TAILIEUCHUNG - Báo cáo khoa học: "Classifying Biological Full-Text Articles for Multi-Database Curation"

In this paper, we propose an approach for identifying curatable articles from a large document set. This system considers three parts of an article (title and abstract, MeSH terms, and captions) as its three individual representations and utilizes two domain-specific resources (UMLS and a tumor name list) to reveal the deep knowledge contained in the article. An SVM classifier is trained and cross-validation is employed to find the best combination of representations. The experimental results show overall high performance. . | Classifying Biological Full-Text Articles for Multi-Database Curation Wen-Juan Hou Chih Lee and Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei Taiwan wjhou clee @ hhchen@ Abstract In this paper we propose an approach for identifying curatable articles from a large document set. This system considers three parts of an article title and abstract MeSH terms and captions as its three individual representations and utilizes two domain-specific resources UMLS and a tumor name list to reveal the deep knowledge contained in the article. An SVM classifier is trained and cross-validation is employed to find the best combination of representations. The experimental results show overall high performance. 1 Introduction Organism databases play a crucial role in genomic and proteomic research. It stores the up-to-date profile of each gene of the species interested. For example the Mouse Genome Database MGD provides essential integration of experimental knowledge for the mouse system with information annotated from both literature and online sources Bult et al. 2004 . To provide biomedical scientists with easy access to complete and accurate information curators have to constantly update databases with new information. With the rapidly growing rate of publication it is impossible for curators to read every published article. Since fully automated curation systems have not met the strict requirement of high accuracy and recall database curators still have to read some if not all of the articles sent to them. Therefore it will be very helpful if a classification system can correctly identify the curatable or relevant articles in a large number of biological articles. Recently several attempts have been made to classify documents from biomedical domain Hirschman et al. 2002 . Couto et al. 2004 used the information extracted from related web resources to classify biomedical literature. Hou .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.