TAILIEUCHUNG - Báo cáo khoa học: "A Term Recognition Approach to Acronym Recognition"

We present a term recognition approach to extract acronyms and their definitions from a large text collection. Parenthetical expressions appearing in a text collection are identified as potential acronyms. Assuming terms appearing frequently in the proximity of an acronym to be the expanded forms (definitions) of the acronyms, we apply a term recognition method to enumerate such candidates and to measure the likelihood scores of the expanded forms. Based on the list of the expanded forms and their likelihood scores, the proposed algorithm determines the final acronym-definition pairs. . | A Term Recognition Approach to Acronym Recognition Naoaki Okazaki Graduate School of Information Science and Technology The University of Tokyo 7-3-1 Hongo Bunkyo-ku Tokyo 113-8656 Japan okazaki@ Sophia Ananiadou National Centre for Text Mining School of Informatics Manchester University PO Box 88 Sackville Street Manchester M60 1QD United Kingdom Abstract We present a term recognition approach to extract acronyms and their definitions from a large text collection. Parenthetical expressions appearing in a text collection are identified as potential acronyms. Assuming terms appearing frequently in the proximity of an acronym to be the expanded forms definitions of the acronyms we apply a term recognition method to enumerate such candidates and to measure the likelihood scores of the expanded forms. Based on the list of the expanded forms and their likelihood scores the proposed algorithm determines the final acronym-definition pairs. The proposed method combined with a letter matching algorithm achieved 78 precision and 85 recall on an evaluation corpus with 4 212 acronym-definition pairs. 1 Introduction In the biomedical literature the amount of terms names of genes proteins chemical compounds drugs organisms etc is increasing at an astounding rate. Existing terminological resources and scientific databases such as Swiss-Prot 1 SGD2 FlyBase3 and UniProt4 cannot keep up-to-date with the growth of neologisms Pustejovsky et al. 2001 . Although curation teams maintain terminological resources integrating neologisms is very difficult if not based on systematic extraction and Research Fellow of the Japan Society for the Promotion of Science JSPS 1 http swissprot 2http 3http 4http GOA collection of terminology from literature. Term identification in literature is one of the major bottlenecks in processing information in biology as it faces many challenges

TÀI LIỆU MỚI ĐĂNG
309    140    0    26-12-2024
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.