TAILIEUCHUNG - Báo cáo khoa học: "Chinese Term Extraction Using Different Types of Relevance"

This paper presents a new term extraction approach using relevance between term candidates calculated by a link analysis based method. Different types of relevance are used separately or jointly for term verification. The proposed approach requires no prior domain knowledge and no adaptation for new domains. | Chinese Term Extraction Using Different Types of Relevance Yuhang Yang1 Tiejun Zhao1 Qin Lu2 Dequan Zheng1 and Hao Yu1 1School of Computer Science and Technology Harbin Institute of Technology Harbin 150001 China yhyang tjzhao dqzheng yu @ Department of Computing The Hong Kong Polytechnic University Hong Kong China csluqin@ Abstract This paper presents a new term extraction approach using relevance between term candidates calculated by a link analysis based method. Different types of relevance are used separately or jointly for term verification. The proposed approach requires no prior domain knowledge and no adaptation for new domains. Consequently the method can be used in any domain corpus and it is especially useful for resource-limited domains. Evaluations conducted on two different domains for Chinese term extraction show significant improvements over existing techniques and also verify the efficiency and relative domain independent nature of the approach. 1 Introduction Terms are the lexical units to represent the most fundamental knowledge of a domain. Term extraction is an essential task in domain knowledge acquisition which can be used for lexicon update domain ontology construction etc. Term extraction involves two steps. The first step extracts candidates by unithood calculation to qualify a string as a valid term. The second step verifies them through termhood measures Kageura and Umino 1996 to validate their domain specificity. Many previous studies are conducted on term candidate extraction. Other tasks such as named entity recognition meaningful word extraction and unknown word detection use techniques similar to that for term candidate extraction. But their focuses are not on domain specificity. This study focuses on the verification of candidates by termhood calculation. Relevance between term candidates and documents is the most popular feature used for term verification such as TF-IDF Salton and McGill 1983 Frank

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.