TAILIEUCHUNG - Báo cáo khoa học: "A Bio-inspired Approach for Multi-Word Expression Extraction"

This paper proposes a new approach for Multi-word Expression (MWE)extraction on the motivation of gene sequence alignment because textual sequence is similar to gene sequence in pattern analysis. Theory of Longest Common Subsequence (LCS) originates from computer science and has been established as affine gap model in Bioinformatics. We perform this developed LCS technique combined with linguistic criteria in MWE extraction. In comparison with traditional n-gram method, which is the major technique for MWE extraction, LCS approach is applied with great efficiency and performance guarantee. . | A Bio-inspired Approach for Multi-Word Expression Extraction Jianyong Duan Ruzhan Lu Weilin Wu Yi Hu Department of Computer Science Shanghai Jiao Tong University Shanghai 200240 . China duanjy@ lu-rz wl-wu huyi @ Yan Tian School of Foreign Languages Department of Computer Science Shanghai Jiao Tong University Shanghai 200240 . China tianyan@ cn Abstract This paper proposes a new approach for Multi-word Expression MWE extraction on the motivation of gene sequence alignment because textual sequence is similar to gene sequence in pattern analysis. Theory of Longest Common Subsequence LCS originates from computer science and has been established as affine gap model in Bioinformatics. We perform this developed LCS technique combined with linguistic criteria in MWE extraction. In comparison with traditional n-gram method which is the major technique for MWE extraction LCS approach is applied with great efficiency and performance guarantee. Experimental results show that LCS-based approach achieves better results than n-gram. 1 Introduction Language is under continuous development. People enlarge vocabulary and let words carry more meanings. Meanwhile the language also develops larger lexical units to carry specific meanings specifically MWE s which include compounds phrases technical terms idioms and collocations etc. The MWE has relatively fixed pattern because every MWE denotes a whole concept. In computational view the MWE repeats itself constantly in corpus Taneli 2003 . The extraction of MWE plays an important role in several areas such as machine translation Pas-cale 1997 information extraction Kalliopi 2000 etc. On the other hand there is also a need for MWE extraction in a much more widespread scenario namely that of human translation and technical writing. Many efforts have been devoted to the study of MWE extraction Beat-rice 2003 Ivan 2002 Jordi 2001 . These statistical methods detect MWE by frequency of candidate .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.