TAILIEUCHUNG - Báo cáo khoa học: "Evolving new lexical association measures using genetic programming"

Automatic extraction of collocations from large corpora has been the focus of many research efforts. Most approaches concentrate on improving and combining known lexical association measures. In this paper, we describe a genetic programming approach for evolving new association measures, which is not limited to any specific language, corpus, or type of collocation. Our preliminary experimental results show that the evolved measures outperform three known association measures. | Evolving new lexical association measures using genetic programming Jan Snajder Bojana Dalbelo Basic Sasa Petrovic Ivan Sikiric Faculty of Electrical Engineering and Computing University of Zagreb Unska 3 Zagreb Croatia @ Abstract Automatic extraction of collocations from large corpora has been the focus of many research efforts. Most approaches concentrate on improving and combining known lexical association measures. In this paper we describe a genetic programming approach for evolving new association measures which is not limited to any specific language corpus or type of collocation. Our preliminary experimental results show that the evolved measures outperform three known association measures. 1 Introduction A collocation is an expression consisting of two or more words that correspond to some conventional way of saying things Manning and Schutze 1999 . Related to the term collocation is the term n-gram which is used to denote any sequence of n words. There are many possible applications of collocations automatic language generation word sense disambiguation improving text categorization information retrieval etc. As different applications require different types of collocations that are often not found in dictionaries automatic extraction of collocations from large textual corpora has been the focus of much research in the last decade see for example Pecina and Schlesinger 2006 Evert and Krenn 2005 . Automatic extraction of collocations is usually performed by employing lexical association measures AMs to indicate how strongly the words comprising an n-gram are associated. However the use of lexical AMs for the purpose of collocation extraction has reached a plateau recent research in this field has focused on combining the existing AMs in the hope of improving the results Pecina and Schlesinger 2006 . In this paper we propose an approach for deriving new AMs for collocation extraction based on .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.