TAILIEUCHUNG - Báo cáo khoa học: "Beyond Lexical Units: Enriching Wordnets with Phrasets"

In this paper we present a proposal to extend WordNet-like lexical databases by adding phrasets, . sets of free combinations of words which are recurrently used to express a concept (let's call them recurrent free phrases). Phrasets are a useful source of information for different NLP tasks, and particularly in a multilingual environment to manage lexical gaps. Two experiments are presented to check the possibility of acquiring recurrent free phrases from dictionaries and corpora. | Beyond Lexical Units Enriching Wordnets with Phrasets Luisa Bentivogli Emanuele Pianta ITC-irst Trento Italy bentiVO pianta @ Abstract In this paper we present a proposal to extend WordNet-like lexical databases by adding phrasets . sets of free combinations of words which are recurrently used to express a concept let s call them recurrent free phrases . Phrasets are a useful source of information for different NLP tasks and particularly in a multilingual environment to manage lexical gaps. Two experiments are presented to check the possibility of acquiring recurrent free phrases from dictionaries and corpora. 1 Introduction WordNet Fellbaum 1998 is a popular lexical database for English in which content words are organized into sets of synonyms synsets each representing one underlying lexical concept. Words and concepts are further connected through various lexical and semantic relations. WordNet has been widely adopted in the NLP community for a variety of practical tasks such as word sense disambiguation question answering information retrieval summarization etc. The English WordNet database is being used as a basis for the development of different multilingual databases such as EuroWordNet MultiWordNet and the recent BalkaNet project. To make it more useful in NLP applications WordNet is constantly updated and extended with different kinds of information such as domain information syntactic information topic signatures syntactic parsing and PoS tagging of the glosses etc. In this paper we propose to extend the Word-Net model by adding a new data structure called phraset. A phraset is a set of free combinations of words as opposed to lexical units which are recurrently used to express a concept. Phrasets can provide useful information for different kind of NLP tasks both in a monolingual and multilingual environment. For instance phrasets can be useful for knowledge-based word alignment of parallel corpora to find correspondences when one language has a

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.