TAILIEUCHUNG - Báo cáo khoa học: "Extracting Hypernym Pairs from the Web"

We apply pattern-based methods for collecting hypernym relations from the web. We compare our approach with hypernym extraction from morphological clues and from large text corpora. We show that the abundance of available data on the web enables obtaining good results with relatively unsophisticated techniques. | Extracting Hypernym Pairs from the Web Erik Tjong Kim Sang ISLA Informatics Institute University of Amsterdam erikt@ Abstract We apply pattern-based methods for collecting hypernym relations from the web. We compare our approach with hypernym extraction from morphological clues and from large text corpora. We show that the abundance of available data on the web enables obtaining good results with relatively unsophisticated techniques. 1 Introduction WordNet is a key lexical resource for natural language applications. However its coverage currently 155k synsets for the English WordNet is far from complete. For languages other than English the available WordNets are considerably smaller like for Dutch with a 44k synset WordNet. Here the lack of coverage creates bigger problems. A manual extension of the WordNets is costly. Currently there is a lot of interest in automatic techniques for updating and extending taxonomies like WordNet. Hearst 1992 was the hrst to apply hxed syntactic patterns like such NP as NP for extracting hypernym-hyponym pairs. Carballo 1999 built noun hierarchies from evidence collected from conjunctions. Pantel Ravichandran and Hovy 2004 learned syntactic patterns for identifying hypernym relations and combined these with clusters built from co-occurrence information. Recently Snow Jurafsky and Ng 2005 generated tens of thousands of hypernym patterns and combined these with noun clusters to generate high-precision suggestions for unknown noun insertion into WordNet Snow et al. 2006 . The previously mentioned papers deal with 165 English. Little work has been done for other languages. IJzereef 2004 used hxed patterns to extract Dutch hypernyms from text and encyclopedias. Van der Plas and Bouma 2005 employed noun distribution characteristics for extending the Dutch part of EuroWordNet. In earlier work different techniques have been applied to large and very large text corpora. Today the web contains more data than the largest .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.