TAILIEUCHUNG - Báo cáo khoa học: "SEXTANT: EXPLORING UNEXPLORED CONTEXTS FOR SEMANTIC EXTRACTION FROM SYNTACTIC ANALYSIS"

For a very long time, it has been considered that the only way of automatically extracting similar groups of words from a text collection for which no semantic information exists is to use docum e n t co-occurrence data. But, with robust syntactic parsers that are becoming more frequently available, syntactically recognizable p h e n o m e n a about word usage can be confidently noted in large collections of texts. | SEXTANT EXPLORING UNEXPLORED CONTEXTS FOR SEMANTIC EXTRACTION FROM SYNTACTIC ANALYSIS Gregory Grefenstette Computer Science Department University of Pittsburgh Pittsburgh PA 15260 grefen@cs .pit t .edu Abstract For a very long time it has been considered that the only way of automatically extracting similar groups of words from a text collection for which no semantic information exists is to use document co-occurrence data. But with robust syntactic parsers that are becoming more frequently available syntactically recognizable phenomena about word usage can be confidently noted in large collections of texts. We present here a new system called SEXTANT which uses these parsers and the finer-grained contexts they produce to judge word similarity. BACKGROUND Many machine-based approaches to term similarity such as found in TRUMP Jacobs and Zemick 1988 and FERRET Mauldin 1991 can be characterized as knowledge-rich in that they presuppose that known lexical items possess Conceptual Dependence CD -like descriptions. Such an approach necessitates a great amount of manual encoding of semantic information and suffers from the drawbacks of cost in terms of initial coding coherence checking maintenance after modifications and costs derivable from a host of other software engineering concern of domain dependence a semantic structure developed for one domain would not be applicable to another. For example sugar would have very different semantic relations in a medical domain than in a commodities exchange domain and of rigidity even within well-established domain new subdomains spring up . AIDS. Can hand-coded systems keep up with new discoveries and new relations with an acceptable latency In the Information Retrieval community researchers have consistently considered that the linguistic apparatus required for effective domain-independent analysis is not yet at hand and have concentrated on counting document co-occurrence statistics Peat and Willet 1991 based on the idea .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.