TAILIEUCHUNG - Báo cáo khoa học: "An Endogeneous Corpus-Based Method for Structural Noun Phrase Disambiguation"

In this paper, we describe a method for structural noun phrase disambiguation which mainly relies on the examination of the text corpus under analysis and doesn't need to integrate any domain-dependent lexico- or syntactico-semantic information. This method is implemented in the Terminology Extraction Sotware LEXTER. We first explain why the integration of LEXTER in the LEXTER-K project, which aims at building a tool for knowledge extraction from large technical text corpora, requires improving the quality of the terminolgy extracted by LEXTER. Then we briefly describe the way LEXTER works and show what kind of disambiguation it has to perform. | An Endogeneous Corpus-Based Method for Structural Noun Phrase Disambiguation Didier Bourigault Centre d Analyse et de Mathematiques Sociales EHESS - Paris Sorbonne - CNRS and Electricité de France - Dừectìon des Etudes et Recherches Service Informatique et Mathématiques Appliquées 1 avenue du Général de Gaulle 92141 Clamart Cedex FRANCE Abstract In this paper we describe a method for structural noun phrase disambiguation which mainly relies on the examination of the text corpus under analysis and doesn t need to integrate any domain-dependent lexico- or syntactico-semantic information. This method is implemented in the Terminology Extraction Sotware LEXTER. We first explain why the integration of LEXTER in the LEXTER-K project which aims at building a tool for knowledge extraction from large technical text corpora requires improving the quality of the terminolgy extracted by LEXTER. Then we briefly describe the way LEXTER . works and show what kind of disambiguation it has to perform when parsing maximal-length noun phrases. We introduce a method of disambiguation which relies on a very simple idea whenever LEXTER has to choose among several competing noun sub-groups in order to disambiguate a maximal-length noun phrase it checks each of these sub-groups if it occurs anywhere else in the corpus in a non-ambiguous situation and then it makes a choice. The half-a-million words corpus analysis resulted in an efficient strategy of disambiguation. The average rates are 27 no disambiguation 70 correct disambiguation 3 wrong disambiguation 1 The LEXTER-K project knowledge extraction from large technical text corpora LEXTER is a Terminology Extraction Software Bourigault 1992a 1992b . A corpus of French-language texts on any technical subject is fed in. LEXTER performs a grammatical analysis of this corpus and yields a list of noun phrases which are likely to be terminological units representing the concepts of the subject field. This list together with the corpus it has .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.