TAILIEUCHUNG - Báo cáo khoa học: "Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax"

A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger, a derivational morphological processor for analysis and generation, and a unificationbased shallow-level parser using transformational rules over syntactic patterns. The contribution of this research is the successful combination of parsing over a seed term list coupled with derivational morphology to achieve greater coverage of multi-word terms for indexing and retrieval. . | Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax Evelyne Tzoukermann Bell Laboratories Lucent Technologies 700 Mountain Avenue 2D-448 Christian Jacquemin Judith L. Klavans Institut de Recherche en Informatique Center for Research de Nantes BP 92208 on Information Access 2 chemin de la Houssinière Columbia University 44322 NANTES Cedex 3 535 w. 114th Street MC 1101 . Box 636 FRANCE New York NY 10027 USA Murray Hill NJ 07974 USA j Abstract A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger a derivational morphological processor for analysis and generation and a unificationbased shallow-level parser using transformational rules over syntactic patterns. The contribution of this research is the successful combination of parsing over a seed term list coupled with derivational morphology to achieve greater coverage of multi-word terms for indexing and retrieval. Final results are evaluated for precision and recall and implications for indexing and retrieval are discussed. 1 Motivation Terms are known to be excellent descriptors of the informational content of textual documents Srinivasan 1996 but they are subject to numerous linguistic variations. Terms cannot be retrieved properly with coarse text simplification techniques . stemming their identification requires precise and efficient NLP techniques. We have developed a domain independent system for automatic term recognition from unrestricted text. The system presented in this paper takes as input a list of controlled terms and a corpus it detects and marks occurrences of term We would like to thank the NLP Group of Columbia University Bell Laboratories - Lucent Technologies and the Institut Universitaire de Technologic de Nantes for their support of the exchange visitor .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
337    151    2    23-01-2025
13    165    1    23-01-2025
9    181    0    23-01-2025
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.