TAILIEUCHUNG - Báo cáo khoa học: "Automatic Acquisition of Adjectival Subcategorization from Corpora"

This paper describes a novel system for acquiring adjectival subcategorization frames (SCFs) and associated frequency information from English corpus data. The system incorporates a decision-tree classifier for 30 SCF types which tests for the presence of grammatical relations (GRs) in the output of a robust statistical parser. It uses a powerful patternmatching language to classify GRs into frames hierarchically in a way that mirrors inheritance-based lexica. The experiments show that the system is able to detect SCF types with 70% precision and 66% recall rate. . | Automatic Acquisition of Adjectival Subcategorization from Corpora Jeremy Yallop Anna Korhonen and Ted Briscoe Computer Laboratory University of Cambridge 15 JJ Thomson Avenue Cambridge CB3 OFD UK yallop@ @ Abstract This paper describes a novel system for acquiring adjectival subcategorization frames scfs and associated frequency information from English corpus data. The system incorporates a decision-tree classifier for 30 SCF types which tests for the presence of grammatical relations GRs in the output of a robust statistical parser. It uses a powerful patternmatching language to classify GRs into frames hierarchically in a way that mirrors inheritance-based lexica. The experiments show that the system is able to detect SCF types with 70 precision and 66 recall rate. A new tool for linguistic annotation of scfs in corpus data is also introduced which can considerably alleviate the process of obtaining training and test data for subcategorization acquisition. 1 Introduction Research into automatic acquisition of lexical information from large repositories of unannotated text such as the web corpora of published text etc. is starting to produce large scale lexical resources which include frequency and usage information tuned to genres and sublanguages. Such resources are critical for natural language processing nlp both for enhancing the performance of Part of this research was conducted while this author was at the University of Edinburgh Laboratory for Foundations of Computer Science. state-of-art statistical systems and for improving the portability of these systems between domains. One type of lexical information with particular importance for NLP is subcategorization. Access to an accurate and comprehensive subcategorization lexicon is vital for the development of successful parsing technology . Carroll et al. 1998b important for many NLP tasks . automatic verb classification Schulte im Walde and Brew 2002

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.