TAILIEUCHUNG - Báo cáo khoa học: "AUTOMATIC ACQUISITION OF A LARGE SUBCATEGORIZATION DICTIONARY FROM CORPORA"

This paper presents a new method for producing a dictionary of subcategorization frames from unlabelled text corpora. It is shown that statistical filtering of the results of a finite state parser running on the output of a stochastic tagger produces high quality results, despite the error rates of the tagger and the parser. Further, it is argued that this method can be used to learn all subcategorization frames, whereas previous methods are not extensible to a general solution to the problem. | AUTOMATIC ACQUISITION OF A LARGE SUBCATEGORIZATION DICTIONARY FROM CORPORA Christopher D. Manning Xerox PARC and Stanford University Stanford University Dept of Linguistics Bldg. 100 Stanford CA 94305-2150 USA Internet m a n n i ng@ csli .sta nford .ed u Abstract This paper presents a new method for producing a dictionary of subcategorization frames from unlabelled text corpora. It is shown that statistical filtering of the results of a finite state parser running on the output of a stochastic tagger produces high quality results despite the error rates of the tagger and the parser. Further it is argued that this method can be used to learn all subcategorization frames whereas previous methods are not extensible to a general solution to the problem. INTRODUCTION Rule-based parsers use subcategorization information to constrain the number of analyses that are generated. For example from subcategorization alone we can deduce that the pp in 1 must be an argument of the verb not a noun phrase modifier 1 John put ÍNpthe cactus ppon the table . Knowledge of subcategorization also aids text generation programs and people learning a foreign language. A subcategorization frame is a statement of what types of syntactic arguments a verb or adjective takes such as objects infinitives that-clauses participial clauses and sub categorized prepositional phrases. In general verbs and adjectives each appear in only a small subset of all possible argument subcategorization frames. A major bottleneck in the production of high-coverage parsers is assembling lexical information Thanks to Julian Kupiec for providing the tagger on which this work depends and for helpful discussions and comments along the way. I am also indebted for comments on an earlier draft to Marti Hearst whose comments were the most useful Hin-rich Schiitze Penni Sibun Mary Dalrymple and others at Xerox PARC where this research was completed during a summer internship Stanley Peters and the two anonymous ACL .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.