TAILIEUCHUNG - Báo cáo khoa học: "AUTOMATIC ACQUISITION OF A LARGE SUBCATEGORIZATION DICTIONARY FROM CORPORA"

This paper presents a new method for producing a dictionary of subcategorization frames from unlabelled text corpora. It is shown that statistical filtering of the results of a finite state parser running on the output of a stochastic tagger produces high quality results, despite the error rates of the tagger and the parser. Further, it is argued that this method can be used to learn all subcategorization frames, whereas previous methods are not extensible to a general solution to the problem. | AUTOMATIC ACQUISITION OF A LARGE SUBCATEGORIZATION DICTIONARY FROM CORPORA Christopher D. Manning Xerox PARC and Stanford University Stanford University Dept of Linguistics Bldg. 100 Stanford CA 94305-2150 USA Internet m a n n i ng@ csli .sta nford .ed u Abstract This paper presents a new method for producing a dictionary of subcategorization frames from unlabelled text corpora. It is shown that statistical filtering of the results of a finite state parser running on the output of a stochastic tagger produces high quality results despite the error rates of the tagger and the parser. Further it is argued that this method can be used to learn all subcategorization frames whereas previous methods are not extensible to a general solution to the problem. INTRODUCTION Rule-based parsers use subcategorization information to constrain the number of analyses that are generated. For example from subcategorization alone we can deduce that the pp in 1 must be an argument of the verb not a noun phrase modifier 1 John put ÍNpthe cactus ppon the table . Knowledge of subcategorization also aids text generation programs and people learning a foreign language. A subcategorization frame is a statement of what types of syntactic arguments a verb or adjective takes such as objects infinitives that-clauses participial clauses and sub categorized prepositional phrases. In general verbs and adjectives each appear in only a small subset of all possible argument subcategorization frames. A major bottleneck in the production of high-coverage parsers is assembling lexical information Thanks to Julian Kupiec for providing the tagger on which this work depends and for helpful discussions and comments along the way. I am also indebted for comments on an earlier draft to Marti Hearst whose comments were the most useful Hin-rich Schiitze Penni Sibun Mary Dalrymple and others at Xerox PARC where this research was completed during a summer internship Stanley Peters and the two anonymous ACL .

Phương Ngọc 74 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Automatic Selectional Preference Acquisition for Latin verbs"

6 57 0

Báo cáo khoa học: "Automatic Acquisition of Ranked Qualia Structures from the Web"

8 50 0

Báo cáo khoa học: "Automatic Acquisition of Adjectival Subcategorization from Corpora"

8 64 0

Báo cáo khoa học: "Automatic Acquisition of English Topic Signatures Based on a Second Language"

6 69 0

Báo cáo khoa học: "An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition"

8 49 0

Báo cáo khoa học: "Automatic Acquisition of Named Entity Tagged Corpus from World Wide Web"

4 52 1

Báo cáo khoa học: "Automatic Acquisition of Language Model based on Head-Dependent Relation between Words"

5 49 0

Báo cáo khoa học: "AUTOMATIC ACQUISITION OF SUBCATEGORIZATION FRAMES FROM UNTAGGED TEXT"

6 59 0

Báo cáo khoa học: "AUTOMATIC ACQUISITION OF A LARGE SUBCATEGORIZATION DICTIONARY FROM CORPORA"

8 66 0

**Báo cáo khoa học: "AUTOMATIC ACQUISITION OF THE LEXICAL SEMANTICS OF VERBS FROM SENTENCE FRAMES*"**

8 54 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461860 55

Giới thiệu :Lập trình mã nguồn mở

14 22613 59

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10883 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10060 446

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9515 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8274 1125

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8225 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7863 2220

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6669 253

Vật lý hạt cơ bản (1)

29 5767 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Đánh giá hao mòn và độ tin cậy của chi tiết và kết cấu trên đầu máy diezel part 3

12 304 0 25-04-2024

Báo cáo khoa học: Loss of kinase activity in Mycobacterium tuberculosis multidomain protein Rv1364c

14 235 0 25-04-2024

Bơm máy nén quạt trong công nghệ part 1

20 249 2 25-04-2024

Trading Strategies Profit Making Techniques For Stock_8

23 174 0 25-04-2024

MySQL Basics for Visual Learners PHẦN 9

15 183 0 25-04-2024

BÀI GIẢNG VỀ - MẠCH ĐIỆN II - Chương I: Phân tích mạch trong miền thời gian

38 140 0 25-04-2024

Giáo trình CẤU TRÚC DỮ LIỆU VÀ GIẢI THUẬT - Chương 1

5 125 0 25-04-2024

báo cáo hóa học:" Endoscopic decompression for intraforaminal and extraforaminal nerve root compression"

7 107 0 25-04-2024

QUẢN LÝ CHẤT LƯỢNG KHÔNG KHÍ

75 137 0 25-04-2024

GIÁO TRÌNH MÁY ĐIỆN KHÍ CỤ ĐIỆN - PHẦN I MÁY ĐIỆN - CHƯƠNG 1

46 131 2 25-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7863 2220

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5695 1353

Ebook Chào con ba mẹ đã sẵn sàng

112 3764 1231

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5311 1135

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8274 1125

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3492 642

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10883 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3679 525

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4041 514

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4123 480