TAILIEUCHUNG - Báo cáo khoa học: "The Eﬀect of Corpus Size in Combining Supervised and Unsupervised Training for Disambiguation"

We investigate the eﬀect of corpus size in combining supervised and unsupervised learning for two types of attachment decisions: relative clause attachment and prepositional phrase attachment. The supervised component is Collins’ parser, trained on the Wall Street Journal. The unsupervised component gathers lexical statistics from an unannotated corpus of newswire text. We ﬁnd that the combined system only improves the performance of the parser for small training sets. Surprisingly, the size of the unannotated corpus has little eﬀect due to the noisiness of the lexical statistics acquired by unsupervised learning. . | The Effect of Corpus Size in Combining Supervised and Unsupervised Training for Disambiguation Michaela Atterer Institute for NLP University of Stuttgart atterer@ Hinrich Schutze Institute for NLP University of Stuttgart hinrich@ Abstract We investigate the effect of corpus size in combining supervised and unsupervised learning for two types of attachment decisions relative clause attachment and prepositional phrase attachment. The supervised component is Collins parser trained on the Wall Street Journal. The unsupervised component gathers lexical statistics from an unannotated corpus of newswire text. We find that the combined system only improves the performance of the parser for small training sets. Surprisingly the size of the unannotated corpus has little effect due to the noisiness of the lexical statistics acquired by unsupervised learning. 1 Introduction The best performing systems for many tasks in natural language processing are based on supervised training on annotated corpora such as the Penn Treebank Marcus et al. 1993 and the prepositional phrase data set first described in Ratnaparkhi et al. 1994 . However the production of training sets is expensive. They are not available for many domains and languages. This motivates research on combining supervised with unsupervised learning since unannotated text is in ample supply for most domains in the major languages of the world. The question arises how much annotated and unannotated data is necessary in combination learning strategies. We investigate this question for two attachment ambiguity problems relative clause RC attachment and prepositional phrase PP attachment. The supervised component is Collins parser Collins 1997 trained on the Wall Street Journal. The unsupervised component gathers lexical statistics from an unannotated corpus of newswire text. The sizes of both types of corpora annotated and unannotated are of interest. We would expect that large annotated .

Thanh Hồng 60 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "The Eﬀect of Corpus Size in Combining Supervised and Unsupervised Training for Disambiguation"

8 50 0

Báo cáo sinh học: "A simulation study on the accuracy of position and eﬀect estimates of linked QTL and their asymptotic standard deviations using multiple interval mapping in an F2 scheme"

25 41 0

Báo cáo sinh học: "The eﬀect of using approximate gametic variance covariance matrices on marker assisted selection by BLUP"

20 47 0

The Eﬀect of Driving Restrictions on Air Quality in Mexico City

1 46 0

Is There a Disposition Eﬀect in Corporate Investment Decisions? Evidence from Real Estate Investment Trusts ∗

44 47 0

Báo cáo lâm nghiệp: "Small mammals of a forest reserve and adjacent stands of the Kelečská pahorkatina Upland (Czech Republic) and their eﬀect on forest dynamics"

9 30 0

Báo cáo lâm nghiệp: "The eﬀect of diﬀerent stand density on diameter growth response in Scots pine stands in relation to climate situations"

13 31 0

Báo cáo lâm nghiệp: "Eﬀect of Melampsora larici-populina on growth and biomass yield of eight clones of Populus nigra"

9 36 0

Báo cáo lâm nghiệp: "Eﬀect of leaf biomass and phenological structure of the canopy on plot growth in a deciduous hardwood forest in northern Japan"

8 41 0

Báo cáo lâm nghiệp: "Eﬀect of tree species substitution on organic matter biodegradability and mineral nutrient availability in a temperate topsoil"

9 47 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462291 61

Giới thiệu :Lập trình mã nguồn mở

14 24918 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11286 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10511 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9790 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8467 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8090 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7471 1763

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7188 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Data Structures and Algorithms - Chapter 8: Heaps

41 172 5 26-11-2024

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 150 3 26-11-2024

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 159 2 26-11-2024

Color Atlas of Ophthamology

165 132 2 26-11-2024

Báo cáo y học: "The Factors Influencing Depression Endpoints Research (FINDER) study: final results of Italian patients with depressio"

9 139 1 26-11-2024

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 140 1 26-11-2024

Báo cáo nghiên cứu khoa học " NÂNG QUAN HỆ KINH TẾ THƯƠNG MẠI VIỆT NAM - TRUNG QUỐC LÊN TẦM CAO THỜI ĐẠI "

8 159 1 26-11-2024

Báo cáo nghiên cứu khoa học " Đại hội XVI thông qua điều lệ Đảng cộng sản Trung Quốc những sửa đổi bổ sung mới "

4 155 1 26-11-2024

Báo cáo nghiên cứu khoa học " Sự nhất quán phát triển kinh tế thị trường XHCN trong xây dựng xã hội hài hoà của Trung Quốc và đổi mới của Việt Nam "

8 138 1 26-11-2024

5 thói quen ăn uống hủy hoại hàm răng đẹp

5 160 1 26-11-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8090 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7471 1763

Ebook Chào con ba mẹ đã sẵn sàng

112 4364 1369

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6156 1258

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3790 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3909 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4618 562

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11286 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4454 490