TAILIEUCHUNG - Báo cáo khoa học: "Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets"

Creating large amounts of annotated data to train statistical PCFG parsers is expensive, and the performance of such parsers declines when training and test data are taken from different domains. In this paper we use selftraining in order to improve the quality of a parser and to adapt it to a different domain, using only small amounts of manually annotated seed data. We report signiﬁcant improvement both when the seed and test data are in the same domain and in the outof-domain adaptation scenario. . | Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets Roi Reichart ICNC Hebrew University of Jerusalem roiri@ Ari Rappoport Institute of Computer Science Hebrew University of Jerusalem arir@ Abstract Creating large amounts of annotated data to train statistical PCFG parsers is expensive and the performance of such parsers declines when training and test data are taken from different domains. In this paper we use selftraining in order to improve the quality of a parser and to adapt it to a different domain using only small amounts of manually annotated seed data. We report significant improvement both when the seed and test data are in the same domain and in the out-of-domain adaptation scenario. In particular we achieve 50 reduction in annotation cost for the in-domain case yielding an improvement of 66 over previous work and a 20-33 reduction for the domain adaptation case. This is the first time that self-training with small labeled datasets is applied successfully to these tasks. We were also able to formulate a characterization of when selftraining is valuable. 1 Introduction State of the art statistical parsers Collins 1999 Charniak 2000 Koo and Collins 2005 Charniak and Johnson 2005 are trained on manually annotated treebanks that are highly expensive to create. Furthermore the performance of these parsers decreases as the distance between the genres of their training and test data increases. Therefore enhancing the performance of parsers when trained on small manually annotated datasets is of great importance both when the seed and test data are taken 616 from the same domain the in-domain scenario and when they are taken from different domains the out-of-domain or parser adaptation scenario . Since the problem is the expense in manual annotation we define small to be sentences which are the sizes of sentence sets that can be manually annotated by constituent structure in a

Ngọc Hoàn 64 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461992 55

Giới thiệu :Lập trình mã nguồn mở

14 23362 68

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11036 533

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10251 453

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9594 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8473 1141

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8314 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7904 2240

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6900 257

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6353 1538

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Đề tài: Ôn xác định vị trí trên – dưới, trước- sau của đối tượng khác.

8 389 3 04-06-2024

Sáng tạo trong thuật toán và lập trình với ngôn ngữ Pascal và C# Tập 2 - Chương 4

47 270 1 04-06-2024

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 194 0 04-06-2024

Công nghiệp gang thép Việt Nam : Một giai đoạn phát triển và chuyển đổi chính sách mới part 5

6 209 0 04-06-2024

GIÁO TRÌNH VI XỬ LÝ 1 - CHƯƠNG 5. LẬP TRÌNH CHO VI ĐIỀU KHIỂN 80C51

23 123 1 04-06-2024

MẪU GIẤY PHÉP VẬN TẢI LOẠI C

2 126 0 04-06-2024

The Constituents of Medicinal Plants

185 114 0 04-06-2024

ĐỀ THI THỬ ĐH NĂM 2011 MÔN VẬT LÍ _ ĐỀ SỐ 101

7 103 0 04-06-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 112 0 04-06-2024

Báo cáo y học: "Aggrecanases and cartilage matrix degradation"

10 107 0 04-06-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7904 2240

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6353 1538

Ebook Chào con ba mẹ đã sẵn sàng

112 3897 1281

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5519 1149

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8473 1141

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3586 658

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3788 570

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11036 533

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4230 527

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4239 483