TAILIEUCHUNG - Báo cáo khoa học: "Self-Training for Biomedical Parsing"

Parser self-training is the technique of taking an existing parser, parsing extra data and then creating a second parser by treating the extra data as further training data. Here we apply this technique to parser adaptation. In particular, we self-train the standard Charniak/Johnson Penn-Treebank parser using unlabeled biomedical abstracts. This achieves an f -score of on a standard test set of biomedical abstracts from the Genia corpus. This is a 20% error reduction over the best previous result on biomedical data ( on the same test set). . | Self-Training for Biomedical Parsing David McClosky and Eugene Charniak Brown Laboratory for Linguistic Information Processing BLLIP Brown University Providence RI 02912 dmcc ec @ Abstract Parser self-training is the technique of taking an existing parser parsing extra data and then creating a second parser by treating the extra data as further training data. Here we apply this technique to parser adaptation. In particular we self-train the standard Char-niak Johnson Penn-Treebank parser using unlabeled biomedical abstracts. This achieves an -score of on a standard test set of biomedical abstracts from the Genia corpus. This is a 20 error reduction over the best previous result on biomedical data on the same test set . 1 Introduction Parser self-training is the technique of taking an existing parser parsing extra data and then creating a second parser by treating the extra data as further training data. While for many years it was thought not to help state-of-the art parsers more recent work has shown otherwise. In this paper we apply this technique to parser adaptation. In particular we self-train the standard Charniak Johnson Penn-Treebank C J parser using unannotated biomedical data. As is well known biomedical data is hard on parsers because it is so far from more standard English. To our knowledge this is the first application of self-training where the gap between the training and self-training data is so large. In section two we look at previous work. In particular we note that there is in fact very little data on self-training when the corpora for self-training is so different from the original labeled data. Section three describes our main experiment on standard test data Clegg and Shepherd 2005 . Section four looks at some preliminary results we obtained on development data that show in slightly more detail how selftraining improved the parser. We conclude in section five. 2 Previous Work While self-training has worked in several .

Bảo Uyên 84 4 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462351 61

Giới thiệu :Lập trình mã nguồn mở

14 26701 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11376 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10568 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9855 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8907 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8519 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7925 1821

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7290 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Đóng mới oto 8 chỗ ngồi part 9

10 187 3 09-01-2025

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 162 3 09-01-2025

Báo cáo y học: "The Factors Influencing Depression Endpoints Research (FINDER) study: final results of Italian patients with depressio"

9 157 1 09-01-2025

Báo cáo " Thẩm quyền quản lí nhà nước đối với hoạt động quảng cáo thực trạng và hướng hoàn thiện "

7 217 7 09-01-2025

Báo cáo nghiên cứu khoa học " Đại hội XVI thông qua điều lệ Đảng cộng sản Trung Quốc những sửa đổi bổ sung mới "

4 171 1 09-01-2025

Chủ đề 3 : SỰ CÂN BẰNG CỦA VẬT RẮN (4 tiết)

9 217 1 09-01-2025

The Ombudsman Enterprise and Administrative Justice

309 152 0 09-01-2025

Lập trình Java cơ bản : Luồng và xử lý file part 8

5 143 1 09-01-2025

Xinh xinh vườn nhà

6 135 0 09-01-2025

CÂU HỎI TRẮC NGHIỆM HSLS NƯỚC TIỂU

9 180 0 09-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7925 1821

Ebook Chào con ba mẹ đã sẵn sàng

112 4436 1376

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6361 1276

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8907 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3859 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3930 610

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4778 567

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11376 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4534 490