TAILIEUCHUNG - Báo cáo khoa học: "Boosting Statistical Word Alignment Using Labeled and Unlabeled Data"

This paper proposes a semi-supervised boosting approach to improve statistical word alignment with limited labeled data and large amounts of unlabeled data. The proposed approach modifies the supervised boosting algorithm to a semisupervised learning algorithm by incorporating the unlabeled data. In this algorithm, we build a word aligner by using both the labeled data and the unlabeled data. Then we build a pseudo reference set for the unlabeled data, and calculate the error rate of each word aligner using only the labeled data. Based on this semisupervised boosting algorithm, we investigate two boosting methods for word alignment. . | Boosting Statistical Word Alignment Using Labeled and Unlabeled Data Hua Wu Haifeng Wang Zhanyi Liu Toshiba China Research and Development Center 5 F. Tower W2 Oriental Plaza East Chang An Ave. Dong Cheng District Beijing 100738 China wuhua wanghaifeng liuzhanyi @ Abstract This paper proposes a semi-supervised boosting approach to improve statistical word alignment with limited labeled data and large amounts of unlabeled data. The proposed approach modifies the supervised boosting algorithm to a semisupervised learning algorithm by incorporating the unlabeled data. In this algorithm we build a word aligner by using both the labeled data and the unlabeled data. Then we build a pseudo reference set for the unlabeled data and calculate the error rate of each word aligner using only the labeled data. Based on this semisupervised boosting algorithm we investigate two boosting methods for word alignment. In addition we improve the word alignment results by combining the results of the two semi-supervised boosting methods. Experimental results on word alignment indicate that semisupervised boosting achieves relative error reductions of and as compared with supervised boosting and unsupervised boosting respectively. 1 Introduction Word alignment was first proposed as an intermediate result of statistical machine translation Brown et al. 1993 . In recent years many researchers build alignment links with bilingual corpora Wu 1997 Och and Ney 2003 Cherry and Lin 2003 Wu et al. 2005 Zhang and Gildea 2005 . These methods unsupervisedly train the alignment models with unlabeled data. A question about word alignment is whether we can further improve the performances of the word aligners with available data and available alignment models. One possible solution is to use the boosting method Freund and Schapire 1996 which is one of the ensemble methods Dietterich 2000 . The underlying idea of boosting is to combine simple rules to form an ensemble

Thiên Hương 83 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Boosting Statistical Word Alignment Using Labeled and Unlabeled Data"

8 77 0

Báo cáo khoa học: "Boosting Statistical Machine Translation by Lemmatization and Linear Interpolation"

4 68 0

A statistical model for the analysis of beta values in DNA methylation studies

11 30 1

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461846 55

Giới thiệu :Lập trình mã nguồn mở

14 22508 57

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10861 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10024 445

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9488 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8199 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6642 253

Vật lý hạt cơ bản (1)

29 5754 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo khoa học: Loss of kinase activity in Mycobacterium tuberculosis multidomain protein Rv1364c

14 233 0 19-04-2024

Động cơ đốt trong và máy kéo công nghiêp tập 2 part 8

32 258 0 19-04-2024

Trading Strategies Profit Making Techniques For Stock_8

23 171 0 19-04-2024

Magnetic Bearings Theory and Applications phần 2

14 170 0 19-04-2024

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 173 0 19-04-2024

MySQL Basics for Visual Learners PHẦN 9

15 183 0 19-04-2024

Posted prices versus bargaining in markets_7

23 154 0 19-04-2024

MySQL Database Usage & Administration PHẦN 7

37 154 0 19-04-2024

MÔN HỌC VẬT LIỆU VÀ CÔNG NGHỆ KIM LOẠI - PHẦN I: KIM LOẠI HỌC

32 175 2 19-04-2024

Diseases of the Liver and Biliary System - part 1

33 120 0 19-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5591 1326

Ebook Chào con ba mẹ đã sẵn sàng

112 3749 1228

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5246 1124

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3471 641

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10861 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3668 524

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4023 513

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4098 478