TAILIEUCHUNG - Báo cáo khoa học: "Boosting Statistical Word Alignment Using Labeled and Unlabeled Data"

This paper proposes a semi-supervised boosting approach to improve statistical word alignment with limited labeled data and large amounts of unlabeled data. The proposed approach modifies the supervised boosting algorithm to a semisupervised learning algorithm by incorporating the unlabeled data. In this algorithm, we build a word aligner by using both the labeled data and the unlabeled data. Then we build a pseudo reference set for the unlabeled data, and calculate the error rate of each word aligner using only the labeled data. Based on this semisupervised boosting algorithm, we investigate two boosting methods for word alignment. . | Boosting Statistical Word Alignment Using Labeled and Unlabeled Data Hua Wu Haifeng Wang Zhanyi Liu Toshiba China Research and Development Center 5 F. Tower W2 Oriental Plaza East Chang An Ave. Dong Cheng District Beijing 100738 China wuhua wanghaifeng liuzhanyi @ Abstract This paper proposes a semi-supervised boosting approach to improve statistical word alignment with limited labeled data and large amounts of unlabeled data. The proposed approach modifies the supervised boosting algorithm to a semisupervised learning algorithm by incorporating the unlabeled data. In this algorithm we build a word aligner by using both the labeled data and the unlabeled data. Then we build a pseudo reference set for the unlabeled data and calculate the error rate of each word aligner using only the labeled data. Based on this semisupervised boosting algorithm we investigate two boosting methods for word alignment. In addition we improve the word alignment results by combining the results of the two semi-supervised boosting methods. Experimental results on word alignment indicate that semisupervised boosting achieves relative error reductions of and as compared with supervised boosting and unsupervised boosting respectively. 1 Introduction Word alignment was first proposed as an intermediate result of statistical machine translation Brown et al. 1993 . In recent years many researchers build alignment links with bilingual corpora Wu 1997 Och and Ney 2003 Cherry and Lin 2003 Wu et al. 2005 Zhang and Gildea 2005 . These methods unsupervisedly train the alignment models with unlabeled data. A question about word alignment is whether we can further improve the performances of the word aligners with available data and available alignment models. One possible solution is to use the boosting method Freund and Schapire 1996 which is one of the ensemble methods Dietterich 2000 . The underlying idea of boosting is to combine simple rules to form an ensemble

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.