TAILIEUCHUNG - Báo cáo khoa học: "Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs"

This paper proposes an approach to improve word alignment for languages with scarce resources using bilingual corpora of other language pairs. To perform word alignment between languages L1 and L2, we introduce a third language L3. Although only small amounts of bilingual data are available for the desired language pair L1-L2, large-scale bilingual corpora in L1-L3 and L2-L3 are available. Based on these two additional corpora and with L3 as the pivot language, we build a word alignment model for L1 and L2. . | Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs Haifeng Wang Hua Wu Zhanyi Liu Toshiba China Research and Development Center 5 F. Tower W2 Oriental Plaza East Chang An Ave. Dong Cheng District Beijing 100738 China wanghaifeng wuhua liuzhanyi @ Abstract This paper proposes an approach to improve word alignment for languages with scarce resources using bilingual corpora of other language pairs. To perform word alignment between languages L1 and L2 we introduce a third language L3. Although only small amounts of bilingual data are available for the desired language pair L1-L2 large-scale bilingual corpora in L1-L3 and L2-L3 are available. Based on these two additional corpora and with L3 as the pivot language we build a word alignment model for L1 and L2. This approach can build a word alignment model for two languages even if no bilingual corpus is available in this language pair. In addition we build another word alignment model for L1 and L2 using the small L1-L2 bilingual corpus. Then we interpolate the above two models to further improve word alignment between L1 and L2. Experimental results indicate a relative error rate reduction of as compared with the method only using the small bilingual corpus in L1 and L2. 1 Introduction Word alignment was first proposed as an intermediate result of statistical machine translation Brown et al. 1993 . Many researchers build alignment links with bilingual corpora Wu 1997 Och and Ney 2003 Cherry and Lin 2003 Zhang and Gildea 2005 . In order to achieve satisfactory results all of these methods require a large-scale bilingual corpus for training. When the large-scale bilingual corpus is unavailable some researchers acquired class-based alignment rules with existing dictionaries to improve word alignment Ker and Chang 1997 . Wu et al. 2005 used a large-scale bilingual corpus in general domain to improve domain-specific word alignment when only a .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.