TAILIEUCHUNG - Báo cáo khoa học: "Active Learning for Multilingual Statistical Machine Translation∗"

Statistical machine translation (SMT) models require bilingual corpora for training, and these corpora are often multilingual with parallel text in multiple languages simultaneously. We introduce an active learning task of adding a new language to an existing multilingual set of parallel text and constructing high quality MT systems, from each language in the collection into this new target language. We show that adding a new language using active learning to the EuroParl corpus provides a significant improvement compared to a random sentence selection baseline. . | Active Learning for Multilingual Statistical Machine Translation Gholamreza Haffari and Anoop Sarkar School of Computing Science Simon Fraser University British Columbia Canada ghaffar1 anoop @ Abstract Statistical machine translation SMT models require bilingual corpora for training and these corpora are often multilingual with parallel text in multiple languages simultaneously. We introduce an active learning task of adding a new language to an existing multilingual set of parallel text and constructing high quality MT systems from each language in the collection into this new target language. We show that adding a new language using active learning to the EuroParl corpus provides a significant improvement compared to a random sentence selection baseline. We also provide new highly effective sentence selection methods that improve AL for phrase-based SMT in the multilingual and single language pair setting. 1 Introduction The main source of training data for statistical machine translation SMT models is a parallel corpus. In many cases the same information is available in multiple languages simultaneously as a multilingual parallel corpus . European Parliament EuroParl and . proceedings. In this paper we consider how to use active learning AL in order to add a new language to such a multilingual parallel corpus and at the same time we construct an MT system from each language in the original corpus into this new target language. We introduce a novel combined measure of translation quality for multiple target language outputs the same content from multiple source languages . The multilingual setting provides new opportunities for AL over and above a single language pair. This setting is similar to the multi-task AL scenario Reichart et al. 2008 . In our case the multiple tasks are individual machine translation tasks for several language pairs. The nature of the translation processes vary from any of the source Thanks to James Peltier for systems .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.