TAILIEUCHUNG - Báo cáo khoa học: "Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages"

We propose several techniques for improving statistical machine translation between closely-related languages with scarce resources. We use character-level translation trained on n-gram-character-aligned bitexts and tuned using word-level BLEU, which we further augment with character-based transliteration at the word level and combine with a word-level translation model. | Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages Preslav Nakov Qatar Computing Research Institute Qatar Foundation . box 5825 Doha Qatar pnakov@ Jorg Tiedemann Department of Linguistics and Philology Uppsala University Uppsala Sweden Abstract We propose several techniques for improving statistical machine translation between closely-related languages with scarce resources. We use character-level translation trained on n-gram-character-aligned bitexts and tuned using word-level BLEU which we further augment with character-based transliteration at the word level and combine with a word-level translation model. The evaluation on Macedonian-Bulgarian movie subtitles shows an improvement of BLEU points over a phrase-based word-level baseline. 1 Introduction Statistical machine translation SMT systems require parallel corpora of sentences and their translations called bitexts which are often not sufficiently large. However for many closely-related languages SMT can be carried out even with small bitexts by exploring relations below the word level. Closely-related languages such as Macedonian and Bulgarian exhibit a large overlap in their vocabulary and strong syntactic and lexical similarities. Spelling conventions in such related languages can still be different and they may diverge more substantially at the level of morphology. However the differences often constitute consistent regularities that can be generalized when translating. The language similarities and the regularities in morphological variation and spelling motivate the use of character-level translation models which were applied to translation Vilar et al. 2007 Tiedemann 2009a and transliteration Matthews 2007 . 301 Macedonian Bulgarian a B M e _ a B M 6 _ ỈỊ a _ _ Be py B a M _ _fleKa_TOj- a X M e _ a X M e - A a _ -BSpBaM-_ He_T0ii_ Table 1 Examples from a character-level phrase table without scores .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.