TAILIEUCHUNG - Báo cáo khoa học: "Adapting Translation Models to Translationese Improves SMT"

Translation models used for statistical machine translation are compiled from parallel corpora; such corpora are manually translated, but the direction of translation is usually unknown, and is consequently ignored. However, much research in Translation Studies indicates that the direction of translation matters, as translated language (translationese) has many unique properties. Specifically, phrase tables constructed from parallel corpora translated in the same direction as the translation task perform better than ones constructed from corpora translated in the opposite direction. . | Adapting Translation Models to Translationese Improves SMT Gennadi Lembersky Noam Ordan Dept. of Computer Science Dept. of Computer Science University of Haifa University of Haifa 31905 Haifa Israel 31905 Haifa Israel glembers@ Shuly Wintner Dept. of Computer Science University of Haifa 31905 Haifa Israel shuly@ Abstract Translation models used for statistical machine translation are compiled from parallel corpora such corpora are manually translated but the direction of translation is usually unknown and is consequently ignored. However much research in Translation Studies indicates that the direction of translation matters as translated language translationese has many unique properties. Specifically phrase tables constructed from parallel corpora translated in the same direction as the translation task perform better than ones constructed from corpora translated in the opposite direction. We reconfirm that this is indeed the case but emphasize the importance of using also texts translated in the wrong direction. We take advantage of information pertaining to the direction of translation in constructing phrase tables by adapting the translation model to the special properties of translationese. We define entropybased measures that estimate the correspondence of target-language phrases to transla-tionese thereby eliminating the need to annotate the parallel corpus with information pertaining to the direction of translation. We show that incorporating these measures as features in the phrase tables of statistical machine translation systems results in consistent statistically significant improvement in the quality of the translation. 1 Introduction Much research in Translation Studies indicates that translated texts have unique characteristics that set them apart from original texts Toury 1980 Gellerstam 1986 Toury 1995 . Known as translationese translated texts in any language constitute a genre or a dialect of

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.