TAILIEUCHUNG - Báo cáo khoa học: "Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish"

We present a novel scheme to apply factored phrase-based SMT to a language pair with very disparate morphological structures. Our approach relies on syntactic analysis on the source side (English) and then encodes a wide variety of local and non-local syntactic structures as complex structural tags which appear as additional factors in the training data. On the target side (Turkish), we only perform morphological analysis and disambiguation but treat the complete complex morphological tag as a factor, instead of separating morphemes. . | Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish Reyyan Yeniterzi Language Technologies Institute Carnegie Mellon University Pittsburgh PA 15213 USA reyyan@ Kemal Oflazer Computer Science Carnegie Mellon University-Qatar PO Box 24866 Doha Qatar ko@ Abstract We present a novel scheme to apply factored phrase-based SMT to a language pair with very disparate morphological structures. Our approach relies on syntactic analysis on the source side English and then encodes a wide variety of local and non-local syntactic structures as complex structural tags which appear as additional factors in the training data. On the target side Turkish we only perform morphological analysis and disambiguation but treat the complete complex morphological tag as a factor instead of separating morphemes. We incrementally explore capturing various syntactic substructures as complex tags on the English side and evaluate how our translations improve in BLEU scores. Our maximal set of source and target side transformations coupled with some additional techniques provide an 39 relative improvement from a baseline to BLEU all averaged over 10 training and test sets. Now that the syntactic analysis on the English side is available we also experiment with more long distance constituent reordering to bring the English constituent order close to Turkish but find that these transformations do not provide any additional consistent tangible gains when averaged over the 10 sets. 1 Introduction Statistical machine translation into a morphologically complex language such as Turkish Finnish or Arabic involves the generation of target words with the proper morphology in addition to properly ordering the target words. Earlier work on translation from English to Turkish Oflazer and Durgar-El-Kahlout 2007 Oflazer 2008 Durgar-El-Kahlout and Oflazer 2010 has used an approach which relied on identifying the contextually

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.