TAILIEUCHUNG - Báo cáo khoa học: "Multi-Engine Machine Translation Guided by Explicit Word Matching"

We describe a new approach for synthetically combining the output of several different Machine Translation (MT) engines operating on the same input. The goal is to produce a synthetic combination that surpasses all of the original systems in translation quality. Our approach uses the individual MT engines as “black boxes” and does not require any explicit cooperation from the original MT systems. A decoding algorithm uses explicit word matches, in conjunction with confidence estimates for the various engines and a trigram language model in order to score and rank a collection of sentence hypotheses that are synthetic combinations of. | Multi-Engine Machine Translation Guided by Explicit Word Matching Shyamsundar Jayaraman Language Technologies Institute Carnegie Mellon University Pittsburgh PA 15213 shyamj @ Alon Lavie Language Technologies Institute Carnegie Mellon University Pittsburgh PA 15213 alavie @ Abstract We describe a new approach for synthetically combining the output of several different Machine Translation MT engines operating on the same input. The goal is to produce a synthetic combination that surpasses all of the original systems in translation quality. Our approach uses the individual MT engines as black boxes and does not require any explicit cooperation from the original MT systems. A decoding algorithm uses explicit word matches in conjunction with confidence estimates for the various engines and a trigram language model in order to score and rank a collection of sentence hypotheses that are synthetic combinations of words from the various original engines. The highest scoring sentence hypothesis is selected as the final output of our system. Experiments using several Arabic-to-English systems of similar quality show a substantial improvement in the quality of the translation output. 1 Introduction A variety of different paradigms for machine translation MT have been developed over the years ranging from statistical systems that learn mappings between words and phrases in the source language and their corresponding translations in the target language to Interlingua-based systems that perform deep semantic analysis. Each approach and system has different advantages and disadvantages. While statistical systems provide broad coverage with little manpower the quality of the corpus based systems rarely reaches the quality of knowledge based systems. With such a wide range of approaches to machine translation it would be beneficial to have an effective framework for combining these systems into an MT system that carries many of the advantages of the individual

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.