TAILIEUCHUNG - Báo cáo khoa học: "N-gram-based Statistical Machine Translation versus Syntax Augmented Machine Translation: comparison and system combination"

In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual units related to word-to-word alignment and statistical modeling of the bilingual context following a maximumentropy framework. . | N-gram-based Statistical Machine Translation versus Syntax Augmented Machine Translation comparison and system combination Maxim Khalilov and José . Fonollosa Universitat Politècnica de Catalunya Campus Nord UPC 08034 Barcelona Spain khalilov adrian @ Abstract In this paper we compare and contrast two approaches to Machine Translation MT the CMU-UKA Syntax Augmented Machine Translation system SAMT and UPC-TALP N-gram-based Statistical Machine Translation SMT . SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT the translation process is based on bilingual units related to word-to-word alignment and statistical modeling of the bilingual context following a maximumentropy framework. We provide a step-by-step comparison of the systems and report results in terms of automatic evaluation metrics and required computational resources for a smaller Arabic-to-English translation task tokens in the training corpus . Human error analysis clarifies advantages and disadvantages of the systems under consideration. Finally we combine the output of both systems to yield significant improvements in translation quality. 1 Introduction There is an ongoing controversy regarding whether or not information about the syntax of language can benefit MT or contribute to a hybrid system. Classical IBM word-based models were recently augmented with a phrase translation capability as shown in Koehn et al. 2003 or in more recent implementation the MOSES MT sys-tem1 Koehn et al. 2007 . In parallel to the phrasebased approach the N-gram-based approach appeared Marino et al. 2006 . It stemms from 1 moses the Finite-State Transducers paradigm and is extended to the log-linear modeling framework as shown in Marino et al. 2006 . A system following this approach deals with bilingual units called tuples which are composed of one or more words from the source language and zero or .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.