TAILIEUCHUNG - Báo cáo khoa học: " A Decoder for Syntax-based Statistical MT"

A statistical machine translation system based on the noisy channel model consists of three components: a language model (LM), a translation model (TM), and a decoder. For a system which translates from to English , the LM gives a foreign language a prior probability P and the TM gives a channel translation probability P . These models are automatically trained using monolingual (for the LM) and bilingual (for the TM) corpora. | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 303-310. A Decoder for Syntax-based Statistical MT Kenji Yamada and Kevin Knight Information Sciences Institute University of Southern California 4676 Admiralty Way Suite 1001 Marina del Rey CA 90292 kyamada knight @ Abstract This paper describes a decoding algorithm for a syntax-based translation model Yamada and Knight 2001 . The model has been extended to incorporate phrasal translations as presented here. In contrast to a conventional word-to-word statistical model a decoder for the syntaxbased model builds up an English parse tree given a sentence in a foreign language. As the model size becomes huge in a practical setting and the decoder considers multiple syntactic structures for each word alignment several pruning techniques are necessary. We tested our decoder in a Chinese-to-English translation system and obtained better results than IBM Model 4. We also discuss issues concerning the relation between this decoder and a language model. 1 Introduction A statistical machine translation system based on the noisy channel model consists of three components a language model LM a translation model TM and a decoder. For a system which translates from a foreign language F to English E the LM gives a prior probability P F and the TM gives a channel translation probability P F F . These models are automatically trained using monolingual for the LM and bilingual for the TM corpora. A decoder then finds the best English sentence given a foreign sentence that maximizes P F F which also maximizes P F E P E according to Bayes rule. A different decoder is needed for different choices of LM and TM. Since P E and P F E are not simple probability tables but are parameterized models a decoder must conduct a search over the space defined by the models. For the IBM models defined by a pioneering paper Brown et al. 1993 a decoding algorithm based on a .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.