TAILIEUCHUNG - Báo cáo khoa học: "A Localized Prediction Model for Statistical Machine Translation"

In this paper, we present a block-based model for statistical machine translation. A block is a pair of phrases which are translations of each other. For example, Fig. 1 shows an Arabic-English translation example that uses blocks. During decoding, we view translation as a block segmentation process, where the input sentence is segmented from left to right and the target sentence is generated from bottom to top, one block at a time. A monotone block sequence is generated except for the possibility to swap a pair of neighbor blocks. We use an orientation model similar to the lexicalized block. | A Localized Prediction Model for Statistical Machine Translation Christoph Tillmann and Tong Zhang IBM TJ. Watson Research Center Yorktown Heights NY 10598 USA ctill tzhang @ Abstract In this paper we present a novel training method for a localized phrase-based prediction model for statistical machine translation SMT . The model predicts blocks with orientation to handle local phrase re-ordering. We use a maximum likelihood criterion to train a log-linear block bigram model which uses realvalued features . a language model score as well as binary features based on the block identities themselves . block bigram features. Our training algorithm can easily handle millions of features. The best system obtains a improvement over the baseline on a standard Arabic-English translation task. aữspa Lebanese vlolale warplanes Israeli A 1 A 1 A 1 Ì n A 1 A Ị A 1 T H A t m j 1 Ặ r s h j w b b r k A y n r y A 1 A A p n t y y 1 Introduction In this paper we present a block-based model for statistical machine translation. A block is a pair of phrases which are translations of each other. For example Fig. 1 shows an Arabic-English translation example that uses 4 blocks. During decoding we view translation as a block segmentation process where the input sentence is segmented from left to right and the target sentence is generated from bottom to top one block at a time. A monotone block sequence is generated except for the possibility to swap a pair of neighbor blocks. We use an orientation model similar to the lexicalized block re-ordering model in Tillmann 2004 Och et al. 2004 to generate a block b with orientation Ỡ relative to its predecessor block 6 . During decoding we compute the probability p b o of a block sequence 6 with orientation o as a product of block bigram probabilities 011 -1 01-1 1 i l y p Figure 1 An Arabic-English block translation example where the Arabic words are romanized. The following orientation sequence is generated 01 N 02 L o3 N Ỡ4

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.