TAILIEUCHUNG - Báo cáo khoa học: "Partial Matching Strategy for Phrase-based Statistical Machine Translation"

This paper presents a partial matching strategy for phrase-based statistical machine translation (PBSMT). Source phrases which do not appear in the training corpus can be translated by word substitution according to partially matched phrases. The advantage of this method is that it can alleviate the data sparseness problem if the amount of bilingual corpus is limited. | Partial Matching Strategy for Phrase-based Statistical Machine Translation Zhongjun He1 2 and Qun Liu1 and Shouxun Lin1 1 Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China 2 Graduate University of Chinese Academy of Sciences Beijing 100049 China zjhe liuqun sxlin @ Abstract This paper presents a partial matching strategy for phrase-based statistical machine translation PBSMT . Source phrases which do not appear in the training corpus can be translated by word substitution according to partially matched phrases. The advantage of this method is that it can alleviate the data sparseness problem if the amount of bilingual corpus is limited. We incorporate our approach into the state-of-the-art PBSMT system Moses and achieve statistically significant improvements on both small and large corpora. 1 Introduction Currently most of the phrase-based statistical machine translation PBSMT models Marcu and Wong 2002 Koehn et al. 2003 adopt full matching strategy for phrase translation which means that a phrase pair f e can be used for translating a source phrase only if f f. Due to lack of generalization ability the full matching strategy has some limitations. On one hand the data sparseness problem is serious especially when the amount of the bilingual data is limited. On the other hand for a certain source text the phrase table is redundant since most of the bilingual phrases cannot be fully matched. In this paper we address the problem of translation of unseen phrases the source phrases that are not observed in the training corpus. The alignment template model Och and Ney 2004 enhanced phrasal generalizations by using words classes rather than the words themselves. But the phrases are overly generalized. The hierarchical phrase-based model Chiang 2005 used hierarchical phrase pairs to strengthen the generalization ability of phrases and allow long distance reorderings. However the

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.