TAILIEUCHUNG - Báo cáo khoa học: "Modified Distortion Matrices for Phrase-Based Statistical Machine Translation"

This paper presents a novel method to suggest long word reorderings to a phrase-based SMT decoder. We address language pairs where long reordering concentrates on few patterns, and use fuzzy chunk-based rules to predict likely reorderings for these phenomena. Then we use reordered n-gram LMs to rank the resulting permutations and select the n-best for translation. | Modified Distortion Matrices for Phrase-Based Statistical Machine Translation Arianna Bisazza and Marcello Federico Fondazione Bruno Kessler Trento Italy bisazza federico @ Abstract This paper presents a novel method to suggest long word reorderings to a phrase-based SMT decoder. We address language pairs where long reordering concentrates on few patterns and use fuzzy chunk-based rules to predict likely reorderings for these phenomena. Then we use reordered n-gram LMs to rank the resulting permutations and select the n-best for translation. Finally we encode these reorderings by modifying selected entries of the distortion cost matrix on a per-sentence basis. In this way we expand the search space by a much finer degree than if we simply raised the distortion limit. The proposed techniques are tested on Arabic-English and German-English using well-known SMT benchmarks. 1 Introduction Despite the large research effort devoted to the modeling of word reordering this remains one of the main obstacles to the development of accurate SMT systems for many language pairs. On one hand the phrase-based approach PSMT Och 2002 Zens et al. 2002 Koehn et al. 2003 with its shallow and loose modeling of linguistic equivalences appears as the most competitive choice for closely related language pairs with similar clause structures both in terms of quality and of efficiency. On the other tree-based approaches Wu 1997 Yamada 2002 Chiang 2005 gain advantage at the cost of higher complexity and isomorphism assumptions on language pairs with radically different word orders. Lying between these two extremes are language pairs where most of the reordering happens locally 478 and where long reorderings can be isolated and described by a handful of linguistic rules. Notable examples are the family-unrelated Arabic-English and the related German-English language pairs. Interestingly on these pairs PSMT generally prevails over tree-based SMT1 producing overall high-quality outputs and

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.