Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Modiﬁed Distortion Matrices for Phrase-Based Statistical Machine Translation"

Hồng Quang 64 10 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

This paper presents a novel method to suggest long word reorderings to a phrase-based SMT decoder. We address language pairs where long reordering concentrates on few patterns, and use fuzzy chunk-based rules to predict likely reorderings for these phenomena. Then we use reordered n-gram LMs to rank the resulting permutations and select the n-best for translation. | Modified Distortion Matrices for Phrase-Based Statistical Machine Translation Arianna Bisazza and Marcello Federico Fondazione Bruno Kessler Trento Italy bisazza federico @fbk.eu Abstract This paper presents a novel method to suggest long word reorderings to a phrase-based SMT decoder. We address language pairs where long reordering concentrates on few patterns and use fuzzy chunk-based rules to predict likely reorderings for these phenomena. Then we use reordered n-gram LMs to rank the resulting permutations and select the n-best for translation. Finally we encode these reorderings by modifying selected entries of the distortion cost matrix on a per-sentence basis. In this way we expand the search space by a much finer degree than if we simply raised the distortion limit. The proposed techniques are tested on Arabic-English and German-English using well-known SMT benchmarks. 1 Introduction Despite the large research effort devoted to the modeling of word reordering this remains one of the main obstacles to the development of accurate SMT systems for many language pairs. On one hand the phrase-based approach PSMT Och 2002 Zens et al. 2002 Koehn et al. 2003 with its shallow and loose modeling of linguistic equivalences appears as the most competitive choice for closely related language pairs with similar clause structures both in terms of quality and of efficiency. On the other tree-based approaches Wu 1997 Yamada 2002 Chiang 2005 gain advantage at the cost of higher complexity and isomorphism assumptions on language pairs with radically different word orders. Lying between these two extremes are language pairs where most of the reordering happens locally 478 and where long reorderings can be isolated and described by a handful of linguistic rules. Notable examples are the family-unrelated Arabic-English and the related German-English language pairs. Interestingly on these pairs PSMT generally prevails over tree-based SMT1 producing overall high-quality outputs and

TÀI LIỆU LIÊN QUAN

Kỷ yếu tóm tắt báo cáo khoa học: Hội nghị khoa học tim mạch toàn quốc lần thứ XI - Hội tim mạch Quốc gia Việt Nam

Báo cáo nghiên cứu khoa học: "Danh lục các loài thú ở khu bảo tồn thiên nhiên Pù Huống tỉnh Nghệ An và ý nghĩa bảo tồn nguồn gen quí hiếm của chúng"

Báo cáo khoa học: Hỗ trợ nâng cao năng lực quản lý chất thải sinh hoạt tại thành phố Hội An

Báo cáo nghiên cứu khoa học: "Tính năng động nghệ thuật của văn học hiện đại Việt Nam và một cách nhìn hành trình thể loại"

Báo cáo nghiên cứu khoa học: " DỊCH CHUYỂN TRUY VẤN OQL VÀO CÁC PHÉP TÍNH BAO HÀM"

Báo cáo khoa học: " Áp dụng thủ tục phân tích trong kiểm toán báo cáo tài chính"

Báo cáo nghiên cứu khoa học: "Người lính trở về sau chiến tranh với mặc cảm “ăn mày dĩ vãng’ trong tiểu thuyết Chu Lai"

Báo cáo nghiên cứu khoa học: "Khảo sát hiện tượng chuyển đổi chức năng - nghĩa của động từ tiếng Việt"

Báo cáo nghiên cứu khoa học: " BẢN CHẤT KHOA HỌC VÀ CÁCH MẠNG LÀ CỘI NGUỒN SỨC SỐNG CỦA CHỦ NGHĨA MÁC - LÊNIN"

Báo cáo khoa học: " CẢI TIẾN CÁC THUẬT TOÁN MƯỢN VÀ KHOÁ KÊNH TẦN SỐ MẠNG DI ĐỘNG TẾ BÀO"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.