TAILIEUCHUNG - Báo cáo khoa học: "Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation"

In this paper we present a novel discriminative mixture model for statistical machine translation (SMT). We model the feature space with a log-linear combination of multiple mixture components. Each component contains a large set of features trained in a maximumentropy framework. All features within the same mixture component are tied and share the same mixture weights, where the mixture weights are trained discriminatively to maximize the translation performance. | Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation Bing Xiang and Abraham Ittycheriah IBM T. J. Watson Research Center Yorktown Heights NY 10598 bxiang abei @ Abstract In this paper we present a novel discriminative mixture model for statistical machine translation SMT . We model the feature space with a log-linear combination of multiple mixture components. Each component contains a large set of features trained in a maximumentropy framework. All features within the same mixture component are tied and share the same mixture weights where the mixture weights are trained discriminatively to maximize the translation performance. This approach aims at bridging the gap between the maximum-likelihood training and the discriminative training for SMT. It is shown that the feature space can be partitioned in a variety of ways such as based on feature types word alignments or domains for various applications. The proposed approach improves the translation performance significantly on a large-scale Arabic-to-English MT task. 1 Introduction Significant progress has been made in statistical machine translation SMT in recent years. Among all the proposed approaches the phrasebased method Koehn et al. 2003 has become the widely adopted one in SMT due to its capability of capturing local context information from adjacent words. There exists significant amount of work focused on the improvement of translation performance with better features. The feature set could be either small at the order of 10 or large up to millions . For example the system described in Koehn 424 et al. 2003 is a widely known one using small number of features in a maximum-entropy log-linear model Och and Ney 2002 . The features include phrase translation probabilities lexical probabilities number of phrases and language model scores etc. The feature weights are usually optimized with minimum error rate training MERT as in Och 2003 . Besides the MERT-based feature .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.