TAILIEUCHUNG - Báo cáo khoa học: "Continuous Space Language Models for Statistical Machine Translation"

Statistical machine translation systems are based on one or more translation models and a language model of the target language. While many different translation models and phrase extraction algorithms have been proposed, a standard word n-gram back-off language model is used in most systems. In this work, we propose to use a new statistical language model that is based on a continuous representation of the words in the vocabulary. A neural network is used to perform the projection and the probability estimation. . | Continuous Space Language Models for Statistical Machine Translation Holger Schwenk and Daniel Dchelotte and Jean-Luc Gauvain LIMSI-CNRS BP 133 91403 Orsay cedex FRANCE schwenk dechelot gauvain @ Abstract Statistical machine translation systems are based on one or more translation models and a language model of the target language. While many different translation models and phrase extraction algorithms have been proposed a standard word n-gram back-off language model is used in most systems. In this work we propose to use a new statistical language model that is based on a continuous representation of the words in the vocabulary. A neural network is used to perform the projection and the probability estimation. We consider the translation of European Parliament Speeches. This task is part of an international evaluation organized by the Tc-Star project in 2006. The proposed method achieves consistent improvements in the BLEU score on the development and test data. We also present algorithms to improve the estimation of the language model probabilities when splitting long sentences into shorter chunks. 1 Introduction The goal of statistical machine translation SMT is to produce a target sentence e from a source sentence f. Among all possible target sentences the one with maximal probability is chosen. The classical Bayes relation is used to introduce a target language model Brown et al. 1993 e argmaxPr e f argmaxPr f e Pr e where Pr f e is the translation model and Pr e is the target language model. This approach is usually referred to as the noisy source-channel approach in statistical machine translation. Since the introduction of this basic model many improvements have been made but it seems that research is mainly focused on better translation and alignment models or phrase extraction algorithms as demonstrated by numerous publications on these topics. On the other hand we are aware of only a small amount of papers investigating new approaches to .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.