Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Our method is based on the framework of boosting. First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. | Boosting-based System Combination for Machine Translation Tong Xiao Jingbo Zhu Muhua Zhu Huizhen Wang Natural Language Processing Lab. Northeastern University China xiaotong zhujingbo wanghuizhen @mail.neu.edu.cn zhumuhua@gmail.com Abstract In this paper we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation SMT engine for system combination. Our method is based on the framework of boosting. First a sequence of weak translation systems is generated from a baseline system in an iterative manner. Then a strong translation system is built from the ensemble of these weak translation systems. To adapt boosting to SMT system combination several key components of the original boosting algorithms are redesigned in this work. We evaluate our method on Chinese-to-English Machine Translation MT tasks in three baseline systems including a phrase-based system a hierarchical phrasebased system and a syntax-based system. The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems. 1 Introduction Recent research on Statistical Machine Translation SMT has achieved substantial progress. Many SMT frameworks have been developed including phrase-based SMT Koehn et al. 2003 hierarchical phrase-based SMT Chiang 2005 syntax-based SMT Eisner 2003 Ding and Palmer 2005 Liu et al. 2006 Galley et al. 2006 Cowan et al. 2006 etc. With the emergence of various structurally different SMT systems more and more studies are focused on combining multiple SMT systems for achieving higher translation accuracy rather than using a single translation system. The basic idea of system combination is to extract or generate a translation by voting from an ensemble of translation outputs. Depending on how the translation is combined and what voting strategy is adopted several methods can be used for system