Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. | How Much Can We Gain from Supervised Word Alignment Jinxi Xu and Jinying Chen Raytheon BBN Technologies 10 Moulton Street Cambridge MA 02138 USA jxu j chen @bbn.com Abstract Word alignment is a central problem in statistical machine translation SMT . In recent years supervised alignment algorithms which improve alignment accuracy by mimicking human alignment have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned Chinese-English corpus with 280K words recently released by the Linguistic Data Consortium LDC . We treated the human alignment as the oracle of supervised alignment. The result is surprising the gain of human alignment over a state of the art unsupervised method GIZA is less than 1 point in BLEU. Furthermore we showed the benefit of improved alignment becomes smaller with more training data implying the above limit also holds for large training conditions. 1 Introduction Word alignment is a central problem in statistical machine translation SMT . A recent trend in this area of research is to exploit supervised learning to improve alignment accuracy by mimicking human alignment. Studies in this line of work include Haghighi et al. 2009 DeNero and Klein 2010 Setiawan et al. 2010 just to name a few. The objective of this work is to explore the performance limit of supervised word alignment. 165 More specifically we would like to know what magnitude of gain in MT performance we can expect from supervised alignment over the state of the art unsupervised alignment if we have access to a large amount of parallel data. Since alignment errors have been assumed to be a major hindrance to good MT an answer to such a question might help us find new directions in MT research. Our method is to use human alignment as the oracle of supervised learning and compare its performance against that of GIZA Och and Ney 2003 a state of .