Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Automatic word alignment is a key step in training statistical machine translation systems. Despite much recent work on word alignment methods, alignment accuracy increases often produce little or no improvements in machine translation quality. In this work we analyze a recently proposed agreement-constrained EM algorithm for unsupervised alignment models. We attempt to tease apart the effects that this simple but effective modification has on alignment precision and recall trade-offs, and how rare and common words are affected across several language pairs. . | Better Alignments Better Translations Kuzman Ganchev Computer Information Science University of Pennsylvania kuzman@cis.upenn.edu Joao V. Graca L2f INESC-ID Lisboa Portugal javg@l2f.inesc-id.pt Ben Taskar Computer Information Science University of Pennsylvania taskar@cis.upenn.edu Abstract Automatic word alignment is a key step in training statistical machine translation systems. Despite much recent work on word alignment methods alignment accuracy increases often produce little or no improvements in machine translation quality. In this work we analyze a recently proposed agreement-constrained EM algorithm for unsupervised alignment models. We attempt to tease apart the effects that this simple but effective modification has on alignment precision and recall trade-offs and how rare and common words are affected across several language pairs. We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems and show significant gains as measured by BLEU score in end-to-end translation systems for six languages pairs used in recent MT competitions. 1 Introduction The typical pipeline for a machine translation MT system starts with a parallel sentence-aligned corpus and proceeds to align the words in every sentence pair. The word alignment problem has received much recent attention but improvements in standard measures of word alignment performance often do not result in better translations. Fraser and Marcu 2007 note that none of the tens of papers published over the last five years has shown that significant decreases in alignment error rate AER result in significant increases in translation performance. In this work we show that by changing the way the word alignment models are trained and used we can get not only improvements in alignment performance but also in the performance of the MT system that uses those alignments. We present extensive experimental results evaluating a new training