TAILIEUCHUNG - Báo cáo khoa học: "Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices"

Minimum Error Rate Training (MERT) and Minimum Bayes-Risk (MBR) decoding are used in most current state-of-theart Statistical Machine Translation (SMT) systems. The algorithms were originally developed to work with N -best lists of translations, and recently extended to lattices that encode many more hypotheses than typical N -best lists. We here extend lattice-based MERT and MBR algorithms to work with hypergraphs that encode a vast number of translations produced by MT systems based on Synchronous Context Free Grammars. These algorithms are more efficient than the lattice-based versions presented earlier. . | Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices Shankar Kumar1 and Wolfgang Macherey1 and Chris Dyer2 and Franz Och1 1Google Inc. 1600 Amphitheatre Pkwy. Mountain View CA 94043 USA shankarkumar wmach och @ Abstract Minimum Error Rate Training MERT and Minimum Bayes-Risk MBR decoding are used in most current state-of-the-art Statistical Machine Translation SMT systems. The algorithms were originally developed to work with N-best lists of translations and recently extended to lattices that encode many more hypotheses than typical N-best lists. We here extend lattice-based MERT and MBR algorithms to work with hypergraphs that encode a vast number of translations produced by MT systems based on Synchronous Context Free Grammars. These algorithms are more efficient than the lattice-based versions presented earlier. We show how MERT can be employed to optimize parameters for MBR decoding. Our experiments show speedups from MERT and MBR as well as performance improvements from MBR decoding on several language pairs. 1 Introduction Statistical Machine Translation SMT systems have improved considerably by directly using the error criterion in both training and decoding. By doing so the system can be optimized for the translation task instead of a criterion such as likelihood that is unrelated to the evaluation metric. Two popular techniques that incorporate the error criterion are Minimum Error Rate Training MERT Och 2003 and Minimum Bayes-Risk MBR decoding Kumar and Byrne 2004 . These two techniques were originally developed for N-best lists of translation hypotheses and recently extended to translation lattices Macherey et al. 2008 Tromble et al. 2008 generated by a phrase-based SMT system Och and Ney 2004 . Translation lattices contain a significantly higher 2Department of Linguistics University of Maryland College Park MD 20742 USA redpony@ number of translation alternatives relative to .

Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.