Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We present an SCFG binarization algorithm that combines the strengths of early terminal matching on the source language side and early language model integration on the target language side. We also examine how different strategies of target-side terminal attachment during binarization can significantly affect translation quality. | Terminal-Aware Synchronous Binarization Licheng Fang Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester NY 14627 Abstract We present an SCFG binarization algorithm that combines the strengths of early terminal matching on the source language side and early language model integration on the target language side. We also examine how different strategies of target-side terminal attachment during binarization can significantly affect translation quality. 1 Introduction Synchronous context-free grammars SCFG are behind most syntax-based machine translation models. Efficient machine translation decoding with an SCFG requires converting the grammar into a binarized form either explicitly as in synchronous binarization Zhang et al. 2006 where virtual nonterminals are generated for binarization or implicitly as in Earley parsing Earley 1970 where dotted items are used. Given a source-side binarized SCFG with terminal set T and nonterminal set N the time complexity of decoding a sentence of length n with a m-gram language model is Venugopal et al. 2007 O n3 N T 2 m-1 K where K is the maximum number of right-hand-side nonterminals. SCFG binarization serves two important goals Parsing complexity for unbinarized SCFG grows exponentially with the number of nonterminals on the right-hand side of grammar rules. Binarization ensures cubic time decoding in terms of input sentence length. 401 In machine translation integrating language model states as early as possible is essential to reducing search errors. Synchronous binarization Zhang et al. 2006 enables the decoder to incorporate language model scores as soon as a binarized rule is applied. In this paper we examine a CYK-like synchronous binarization algorithm that integrates a novel criterion in a unified semiring parsing framework. The criterion we present has explicit consideration of source-side terminals. In general terminals in a rule have a lower probability of being matched