Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Syntactic analysis influences the way in which the source sentence is translated. Previous efforts add syntactic constraints to phrase-based translation by directly rewarding/punishing a hypothesis whenever it matches/violates source-side constituents. We present a new model that automatically learns syntactic constraints, including but not limited to constituent matching/violation, from training corpus. The model brackets a source phrase as to whether it satisfies the learnt syntactic constraints. The bracketed phrases are then translated as a whole unit by the decoder. . | A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong Min Zhang Aiti Aw and Haizhou Li Human Language Technology Institute for Infocomm Research 1 Fusionopolis Way 21-01 South Connexis Singapore 138632 dyxiong mzhang aaiti hli @i2r.a-star.edu.sg Abstract Syntactic analysis influences the way in which the source sentence is translated. Previous efforts add syntactic constraints to phrase-based translation by directly rewarding punishing a hypothesis whenever it matches violates source-side constituents. We present a new model that automatically learns syntactic constraints including but not limited to constituent matching violation from training corpus. The model brackets a source phrase as to whether it satisfies the learnt syntactic constraints. The bracketed phrases are then translated as a whole unit by the decoder. Experimental results and analysis show that the new model outperforms other previous methods and achieves a substantial improvement over the baseline which is not syntactically informed. 1 Introduction The phrase-based approach is widely adopted in statistical machine translation SMT . It segments a source sentence into a sequence of phrases then translates and reorder these phrases in the target. In such a process original phrase-based decoding Koehn et al. 2003 does not take advantage of any linguistic analysis which however is broadly used in rule-based approaches. Since it is not linguistically motivated original phrasebased decoding might produce ungrammatical or even wrong translations. Consider the following Chinese fragment with its parse tree Src ffi 7 11 S NP r W W MW L NP PP VP IP VP Ref established July 11 as Sailing Festival day Output to ÍE set up w for W naviga-tion MW on July 11 7 110 knots L The output is generated from a phrase-based system which does not involve any syntactic analysis. Here we use straight orientation and 0 inverted orientation to denote the common structure of the source fragment and its .