Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. . | Learning Hierarchical Translation Structure with Linguistic Annotations Markos Mylonakis Khalil Sima an ILLC ILLC University of Amsterdam m.mylonakis@uva.nl University of Amsterdam k.simaan@uva.nl Abstract While it is generally accepted that many translation phenomena are correlated with linguistic structures employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and or target language parse employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters by selecting and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source mounting up to 1.92 BLEU for Chinese as target. 1 Introduction Recent advances in Statistical Machine Translation SMT are widely centred around two concepts a hierarchical translation processes frequently employing Synchronous Context Free Grammars SCFGs and b transduction or synchronous rewrite processes over a linguistic syntactic tree. SCFGs in the form of the Inversion-Transduction Grammar ITG were first introduced by Wu 1997 as a formalism to recursively describe the translation process. The Hiero system Chiang 2005 642 utilised an ITG-flavour which focused on hierarchical phrase-pairs to capture context-driven translation and reordering patterns with gaps offering competitive performance particularly for language pairs with extensive reordering. As Hiero uses a single non-terminal and