TAILIEUCHUNG - Dependency-based Pre-ordering For English-Vietnamese Statistical Machine Translation
In this paper, we present an approach as pre-processing step based on a dependency parser in phrase-based statistical machine translation (SMT) to learn automatic and manual reordering rules from English to Vietnamese. The dependency parse trees and transformation rules are used to reorder the source sentences and applied for systems translating from English to Vietnamese. We evaluated our approach on English-Vietnamese machine translation tasks, and showed that it outperforms the baseline phrase-based SMT system. | VNU Journal of Science: Comp. Science & Com. Eng., Vol. 33, No. 2 (2017) 14-27 Dependency-based Pre-ordering For English-Vietnamese Statistical Machine Translation Tran Hong Viet1,2,*, Nguyen Van Vinh2, Vu Thuong Huyen3, Nguyen Le Minh4 1 University of Economic and Technical Industries, Hanoi, Vietnam VNU University of Engineering and Technology, 144 Xuan Thuy, Cau Giay, Hanoi, Vietnam 3 Thuy Loi University, Hanoi, Vietnam 4 Japan Advanced Institute of Science and Technolog 2 Abstract Reordering is a major challenge in machine translation (MT) between two languages with significant differences in word order. In this paper, we present an approach as pre-processing step based on a dependency parser in phrase-based statistical machine translation (SMT) to learn automatic and manual reordering rules from English to Vietnamese. The dependency parse trees and transformation rules are used to reorder the source sentences and applied for systems translating from English to Vietnamese. We evaluated our approach on English-Vietnamese machine translation tasks, and showed that it outperforms the baseline phrase-based SMT system. Received 16 May 2017; Revised 07 Sep 2017; Accepted 29 Sep 2017 Keywords: Natural Language Processing, Machine Translation, Phrase-based Statistical Machine Translation. 1. Introduction* strengths of phrases, while incorporating syntax into SMT. Some approaches were applied at the word level [3]. They are useful for language with rich morphology, for reducing data sparseness. Other kinds of syntax reordering methods require parser trees, such as the work in [3]. The parsed tree is more powerful in capturing the sentence structure. However, it is expensive to create tree structure and build a good quality parser. All the above approaches require much decoding time, which is expensive. The approach that we are interested in is balancing the quality of translation with decoding time. Reordering approaches as a preprocessing step [5, 21, 27] are .
đang nạp các trang xem trước