Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We present a set of algorithms that enable us to translate natural language sentences by exploiting both a translation memory and a statistical-based translation model. Our results show that an automatically derived translation memory can be used within a statistical framework to often find translations of higher probability than those found using solely a statistical model. The translations produced using both the translation memory and the statistical model are significantly better than translations produced by two commercial systems: our hybrid system translated perfectly 58% of the 505 sentences in a test collection, while the commercial systems translated perfectly only. | Towards a Unified Approach to Memory- and Statistical-Based Machine Translation Daniel Marcu Information Sciences Institute and Department of Computer Science University of Southern California 4676 Admiralty Way Suite 1001 Marina del Rey CA 90292 marcu@isi.edu Abstract We present a set of algorithms that enable us to translate natural language sentences by exploiting both a translation memory and a statistical-based translation model. Our results show that an automatically derived translation memory can be used within a statistical framework to often find translations of higher probability than those found using solely a statistical model. The translations produced using both the translation memory and the statistical model are significantly better than translations produced by two commercial systems our hybrid system translated perfectly 58 of the 505 sentences in a test collection while the commercial systems translated perfectly only 40-42 of them. 1 Introduction Over the last decade much progress has been made in the fields of example-based EBMT and statistical machine translation SMT . EBMT systems work by modifying existing human produced translation instances which are stored in a translation memory TMEM . Many methods have been proposed for storing translation pairs in a TMEM finding translation examples that are relevant for translating unseen sentences and modifying and integrating translation fragments to produce correct outputs. Sato 1992 for example stores complete parse trees in the TMEM and selects and generates new translations by performing similarity matchings on these trees. Veale and Way 1997 store complete sentences new translations are generated by modifying the TMEM translation that is most similar to the input sentence. Others store phrases new translations are produced by optimally partitioning the input into phrases that match examples from the TMEM Maruyana and Watanabe 1992 or by finding all partial matches and then choosing the best .