Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to the lattice decoder. . | Paraphrase Lattice for Statistical Machine Translation Takashi Onishi and Masao Utiyama and Eiichiro Sumita Language Translation Group MASTAR Project National Institute of Information and Communications Technology 3-5 Hikaridai Keihanna Science City Kyoto 619-0289 JAPaN takashi.onishi mutiyama eiichiro.sumita @nict.go.jp Abstract Lattice decoding in statistical machine translation SMT is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then we give the paraphrase lattice as an input to the lattice decoder. The decoder selects the best path for decoding. Using these paraphrase lattices as inputs we obtained significant gains in BLEU scores for IWSLT and Europarl datasets. 1 Introduction Lattice decoding in SMT is useful in speech translation and in the translation of German Bertoldi et al. 2007 Dyer 2009 . In speech translation by using lattices that represent not only 1-best result but also other possibilities of speech recognition we can take into account the ambiguities of speech recognition. Thus the translation quality for lattice inputs is better than the quality for 1best inputs. In this paper we show that lattice decoding is also useful for handling input variations. Input variations refers to the differences of input texts with the same meaning. For example Is there a beauty salon and Is there a beauty parlor have the same meaning with variations in beauty salon and beauty parlor . Since these variations are frequently found in natural language texts a mismatch of the expressions in source sentences and the expressions in training corpus leads to a decrease in translation quality. Therefore we propose a novel method that .