Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
The language model (LM) is a critical component in most statistical machine translation (SMT) systems, serving to establish a probability distribution over the hypothesis space. Most SMT systems use a static LM, independent of the source language input. While previous work has shown that adapting LMs based on the input improves SMT performance, none of the techniques has thus far been shown to be feasible for on-line systems. | On-line Language Model Biasing for Statistical Machine Translation Sankaranarayanan Ananthakrishnan Rohit Prasad and Prem Natarajan Raytheon BBN Technologies Cambridge MA 02138 U.S.A. sanantha rprasad pnataraj @bbn.com Abstract The language model LM is a critical component in most statistical machine translation SMT systems serving to establish a probability distribution over the hypothesis space. Most SMT systems use a static LM independent of the source language input. While previous work has shown that adapting LMs based on the input improves SMT performance none of the techniques has thus far been shown to be feasible for on-line systems. In this paper we develop a novel measure of cross-lingual similarity for biasing the LM based on the test input. We also illustrate an efficient on-line implementation that supports integration with on-line SMT systems by transferring much of the computational load off-line. Our approach yields significant reductions in target perplexity compared to the static LM as well as consistent improvements in SMT performance across language pairs English-Dari and English-Pashto . 1 Introduction While much of the focus in developing a statistical machine translation SMT system revolves around the translation model TM most systems do not emphasize the role of the language model LM . The latter generally follows a n-gram structure and is estimated from a large monolingual corpus of target sentences. In most systems the LM is independent of the test input i.e. fixed n-gram probabilities determine the likelihood of all translation hypotheses regardless of the source input. The views expressed are those of the author and do not reflect the official policy or position of s f i 5 Some previous work exists in LM adaptation for SMT. Snover et al. 2008 used a cross-lingual information retrieval CLIR system to select a subset of target documents comparable to the source document bias LMs estimated from these subsets were interpolated with a static