TAILIEUCHUNG - Báo cáo khoa học: "On-line Language Model Biasing for Statistical Machine Translation"

The language model (LM) is a critical component in most statistical machine translation (SMT) systems, serving to establish a probability distribution over the hypothesis space. Most SMT systems use a static LM, independent of the source language input. While previous work has shown that adapting LMs based on the input improves SMT performance, none of the techniques has thus far been shown to be feasible for on-line systems. | On-line Language Model Biasing for Statistical Machine Translation Sankaranarayanan Ananthakrishnan Rohit Prasad and Prem Natarajan Raytheon BBN Technologies Cambridge MA 02138 . sanantha rprasad pnataraj @ Abstract The language model LM is a critical component in most statistical machine translation SMT systems serving to establish a probability distribution over the hypothesis space. Most SMT systems use a static LM independent of the source language input. While previous work has shown that adapting LMs based on the input improves SMT performance none of the techniques has thus far been shown to be feasible for on-line systems. In this paper we develop a novel measure of cross-lingual similarity for biasing the LM based on the test input. We also illustrate an efficient on-line implementation that supports integration with on-line SMT systems by transferring much of the computational load off-line. Our approach yields significant reductions in target perplexity compared to the static LM as well as consistent improvements in SMT performance across language pairs English-Dari and English-Pashto . 1 Introduction While much of the focus in developing a statistical machine translation SMT system revolves around the translation model TM most systems do not emphasize the role of the language model LM . The latter generally follows a n-gram structure and is estimated from a large monolingual corpus of target sentences. In most systems the LM is independent of the test input . fixed n-gram probabilities determine the likelihood of all translation hypotheses regardless of the source input. The views expressed are those of the author and do not reflect the official policy or position of s f i 5 Some previous work exists in LM adaptation for SMT. Snover et al. 2008 used a cross-lingual information retrieval CLIR system to select a subset of target documents comparable to the source document bias LMs estimated from these subsets were interpolated with a static

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
10    179    3    25-12-2024
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.