TAILIEUCHUNG - Báo cáo khoa học: "Toward Statistical Machine Translation without Parallel Corpora"

We estimate the parameters of a phrasebased statistical machine translation system from monolingual corpora instead of a bilingual parallel corpus. We extend existing research on bilingual lexicon induction to estimate both lexical and phrasal translation probabilities for MT-scale phrasetables. We propose a novel algorithm to estimate reordering probabilities from monolingual data. We report translation results for an end-to-end translation system using these monolingual features alone. Our method only requires monolingual corpora in source and target languages, a small bilingual dictionary, and a small bitext for tuning feature weights. In this paper, we examine an idealization where a phrase-table is. | Toward Statistical Machine Translation without Parallel Corpora Alexandre Klementiev Ann Irvine Chris Callison-Burch David Yarowsky Center for Language and Speech Processing Johns Hopkins University Abstract We estimate the parameters of a phrasebased statistical machine translation system from monolingual corpora instead of a bilingual parallel corpus. We extend existing research on bilingual lexicon induction to estimate both lexical and phrasal translation probabilities for MT-scale phrasetables. We propose a novel algorithm to estimate reordering probabilities from monolingual data. We report translation results for an end-to-end translation system using these monolingual features alone. Our method only requires monolingual corpora in source and target languages a small bilingual dictionary and a small bitext for tuning feature weights. In this paper we examine an idealization where a phrase-table is given. We examine the degradation in translation performance when bilingually estimated translation probabilities are removed and show that 80 of the loss can be recovered with monolingually estimated features alone. We further show that our monolingual features add BLEU points when combined with standard bilingually estimated phrase table features. 1 Introduction The parameters of statistical models of translation are typically estimated from large bilingual parallel corpora Brown et al. 1993 . However these resources are not available for most language pairs and they are expensive to produce in quantities sufficient for building a good translation system Germann 2001 . We attempt an entirely different approach we use cheap and plentiful monolingual resources to induce an end-to-end statistical machine translation system. In particular we extend the long line of work on inducing translation lexicons beginning with Rapp 1995 and propose to use multiple independent cues present in monolingual texts to estimate lexical and phrasal translation probabilities for .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.