TAILIEUCHUNG - Báo cáo khoa học: "Terminal-Aware Synchronous Binarization"

We present an SCFG binarization algorithm that combines the strengths of early terminal matching on the source language side and early language model integration on the target language side. We also examine how different strategies of target-side terminal attachment during binarization can significantly affect translation quality. | Terminal-Aware Synchronous Binarization Licheng Fang Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester NY 14627 Abstract We present an SCFG binarization algorithm that combines the strengths of early terminal matching on the source language side and early language model integration on the target language side. We also examine how different strategies of target-side terminal attachment during binarization can significantly affect translation quality. 1 Introduction Synchronous context-free grammars SCFG are behind most syntax-based machine translation models. Efficient machine translation decoding with an SCFG requires converting the grammar into a binarized form either explicitly as in synchronous binarization Zhang et al. 2006 where virtual nonterminals are generated for binarization or implicitly as in Earley parsing Earley 1970 where dotted items are used. Given a source-side binarized SCFG with terminal set T and nonterminal set N the time complexity of decoding a sentence of length n with a m-gram language model is Venugopal et al. 2007 O n3 N T 2 m-1 K where K is the maximum number of right-hand-side nonterminals. SCFG binarization serves two important goals Parsing complexity for unbinarized SCFG grows exponentially with the number of nonterminals on the right-hand side of grammar rules. Binarization ensures cubic time decoding in terms of input sentence length. 401 In machine translation integrating language model states as early as possible is essential to reducing search errors. Synchronous binarization Zhang et al. 2006 enables the decoder to incorporate language model scores as soon as a binarized rule is applied. In this paper we examine a CYK-like synchronous binarization algorithm that integrates a novel criterion in a unified semiring parsing framework. The criterion we present has explicit consideration of source-side terminals. In general terminals in a rule have a lower probability of being matched

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.