TAILIEUCHUNG - Báo cáo khoa học: "Efficient Multi-pass Decoding for Synchronous Context Free Grammars"

We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model. The trigram pass closes most of the performance gap between a bigram decoder and a much slower trigram decoder, but takes time that is insignificant in comparison to the bigram pass. An additional fast decoding pass maximizing the expected count of correct translation hypotheses increases the BLEU score significantly. . | Efficient Multi-pass Decoding for Synchronous Context Free Grammars Hao Zhang and Daniel Gildea Computer Science Department University of Rochester Rochester NY 14627 Abstract We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models the first pass uses a bigram language model and the resulting parse forest is used in the second pass to guide search with a trigram language model. The trigram pass closes most of the performance gap between a bigram decoder and a much slower trigram decoder but takes time that is insignificant in comparison to the bigram pass. An additional fast decoding pass maximizing the expected count of correct translation hypotheses increases the BLEU score significantly. 1 Introduction Statistical machine translation systems based on synchronous grammars have recently shown great promise but one stumbling block to their widespread adoption is that the decoding or search problem during translation is more computationally demanding than in phrase-based systems. This complexity arises from the interaction of the tree-based translation model with an n-gram language model. Use of longer n-grams improves translation results but exacerbates this interaction. In this paper we present three techniques for attacking this problem in order to obtain fast high-quality decoders. First we present a two-pass decoding algorithm in which the first pass explores states resulting from an integrated bigram language model and the second pass expands these states into trigram-based states. The general bigram-to-trigram technique is common in speech recognition Murveit et al. 1993 where lattices from a bigram-based decoder are re-scored with a trigram language model. We examine the question of whether given the reordering inherent in the machine translation problem lower order n-grams will provide as valuable a search heuristic as they do for speech recognition. .

TÀI LIỆU MỚI ĐĂNG
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.