TAILIEUCHUNG - Báo cáo khoa học: "Stochastic Lexicalized Inversion Transduction Grammar for Alignment"

We present a version of Inversion Transduction Grammar where rule probabilities are lexicalized throughout the synchronous parse tree, along with pruning techniques for efficient training. Alignment results improve over unlexicalized ITG on short sentences for which full EM is feasible, but pruning seems to have a negative impact on longer sentences. | Stochastic Lexicalized Inversion Transduction Grammar for Alignment Hao Zhang and Daniel Gildea Computer Science Department University of Rochester Rochester NY 14627 Abstract We present a version of Inversion Transduction Grammar where rule probabilities are lexicalized throughout the synchronous parse tree along with pruning techniques for efficient training. Alignment results improve over unlexicalized ITG on short sentences for which full EM is feasible but pruning seems to have a negative impact on longer sentences. 1 Introduction The Inversion Transduction Grammar ITG of Wu 1997 is a syntactically motivated algorithm for producing word-level alignments of pairs of transla-tionally equivalent sentences in two languages. The algorithm builds a synchronous parse tree for both sentences and assumes that the trees have the same underlying structure but that the ordering of constituents may differ in the two languages. This probabilistic syntax-based approach has inspired much subsequent reasearch. Alshawi et al. 2000 use hierarchical finite-state transducers. In the tree-to-string model of Yamada and Knight 2001 a parse tree for one sentence of a translation pair is projected onto the other string. Melamed 2003 presents algorithms for synchronous parsing with more complex grammars discussing how to parse grammars with greater than binary branching and lexicalization of synchronous grammars. Despite being one of the earliest probabilistic syntax-based translation models ITG remains state-of-the art. Zens and Ney 2003 found that the constraints of ITG were a better match to the decoding task than the heuristics used in the IBM decoder of Berger et al. 1996 . Zhang and Gildea 2004 found ITG to outperform the tree-to-string model for word-level alignment as measured against human gold-standard alignments. One explanation for this result is that while a tree representation is helpful for modeling translation the trees assigned by the traditional monolingual parsers and

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.