TAILIEUCHUNG - Báo cáo khoa học: "cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models"

We present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests, the decoder strictly separates model-specific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1- or k-best translations, but also alignments to a reference, or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. . | cdec A Decoder Alignment and Learning Framework for Finite-State and Context-Free Translation Models Chris Dyer University of Maryland redpony@ Jonathan Weese Johns Hopkins University jweese@ Hendra Setiawan University of Maryland hendra@ Adam Lopez University of Edinburgh alopez@ Ferhan Ture University of Maryland fture@ Vladimir Eidelman University of Maryland vlad@ Juri Ganitkevitch Johns Hopkins University juri@ Phil Blunsom Oxford University pblunsom@ Philip Resnik University of Maryland resnik@ Abstract We present cdec an open source framework for decoding aligning with and training a number of statistical machine translation models including word-based models phrase-based models and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests the decoder strictly separates model-specific translation logic from general rescoring pruning and inference algorithms. From this unified representation the decoder can extract not only the 1- or k-best translations but also alignments to a reference or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. Its efficient C implementation means that memory use and runtime performance are significantly better than comparable decoders. 1 Introduction The dominant models used in machine translation and sequence tagging are formally based on either weighted finite-state transducers FSTs or weighted synchronous context-free grammars SCFGs Lopez 2008 . Phrase-based models Koehn et al. 2003 lexical translation models Brown et al. 1993 and finite-state conditional random fields Sha and Pereira 2003 exemplify the former and hierarchical phrase-based models the latter Chiang 2007 . We introduce a software package called cdec that manipulates both classes in a unified Although open source .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.