TAILIEUCHUNG - Báo cáo khoa học: "A Syntax-based Statistical Translation Model"

We present a syntax-based statistical translation model. Our model transforms a source-language parse tree into a target-language string by applying stochastic operations at each node. These operations capture linguistic differences such as word order and case marking. Model parameters are estimated in polynomial time using an EM algorithm. The model produces word alignments that are better than those produced by IBM Model 5. is conditioned only on word classes and positions in the string, and the duplication and translation are conditioned only on the word identity. . | A Syntax-based Statistical Translation Model Kenji Yamada and Kevin Knight Information Sciences Institute University of Southern California 4676 Admiralty Way Suite 1001 Marina del Rey CA 90292 kyamada knight @ Abstract We present a syntax-based statistical translation model. Our model transforms a source-language parse tree into a target-language string by applying stochastic operations at each node. These operations capture linguistic differences such as word order and case marking. Model parameters are estimated in polynomial time using an EM algorithm. The model produces word alignments that are better than those produced by IBM Model 5. 1 Introduction A statistical translation model TM is a mathematical model in which the process of humanlanguage translation is statistically modeled. Model parameters are automatically estimated using a corpus of translation pairs. TMs have been used for statistical machine translation Berger et al. 1996 word alignment of a translation corpus Melamed 2000 multilingual document retrieval Franz et al. 1999 automatic dictionary construction Resnik and Melamed 1997 and data preparation for word sense disambiguation programs Brown et al. 1991 . Developing a better TM is a fundamental issue for those applications. Researchers at IBM first described such a statistical TM in Brown et al. 1988 . Their models are based on a string-to-string noisy channel model. The channel converts a sequence of words in one language such as English into another such as French . The channel operations are movements duplications and translations applied to each word independently. The movement is conditioned only on word classes and positions in the string and the duplication and translation are conditioned only on the word identity. Mathematical details are fully described in Brown et al. 1993 . One criticism of the IBM-style TM is that it does not model structural or syntactic aspects of the language. The TM was only demonstrated for a .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.