TAILIEUCHUNG - Báo cáo khoa học: "Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach"

Typically, the lexicon models used in statistical machine translation systems do not include any kind of linguistic or contextual information, which often leads to problems in performing a correct word sense disambiguation. One way to deal with this problem within the statistical framework is to use maximum entropy methods. In this paper, we present how to use this type of information within a statistical machine translation system. We show that it is possible to significantly decrease training and test corpus perplexity of the translation models. . | Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach Ismael García Varea Dpto. de Informatica Univ. de Castilla-La Mancha Campus Universitario s n 02071 Albacete Spain ivarea@ Franz J. Och and Hermann Ney Lehrstuhl fur Inf. VI RWTH Aachen Ahornstr. 55 D-52056 Aachen Germany och ney @ Francisco Casacuberta Dpto. de Sist. Inf. y Comp. Inst. Tecn. de Inf. UPV Avda. de Los Naranjos s n 46071 Valencia Spain fcn@ Abstract Typically the lexicon models used in statistical machine translation systems do not include any kind of linguistic or contextual information which often leads to problems in performing a correct word sense disambiguation. One way to deal with this problem within the statistical framework is to use maximum entropy methods. In this paper we present how to use this type of information within a statistical machine translation system. We show that it is possible to significantly decrease training and test corpus perplexity of the translation models. In addition we perform a rescoring of V-Best lists using our maximum entropy model and thereby yield an improvement in translation quality. Experimental results are presented on the so-called Verbmobil Task . 1 Introduction Typically the lexicon models used in statistical machine translation systems are only single-word based that is one word in the source language corresponds to only one word in the target language. Those lexicon models lack from context information that can be extracted from the same parallel corpus. This additional information could be Simple context information information of the words surrounding the word pair Syntactic information part-of-speech information syntactic constituent sentence mood Semantic information disambiguation information . from WordNet cur-rent previous speech or dialog act. To include this additional information within the statistical framework we use the maximum entropy approach. This .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.