TAILIEUCHUNG - Báo cáo khoa học: "Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora"

This paper presents a new model for word alignments between parallel sentences, which allows one to accurately estimate different parameters, in a computationally efficient way. An application of this model to bilingual terminology extraction, where terms are identified in one language and guessed, through the alignment process, in the other one, is also described. An experiment conducted on a small English-French parallel corpus gave results with high precision, demonstrating the validity of the model. . | Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora Eric Gaussier Xerox Research Centre Europe 6 Chemin de Maupertuis 38240 Meylan F. Abstract This paper presents a new model for word alignments between parallel sentences which allows one to accurately estimate different parameters in a computationally efficient way. An application of this model to bilingual terminology extraction where terms are identified in one language and guessed through the alignment process in the other one is also described. An experiment conducted on a small English-French parallel corpus gave results with high precision demonstrating the validity of the model. 1 Introduction Early works Gale and Church 1993 Brown et al. 1993 and to a certain extent Kay and Roscheisen 1993 presented methods to extract bilingual lexicons of words from a parallel corpus relying on the distribution of the words in the set of parallel sentences or other units . Brown et al. 1993 then extended their method and established a sound probabilistic model series relying on different parameters describing how words within paraUel sentences are aligned to each other. On the other hand Dagan et al. 1993 proposed an algorithm borrowed to the field of dynamic programming and based on the output of their previous work to find the best alignment subject to certain constraints between words in parallel sentences. A similar algorithm was used by Vogel et al. 1996 . Investigating alignments at the sentence level allows to clean and to refine the lexicons otherwise extracted from a parallel corpus as a whole . pruning what Melamed 1996 calls indirect associations . Now what differentiates the models and algorithms proposed are the sets of parameters and constraints they rely on their ability to find an appropriate solution under the constraints de fined and their ability to nicely integrate new parameters. We want to present here a model of the possible .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.