TAILIEUCHUNG - Báo cáo khoa học: "High-Performance Bilingual Text Alignment Using Statistical and Dictionary Information"

This paper describes an accurate and robust text alignment system for structurally different languages. Among structurally different languages such as Japanese and English, there is a limitation on the amount of word correspondences that can be statistically acquired. The proposed method makes use of two kinds of word correspondences in aligning bilingual texts. One is a bilingual dictionary of general use. | High-Performance Bilingual Text Alignment Using Statistical and Dictionary Information Masahiko Haruno Takefumi Yamazaki NTT Communication Science Labs. 1-2356 Take Yokosuka-Shi Kanagawa 238-03 Japan haruno@ yamazaki@ Abstract This paper describes an accurate and robust text alignment system for structurally different languages. Among structurally different languages such as Japanese and English there is a limitation on the amount of word correspondences that can be statistically acquired. The proposed method makes use of two kinds of word correspondences in aligning bilingual texts. One is a bilingual dictionary of general use. The other is the word correspondences that are statistically acquired in the alignment process. Our method gradually determines sentence pairs anchors that correspond to each other by relaxing parameters. The method by combining two kinds of word correspondences achieves adequate word correspondences for complete alignment. As a result texts of various length and of various genres in structurally different languages can be aligned with high precision. Experimental results show our system outperforms conventional methods for various kinds of Japanese-English texts. 1 Introduction Corpus-based approaches based on bilingual texts are promising for various applications . lexical knowledge extraction Kupiec 1993 Matsumoto et al. 1993 Smadja et al. 1996 Dagan and Church 1994 Kumano and Hirakawa 1994 Haruno et al. 1996 machine translation Brown and others 1993 Sato and Nagao 1990 Kaji et al. 1992 and information retrieval Sato 1992 . Most of these works assume voluminous aligned corpora. Many methods have been proposed to align bilingual corpora. One of the major approaches is based on the statistics of simple features such as sentence length in words Brown and others 1991 or in characters Gale and Church 1993 . These techniques are widely used because they can be imple mented in an efficient and simple way through .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.