TAILIEUCHUNG - Báo cáo khoa học: "Using Similarity Scoring To Improve the Bilingual Dictionary for Word Alignment"

We describe an approach to improve the bilingual cooccurrence dictionary that is used for word alignment, and evaluate the improved dictionary using a version of the Competitive Linking algorithm. We demonstrate a problem faced by the Competitive Linking algorithm and present an approach to ameliorate it. In particular, we rebuild the bilingual dictionary by clustering similar words in a language and assigning them a higher cooccurrence score with a given word in the other language than each single word would have otherwise. Experimental results show a significant improvement in precision and recall for word alignment when the improved dicitonary. | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 409-416. Using Similarity Scoring To Improve the Bilingual Dictionary for Word Alignment Katharina Probst Language Technologies Institute Carnegie Mellon University Pittsburgh PA USA 15213 kathrin@ Ralf Brown Language Technologies Institute Carnegie Mellon University Pittsburgh PA USA 15213 ralf@ Abstract We describe an approach to improve the bilingual cooccurrence dictionary that is used for word alignment and evaluate the improved dictionary using a version of the Competitive Linking algorithm. We demonstrate a problem faced by the Competitive Linking algorithm and present an approach to ameliorate it. In particular we rebuild the bilingual dictionary by clustering similar words in a language and assigning them a higher cooccurrence score with a given word in the other language than each single word would have otherwise. Experimental results show a significant improvement in precision and recall for word alignment when the improved dicitonary is used. 1 Introduction and Related Work Word alignment is a well-studied problem in Natural Language Computing. This is hardly surprising given its significance in many applications word-aligned data is crucial for example-based machine translation statistical machine translation but also other applications such as cross-lingual information retrieval. Since it is a hard and time-consuming task to hand-align bilingual data the automation of this task receives a fair amount of attention. In this paper we present an approach to improve the bilingual dictionary that is used by word alignment algorithms. Our method is based on similarity scores between words which in effect results in the clustering of morphological variants. One line of related work is research in clustering based on word similarities. This problem is an area of active research in the Information Retrieval community. For .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.