TAILIEUCHUNG - Báo cáo khoa học: "Named Entity Transliteration with Comparable Corpora"

In this paper we investigate ChineseEnglish name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics — and therefore share references to named entities — but are not translations of each other. We present two distinct methods for transliteration, one approach using phonetic transliteration, and the second using the temporal distribution of candidate pairs. Each of these approaches works quite well, but by combining the approaches one can achieve even better results. We then propose a novel score propagation method that utilizes the co-occurrence of transliteration pairs within document pairs | Named Entity Transliteration with Comparable Corpora Richard Sproat Tao Tao ChengXiang Zhai University of Illinois at Urbana-Champaign Urbana IL 61801 rws@ taotao czhai @ Abstract In this paper we investigate Chinese-English name transliteration using comparable corpora corpora where texts in the two languages deal in some of the same topics and therefore share references to named entities but are not translations of each other. We present two distinct methods for transliteration one approach using phonetic transliteration and the second using the temporal distribution of candidate pairs. Each of these approaches works quite well but by combining the approaches one can achieve even better results. We then propose a novel score propagation method that utilizes the co-occurrence of transliteration pairs within document pairs. This propagation method achieves further improvement over the best results from the previous step. 1 Introduction As part of a more general project on multilingual named entity identification we are interested in the problem of name transliteration across languages that use different scripts. One particular issue is the discovery of named entities in comparable texts in multiple languages where by comparable we mean texts that are about the same topic but are not in general translations of each other. For example if one were to go through an English Chinese and Arabic newspaper on the same day it is likely that the more important international events in various topics such as politics business science and sports would each be covered in each of the newspapers. Names of the same persons locations and so forth which are often transliterated rather than translated would be found in comparable stories across the three We wish to use this expectation to leverage transliteration and thus the identification of named entities across languages. Our idea is that the occurrence of a cluster of names in say an English text should

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.