TAILIEUCHUNG - Báo cáo khoa học: "Nonparametric Bayesian Machine Transliteration with Synchronous Adaptor Grammars"

Machine transliteration is defined as automatic phonetic translation of names across languages. In this paper, we propose synchronous adaptor grammar, a novel nonparametric Bayesian learning approach, for machine transliteration. This model provides a general framework without heuristic or restriction to automatically learn syllable equivalents between languages. | Nonparametric Bayesian Machine Transliteration with Synchronous Adaptor Grammars Yun Huang1 2 Min Zhang1 Chew Lim Tan2 huangyun@ mzhang@ tancl@ 1Human Language Department Institute for Infocomm Research 1 Fusionopolis Way Singapore 2Department of Computer Science National University of Singapore 13 Computing Drive Singapore Abstract Machine transliteration is defined as automatic phonetic translation of names across languages. In this paper we propose synchronous adaptor grammar a novel nonparametric Bayesian learning approach for machine transliteration. This model provides a general framework without heuristic or restriction to automatically learn syllable equivalents between languages. The proposed model outperforms the state-of-the-art EM-based model in the English to Chinese transliteration task. 1 Introduction Proper names are one source of OOV words in many NLP tasks such as machine translation and crosslingual information retrieval. They are often translated through transliteration . translation by preserving how words sound in both languages. In general machine transliteration is often modelled as monotonic machine translation Rama and Gali 2009 Finch and Sumita 2009 Finch and Sumita 2010 the joint source-channel models Li et al. 2004 Yang et al. 2009 or the sequential labeling problems Reddy and Waxmonsky 2009 Abdul Hamid and Darwish 2010 . Syllable equivalents acquisition is a critical phase for all these models. Traditional learning approaches aim to maximize the likelihood of training data by the Expectation-Maximization EM algorithm. However the EM algorithm may over-fit the training data by memorizing the whole training instances. To avoid this problem some approaches restrict that a 534 single character in one language could be aligned to many characters of the other but not vice versa Li et al. 2004 Yang et al. 2009 . Heuristics are introduced to obtain many-to-many alignments by combining two .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.