TAILIEUCHUNG - Báo cáo khoa học: "Jointly optimizing a two-step conditional random field model for machine transliteration and its fast decoding algorithm"

This paper presents a joint optimization method of a two-step conditional random field (CRF) model for machine transliteration and a fast decoding algorithm for the proposed method. Our method lies in the category of direct orthographical mapping (DOM) between two languages without using any intermediate phonemic mapping. In the two-step CRF model, the first CRF segments an input word into chunks and the second one converts each chunk into one unit in the target language. In this paper, we propose a method to jointly optimize the two-step CRFs and also a fast algorithm to realize it. . | Jointly optimizing a two-step conditional random field model for machine transliteration and its fast decoding algorithm Dong Yang Paul Dixon and Sadaoki Furui Department of Computer Science Tokyo Institute of Technology Tokyo 152-8552 Japan raymond dixonp furui @ Abstract This paper presents a joint optimization method of a two-step conditional random field CRF model for machine transliteration and a fast decoding algorithm for the proposed method. Our method lies in the category of direct orthographical mapping DOM between two languages without using any intermediate phonemic mapping. In the two-step CRF model the first CRF segments an input word into chunks and the second one converts each chunk into one unit in the target language. In this paper we propose a method to jointly optimize the two-step CRFs and also a fast algorithm to realize it. Our experiments show that the proposed method outperforms the well-known joint source channel model JSCM and our proposed fast algorithm decreases the decoding time significantly. Furthermore combination of the proposed method and the JSCM gives further improvement which outperforms state-of-the-art results in terms of top-1 accuracy. 1 Introduction There are more than 6000 languages in the world and 10 languages of them have more than 100 million native speakers. With the information revolution and globalization systems that support multiple language processing and spoken language translation become urgent demands. The translation of named entities from alphabetic to syllabary language is usually performed through transliteration which tries to preserve the pronunciation in the original language. For example in Chinese foreign words are written with Chinese characters in Japanese foreign words are usually written with special char- Source Name Target Name Note ĩ Ơ tb guu gu ru English-to-Chinese Chinese Romanized writing English-to-Japanese Japanese Romanized writing Figure 1 Transliteration examples

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.