TAILIEUCHUNG - Báo cáo khoa học: "Substring-Based Transliteration"

Transliteration is the task of converting a word from one alphabetic script to another. We present a novel, substring-based approach to transliteration, inspired by phrasebased models of machine translation. We investigate two implementations of substringbased transliteration: a dynamic programming algorithm, and a finite-state transducer. | Substring-Based Transliteration Tarek Sherif and Grzegorz Kondrak Department of Computing Science University of Alberta Edmonton Alberta Canada T6G 2E8 tarek kondrak @ Abstract Transliteration is the task of converting a word from one alphabetic script to another. We present a novel substring-based approach to transliteration inspired by phrasebased models of machine translation. We investigate two implementations of substringbased transliteration a dynamic programming algorithm and a finite-state transducer. We show that our substring-based transducer not only outperforms a state-of-the-art letterbased approach by a significant margin but is also orders of magnitude faster. 1 Introduction A significant proportion of out-of-vocabulary words in machine translation models or cross language information retrieval systems are named entities. If the languages are written in different scripts these names must be transliterated. Transliteration is the task of converting a word from one writing script to another usually based on the phonetics of the original word. If the target language contains all the phonemes used in the source language the transliteration is straightforward. For example the Arabic transliteration of Amanda is IaJLI which is essentially pronounced in the same way. However if some of the sounds are missing in the target language they are generally mapped to the most phonetically similar letter. For example the sound p in the name Paul does not exist in Arabic and the phonotactic constraints of Arabic disallow the sound a in this context so the word is transliterated as Jjj pronounced bul . 944 The information loss inherent in the process of transliteration makes back-transliteration which is the restoration of a previously transliterated word a particularly difficult task. Any phonetically reasonable forward transliteration is essentially correct although occasionally there is a standard transliteration . Omar Sharif . In the original .

Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.