TAILIEUCHUNG - Báo cáo khoa học: "Collapsed Consonant and Vowel Models: New Approaches for English-Persian Transliteration and Back-Transliteration"

We propose a novel algorithm for English to Persian transliteration. Previous methods proposed for this language pair apply a word alignment tool for training. By contrast, we introduce an alignment algorithm particularly designed for transliteration. Our new model improves the English to Persian transliteration accuracy by 14% over an n-gram baseline. We also propose a novel back-transliteration method for this language pair, a previously unstudied problem. | Collapsed Consonant and Vowel Models New Approaches for English-Persian Transliteration and Back-Transliteration Sarvnaz Karimi Falk Scholer Andrew Turpin School of Computer Science and Information Technology RMIT University GPO Box 2476V Melbourne 3001 Australia sarvnaz fscholer aht @ Abstract We propose a novel algorithm for English to Persian transliteration. Previous methods proposed for this language pair apply a word alignment tool for training. By contrast we introduce an alignment algorithm particularly designed for transliteration. Our new model improves the English to Persian transliteration accuracy by 14 over an n-gram baseline. We also propose a novel back-transliteration method for this language pair a previously unstudied problem. Experimental results demonstrate that our algorithm leads to an absolute improvement of 25 over standard transliteration approaches. 1 Introduction Translation of a text from a source language to a target language requires dealing with technical terms and proper names. These occur in almost any text but rarely appear in bilingual dictionaries. The solution is the transliteration of such out-ofdictionary terms a word from the source language is transformed to a word in the target language preserving its pronunciation. Recovering the original word from the transliterated target is called back-transliteration. Automatic transliteration is important for many different applications including machine translation cross-lingual information retrieval and cross-lingual question answering. Transliteration methods can be categorized into grapheme-based AbdulJaleel and Larkey 2003 Li 648 et al. 2004 phoneme-based Knight and Graehl 1998 Jung et al. 2000 and combined Bilac and Tanaka 2005 approaches. Grapheme-based methods perform a direct orthographical mapping between source and target words while phonemebased approaches use an intermediate phonetic representation. Both grapheme- or phoneme-based methods usually begin by .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.