TAILIEUCHUNG - Báo cáo khoa học: "Corpus Effects on the Evaluation of Automated Transliteration Systems"

Most current machine transliteration systems employ a corpus of known sourcetarget word pairs to train their system, and typically evaluate their systems on a similar corpus. In this paper we explore the performance of transliteration systems on corpora that are varied in a controlled way. In particular, we control the number, and prior language knowledge of human transliterators used to construct the corpora, and the origin of the source words that make up the corpora. | Corpus Effects on the Evaluation of Automated Transliteration Systems Sarvnaz Karimi Andrew Turpin Falk Scholer School of Computer Science and Information Technology RMIT University GPO Box 2476V Melbourne 3001 Australia sarvnaz aht fscholer @ Abstract Most current machine transliteration systems employ a corpus of known sourcetarget word pairs to train their system and typically evaluate their systems on a similar corpus. In this paper we explore the performance of transliteration systems on corpora that are varied in a controlled way. In particular we control the number and prior language knowledge of human transliterators used to construct the corpora and the origin of the source words that make up the corpora. We find that the word accuracy of automated transliteration systems can vary by up to 30 in absolute terms depending on the corpus on which they are run. We conclude that at least four human transliterators should be used to construct corpora for evaluating automated transliteration systems and that although absolute word accuracy metrics may not translate across corpora the relative rankings of system performance remains stable across differing corpora. 1 Introduction Machine transliteration is the process of transforming a word written in a source language into a word in a target language without the aid of a bilingual dictionary. Word pronunciation is preserved as far as possible but the script used to render the target word is different from that of the source language. Transliteration is applied to proper nouns and out-of-vocabulary terms as part of machine translation and cross-lingual information retrieval CLIR Ab-dulJaleel and Larkey 2003 Pirkola et al. 2006 . 640 Several transliteration methods are reported in the literature for a variety of languages with their performance being evaluated on multilingual corpora. Source-target pairs are either extracted from bilingual documents or dictionaries AbdulJaleel and Larkey 2003 Bilac and

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.