TAILIEUCHUNG - Báo cáo khoa học: "Active Sample Selection for Named Entity Transliteration"

This paper introduces a new method for identifying named-entity (NE) transliterations within bilingual corpora. Current state-of-theart approaches usually require annotated data and relevant linguistic knowledge which may not be available for all languages. We show how to effectively train an accurate transliteration classifier using very little data, obtained automatically. To perform this task, we introduce a new active sampling paradigm for guiding and adapting the sample selection process. We also investigate how to improve the classifier by identifying repeated patterns in the training data. We evaluated our approach using English, Russian and Hebrew corpora. . | Active Sample Selection for Named Entity Transliteration Dan Goldwasser Dan Roth Department of Computer Science University of Illinois Urbana IL 61801 goldwas1 danr @ Abstract This paper introduces a new method for identifying named-entity NE transliterations within bilingual corpora. Current state-of-the-art approaches usually require annotated data and relevant linguistic knowledge which may not be available for all languages. We show how to effectively train an accurate transliteration classifier using very little data obtained automatically. To perform this task we introduce a new active sampling paradigm for guiding and adapting the sample selection process. We also investigate how to improve the classifier by identifying repeated patterns in the training data. We evaluated our approach using English Russian and Hebrew corpora. 1 Introduction This paper presents a new approach for constructing a discriminative transliteration model. Our approach is fully automated and requires little knowledge of the source and target languages. Named entity NE transliteration is the process of transcribing a NE from a source language to a target language based on phonetic similarity between the entities. Figure 1 provides examples of NE transliterations in English Russian and Hebrew. Identifying transliteration pairs is an important component in many linguistic applications such as machine translation and information retrieval which require identifying out-of-vocabulary words. In our settings we have access to source language NE and the ability to label the data upon request. We introduce a new active sampling paradigm that English NE Russian NE Hebrew NE Saint Petersburg CaHKT neTepõypr topao ma-itos Figure 1 NE in English Russian and Hebrew. aims to guide the learner toward informative samples allowing learning from a small number of representative examples. After the data is obtained it is analyzed to identify repeating patterns which can be used to focus the .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.