TAILIEUCHUNG - Báo cáo khoa học: "Learning to Find Translations and Transliterations on the Web"

In this paper, we present a new method for learning to finding translations and transliterations on the Web for a given term. The approach involves using a small set of terms and translations to obtain mixed-code snippets from a search engine, and automatically annotating the snippets with tags and features for training a conditional random field model. | Learning to Find Translations and Transliterations on the Web Joseph Z. Chang Department of Computer Science National Tsing Hua University 101 Kuangfu Road Hsinchu 300 Taiwan j Jason S. Chang Department of Computer Science National Tsing Hua University 101 Kuangfu Road Hsinchu 300 Taiwan j schang@ Jyh-Shing Roger Jang Department of Computer Science National Tsing Hua University 101 Kuangfu Road Hsinchu 300 Taiwan jang@ Abstract In this paper we present a new method for learning to finding translations and transliterations on the Web for a given term. The approach involves using a small set of terms and translations to obtain mixed-code snippets from a search engine and automatically annotating the snippets with tags and features for training a conditional random field model. At runtime the model is used to extracting translation candidates for a given term. Preliminary experiments and evaluation show our method cleanly combining various features resulting in a system that outperforms previous work. 1 Introduction The phrase translation problem is critical to machine translation cross-lingual information retrieval and multilingual terminology Bian and Chen 2000 Kupiec 1993 . Such systems typically use a parallel corpus. However the out of vocabulary problem OOV is hard to overcome even with a very large training corpus due to the Zipf nature of word distribution and ever growing new terminology and named entities. Luckily there are an abundant of webpages consisting mixed-code text typically written in one language but interspersed with some sentential or phrasal translations in another language. By retrieving and identifying such translation counterparts on the Web we can cope with the OOV problem. Consider the technical term named-entity recognition. The best places to find the Chinese translations for named-entity recognition are probably not some parallel corpus or dictionary but rather mixed-code webpages. The

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.