TAILIEUCHUNG - Báo cáo khoa học: "Low-cost Enrichment of Spanish WordNet with Automatically Translated Glosses: Combining General and Specialized Models"

This paper studies the enrichment of Spanish WordNet with synset glosses automatically obtained from the English WordNet glosses using a phrase-based Statistical Machine Translation system. We construct the English-Spanish translation system from a parallel corpus of proceedings of the European Parliament, and study how to adapt statistical models to the domain of dictionary definitions. We build specialized language and translation models from a small set of parallel definitions and experiment with robust manners to combine them. . | Low-cost Enrichment of Spanish WordNet with Automatically Translated Glosses Combining General and Specialized Models Jesus Gimenez and Lluis Marquez TALP Research Center LSI Department Universitat Politecnica de Catalunya Jordi Girona Salgado 1-3 E-08034 Barcelona jgimenez lluism @ Abstract This paper studies the enrichment of Spanish WordNet with synset glosses automatically obtained from the English Word-Net glosses using a phrase-based Statistical Machine Translation system. We construct the English-Spanish translation system from a parallel corpus of proceedings of the European Parliament and study how to adapt statistical models to the domain of dictionary definitions. We build specialized language and translation models from a small set of parallel definitions and experiment with robust manners to combine them. A statistically significant increase in performance is obtained. The best system is finally used to generate a definition for all Spanish synsets which are currently ready for a manual revision. As a complementary issue we analyze the impact of the amount of in-domain data needed to improve a system trained entirely on out-of-domain data. 1 Introduction Statistical Machine Translation SMT is today a very promising approach. It allows to build very quickly and fully automatically Machine Translation MT systems exhibiting very competitive results only from a parallel corpus aligning sentences from the two languages involved. In this work we approach the task of enriching Spanish WordNet with automatically translated glosses1 . The source glosses for these translations are taken from the English WordNet Fellbaum 1Glosses are short dictionary definitions that accompany WordNet synsets. See examples in Tables 5 and 6. 1998 which is linked at the synset level to Spanish WordNet. This resource is available among other sources through the Multilingual Central Repository MCR developed by the MEANING project Atserias et al. 2004 . We start by .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.