TAILIEUCHUNG - Báo cáo khoa học: "Deciphering Foreign Language by Combining Language Models and Context Vectors"

In this paper we show how to train statistical machine translation systems on reallife tasks using only non-parallel monolingual data from two languages. We present a modification of the method shown in (Ravi and Knight, 2011) that is scalable to vocabulary sizes of several thousand words. On the task shown in (Ravi and Knight, 2011) we obtain better results with only 5% of the computational effort when running our method with an n-gram language model. | Deciphering Foreign Language by Combining Language Models and Context Vectors Malte Nuhn and Arne Mauser and Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Germany surname @ Abstract In this paper we show how to train statistical machine translation systems on real-life tasks using only non-parallel monolingual data from two languages. We present a modification of the method shown in Ravi and Knight 2011 that is scalable to vocabulary sizes of several thousand words. On the task shown in Ravi and Knight 2011 we obtain better results with only 5 of the computational effort when running our method with an n-gram language model. The efficiency improvement of our method allows us to run experiments with vocabulary sizes of around 5 000 words such as a non-parallel version of the Verbmobil corpus. We also report results using data from the monolingual French and English Gigaword corpora. 1 Introduction It has long been a vision of science fiction writers and scientists to be able to universally communicate in all languages. In these visions even previously unknown languages can be learned automatically from analyzing foreign language input. In this work we attempt to learn statistical translation models from only monolingual data in the source and target language. The reasoning behind this idea is that the elements of languages share statistical similarities that can be automatically identified and matched with other languages. This work is a big step towards large-scale and large-vocabulary unsupervised training of statistical translation models. Previous approaches have faced constraints in vocabulary or data size. We show how Author now at Google Inc. amauser@. 156 to scale unsupervised training to real-life translation tasks and how large-scale experiments can be done. Monolingual data is more readily available if not abundant compared to true parallel or even just translated data. Learning from

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.