TAILIEUCHUNG - Báo cáo khoa học: "Character-Based Pivot Translation for Under-Resourced Languages and Domains"

In this paper we investigate the use of character-level translation models to support the translation from and to underresourced languages and textual domains via closely related pivot languages. Our experiments show that these low-level models can be successful even with tiny amounts of training data. We test the approach on movie subtitles for three language pairs and legal texts for another language pair in a domain adaptation task. Our pivot translations outperform the baselines by a large margin. . | Character-Based Pivot Translation for Under-Resourced Languages and Domains Jorg Tiedemann Department of Linguistics and Philology Uppsala University Uppsala Sweden Abstract In this paper we investigate the use of character-level translation models to support the translation from and to underresourced languages and textual domains via closely related pivot languages. Our experiments show that these low-level models can be successful even with tiny amounts of training data. We test the approach on movie subtitles for three language pairs and legal texts for another language pair in a domain adaptation task. Our pivot translations outperform the baselines by a large margin. 1 Introduction Data-driven approaches have been extremely successful in most areas of natural language processing NLP and can be considered the main paradigm in application-oriented research and development. Research in machine translation is a typical example with the dominance of statistical models over the last decade. This is even enforced due to the availability of toolboxes such as Moses Koehn et al. 2007 which make it possible to build translation engines within days or even hours for any language pair provided that appropriate training data is available. However this reliance on training data is also the most severe limitation of statistical approaches. Resources in large quantities are only available for a few languages and domains. In the case of SMT the dilemma is even more apparent as parallel corpora are rare and usually quite sparse. Some languages can be considered lucky for example because of political situations that lead to the production of freely available translated material on a large scale. A lot of research and development would not have been possible without the European Union and its language policies to give an example. One of the main challenges of current NLP research is to port data-driven techniques to underresourced languages which .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.