TAILIEUCHUNG - Báo cáo khoa học: "Character-Based Pivot Translation for Under-Resourced Languages and Domains"

In this paper we investigate the use of character-level translation models to support the translation from and to underresourced languages and textual domains via closely related pivot languages. Our experiments show that these low-level models can be successful even with tiny amounts of training data. We test the approach on movie subtitles for three language pairs and legal texts for another language pair in a domain adaptation task. Our pivot translations outperform the baselines by a large margin. . | Character-Based Pivot Translation for Under-Resourced Languages and Domains Jorg Tiedemann Department of Linguistics and Philology Uppsala University Uppsala Sweden Abstract In this paper we investigate the use of character-level translation models to support the translation from and to underresourced languages and textual domains via closely related pivot languages. Our experiments show that these low-level models can be successful even with tiny amounts of training data. We test the approach on movie subtitles for three language pairs and legal texts for another language pair in a domain adaptation task. Our pivot translations outperform the baselines by a large margin. 1 Introduction Data-driven approaches have been extremely successful in most areas of natural language processing NLP and can be considered the main paradigm in application-oriented research and development. Research in machine translation is a typical example with the dominance of statistical models over the last decade. This is even enforced due to the availability of toolboxes such as Moses Koehn et al. 2007 which make it possible to build translation engines within days or even hours for any language pair provided that appropriate training data is available. However this reliance on training data is also the most severe limitation of statistical approaches. Resources in large quantities are only available for a few languages and domains. In the case of SMT the dilemma is even more apparent as parallel corpora are rare and usually quite sparse. Some languages can be considered lucky for example because of political situations that lead to the production of freely available translated material on a large scale. A lot of research and development would not have been possible without the European Union and its language policies to give an example. One of the main challenges of current NLP research is to port data-driven techniques to underresourced languages which .

Nhã Hương 48 11 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461937 55

Giới thiệu :Lập trình mã nguồn mở

14 23057 64

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10980 531

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10172 451

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9566 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8371 1127

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8277 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7889 2228

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6826 256

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6068 1463

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Trading Strategies Profit Making Techniques For Stock_3

23 199 1 19-05-2024

Anh văn bằng C-124

8 190 0 19-05-2024

Management and Services Part 1

10 170 0 19-05-2024

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 190 0 19-05-2024

Bơm máy nén quạt trong công nghiệp part 8

20 207 2 19-05-2024

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 161 0 19-05-2024

Data Structures and Algorithms - Chapter 9: Hashing

54 121 0 19-05-2024

Lãi suất cơ bản, công cụ quan trọng của chính sách tiền tệ

5 120 0 19-05-2024

Báo cáo nghiên cứu nông nghiệp " Field control of pest fruit flies in Vietnam "

14 124 0 19-05-2024

Báo cáo khoa học: " Biogeography of Matsucoccus josephi Bodenheimer et Harpaz in Crete and mainland Greece"

6 89 0 19-05-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7889 2228

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6068 1463

Ebook Chào con ba mẹ đã sẵn sàng

112 3787 1253

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5408 1138

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8371 1127

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3550 656

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10980 531

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3753 528

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4162 523

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4188 483