TAILIEUCHUNG - Báo cáo khoa học: "Prediction of Learning Curves in Machine Translation"

Parallel data in the domain of interest is the key resource when training a statistical machine translation (SMT) system for a speciﬁc purpose. Since ad-hoc manual translation can represent a signiﬁcant investment in time and money, a prior assesment of the amount of training data required to achieve a satisfactory accuracy level can be very useful. In this work, we show how to predict what the learning curve would look like if we were to manually translate increasing amounts of data. | Prediction of Learning Curves in Machine Translation Prasanth Kolachina Nicola Cancedda Marc Dymetman Sriram Venkatapathy LTRC IIIT-Hyderabad Hyderabad India f Xerox Research Centre Europe 6 chemin de Maupertuis 38240 Meylan France Abstract Parallel data in the domain of interest is the key resource when training a statistical machine translation SMT system for a specific purpose. Since ad-hoc manual translation can represent a significant investment in time and money a prior assesment of the amount of training data required to achieve a satisfactory accuracy level can be very useful. In this work we show how to predict what the learning curve would look like if we were to manually translate increasing amounts of data. We consider two scenarios 1 Monolingual samples in the source and target languages are available and 2 An additional small amount of parallel corpus is also available. We propose methods for predicting learning curves in both these scenarios. 1 Introduction Parallel data in the domain of interest is the key resource when training a statistical machine translation SMT system for a specific business purpose. In many cases it is possible to allocate some budget for manually translating a limited sample of relevant documents be it via professional translation services or through increasingly fashionable crowdsourcing. However it is often difficult to predict how much training data will be required to achieve satisfactory translation accuracy preventing sound provisional budgetting. This prediction or more generally the prediction of the learning curve of an SMT system as a function of available in-domain parallel data is the objective of this paper. We consider two scenarios representative of realistic situations. 1. In the first scenario S1 the SMT developer is given only monolingual source and target samples from the relevant domain and a small test parallel corpus. This research was carried out during an internship at Xerox Research Centre Europe. 22

Mai Khôi 61 9 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data

18 58 1

Lecture Machine learning (2014-2015) - Lecture 02: Linear prediction

18 87 0

Human robot interactive intention prediction using deep learning techniques

12 23 2

Biologically relevant transfer learning improves transcription factor binding prediction

25 38 1

DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment

15 52 1

Hypergraph and protein function prediction with gene expression data

7 84 0

Short term wind power prediction using GA-ELM

9 91 0

Deep convolutional neural networks for pan-specific peptide-MHC class I binding prediction

9 68 1

Báo cáo khoa học: "Prediction of Learning Curves in Machine Translation"

9 50 0

Báo cáo khoa học: "From Structured Prediction to Inverse Reinforcement Learning"

1 48 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461846 55

Giới thiệu :Lập trình mã nguồn mở

14 22508 57

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10861 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10024 445

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9488 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8199 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6642 253

Vật lý hạt cơ bản (1)

29 5754 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo khoa học: Loss of kinase activity in Mycobacterium tuberculosis multidomain protein Rv1364c

14 233 0 19-04-2024

extremetech Hacking BlackBerry phần 9

31 239 0 19-04-2024

Bibliography on Medieval Women, Gender, and Medicine 1980-2009

82 205 0 19-04-2024

Bơm máy nén quạt trong công nghệ part 1

20 248 2 19-04-2024

MySQL Basics for Visual Learners PHẦN 9

15 183 0 19-04-2024

Khurana et al. Journal of Orthopaedic Surgery and Research 2010, 5:23

7 133 0 19-04-2024

báo cáo hóa học:" Endoscopic decompression for intraforaminal and extraforaminal nerve root compression"

7 106 0 19-04-2024

New Trends and Developments in Automotive System Engineering Part 4

40 87 0 19-04-2024

Khóa luận tốt nghiệp: Giải pháp nâng cao chất lượng phương thức thanh toán tín dụng chứng từ phục vụ xuất nhập khẩu tại ngân hàng Thương mại Việt Nam - Trần Thị Tân

12 115 0 19-04-2024

Bài Tiểu Luận Chuyên Đề Tổ Chức Hoạt Động Nhận Thức Trong Dạy Học Vật Lý " Định Luật Ôm Cho Các Loại Đoạn Mạch Chứa Nguồn Điện"

10 150 3 19-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5591 1326

Ebook Chào con ba mẹ đã sẵn sàng

112 3749 1228

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5246 1124

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3471 641

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10861 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3668 524

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4023 513

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4098 478