TAILIEUCHUNG - Báo cáo khoa học: "Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation"

We explore how to improve machine translation systems by adding more translation data in situations where we already have substantial resources. The main challenge is how to buck the trend of diminishing returns that is commonly encountered. We present an active learning-style data solicitation algorithm to meet this challenge. We test it, gathering annotations via Amazon Mechanical Turk, and ﬁnd that we get an order of magnitude increase in performance rates of improvement. | Bucking the Trend Large-Scale Cost-Focused Active Learning for Statistical Machine Translation Michael Bloodgood Human Language Technology Center of Excellence Johns Hopkins University Baltimore MD 21211 bloodgood@ Chris Callison-Burch Center for Language and Speech Processing Johns Hopkins University Baltimore MD 21211 ccb@ Abstract We explore how to improve machine translation systems by adding more translation data in situations where we already have substantial resources. The main challenge is how to buck the trend of diminishing returns that is commonly encountered. We present an active learning-style data solicitation algorithm to meet this challenge. We test it gathering annotations via Amazon Mechanical Turk and find that we get an order of magnitude increase in performance rates of improvement. 1 Introduction Figure 1 shows the learning curves for two state of the art statistical machine translation SMT systems for Urdu-English translation. Observe how the learning curves rise rapidly at first but then a trend of diminishing returns occurs put simply the curves flatten. This paper investigates whether we can buck the trend of diminishing returns and if so how we can do it effectively. Active learning AL has been applied to SMT recently Haffari et al. 2009 Haffari and Sarkar 2009 but they were interested in starting with a tiny seed set of data and they stopped their investigations after only adding a relatively tiny amount of data as depicted in Figure 1. In contrast we are interested in applying AL when a large amount of data already exists as is the case for many important lanuage pairs. We develop an AL algorithm that focuses on keeping annotation costs measured by time in seconds low. It succeeds in doing this by only soliciting translations for parts of sentences. We show that this gets a savings in human annotation time above and beyond what the reduction in words annotated would have indicated by a factor of about three and .

Thủy Mai 88 11 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation"

11 68 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462012 59

Giới thiệu :Lập trình mã nguồn mở

14 23544 70

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11084 535

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10307 454

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9609 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8570 1146

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8337 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7918 2242

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6939 260

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6558 1581

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Đánh giá hao mòn và độ tin cậy của chi tiết và kết cấu trên đầu máy diezel part 3

12 335 1 16-06-2024

Sáng tạo trong thuật toán và lập trình với ngôn ngữ Pascal và C# Tập 2 - Chương 4

47 279 1 16-06-2024

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 198 1 16-06-2024

Posted prices versus bargaining in markets_7

23 177 0 16-06-2024

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 184 2 16-06-2024

XỬ TRÍ CHẤN THƯƠNG SỌ NÃO KÍN

1 141 2 16-06-2024

GIÁO TRÌNH VI XỬ LÝ 1 - CHƯƠNG 5. LẬP TRÌNH CHO VI ĐIỀU KHIỂN 80C51

23 129 1 16-06-2024

Bài Tiểu Luận Chuyên Đề Tổ Chức Hoạt Động Nhận Thức Trong Dạy Học Vật Lý " Định Luật Ôm Cho Các Loại Đoạn Mạch Chứa Nguồn Điện"

10 174 3 16-06-2024

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 122 1 16-06-2024

Báo cáo nghiên cứu nông nghiệp " Field control of pest fruit flies in Vietnam "

14 153 2 16-06-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7918 2242

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6558 1581

Ebook Chào con ba mẹ đã sẵn sàng

112 3968 1296

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5602 1170

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8570 1146

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3611 664

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3820 581

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11084 535

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4261 528

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4273 483