TAILIEUCHUNG - Báo cáo khoa học: "Semi-Supervised Training for Statistical Word Alignment"

We introduce a semi-supervised approach to training for statistical machine translation that alternates the traditional Expectation Maximization step that is applied on a large training corpus with a discriminative step aimed at increasing word-alignment quality on a small, manually word-aligned sub-corpus. We show that our algorithm leads not only to improved alignments but also to machine translation outputs of higher quality. | Semi-Supervised Training for Statistical Word Alignment Alexander Fraser ISI University of Southern California 4676 Admiralty Way Suite 1001 Marina del Rey CA 90292 fraser@ Daniel Marcu ISI University of Southern California 4676 Admiralty Way Suite 1001 Marina del Rey CA 90292 marcu@ Abstract We introduce a semi-supervised approach to training for statistical machine translation that alternates the traditional Expectation Maximization step that is applied on a large training corpus with a discriminative step aimed at increasing word-alignment quality on a small manually word-aligned sub-corpus. We show that our algorithm leads not only to improved alignments but also to machine translation outputs of higher quality. 1 Introduction The most widely applied training procedure for statistical machine translation IBM model 4 Brown et al. 1993 unsupervised training followed by post-processing with symmetrization heuristics Och and Ney 2003 yields low quality word alignments. When compared with gold standard parallel data which was manually aligned using a high-recall precision methodology Melamed 1998 the word-level alignments produced automatically have an F-measure accuracy of and see Section 2 for details . In this paper we improve word alignment and subsequently MT accuracy by developing a range of increasingly sophisticated methods 1. We first recast the problem of estimating the IBM models Brown et al. 1993 in a discriminative framework which leads to an initial increase in word-alignment accuracy. 2. We extend the IBM models with new sub models which leads to additional increases in word-alignment accuracy. In the process we also show that these improvements are explained not only by the power of the new models but also by a novel search procedure for the alignment of highest probability. 3. Finally we propose a training procedure that interleaves discriminative training with maximum likelihood training. These steps lead to word alignments

Lam Ngọc 69 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462291 61

Giới thiệu :Lập trình mã nguồn mở

14 24918 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11286 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10511 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9790 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8467 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8090 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7471 1763

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7188 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 261 4 26-11-2024

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 211 4 26-11-2024

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 132 2 26-11-2024

Bảng màu theo chữ cái – V

11 153 2 26-11-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 226 7 26-11-2024

Color Atlas of Ophthamology

165 132 2 26-11-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 146 1 26-11-2024

báo cáo hóa học:" Quality of data collection in a large HIV observational clinic database in sub-Saharan Africa: implications for clinical research and audit of care"

7 146 4 26-11-2024

Giáo án điện tử tiểu học môn lịch sử: Cách mạng mùa thu

39 157 1 26-11-2024

Báo cáo y học: "The Factors Influencing Depression Endpoints Research (FINDER) study: final results of Italian patients with depressio"

9 139 1 26-11-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8090 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7471 1763

Ebook Chào con ba mẹ đã sẵn sàng

112 4364 1369

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6156 1258

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3790 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3909 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4618 562

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11286 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4454 490