TAILIEUCHUNG - Báo cáo khoa học: "Phrase-Based Backoff Models for Machine Translation of Highly Inﬂected Languages"

We propose a backoff model for phrasebased machine translation that translates unseen word forms in foreign-language text by hierarchical morphological abstractions at the word and the phrase level. The model is evaluated on the Europarl corpus for German-English and FinnishEnglish translation and shows improvements over state-of-the-art phrase-based models. | Phrase-Based Backoff Models for Machine Translation of Highly Inflected Languages Mei Yang Department of Electrical Engineering University of Washington Seattle WA USA yangmei@ Katrin Kirchhoff Department of Electrical Engineering University of Washington Seattle WA USA katrin@ Abstract We propose a backoff model for phrasebased machine translation that translates unseen word forms in foreign-language text by hierarchical morphological abstractions at the word and the phrase level. The model is evaluated on the Europarl corpus for German-English and Finnish-English translation and shows improvements over state-of-the-art phrase-based models. 1 Introduction Current statistical machine translation SMT usually works well in cases where the domain is fixed the training and test data match and a large amount of training data is available. Nevertheless standard SMT models tend to perform much better on languages that are morphologically simple whereas highly inflected languages with a large number of potential word forms are more problematic particularly when training data is sparse. SMT attempts to find a sentence ê in the desired output language given the corresponding sentence f in the source language according to e argmaxe P f e P e 1 Most state-of-the-art SMT adopt a phrase-based approach such that e is chunked into I phrases e1 . ẽ and the translation model is defined over mappings between phrases in e and in f . . P f e . Typically phrases are extracted from a word-aligned training corpus. Different inflected forms of the same lemma are treated as different words and there is no provision for unseen forms . unknown words encountered in the test data are not translated at all but appear verbatim in the output. Although the percentage of such unseen word forms may be negligible when the training set is large and matches the test set well it may rise drastically when training data is limited or from a different domain. Many .

Thuận Anh 48 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462352 61

Giới thiệu :Lập trình mã nguồn mở

14 26772 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11377 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10569 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9856 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8909 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8522 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7968 1823

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7297 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 289 4 11-01-2025

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 152 2 11-01-2025

Bảng màu theo chữ cái – V

11 177 2 11-01-2025

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 246 8 11-01-2025

CHƯƠNG 2: RỦI RO THÂM HỤT TÀI KHÓA

28 167 1 11-01-2025

Giáo án điện tử tiểu học môn lịch sử: Cách mạng mùa thu

39 170 1 11-01-2025

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 156 1 11-01-2025

Bệnh sán lá gan trên gia súc và cách phòng trị

3 170 1 11-01-2025

Báo cáo nghiên cứu khoa học " Vai trò chính quyền địa phương trong phát triển kinh tế : khu chuyên doanh gốm sứ ( Trung Quốc ) và Bát Tràng ( Việt Nam )("

11 218 1 11-01-2025

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 149 1 11-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7968 1823

Ebook Chào con ba mẹ đã sẵn sàng

112 4440 1376

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6372 1278

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8909 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3861 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3930 610

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4783 567

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11377 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4536 490