TAILIEUCHUNG - Báo cáo khoa học: "Phrase Linguistic Classiﬁcation and Generalization for Improving Statistical Machine Translation"

In this paper a method to incorporate linguistic information regarding single-word and compound verbs is proposed, as a ﬁrst step towards an SMT model based on linguistically-classiﬁed phrases. By substituting these verb structures by the base form of the head verb, we achieve a better statistical word alignment performance, and are able to better estimate the translation model and generalize to unseen verb forms during translation. Preliminary experiments for the English - Spanish language pair are performed, and future research lines are detailed. . | Phrase Linguistic Classification and Generalization for Improving Statistical Machine Translation Adria de Gispert TALP Research Center Universitat Politecnica de Catalunya UPC Barcelona agispert@ Abstract In this paper a method to incorporate linguistic information regarding single-word and compound verbs is proposed as a first step towards an SMT model based on linguistically-classified phrases. By substituting these verb structures by the base form of the head verb we achieve a better statistical word alignment performance and are able to better estimate the translation model and generalize to unseen verb forms during translation. Preliminary experiments for the English - Spanish language pair are performed and future research lines are detailed. 1 Introduction Since its revival in the beginning of the 1990s statistical machine translation SMT has shown promising results in several evaluation campaigns. From original word-based models results were further improved by the appearance of phrase-based translation models. However many SMT systems still ignore any morphological analysis and work at the surface level of word forms. For highly-inflected languages such as German or Spanish or any language of the Romance family this poses severe limitations both in training from parallel corpora as well as in producing a correct translation of an input sentence. This lack of linguistic knowledge in SMT forces the translation model to learn different translation probability distributions for all inflected forms of nouns adjectives or verbs vengo vienes viene etc. and this suffers from usual data sparseness. Despite the recent efforts in the community to provide models with this kind of information see Section 6 for details on related previous work results are yet to be encouraging. In this paper we address the incorporation of morphological and shallow syntactic information regarding verbs and compound verbs as a first step towards an SMT model based on .

Bích Hậu 56 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Phrase Linguistic Classiﬁcation and Generalization for Improving Statistical Machine Translation"

6 49 0

Báo cáo khoa học: "CAPTURING LINGUISTIC IN ANANNOTATED GENERALIZATIONS WITH METARULES PHRASE-STRUCTURE GRAMMAR"

6 59 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462343 61

Giới thiệu :Lập trình mã nguồn mở

14 26146 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11351 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10553 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9844 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8507 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7767 1793

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7274 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 392 3 29-12-2024

Bảng màu theo chữ cái – V

11 168 2 29-12-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 235 7 29-12-2024

Giáo án điện tử tiểu học môn lịch sử: Cách mạng mùa thu

39 165 1 29-12-2024

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 142 1 29-12-2024

Báo cáo nghiên cứu khoa học " Đại hội XVI thông qua điều lệ Đảng cộng sản Trung Quốc những sửa đổi bổ sung mới "

4 164 1 29-12-2024

Lịch sử Trung Quốc 5000 năm tập 3 part 2

54 153 1 29-12-2024

TRẮC NGHIỆM - CÁC BỆNH THIẾU DINH DƯỠNG THƯỜNG GẶP

32 212 2 29-12-2024

CÂU HỎI TRẮC NGHIỆM HSLS NƯỚC TIỂU

9 177 0 29-12-2024

NHÀ MẠC – NAM BẮC TRIỀU (1527-1592)_1

6 125 1 29-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7767 1793

Ebook Chào con ba mẹ đã sẵn sàng

112 4410 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6305 1268

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3843 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3921 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4720 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11351 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4511 490