TAILIEUCHUNG - Báo cáo khoa học: "Using Noisy Bilingual Data for Statistical Machine Translation"

SMT systems rely on sufficient amount of parallel corpora to train the translation model. This paper investigates possibilities to use word-to-word and phrase-to-phrase translations extracted not only from clean parallel corpora but also from noisy comparable corpora. Translation results for a Chinese to English translation task are given. | Using Noisy Bilingual Data for Statistical Machine Translation Stephan Vogel Interactive Systems Lab Language Technologies Institute Carnegie Mellon University vogel @ Abstract SMT systems rely on sufficient amount of parallel corpora to train the translation model. This paper investigates possibilities to use word-to-word and phrase-to-phrase translations extracted not only from clean parallel corpora but also from noisy comparable corpora. Translation results for a Chinese to English translation task are given. 1 Introduction Statistical machine translation systems typically use a translation model trained on bilingual data and a language model for the target language trained on perhaps some larger monolingual data. Often the amount of clean parallel data is limited. This leads to the question of whether translation quality can be improved by using additional noisier bilingual data. Some approaches like Fung and MxKeown 1997 have been developed to extract word translations from non-parallel corpora. In Munteanu and Marcu 2002 bilingual suffix trees are used to extract parallel sequences of words from a comparable corpus. 95 of those phrase translation pairs were judged to be correct. However no results where reported if these additional translation correspondences resulted in improved translation quality. 2 The SMT System Statistical translation as introduced in Brown et al. 1993 is based on word-to-word translations. The SMT system used in this study relies on multiword to multi-word translations. The term phrase translations will be used throughout this paper without implying that these multi-word translation pairs are phrases in some linguistic sense. Phrase translations can be extracted from the Viterbi alignment of the alignment model. Phrase translation pairs are seen only a few times. Actually most of the longer phrases are seen only once in even the larger corpora. Using relative frequency to estimate the translation probability would make most

Kim Lan 44 4 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Noisy image classification using hybrid deep learning methods

37 78 0

Robust watershed segmentation of noisy image using wavelet

6 64 2

Báo cáo khoa học: "Arabic Named Entity Recognition: Using Features Extracted from Noisy Data"

5 70 0

Báo cáo khoa học: "Using Noisy Bilingual Data for Statistical Machine Translation"

4 33 0

Báo cáo khoa học: "Automated Whole Sentence Grammar Correction Using a Noisy Channel Model"

11 72 0

Báo cáo hóa học: " Research Article Linear Motion Blur Parameter Estimation in Noisy Images Using Fuzzy Sets and Power Spectrum"

8 37 0

Robust gene selection methods using weighting schemes for microarray data analysis

15 55 1

NoGOA: Predicting noisy GO annotations using evidences and sparse representation

13 35 1

Unsupervised segmentation of noisy electron microscopy images using salient watersheds and region merging

9 31 1

LRScaf: Improving draft genomes using long noisy reads

12 15 2

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462295 61

Giới thiệu :Lập trình mã nguồn mở

14 24942 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11287 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10513 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9791 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8467 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8090 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7474 1763

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7190 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 213 3 27-11-2024

Quy Trình Canh Tác Cây Bông Vải

8 148 2 27-11-2024

Bảng màu theo chữ cái – V

11 153 2 27-11-2024

Giáo án điện tử tiểu học môn lịch sử: Cách mạng mùa thu

39 157 1 27-11-2024

Báo cáo " Thẩm quyền quản lí nhà nước đối với hoạt động quảng cáo thực trạng và hướng hoàn thiện "

7 196 7 27-11-2024

Báo cáo " Bàn về hành vi pháp luật và hành vi đạo đức "

11 169 2 27-11-2024

Báo cáo nghiên cứu khoa học " Vai trò chính quyền địa phương trong phát triển kinh tế : khu chuyên doanh gốm sứ ( Trung Quốc ) và Bát Tràng ( Việt Nam )("

11 206 1 27-11-2024

Báo cáo nghiên cứu khoa học " NÂNG QUAN HỆ KINH TẾ THƯƠNG MẠI VIỆT NAM - TRUNG QUỐC LÊN TẦM CAO THỜI ĐẠI "

8 159 1 27-11-2024

Báo cáo nghiên cứu khoa học " Đại hội XVI thông qua điều lệ Đảng cộng sản Trung Quốc những sửa đổi bổ sung mới "

4 155 1 27-11-2024

Báo cáo lâm nghiệp: "Assessment of the effects of below-zero temperatures on photosynthesis and chlorophyll a fluorescence in leaf discs of Eucalyptus globulu"

4 131 0 27-11-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8090 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7474 1763

Ebook Chào con ba mẹ đã sẵn sàng

112 4366 1369

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6158 1259

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3791 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3909 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4620 562

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11287 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4455 490