TAILIEUCHUNG - Báo cáo khoa học: "Deciphering Foreign Language by Combining Language Models and Context Vectors"

In this paper we show how to train statistical machine translation systems on reallife tasks using only non-parallel monolingual data from two languages. We present a modiﬁcation of the method shown in (Ravi and Knight, 2011) that is scalable to vocabulary sizes of several thousand words. On the task shown in (Ravi and Knight, 2011) we obtain better results with only 5% of the computational effort when running our method with an n-gram language model. | Deciphering Foreign Language by Combining Language Models and Context Vectors Malte Nuhn and Arne Mauser and Hermann Ney Human Language Technology and Pattern Recognition Group RWTH Aachen University Germany surname @ Abstract In this paper we show how to train statistical machine translation systems on real-life tasks using only non-parallel monolingual data from two languages. We present a modification of the method shown in Ravi and Knight 2011 that is scalable to vocabulary sizes of several thousand words. On the task shown in Ravi and Knight 2011 we obtain better results with only 5 of the computational effort when running our method with an n-gram language model. The efficiency improvement of our method allows us to run experiments with vocabulary sizes of around 5 000 words such as a non-parallel version of the Verbmobil corpus. We also report results using data from the monolingual French and English Gigaword corpora. 1 Introduction It has long been a vision of science fiction writers and scientists to be able to universally communicate in all languages. In these visions even previously unknown languages can be learned automatically from analyzing foreign language input. In this work we attempt to learn statistical translation models from only monolingual data in the source and target language. The reasoning behind this idea is that the elements of languages share statistical similarities that can be automatically identified and matched with other languages. This work is a big step towards large-scale and large-vocabulary unsupervised training of statistical translation models. Previous approaches have faced constraints in vocabulary or data size. We show how Author now at Google Inc. amauser@. 156 to scale unsupervised training to real-life translation tasks and how large-scale experiments can be done. Monolingual data is more readily available if not abundant compared to true parallel or even just translated data. Learning from

Như Quân 61 9 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Deciphering Foreign Language by Combining Language Models and Context Vectors"

9 44 0

Báo cáo khoa học: "Deciphering Foreign Language"

10 30 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462383 61

Giới thiệu :Lập trình mã nguồn mở

14 27244 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11388 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10588 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9870 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8914 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8537 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8114 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 8071 1836

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7321 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 153 2 22-01-2025

báo cáo hóa học:" Perceptions of rewards among volunteer caregivers of people living with AIDS working in faith-based organizations in South Africa: a qualitative study"

10 165 1 22-01-2025

Giáo án điện tử tiểu học môn lịch sử: Cách mạng mùa thu

39 171 1 22-01-2025

ĐỀ TÀI " ĐÁNH GIÁ HIỆU QUẢ HOẠT ĐỘNG KINH DOANH NGOẠI HỐI CỦA NGÂN HÀNG THƯƠNG MẠI CỔ PHẦN XUẤT NHẬP KHẨU VIỆT NAM "

51 160 3 22-01-2025

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 149 1 22-01-2025

Xinh xinh vườn nhà

6 135 0 22-01-2025

Lịch sử Trung Quốc 5000 năm tập 3 part 2

54 160 1 22-01-2025

CÂU HỎI TRẮC NGHIỆM HSLS NƯỚC TIỂU

9 181 0 22-01-2025

Báo cáo khoa học: "Tongue carcinoma in an adult Down's syndrome patient: a case report"

4 138 0 22-01-2025

5 lý do khiến phụ nữ không thể giảm cân

6 154 0 22-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8114 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 8071 1836

Ebook Chào con ba mẹ đã sẵn sàng

112 4475 1381

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6461 1285

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8914 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3882 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3933 610

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4833 568

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11388 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4550 490