TAILIEUCHUNG - Báo cáo khoa học: "Learning Common Grammar from Multilingual Corpus"

We propose a corpus-based probabilistic framework to extract hidden common syntax across languages from non-parallel multilingual corpora in an unsupervised fashion. For this purpose, we assume a generative model for multilingual corpora, where each sentence is generated from a language dependent probabilistic contextfree grammar (PCFG), and these PCFGs are generated from a prior grammar that is common across languages. | Learning Common Grammar from Multilingual Corpus Tomoharu Iwata Daichi Mochihashi Hiroshi Sawada NTT Communication Science Laboratories 2-4 Hikaridai Seika-cho Soraku-gun Kyoto Japan iwata daichi sawada @ Abstract We propose a corpus-based probabilistic framework to extract hidden common syntax across languages from non-parallel multilingual corpora in an unsupervised fashion. For this purpose we assume a generative model for multilingual corpora where each sentence is generated from a language dependent probabilistic context-free grammar PCFG and these PCFGs are generated from a prior grammar that is common across languages. We also develop a variational method for efficient inference. Experiments on a non-parallel multilingual corpus of eleven languages demonstrate the feasibility of the proposed method. 1 Introduction Languages share certain common properties Pinker 1994 . For example the word order in most European languages is subject-verb-object SVO and some words with similar forms are used with similar meanings in different languages. The reasons for these common properties can be attributed to 1 a common ancestor language 2 borrowing from nearby languages and 3 the innate abilities of humans Chomsky 1965 . We assume hidden commonalities in syntax across languages and try to extract a common grammar from non-parallel multilingual corpora. For this purpose we propose a generative model for multilingual grammars that is learned in an unsupervised fashion. There are some computational models for capturing commonalities at the phoneme and word level Oakes 2000 Bouchard-Cote et al. 2008 but as far as we know no attempt has been made to extract commonalities in syntax level from non-parallel and non-annotated multilingual corpora. In our scenario we use probabilistic context-free grammars PCFGs as our monolingual grammar model. We assume that a PCFG for each language is generated from a general model that are common across languages and each .

Kim Toàn 33 5 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Learning Common Grammar from Multilingual Corpus"

5 25 0

Stakeholers’ assessments of BA English programs’ Expected Learning Outcomes at Vietnam National University of Agriculture

18 12 1

Ebook Neural network and deep learning: A textbook

512 25 1

575 Reported Speech

31 96 0

GRE Study Group Kaplan Vocabulary

66 116 0

Lecture Financial derivatives - Lecture 5: Common financial derivatives

29 67 0

High-order dynamic Bayesian Network learning with hidden common causes for causal gene regulatory network

28 35 1

English language graduation thesis: Common speaking errors made by 1st year English majors at HPU

57 53 1

CEFR: education towards 21st century of learning. Why matters?

7 19 3

Common reading techniques for English language readers

3 55 2

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462291 61

Giới thiệu :Lập trình mã nguồn mở

14 24918 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11286 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10511 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9790 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8467 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8090 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7471 1763

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7188 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 374 3 26-11-2024

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 213 3 26-11-2024

Quy Trình Canh Tác Cây Bông Vải

8 148 2 26-11-2024

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 167 3 26-11-2024

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 159 2 26-11-2024

Báo cáo " Thẩm quyền quản lí nhà nước đối với hoạt động quảng cáo thực trạng và hướng hoàn thiện "

7 196 7 26-11-2024

Valve Selection Handbook - Fourth Edition

337 139 1 26-11-2024

ĐỀ TÀI " ĐÁNH GIÁ HIỆU QUẢ HOẠT ĐỘNG KINH DOANH NGOẠI HỐI CỦA NGÂN HÀNG THƯƠNG MẠI CỔ PHẦN XUẤT NHẬP KHẨU VIỆT NAM "

51 144 3 26-11-2024

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 135 1 26-11-2024

Báo cáo nghiên cứu khoa học " NÂNG QUAN HỆ KINH TẾ THƯƠNG MẠI VIỆT NAM - TRUNG QUỐC LÊN TẦM CAO THỜI ĐẠI "

8 159 1 26-11-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8090 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7471 1763

Ebook Chào con ba mẹ đã sẵn sàng

112 4364 1369

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6156 1258

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3790 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3909 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4618 562

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11286 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4454 490