TAILIEUCHUNG - Báo cáo khoa học: "Learning Common Grammar from Multilingual Corpus"

We propose a corpus-based probabilistic framework to extract hidden common syntax across languages from non-parallel multilingual corpora in an unsupervised fashion. For this purpose, we assume a generative model for multilingual corpora, where each sentence is generated from a language dependent probabilistic contextfree grammar (PCFG), and these PCFGs are generated from a prior grammar that is common across languages. | Learning Common Grammar from Multilingual Corpus Tomoharu Iwata Daichi Mochihashi Hiroshi Sawada NTT Communication Science Laboratories 2-4 Hikaridai Seika-cho Soraku-gun Kyoto Japan iwata daichi sawada @ Abstract We propose a corpus-based probabilistic framework to extract hidden common syntax across languages from non-parallel multilingual corpora in an unsupervised fashion. For this purpose we assume a generative model for multilingual corpora where each sentence is generated from a language dependent probabilistic context-free grammar PCFG and these PCFGs are generated from a prior grammar that is common across languages. We also develop a variational method for efficient inference. Experiments on a non-parallel multilingual corpus of eleven languages demonstrate the feasibility of the proposed method. 1 Introduction Languages share certain common properties Pinker 1994 . For example the word order in most European languages is subject-verb-object SVO and some words with similar forms are used with similar meanings in different languages. The reasons for these common properties can be attributed to 1 a common ancestor language 2 borrowing from nearby languages and 3 the innate abilities of humans Chomsky 1965 . We assume hidden commonalities in syntax across languages and try to extract a common grammar from non-parallel multilingual corpora. For this purpose we propose a generative model for multilingual grammars that is learned in an unsupervised fashion. There are some computational models for capturing commonalities at the phoneme and word level Oakes 2000 Bouchard-Cote et al. 2008 but as far as we know no attempt has been made to extract commonalities in syntax level from non-parallel and non-annotated multilingual corpora. In our scenario we use probabilistic context-free grammars PCFGs as our monolingual grammar model. We assume that a PCFG for each language is generated from a general model that are common across languages and each .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.