TAILIEUCHUNG - Báo cáo khoa học: "A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora"

We present a geometric view on bilingual lexicon extraction from comparable corpora, which allows to re-interpret the methods proposed so far and identify unresolved problems. This motivates three new methods that aim at solving these problems. Empirical evaluation shows the strengths and weaknesses of these methods, as well as a significant gain in the accuracy of extracted lexicons. and polysemy problems. | A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora E. Gaussiery . Renders I. Matveeva C. Gouttey H. Dejeany Xerox Research Centre Europe 6 Chemin de Maupertuis 38320 Meylan France Dept of Computer Science University of Chicago 1100 E. 58th St. Chicago IL 60637 USA matveeva@ Abstract We present a geometric view on bilingual lexicon extraction from comparable corpora which allows to re-interpret the methods proposed so far and identify unresolved problems. This motivates three new methods that aim at solving these problems. Empirical evaluation shows the strengths and weaknesses of these methods as well as a significant gain in the accuracy of extracted lexicons. 1 Introduction Comparable corpora contain texts written in different languages that roughly speaking talk about the same thing . In comparison to parallel corpora ie corpora which are mutual translations comparable corpora have not received much attention from the research community and very few methods have been proposed to extract bilingual lexicons from such corpora. However except for those found in translation services or in a few international organisations which by essence produce parallel documentations most existing multilingual corpora are not parallel but comparable. This concern is reflected in major evaluation conferences on crosslanguage information retrieval CLIR . CLEF1 which only use comparable corpora for their multilingual tracks. We adopt here a geometric view on bilingual lexicon extraction from comparable corpora which allows one to re-interpret the methods proposed thus far and formulate new ones inspired by latent semantic analysis LSA which was developed within the information retrieval IR community to treat synonymous and polysemous terms Deerwester et al. 1990 . We will explain in this paper the motivations behind the use of such methods for bilingual lexicon extraction from comparable corpora and show how to

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.