TAILIEUCHUNG - Báo cáo khoa học: "Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models"

We describe two probabilistic models for unsupervised word-sense disambiguation using parallel corpora. The first model, which we call the Sense model, builds on the work of Diab and Resnik (2002) that uses both parallel text and a sense inventory for the target language, and recasts their approach in a probabilistic framework. The second model, which we call the Concept model, is a hierarchical model that uses a concept latent variable to relate different language specific sense labels. | Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models Indrajit Bhattacharya Dept. of Computer Science University of Maryland College Park Md USA indrajit@ Lise Getoor Dept. of Computer Science University of Maryland College Park Md USA getoor@ Yoshua Bengio Dept. IRO Universite de Montreal Montreal Quebec Canada bengioy@ Abstract We describe two probabilistic models for unsupervised word-sense disambiguation using parallel corpora. The first model which we call the Sense model builds on the work of Diab and Resnik 2002 that uses both parallel text and a sense inventory for the target language and recasts their approach in a probabilistic framework. The second model which we call the Concept model is a hierarchical model that uses a concept latent variable to relate different language specific sense labels. We show that both models improve performance on the word sense disambiguation task over previous unsupervised approaches with the Concept model showing the largest improvement. Furthermore in learning the Concept model as a by-product we learn a sense inventory for the parallel language. 1 Introduction Word sense disambiguation WSD has been a central question in the computational linguistics community since its inception. WSD is fundamental to natural language understanding and is a useful intermediate step for many other language processing tasks Ide and Veronis 1998 . Many recent approaches make use of ideas from statistical machine learning the availability of shared sense definitions . WordNet Fellbaum 1998 and recent international competitions Kilgarrif and Rosen-zweig 2000 have enabled researchers to compare their results. Supervised approaches which make use of a small hand-labeled training set Bruce and Wiebe 1994 Yarowsky 1993 typically outperform unsupervised approaches Agirre et al. 2000 Litkowski 2000 Lin 2000 Resnik 1997 Yarowsky 1992 Yarowsky 1995 but tend to be tuned to a specific corpus and .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.