TAILIEUCHUNG - Báo cáo khoa học: "Topic Models for Word Sense Disambiguation and Token-based Idiom Detection"

This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. | Topic Models for Word Sense Disambiguation and Token-based Idiom Detection Linlin Li Benjamin Roth and Caroline Sporleder Saarland University Postfach 15 11 50 66041 Saarbriicken Germany linlin beroth csporled @ Abstract This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks coarse-grained word sense disambiguation fine-grained word sense disambiguation and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases we outperform state-of-the-art systems either quantitatively or statistically significantly. 1 Introduction Word sense disambiguation WSD is the task of automatically determining the correct sense for a target word given the context in which it occurs. WSD is an important problem in NLP and an essential preprocessing step for many applications including machine translation question answering and information extraction. However WSD is a difficult task and despite the fact that it has been the focus of much research over the years state-of-the-art systems are still often not good enough for real-world applications. One major factor that makes WSD difficult is a relative lack of manually annotated corpora which hampers the performance of supervised systems. To address this problem there has been a significant amount of work on unsupervised WSD that does not require manually sense-disambiguated training data see McCarthy 2009 for an overview . Recently several researchers have experimented with topic models Brody and Lapata 2009 Boyd-Graber et al. 2007 Boyd-Graber and

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.