TAILIEUCHUNG - Báo cáo khoa học: "Topic Models for Word Sense Disambiguation and Token-based Idiom Detection"

This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. | Topic Models for Word Sense Disambiguation and Token-based Idiom Detection Linlin Li Benjamin Roth and Caroline Sporleder Saarland University Postfach 15 11 50 66041 Saarbriicken Germany linlin beroth csporled @ Abstract This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks coarse-grained word sense disambiguation fine-grained word sense disambiguation and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases we outperform state-of-the-art systems either quantitatively or statistically significantly. 1 Introduction Word sense disambiguation WSD is the task of automatically determining the correct sense for a target word given the context in which it occurs. WSD is an important problem in NLP and an essential preprocessing step for many applications including machine translation question answering and information extraction. However WSD is a difficult task and despite the fact that it has been the focus of much research over the years state-of-the-art systems are still often not good enough for real-world applications. One major factor that makes WSD difficult is a relative lack of manually annotated corpora which hampers the performance of supervised systems. To address this problem there has been a significant amount of work on unsupervised WSD that does not require manually sense-disambiguated training data see McCarthy 2009 for an overview . Recently several researchers have experimented with topic models Brody and Lapata 2009 Boyd-Graber et al. 2007 Boyd-Graber and

Ngọc Huệ 61 10 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Application of dynamic topic models to toxicogenomics data

10 69 1

Báo cáo khoa học: "Topic Models for Dynamic Translation Model Adaptation"

5 82 0

Báo cáo khoa học: "Authorship Attribution with Author-aware Topic Models"

6 57 0

Báo cáo khoa học: "Topic Models for Word Sense Disambiguation and Token-based Idiom Detection"

10 48 0

Báo cáo khoa học: "PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names"

10 66 0

Báo cáo khoa học: "Automatic Labelling of Topic Models"

10 48 0

Báo cáo khoa học: "Identifying Word Translations from Comparable Corpora Using Latent Topic Models"

6 63 0

Báo cáo khoa học: "Employing Topic Models for Pattern-based Semantic Class Discovery"

9 63 0

Báo cáo khoa học: "Multi-Document Summarization using Sentence-based Topic Models"

4 51 0

Báo cáo khoa học: "Unsupervised Topic Identiﬁcation by Integrating Linguistic and Visual Information Based on Hidden Markov Models"

8 75 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461847 55

Giới thiệu :Lập trình mã nguồn mở

14 22518 57

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10865 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10029 445

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9490 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8243 1124

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8206 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7860 2220

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6646 253

Vật lý hạt cơ bản (1)

29 5755 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Động cơ đốt trong và máy kéo công nghiêp tập 1 part 7

23 256 0 20-04-2024

Động cơ đốt trong và máy kéo công nghiêp tập 2 part 8

32 258 0 20-04-2024

Sáng tạo trong thuật toán và lập trình với ngôn ngữ Pascal và C# Tập 2 - Chương 4

47 245 1 20-04-2024

Mass Transfer in Multiphase Systems and its Applications Part 19

40 254 1 20-04-2024

beginning Ubuntu Linux phần 1

34 211 1 20-04-2024

MÔN HỌC VẬT LIỆU VÀ CÔNG NGHỆ KIM LOẠI - PHẦN I: KIM LOẠI HỌC

32 175 2 20-04-2024

The profit magic of stock Timing The Markets_5

22 117 0 20-04-2024

báo cáo hóa học:" Endoscopic decompression for intraforaminal and extraforaminal nerve root compression"

7 106 0 20-04-2024

Diseases of the Liver and Biliary System - part 1

33 120 0 20-04-2024

Data Structures and Algorithms - Chapter 9: Hashing

54 111 0 20-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7860 2220

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5601 1327

Ebook Chào con ba mẹ đã sẵn sàng

112 3752 1229

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8243 1124

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5255 1124

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3473 641

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10865 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3670 524

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4024 513

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4100 478