TAILIEUCHUNG - Báo cáo khoa học: "Discovering Corpus-Specific Word Senses"

This paper presents an unsupervised algorithm which automatically discovers word senses from text. The algorithm is based on a graph model representing words and relationships between them. Sense clusters are iteratively computed by clustering the local graph of similar words around an ambiguous word. Discrimination against previously extracted sense clusters enables us to discover new senses. We use the same data for both recognising and resolving ambiguity. | Discovering Corpus-Specific Word Senses Beate Dorow Institut fur Maschinelle Sprachverarbeitung Universitãt Stuttgart Germany Dominic Widdows Center for the Study of Language and Information Stanford University California dwiddows@ Abstract This paper presents an unsupervised algorithm which automatically discovers word senses from text. The algorithm is based on a graph model representing words and relationships between them. Sense clusters are iteratively computed by clustering the local graph of similar words around an ambiguous word. Discrimination against previously extracted sense clusters enables us to discover new senses. We use the same data for both recognising and resolving ambiguity. 1 Introduction This paper describes an algorithm which automatically discovers word senses from free text and maps them to the appropriate entries of existing dictionaries or taxonomies. Automatic word sense discovery has applications of many kinds. It can greatly facilitate a lexicographer s work and can be used to automatically construct corpus-based taxonomies or to tune existing ones. The same corpus evidence which supports a clustering of an ambiguous word into distinct senses can be used to decide which sense is referred to in a given context Schiitze 1998 . This paper is organised as follows. In section 2 we present the graph model from which we discover word senses. Section 3 describes the way we divide graphs surrounding ambiguous words into different areas corresponding to different senses using Markov clustering van Dongen 2000 . The quality of the Markov clustering depends strongly on several parameters such as a granularity factor and the size of the local graph. In section 4 we outline a word sense discovery algorithm which bypasses the problem of parameter tuning. We conducted a pilot experiment to examine the performance of our algorithm on a set of words with varying degree of ambiguity. Section 5 describes

Yên Nhi 58 4 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461846 55

Giới thiệu :Lập trình mã nguồn mở

14 22508 57

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10861 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10024 445

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9488 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8199 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6642 253

Vật lý hạt cơ bản (1)

29 5754 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Mass Transfer in Multiphase Systems and its Applications Part 19

40 254 1 19-04-2024

extremetech Hacking BlackBerry phần 9

31 239 0 19-04-2024

Oreilly learning the vi Editor phần 4

19 228 0 19-04-2024

beginning Ubuntu Linux phần 1

34 211 1 19-04-2024

extremetech Hacking Firefox phần 7

46 185 0 19-04-2024

Magnetic Bearings Theory and Applications phần 2

14 170 0 19-04-2024

TƯƠNG QUAN GIỮA MÔ HỌC, GIẢI PHẪU VÀ HÌNH ẢNH CỦA CÁC KHỐI U PHẦN PHỤ

3 167 0 19-04-2024

Management and Services Part 1

10 155 0 19-04-2024

MySQL Basics for Visual Learners PHẦN 9

15 183 0 19-04-2024

Posted prices versus bargaining in markets_7

23 154 0 19-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5591 1326

Ebook Chào con ba mẹ đã sẵn sàng

112 3749 1228

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5246 1124

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3471 641

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10861 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3668 524

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4023 513

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4098 478