TAILIEUCHUNG - Báo cáo khoa học: "Determining Word Sense Dominance Using a Thesaurus"

The degree of dominance of a sense of a word is the proportion of occurrences of that sense in text. We propose four new methods to accurately determine word sense dominance using raw text and a published thesaurus. Unlike the McCarthy et al. (2004) system, these methods can be used on relatively small target texts, without the need for a similarly-sensedistributed auxiliary text. We perform an extensive evaluation using artiﬁcially generated thesaurus-sense-tagged data. In the process, we create a word–category cooccurrence matrix, which can be used for unsupervised word sense disambiguation and estimating distributional similarity of word senses, as. | Determining Word Sense Dominance Using a Thesaurus Saif Mohammad and Graeme Hirst Department of Computer Science University of Toronto Toronto ON M5S 3G4 Canada smm gh @ Abstract The degree of dominance of a sense of a word is the proportion of occurrences of that sense in text. We propose four new methods to accurately determine word sense dominance using raw text and a published thesaurus. Unlike the McCarthy et al. 2004 system these methods can be used on relatively small target texts without the need for a similarly-sense-distributed auxiliary text. We perform an extensive evaluation using artificially generated thesaurus-sense-tagged data. In the process we create a word-category cooccurrence matrix which can be used for unsupervised word sense disambiguation and estimating distributional similarity of word senses as well. 1 Introduction The occurrences of the senses of a word usually have skewed distribution in text. Further the distribution varies in accordance with the domain or topic of discussion. For example the assertion of illegality sense of charge is more frequent in the judicial domain while in the domain of economics the expense cost sense occurs more often. Formally the degree of dominance of a particular sense of a word target word in a given text target text may be defined as the ratio of the occurrences of the sense to the total occurrences of the target word. The sense with the highest dominance in the target text is called the predominant sense of the target word. Determination of word sense dominance has many uses. An unsupervised system will benefit by backing off to the predominant sense in case of insufficient evidence. The dominance values may be used as prior probabilities for the different senses obviating the need for labeled training data in a sense disambiguation task. Natural language systems can choose to ignore infrequent senses of words or consider only the most dominant senses McCarthy et al. 2004 . An .

Khải Hòa 76 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "An Efficient Method for Determining Bilingual Word Classes"

6 51 0

Báo cáo khoa học: "Determining Word Sense Dominance Using a Thesaurus"

8 62 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461838 55

Giới thiệu :Lập trình mã nguồn mở

14 22503 57

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10855 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10024 445

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9479 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8198 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6639 253

Vật lý hạt cơ bản (1)

29 5752 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Đánh giá hao mòn và độ tin cậy của chi tiết và kết cấu trên đầu máy diezel part 3

12 293 0 18-04-2024

Động cơ đốt trong và máy kéo công nghiêp tập 2 part 8

32 258 0 18-04-2024

Sáng tạo trong thuật toán và lập trình với ngôn ngữ Pascal và C# Tập 2 - Chương 4

47 245 1 18-04-2024

Bibliography on Medieval Women, Gender, and Medicine 1980-2009

82 205 0 18-04-2024

Bơm máy nén quạt trong công nghệ part 1

20 248 2 18-04-2024

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 173 0 18-04-2024

MySQL Database Usage & Administration PHẦN 9

37 137 0 18-04-2024

BÀI GIẢNG VỀ - MẠCH ĐIỆN II - Chương I: Phân tích mạch trong miền thời gian

38 140 0 18-04-2024

Hướng dẫn sử dụng Quickoffice cho Ipad và Iphone

13 150 0 18-04-2024

báo cáo hóa học:" Endoscopic decompression for intraforaminal and extraforaminal nerve root compression"

7 106 0 18-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5580 1320

Ebook Chào con ba mẹ đã sẵn sàng

112 3740 1228

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5241 1124

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3471 641

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10855 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3668 524

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4015 513

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4092 478