TAILIEUCHUNG - Báo cáo khoa học: "Estimating Class Priors in Domain Adaptation for Word Sense Disambiguation"

Instances of a word drawn from different domains may have different sense priors (the proportions of the different senses of a word). This in turn affects the accuracy of word sense disambiguation (WSD) systems trained and applied on different domains. This paper presents a method to estimate the sense priors of words drawn from a new domain, and highlights the importance of using well calibrated probabilities when performing these estimations. By using well calibrated probabilities, we are able to estimate the sense priors effectively to achieve signiﬁcant improvements in WSD accuracy. . | Estimating Class Priors in Domain Adaptation for Word Sense Disambiguation Yee Seng Chan and Hwee Tou Ng Department of Computer Science National University of Singapore 3 Science Drive 2 Singapore 117543 chanys nght @ Abstract Instances of a word drawn from different domains may have different sense priors the proportions of the different senses of a word . This in turn affects the accuracy of word sense disambiguation WSD systems trained and applied on different domains. This paper presents a method to estimate the sense priors of words drawn from a new domain and highlights the importance of using well calibrated probabilities when performing these estimations. By using well calibrated probabilities we are able to estimate the sense priors effectively to achieve significant improvements in WSD accuracy. 1 Introduction Many words have multiple meanings and the process of identifying the correct meaning or sense of a word in context is known as word sense disambiguation WSD . Among the various approaches to WSD corpus-based supervised machine learning methods have been the most successful to date. With this approach one would need to obtain a corpus in which each ambiguous word has been manually annotated with the correct sense to serve as training data. However supervised WSD systems faced an important issue of domain dependence when using such a corpus-based approach. To investigate this Escudero et al. 2000 conducted experiments using the DSO corpus which contains sentences drawn from two different corpora namely Brown Corpus BC and Wall Street Journal WSJ . They found that training a WSD system on one part BC or WSJ of the DSO corpus and applying it to the other part can result in an accuracy drop of 12 to 19 . One reason for this is the difference in sense priors . the proportions of the different senses of a word between BC and WSJ. For instance the noun interest has these 6 senses in the DSO corpus sense 1 2 3 4 5 and 8. In the BC part of .

Quốc Trụ 73 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Estimating Class Priors in Domain Adaptation for Word Sense Disambiguation"

8 59 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462282 61

Giới thiệu :Lập trình mã nguồn mở

14 24826 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11280 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10506 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9784 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8461 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8089 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7463 1763

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7184 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 261 4 22-11-2024

Data Structures and Algorithms - Chapter 8: Heaps

41 172 5 22-11-2024

Bảng màu theo chữ cái – V

11 153 2 22-11-2024

Valve Selection Handbook - Fourth Edition

337 139 1 22-11-2024

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 139 1 22-11-2024

Bệnh sán lá gan trên gia súc và cách phòng trị

3 157 1 22-11-2024

Báo cáo nghiên cứu khoa học " Sự nhất quán phát triển kinh tế thị trường XHCN trong xây dựng xã hội hài hoà của Trung Quốc và đổi mới của Việt Nam "

8 138 1 22-11-2024

Lập trình Java cơ bản : Luồng và xử lý file part 8

5 133 1 22-11-2024

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining

101 133 1 22-11-2024

Phạm trù Chủ nghĩa cá nhân của tư tưởng phương Tây trong sự lý giải của Phan Khôi _1

9 117 0 22-11-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8089 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7463 1763

Ebook Chào con ba mẹ đã sẵn sàng

112 4364 1369

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6147 1258

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8876 1160

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3785 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3909 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4613 562

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11280 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4445 490