TAILIEUCHUNG - Báo cáo khoa học: "Estimating Class Priors in Domain Adaptation for Word Sense Disambiguation"

Instances of a word drawn from different domains may have different sense priors (the proportions of the different senses of a word). This in turn affects the accuracy of word sense disambiguation (WSD) systems trained and applied on different domains. This paper presents a method to estimate the sense priors of words drawn from a new domain, and highlights the importance of using well calibrated probabilities when performing these estimations. By using well calibrated probabilities, we are able to estimate the sense priors effectively to achieve significant improvements in WSD accuracy. . | Estimating Class Priors in Domain Adaptation for Word Sense Disambiguation Yee Seng Chan and Hwee Tou Ng Department of Computer Science National University of Singapore 3 Science Drive 2 Singapore 117543 chanys nght @ Abstract Instances of a word drawn from different domains may have different sense priors the proportions of the different senses of a word . This in turn affects the accuracy of word sense disambiguation WSD systems trained and applied on different domains. This paper presents a method to estimate the sense priors of words drawn from a new domain and highlights the importance of using well calibrated probabilities when performing these estimations. By using well calibrated probabilities we are able to estimate the sense priors effectively to achieve significant improvements in WSD accuracy. 1 Introduction Many words have multiple meanings and the process of identifying the correct meaning or sense of a word in context is known as word sense disambiguation WSD . Among the various approaches to WSD corpus-based supervised machine learning methods have been the most successful to date. With this approach one would need to obtain a corpus in which each ambiguous word has been manually annotated with the correct sense to serve as training data. However supervised WSD systems faced an important issue of domain dependence when using such a corpus-based approach. To investigate this Escudero et al. 2000 conducted experiments using the DSO corpus which contains sentences drawn from two different corpora namely Brown Corpus BC and Wall Street Journal WSJ . They found that training a WSD system on one part BC or WSJ of the DSO corpus and applying it to the other part can result in an accuracy drop of 12 to 19 . One reason for this is the difference in sense priors . the proportions of the different senses of a word between BC and WSJ. For instance the noun interest has these 6 senses in the DSO corpus sense 1 2 3 4 5 and 8. In the BC part of .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.