Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Instances of a word drawn from different domains may have different sense priors (the proportions of the different senses of a word). This in turn affects the accuracy of word sense disambiguation (WSD) systems trained and applied on different domains. This paper presents a method to estimate the sense priors of words drawn from a new domain, and highlights the importance of using well calibrated probabilities when performing these estimations. By using well calibrated probabilities, we are able to estimate the sense priors effectively to achieve significant improvements in WSD accuracy. . | Estimating Class Priors in Domain Adaptation for Word Sense Disambiguation Yee Seng Chan and Hwee Tou Ng Department of Computer Science National University of Singapore 3 Science Drive 2 Singapore 117543 chanys nght @comp.nus.edu.sg Abstract Instances of a word drawn from different domains may have different sense priors the proportions of the different senses of a word . This in turn affects the accuracy of word sense disambiguation WSD systems trained and applied on different domains. This paper presents a method to estimate the sense priors of words drawn from a new domain and highlights the importance of using well calibrated probabilities when performing these estimations. By using well calibrated probabilities we are able to estimate the sense priors effectively to achieve significant improvements in WSD accuracy. 1 Introduction Many words have multiple meanings and the process of identifying the correct meaning or sense of a word in context is known as word sense disambiguation WSD . Among the various approaches to WSD corpus-based supervised machine learning methods have been the most successful to date. With this approach one would need to obtain a corpus in which each ambiguous word has been manually annotated with the correct sense to serve as training data. However supervised WSD systems faced an important issue of domain dependence when using such a corpus-based approach. To investigate this Escudero et al. 2000 conducted experiments using the DSO corpus which contains sentences drawn from two different corpora namely Brown Corpus BC and Wall Street Journal WSJ . They found that training a WSD system on one part BC or WSJ of the DSO corpus and applying it to the other part can result in an accuracy drop of 12 to 19 . One reason for this is the difference in sense priors i.e. the proportions of the different senses of a word between BC and WSJ. For instance the noun interest has these 6 senses in the DSO corpus sense 1 2 3 4 5 and 8. In the BC part of .