Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
The lack of positive results on supervised domain adaptation for WSD have cast some doubts on the utility of handtagging general corpora and thus developing generic supervised WSD systems. In this paper we show for the first time that our WSD system trained on a general source corpus (B NC) and the target corpus, obtains up to 22% error reduction when compared to a system trained on the target corpus alone. In addition, we show that as little as 40% of the target corpus (when supplemented with the source corpus) is sufficient to obtain the same results as training. | Supervised Domain Adaption for WSD Eneko Agirre and Oier Lopez de Lacalle IXA NLP Group University of the Basque Country Donostia Basque Contry e.agirre oier.lopezdelacalle @ehu.es Abstract The lack of positive results on supervised domain adaptation for WSD have cast some doubts on the utility of handtagging general corpora and thus developing generic supervised WSD systems. In this paper we show for the first time that our WSD system trained on a general source corpus Bnc and the target corpus obtains up to 22 error reduction when compared to a system trained on the target corpus alone. In addition we show that as little as 40 of the target corpus when supplemented with the source corpus is sufficient to obtain the same results as training on the full target data. The key for success is the use of unlabeled data with SVD a combination of kernels and SVM. 1 Introduction In many Natural Language Processing NLP tasks we find that a large collection of manually-annotated text is used to train and test supervised machine learning models. While these models have been shown to perform very well when tested on the text collection related to the training data what we call the source domain the performance drops considerably when testing on text from other domains called target domains . In order to build models that perform well in new target domains we usually find two settings Daume III 2007 . In the semi-supervised setting the training hand-annotated text from the source domain is supplemented with unlabeled data from the target domain. In the supervised setting we use training data from both the source and target domains to test on the target domain. In Agirre and Lopez de Lacalle 2008 we studied semi-supervised Word Sense Disambigua tion WSD adaptation and in this paper we focus on supervised WSD adaptation. We compare the performance of similar supervised WSD systems on three different scenarios. In the source to target scenario the WSD system is trained on the .