Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
The vast majority of work on word senses has relied on predefined sense inventories and an annotation schema where each word instance is tagged with the best fitting sense. This paper examines the case for a graded notion of word meaning in two experiments, one which uses WordNet senses in a graded fashion, contrasted with the “winner takes all” annotation, and one which asks annotators to judge the similarity of two usages. We find that the graded responses correlate with annotations from previous datasets, but sense assignments are used in a way that weakens the case for clear cut. | Investigations on Word Senses and Word Usages Katrin Erk University of Texas at Austin katrin.erk@mail.utexas.edu Diana McCarthy University of Sussex dianam@sussex.ac.uk Nicholas Gaylord University of Texas at Austin nlgaylord@mail.utexas.edu Abstract The vast majority of work on word senses has relied on predefined sense inventories and an annotation schema where each word instance is tagged with the best fitting sense. This paper examines the case for a graded notion of word meaning in two experiments one which uses WordNet senses in a graded fashion contrasted with the winner takes all annotation and one which asks annotators to judge the similarity of two usages. We find that the graded responses correlate with annotations from previous datasets but sense assignments are used in a way that weakens the case for clear cut sense boundaries. The responses from both experiments correlate with the overlap of paraphrases from the English lexical substitution task which bodes well for the use of substitutes as a proxy for word sense. This paper also provides two novel datasets which can be used for evaluating computational systems. 1 Introduction The vast majority of work on word sense tagging has assumed that predefined word senses from a dictionary are an adequate proxy for the task although of course there are issues with this enterprise both in terms of cognitive validity Hanks 2000 Kilgarriff 1997 Kilgarriff 2006 and adequacy for computational linguistics applications Kilgarriff 2006 . Furthermore given a predefined list of senses annotation efforts and computational approaches to word sense disambiguation WSD have usually assumed that one best fitting sense should be selected for each usage. While there is usually some allowance made for multiple senses this is typically not adopted by annotators or computational systems. Research on the psychology of concepts Murphy 2002 Hampton 2007 shows that categories in the human mind are not simply sets with clearcut .