Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Sense induction seeks to automatically identify word senses directly from a corpus. A key assumption underlying previous work is that the context surrounding an ambiguous word is indicative of its meaning. Sense induction is thus typically viewed as an unsupervised clustering problem where the aim is to partition a word’s contexts into different classes, each representing a word sense. Our work places sense induction in a Bayesian context by modeling the contexts of the ambiguous word as samples from a multinomial distribution over senses which are in turn characterized as distributions over words. . | Bayesian Word Sense Induction Samuel Brody Dept. of Biomedical Informatics Columbia University samuel.brody@dbmi.columbia.edu Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk Abstract Sense induction seeks to automatically identify word senses directly from a corpus. A key assumption underlying previous work is that the context surrounding an ambiguous word is indicative of its meaning. Sense induction is thus typically viewed as an unsupervised clustering problem where the aim is to partition a word s contexts into different classes each representing a word sense. Our work places sense induction in a Bayesian context by modeling the contexts of the ambiguous word as samples from a multinomial distribution over senses which are in turn characterized as distributions over words. The Bayesian framework provides a principled way to incorporate a wide range of features beyond lexical cooccurrences and to systematically assess their utility on the sense induction task. The proposed approach yields improvements over state-of-the-art systems on a benchmark dataset. 1 Introduction Sense induction is the task of discovering automatically all possible senses of an ambiguous word. It is related to but distinct from word sense disambiguation WSD where the senses are assumed to be known and the aim is to identify the intended meaning of the ambiguous word in context. Although the bulk of previous work has been devoted to the disambiguation problem1 there are good reasons to believe that sense induction may be able to overcome some of the issues associated with WSD. Since most disambiguation methods assign senses according to and with the aid Approaches to WSD are too numerous to list We refer the interested reader to Agirre et al. 2007 for an overview of the state of the art. of dictionaries or other lexical resources it is difficult to adapt them to new domains or to languages where such resources are scarce. A related problem concerns the .