Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper presents a method to combine a set of unsupervised algorithms that can accurately disambiguate word senses in a large, completely untagged corpus. Although most of the techniques for word sense resolution have been presented as stand-alone, it is our belief that full-fledged lexical ambiguity resolution should combine several information sources and techniques. The set of techniques have been applied in a combined way to disambiguate the genus terms of two machine-readable dictionaries (MRD), enabling us to construct complete taxonomies for Spanish and French. . | Combining Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation Euskal Herriko Unibertsitatea Donostia Basque Country j ibagbeeSsi.ehu.es German Rigau Jordi Atserias Eneko Agirre Dept de Llenguatges i Sist. Informatics Lengoaia eta Sist. Informatikoak saila Universitat Politècnica de Catalunya Barcelona Catalonia g.rigau batalla @lsi.upc.es Abstract This paper presents a method to combine a set of unsupervised algorithms that can accurately disambiguate word senses in a large completely untagged corpus. Although most of the techniques for word sense resolution have been presented as stand-alone it is our belief that full-fledged lexical ambiguity resolution should combine several information sources and techniques. The set of techniques have been applied in a combined way to disambiguate the genus terms of two machine-readable dictionaries MRD enabling US to construct complete taxonomies for Spanish and French. Tested accuracy is above 80 overall and 95 for two-way ambiguous genus terms showing that taxonomy building is not limited to structured dictionaries such as LDOCE. 1 Introduction While in English the lexical bottleneck problem Briscoe 1991 seems to be softened e.g. WordNet Miller 1990 Alvey Lexicon Grover et al. 1993 COMLEX Grishman et ah 1994 etc. there are no available wide range lexicons for natural language processing NLP for other languages. Manual construction of lexicons is the most reliable technique for obtaining structured lexicons but is costly and highly time-consuming. This is the reason for many researchers having focused on the massive acquisition of lexical knowledge and semantic information from pre-existing structured lexical resources as automatically as possible. This research has been partially funded by CICYT TIC96-1243-C03-02 ITEM project and the European Comission LE-4003 EuroWordNet project . As dictionaries are special texts whose subject matter is a language or a pair of languages in the case of bilingual .