Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper describes a method for learning the countability preferences of English nouns from raw text corpora. The method maps the corpus-attested lexico-syntactic properties of each noun onto a feature vector, and uses a suite of memory-based classifiers to predict membership in 4 countability classes. We were able to assign countability to English nouns with a precision of 94.6%. ence. Knowledge of countability preferences is important both for the analysis and generation of English. In analysis, it helps to constrain the interpretations of parses. . | Learning the Countability of English Nouns from Corpus Data Timothy Baldwin CSLI Stanford University Stanford CA 94305 tbaldwin@csli.stanford.edu Francis Bond NTT Communication Science Laboratories Nippon Telegraph and Telephone Corporation Kyoto Japan bond@cslab.kecl.ntt.co.jp Abstract This paper describes a method for learning the countability preferences of English nouns from raw text corpora. The method maps the corpus-attested lexico-syntactic properties of each noun onto a feature vector and uses a suite of memory-based classifiers to predict membership in 4 countability classes. We were able to assign countability to English nouns with a precision of 94.6 . 1 Introduction This paper is concerned with the task of knowledgerich lexical acquisition from unannotated corpora focusing on the case of countability in English. Knowledge-rich lexical acquisition takes unstructured text and extracts out linguistically-precise categorisations of word and expression types. By combining this with a grammar we can build broad-coverage deep-processing tools with a minimum of human effort. This research is close in spirit to the work of Light 1996 on classifying the semantics of derivational affixes and Siegel and McKeown 2000 on learning verb aspect. In English nouns heading noun phrases are typically either countable or uncountable also called count and mass . Countable nouns can be modified by denumerators prototypically numbers and have a morphologically marked plural form one dog two dogs. Uncountable nouns cannot be modified by denumerators but can be modified by unspecific quantifiers such as much and do not show any number distinction prototypically being singular one equipment some equipment two equipments. Many nouns can be used in countable or uncountable environments with differences in interpretation. We call the lexical property that determines which uses a noun can have the noun s countability prefer ence. Knowledge of countability preferences is important .