Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Previous research has demonstrated the utility of clustering in inducing semantic verb classes from undisambiguated corpus data. We describe a new approach which involves clustering subcategorization frame (SCF) distributions using the Information Bottleneck and nearest neighbour methods. In contrast to previous work, we particularly focus on clustering polysemic verbs. A novel evaluation scheme is proposed which accounts for the effect of polysemy on the clusters, offering us a good insight into the potential and limitations of semantically classifying undisambiguated SCF data. . | Clustering Polysemic Subcategorization Frame Distributions Semantically Anna Korhonen Computer Laboratory University of Cambridge 15 JJ Thomson Avenue Cambridge CB3 0FD UK alk23@cl.cam.ac.uk Yuval Krymolowski Division of Informatics University of Edinburgh 2 Buccleuch Place Edinburgh EH8 9LW Scotland UK ykrymolo@inf.ed.ac.uk Zvika Marx Interdisciplinary Center for Neural Computation The Hebrew University Jerusalem Israel zvim@cs.huji.ac.il Abstract Previous research has demonstrated the utility of clustering in inducing semantic verb classes from undisambiguated corpus data. We describe a new approach which involves clustering subcategorization frame scf distributions using the Information Bottleneck and nearest neighbour methods. In contrast to previous work we particularly focus on clustering polysemic verbs. A novel evaluation scheme is proposed which accounts for the effect of polysemy on the clusters offering us a good insight into the potential and limitations of semantically classifying undisambiguated SCF data. 1 Introduction Classifications which aim to capture the close relation between the syntax and semantics of verbs have attracted a considerable research interest in both linguistics and computational linguistics e.g. Jack-endoff 1990 Levin 1993 Pinker 1989 Dang et al. 1998 Dorr 1997 Merlo and Stevenson 2001 . While such classifications may not provide a means for full semantic inferencing they can capture generalizations over a range of linguistic properties and can therefore be used as a means of reducing redundancy in the lexicon and for filling gaps in lexical knowledge. This work was partly supported by UK EPSRC project GR N36462 93 Robust Accurate Statistical Parsing RASP . Verb classifications have in fact been used to support many natural language processing nlp tasks such as language generation machine translation Dorr 1997 document classification Klavans and Kan 1998 word sense disambiguation Dorr and Jones 1996 and subcategorization .