Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We apply machine learning techniques to classify automatically a set of verbs into lexical semantic classes, based on distributional approximations of diatheses, extracted from a very large annotated corpus. Distributions of four grammatical features are sufficient to reduce error rate by 50% over chance. We conclude that corpus data is a usable repository of verb class information, and that corpus-driven extraction of grammatical features is a promising methodology for automatic lexical acquisition. . | Proceedings of EACL 99 Automatic Verb Classification Using Distributions of Grammatical Features Suzanne Stevenson Paola Merlo Dept of Computer Science and Center for Cognitive Science RuCCS Rutgers University CoRE Building Busch Campus New Brunswick NJ 08903 U.S.A. LATL-Department of Linguistics University of Geneva 2 rue de Candolle 1211 Geneve 4 Switzerland merloOlettres.unige.ch suzanneSruccs.rutgers.edu Abstract We apply machine learning techniques to classify automatically a set of verbs into lexical semantic classes based on distributional approximations of diatheses extracted from a very large annotated corpus. Distributions of four grammatical features are sufficient to reduce error rate by 50 over chance. We conclude that corpus data is a usable repository of verb class information and that corpus-driven extraction of grammatical features is a promising methodology for automatic lexical acquisition. 1 Introduction Recent years have witnessed a shift in grammar development methodology from crafting large grammars to annotation of corpora. Correspondingly there has been a change from developing rule-based parsers to developing statistical methods for inducing grammatical knowledge from annotated corpus data. The shift has mostly occurred because building wide-coverage grammars is time-consuming error prone and difficult. The same can be said for crafting the rich lexical representations that are a central component of linguistic knowledge and research in automatic lexical acquisition has sought to address this Dorr and Jones 1996 Dorr 1997 among others . Yet there have been few attempts to learn finegrained lexical classifications from the statistical analysis of distributional data analogously to the induction of syntactic knowledge though see e.g. Brent 1993 Klavans and Chodorow 1992 Resnik 1992 . In this paper we propose such an approach for the automatic classification of verbs into lexical semantic classes.1 We can express the issues raised by this .