Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper investigates two elements of Maximum Entropy tagging: the use of a correction feature in the Generalised Iterative Scaling (Gis) estimation algorithm, and techniques for model smoothing. We show analytically and empirically that the correction feature, assumed to be required for the correctness of GIS, is unnecessary. We also explore the use of a Gaussian prior and a simple cutoff for smoothing. The experiments are performed with two tagsets: the standard Penn Treebank POS tagset and the larger set of lexical types from Combinatory Categorial Grammar. . | Investigating GIS and Smoothing for Maximum Entropy Taggers James R. Curran and Stephen Clark School of Informatics University of Edinburgh 2 Buccleuch Place Edinburgh. EH8 9LW jamesc stephenc @cogsci.ed.ac.uk Abstract This paper investigates two elements of Maximum Entropy tagging the use of a coưection feature in the Generalised Iterative Scaling Gis estimation algorithm and techniques for model smoothing. We show analytically and empirically that the correction feature assumed to be required for the correctness of GIS is unnecessary. We also explore the use of a Gaussian prior and a simple cutoff for smoothing. The experiments are performed with two tagsets the standard Penn Treebank POS tagset and the larger set of lexical types from Combinatory Categorial Grammar. 1 Introduction The use of maximum entropy ME models has become popular in Statistical NLP some example applications include part-of-speech POS tagging Ratnaparkhi 1996 parsing Ratnaparkhi 1999 Johnson et al. 1999 and language modelling Rosenfeld 1996 . Many tagging problems have been successfully modelled in the ME framework including POS tagging with state of the art performance van Halteren et al. 2001 supertagging Clark 2002 and chunking Koeling 2000 . Generalised Iterative Scaling GIS is a very simple algorithm for estimating the parameters of a ME model. The original formulation of GIS Dar-roch and Ratcliff 1972 required the sum of the feature values for each event to be constant. Since this is not the case for many applications the standard method is to add a correction or slack feature to each event. Improved Iterative Scaling IIS Berger et al. 1996 Della Pietra et al. 1997 eliminated the correction feature to improve the convergence rate of the algorithm. However the extra book keeping required for IIS means that GIS is often faster in practice Malouf 2002 . This paper shows by a simple adaptation of Berger s proof for the convergence of IIS Berger 1997 that GIS does not require a correction