Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We describe a speedup for training conditional maximum entropy models. The algorithm is a simple variation on Generalized Iterative Scaling, but converges roughly an order of magnitude faster, depending on the number of constraints, and the way speed is measured. Rather than attempting to train all model parameters simultaneously, the algorithm trains them sequentially. The algorithm is easy to implement, typically uses only slightly more memory, and will lead to improvements for most maximum entropy problems. . | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 9-16. Sequential Conditional Generalized Iterative Scaling Joshua Goodman Microsoft Research One Microsoft Way Redmond WA 98052 joshuago@microsoft.com Abstract We describe a speedup for training conditional maximum entropy models. The algorithm is a simple variation on Generalized Iterative Scaling but converges roughly an order of magnitude faster depending on the number of constraints and the way speed is measured. Rather than attempting to train all model parameters simultaneously the algorithm trains them sequentially. The algorithm is easy to implement typically uses only slightly more memory and will lead to improvements for most maximum entropy problems. 1 Introduction Conditional Maximum Entropy models have been used for a variety of natural language tasks including Language Modeling Rosenfeld 1994 part-of-speech tagging prepositional phrase attachment and parsing Ratnaparkhi 1998 word selection for machine translation Berger et al. 1996 and finding sentence boundaries Reynar and Ratnaparkhi 1997 . Unfortunately although maximum entropy maxent models can be applied very generally the typical training algorithm for maxent Generalized Iterative Scaling GIS Darroch and Ratcliff 1972 can be extremely slow. We have personally used up to a month of computer time to train a single model. There have been several attempts to speed up maxent training Della Pietra et al. 1997 Wu and Khu-danpur 2000 Goodman 2001 . However as we describe later each of these has suffered from applicability to a limited number of applications. Darroch and Ratcliff 1972 describe GIS for joint probabilities and mention a fast variation which appears to have been missed by the conditional maxent community. We show that this fast variation can also be used for conditional probabilities and that it is useful for a larger range of problems than traditional speedup techniques. .