Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Log-linear models provide a statistically sound framework for Stochastic "Unification-Based" Grammars (SUBGs) and stochastic versions of other kinds of grammars. We describe two computationally-tractable ways of estimating the parameters of such grammars from a training corpus of syntactic analyses, and apply these to estimate a stochastic version of LexicalFunctional Grammar. | Estimators for Stochastic Unification-Based Grammars Mark Johnson Cognitive and Linguistic Sciences Brown University Stuart Geman Stephen Canon Applied Mathematics Cognitive and Linguistic Sciences Brown University Brown University Zhiyi Chi Dept of Statistics The University of Chicago Stefan Riezler Institut fur Maschinelle Sprachverarbeitung Universitât Stuttgart Abstract Log-linear models provide a statistically sound framework for Stochastic Unification-Based Grammars SUBGs and stochastic versions of other kinds of grammars. We describe two computationally-tractable ways of estimating the parameters of such grammars from a training corpus of syntactic analyses and apply these to estimate a stochastic version of Lexical-Functional Grammar. 1 Introduction Probabilistic methods have revolutionized computational linguistics. They can provide a systematic treatment of preferences in parsing. Given a suitable estimation procedure stochastic models can be tuned to reflect the properties of a corpus. On the other hand Unification-Based Grammars UBGs can express a variety of linguistically-important syntactic and semantic constraints. However developing Stochastic Unification-based Grammars SUBGs has not proved as straightforward as might be hoped. The simple relative frequency estimator for PCFGs yields the maximum likelihood parameter estimate which is to say that it minimizes the Kulback-Liebler divergence between the training and estimated distributions. On the other hand as Abney 1997 points out the context-sensitive dependencies that unification-based constraints introduce render the relative frequency estimator suboptimal in general it does not maximize the likelihood and it is inconsistent. This research was supported by the National Science Foundation SBR-9720368 the US Army Research Office DAAH04-96-BAA5 and Office of Naval Research N00014-97-1-0249 . Abney 1997 proposes a Markov Random Field or log linear model for SUBGs and the models described here are .