TAILIEUCHUNG - Báo cáo khoa học: "Joint and conditional estimation of tagging and parsing models∗"

This paper compares two different ways of estimating statistical language models. Many statistical NLP tagging and parsing models are estimated by maximizing the (joint) likelihood of the fully-observed training data. However, since these applications only require the conditional probability distributions, these distributions can in principle be learnt by maximizing the conditional likelihood of the training data. Perhaps somewhat surprisingly, models estimated by maximizing the joint were superior to models estimated by maximizing the conditional, even though some of the latter models intuitively had access to “more information”. . | Joint and conditional estimation of tagging and parsing models Mark Johnson Brown University Mark-Johnson@ Abstract This paper compares two different ways of estimating statistical language models. Many statistical NLP tagging and parsing models are estimated by maximizing the joint likelihood of the fully-observed training data. However since these applications only require the conditional probability distributions these distributions can in principle be learnt by maximizing the conditional likelihood of the training data. Perhaps somewhat surprisingly models estimated by maximizing the joint were superior to models estimated by maximizing the conditional even though some of the latter models intuitively had access to more information . 1 Introduction Many statistical NLP applications such as tagging and parsing involve finding the value of some hidden variable Y . a tag or a parse tree which maximizes a conditional probability distribution P6 Y X where X is a given word string. The model parameters 0 are typically estimated by maximum likelihood . maximizing the likelihood of the training I would like to thank Eugene Charniak and the other members ofBLLIP fortheir comments and suggestions. Fernando Pereira was especially generous with comments and suggestions as were the ACL reviewers I apologize for not being able to follow up all of your good suggestions. This research was supported by NsF awards 9720368 and 9721276 and NIH award R01 mH60922-01A2. data. Given a fully observed training corpus D yi x1 . yn xn the maximum joint likelihood estimate MLE of 0 is n 0 argmax TT Pộ yi Xi . 1 6 i 1 However it turns out there is another maximum likelihood estimation method which maximizes the conditional likelihood or pseudo-likelihood of the training data Besag 1975 . Maximum conditional likelihood is consistent for the conditional distribution. Given a training corpus D the maximum conditional likelihood estimate MCLE of the model parameters 0 is n 0 .

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.