Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We present a novel probabilistic classifier, which scales well to problems that involve a large number of classes and require training on large datasets. A prominent example of such a problem is language modeling. Our classifier is based on the assumption that each feature is associated with a predictive strength, which quantifies how well the feature can predict the class by itself. The predictions of individual features can then be combined according to their predictive strength, resulting in a model, whose parameters can be reliably and efficiently estimated. We show that a generative language model based on our classifier. | A Scalable Probabilistic Classifier for Language Modeling Joel Lang Institute for Language Cognition and Computation School of Informatics University of Edinburgh 10 Crichton Street Edinburgh Eh8 9AB uK J.Lang-3@sms.ed.ac.uk Abstract We present a novel probabilistic classifier which scales well to problems that involve a large number of classes and require training on large datasets. A prominent example of such a problem is language modeling. Our classifier is based on the assumption that each feature is associated with a predictive strength which quantifies how well the feature can predict the class by itself. The predictions of individual features can then be combined according to their predictive strength resulting in a model whose parameters can be reliably and efficiently estimated. We show that a generative language model based on our classifier consistently matches modified Kneser-Ney smoothing and can outperform it if sufficiently rich features are incorporated. 1 Introduction A Language Model LM is an important component within many natural language applications including speech recognition and machine translation. The task of a generative LM is to assign a probability p w to a sequence of words w W1. WL. It is common to factorize this probability as p w n p Wi Wi-N 1 .Wi-1 1 i 1 Thus the central problem that arises from this formulation consists of estimating the probability p wi wi-N 1. wi-1 . This can be viewed as a classification problem in which the target word Wi corresponds to the class that must be predicted based on features extracted from the conditioning context e.g. a word occurring in the context. 625 This paper describes a novel approach for modeling such conditional probabilities. We propose a classifier which is based on the assumption that each feature has a predictive strength quantifying how well the feature can predict the class target word by itself. Then the predictions made by individual features can be combined into a mixture model