Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Building on earlier work that integrates different factors in language modeling, we view (i) backing off to a shorter history and (ii) class-based generalization as two complementary mechanisms of using a larger equivalence class for prediction when the default equivalence class is too small for reliable estimation. This view entails that the classes in a language model should be learned from rare events only and should be preferably applied to rare events. We construct such a model and show that both training on rare events and preferable application to rare events improve perplexity when compared to a simple direct. | Integrating history-length interpolation and classes in language modeling Hinrich Schutze Institute for NLP University of Stuttgart Germany Abstract Building on earlier work that integrates different factors in language modeling we view i backing off to a shorter history and ii class-based generalization as two complementary mechanisms of using a larger equivalence class for prediction when the default equivalence class is too small for reliable estimation. This view entails that the classes in a language model should be learned from rare events only and should be preferably applied to rare events. We construct such a model and show that both training on rare events and preferable application to rare events improve perplexity when compared to a simple direct interpolation of class-based with standard language models. 1 Introduction Language models probability distributions over strings of words are fundamental to many applications in natural language processing. The main challenge in language modeling is to estimate string probabilities accurately given that even very large training corpora cannot overcome the inherent sparseness of word sequence data. One way to improve the accuracy of estimation is class-based generalization. The idea is that even though a particular word sequence s may not have occurred in the training set or too infrequently for accurate estimation the occurrence of sequences similar to s can help us better estimate p s . Plausible though this line of reasoning is the language models most commonly used today do not incorporate class-based generalization. This is partially due to the additional cost of creating classes and using classes as part of the model. But an equally important reason is that most models that integrate class-based information do so by way of a simple interpolation and achieve only a modest improvement in performance. In this paper we propose a new type of classbased language model. The key novelty is that we recognize that