Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We present a class-based language model that clusters rare words of similar morphology together. The model improves the prediction of words after histories containing outof-vocabulary words. The morphological features used are obtained without the use of labeled data. The perplexity improvement compared to a state of the art Kneser-Ney model is 4% overall and 81% on unknown histories. | Improved Modeling of Out-Of-Vocabulary Words Using Morphological Classes Thomas Muller and Hinrich Schutze Institute for Natural Language Processing University of Stuttgart Germany muellets@ims.uni-stuttgart.de Abstract We present a class-based language model that clusters rare words of similar morphology together. The model improves the prediction of words after histories containing out-of-vocabulary words. The morphological features used are obtained without the use of labeled data. The perplexity improvement compared to a state of the art Kneser-Ney model is 4 overall and 81 on unknown histories. 1 Introduction One of the challenges in statistical language modeling are words that appear in the recognition task at hand but not in the training set so called out-of-vocabulary OOV words. Especially for productive language it is often necessary to at least reduce the number of OOVs. We present a novel approach based on morphological classes to handling OOV words in language modeling for English. Previous work on morphological classes in English has not been able to show noticeable improvements in perplexity. In this article class-based language models as proposed by Brown et al. 1992 are used to tackle the problem. Our model improves perplexity of a Kneser-Ney KN model for English by 4 the largest improvement of a state-of-the-art model for English due to morphological modeling that we are aware of. A class-based language model groups words into classes and replaces the word transition probability by a class transition probability and a word emission probability P W3 W1W2 P C3IC1C2 P W3IC3 . 1 524 Brown et al. and many other authors primarily use context information for clustering. Niesler et al. 1998 showed that context clustering works better than clusters based on part-of-speech tags. However since the context of an OOV word is unknown and it therefore cannot be assigned to a cluster OOV words are as much a problem to a context-based class model as to a word .