Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Kneser-Ney (1995) smoothing and its variants are generally recognized as having the best perplexity of any known method for estimating N-gram language models. Kneser-Ney smoothing, however, requires nonstandard N-gram counts for the lowerorder models used to smooth the highestorder model. For some applications, this makes Kneser-Ney smoothing inappropriate or inconvenient. In this paper, we introduce a new smoothing method based on ordinary counts that outperforms all of the previous ordinary-count methods we have tested, with the new method eliminating most of the gap between Kneser-Ney and those methods. . | Improved Smoothing for N-gram Language Models Based on Ordinary Counts Robert C. Moore Chris Quirk Microsoft Research Redmond WA 98052 USA bobmoore chrisq @microsoft.com Abstract Kneser-Ney 1995 smoothing and its variants are generally recognized as having the best perplexity of any known method for estimating N-gram language models. Kneser-Ney smoothing however requires nonstandard N-gram counts for the lower-order models used to smooth the highest-order model. For some applications this makes Kneser-Ney smoothing inappropriate or inconvenient. In this paper we introduce a new smoothing method based on ordinary counts that outperforms all of the previous ordinary-count methods we have tested with the new method eliminating most of the gap between Kneser-Ney and those methods. 1 Introduction Statistical language models are potentially useful for any language technology task that produces natural-language text as a final or intermediate output. In particular they are extensively used in speech recognition and machine translation. Despite the criticism that they ignore the structure of natural language simple N-gram models which estimate the probability of each word in a text string based on the N 1 preceding words remain the most widely used type of model. The simplest possible N-gram model is the maximum likelihood estimate MLE which takes the probability of a word wn given the preceding context W1. wn-1 to be the ratio of the number of occurrences in a training corpus of the Ngram W1 . .wn to the total number of occurrences of any word in the same context C w1 . . . wn p w w1---w -1 . C wi w -iw One obvious problem with this method is that it assigns a probability of zero to any N-gram that is not observed in the training corpus hence numerous smoothing methods have been invented that reduce the probabilities assigned to some or all observed N-grams to provide a non-zero probability for N-grams not observed in the training corpus. The best methods for smoothing .