Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
In some computer applications of linguistics (such as maximum-likelihood decoding of speech or handwriting), the purpose of the language-handling component (Language Model) is to estimate the linguistic (a priori) probability of arbitrary natural-language sentences. This paper discusses theoretical and practical issues regarding an approach to building such a language model based on any equivalence criterion defined on incomplete sentences, and experimental results and measurements performed on such a model of the Italian language, which is a part of the prototype for the recognition of spoken Italian built at the IBM Rome Scintific Center. . | STOCHASTIC MODELING OF LANGUAGE VIA SENTENCE SPACE PARTITIONING Alex Martelli IBM Rome Scientific Center via Giorgione 159 ROME Italy ABSTRACT In some computer applications of linguistics such as maximum-likelihood decoding of speech or handwriting the purpose of the language-handling component Language Model is to estimate the linguistic a priori probability of arbitrary natural-language sentences. This paper discusses theoretical and practical issues regarding an approach to building such a language model based on any equivalence criterion defined on incomplete sentences and experimental results and measurements performed on such a model of the Italian language which is a part of the prototype for the recognition of spoken Italian built at the IBM Rome Scintific Center. STOCHASTIC MODELS OF LANGUAGE In some computer applications it is necessary to have a way to estimate the probability of any arbitrary natural-language sentence. A prominent example is maximum-likelihood speech recognition as discussed in 1 4 7 whose underlying mathematical approach can be generalized to recognition of natural language encoded in any medium e.g. handwriting . The subsystem which estimates this probability can be called a stochastic model of the target language. If the sentence is to be recognized while it is being produced as necessary for a real-time application the computation of its probability should proceed left-to-right i.e. word by word from the beginning towards the end of the sentence allowing application of fast tree-search algorithms such as stack decodlng 5 . Left-to-right computation of the probability of any word string is made possible by a formal manipulation based on the definition of conditional probability if Wị is the i-th word in the sequence w of length N then N P W Pp fW J .wt 1 1 In other terms the probability of a sequence of words is the product of the conditional probability of each word given all of the previous ones. As a formal step this holds for .