TAILIEUCHUNG - Báo cáo khoa học: "Generalized Algorithms for Constructing Statistical Language Models"

Recent text and speech processing applications such as speech mining raise new and more general problems related to the construction of language models. We present and describe in detail several new and efficient algorithms to address these more general problems and report experimental results demonstrating their usefulness. We give an algorithm for computing efficiently the expected counts of any sequence in a word lattice output by a speech recognizer or any arbitrary weighted automaton; describe a new technique for creating exact representations of -gram language models by weighted automata whose size is practical for offline use even for a. | Generalized Algorithms for Constructing Statistical Language Models Cyril Allauzen Mehryar Mohri Brian Roark AT T Labs - Research 180 Park Avenue Florham Park NJ 07932 USA allauzen mohri roark @ Abstract Recent text and speech processing applications such as speech mining raise new and more general problems related to the construction of language models. We present and describe in detail several new and efficient algorithms to address these more general problems and report experimental results demonstrating their usefulness. We give an algorithm for computing efficiently the expected counts of any sequence in a word lattice output by a speech recognizer or any arbitrary weighted automaton describe a new technique for creating exact representations of n-gram language models by weighted automata whose size is practical for offline use even for a vocabulary size of about 500 000 words and ann-gram order n 6 and present a simple and more general technique for constructing class-based language models that allows each class to represent an arbitrary weighted automaton. An efficient implementation of our algorithms and techniques has been incorporated in a general software library for language modeling the GRM Library that includes many other text and grammar processing functionalities. 1 Motivation Statistical language models are crucial components of many modern natural language processing systems such as speech recognition information extraction machine translation or document classification. In all cases a language model is used in combination with other information sources to rank alternative hypotheses by assigning them some probabilities. There are classical techniques for constructing language models such as ngram models with various smoothing techniques see Chen and Goodman 1998 and the references therein for a survey and comparison of these techniques . In some recent text and speech processing applications several new and more general problems .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
11    163    2    24-12-2024
13    157    1    24-12-2024
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.