TAILIEUCHUNG - Báo cáo khoa học: "A Hierarchical Bayesian Language Model based on Pitman-Yor Processes"

We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models. Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modified Kneser-Ney. . | A Hierarchical Bayesian Language Model based on Pitman-Yor Processes Yee Whye Teh School of Computing National University of Singapore 3 Science Drive 2 Singapore 117543. tehyw@ Abstract We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney one of the best smoothing methods for n-gram language models. Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modified Kneser-Ney. 1 Introduction Probabilistic language models are used extensively in a variety of linguistic applications including speech recognition handwriting recognition optical character recognition and machine translation. Most language models fall into the class of n-gram models which approximate the distribution over sentences using the conditional distribution of each word given a context consisting of only the previous n 1 words T P sentence JJ P word. wordi-n i 1 i 1 with n 3 trigram models being typical. Even for such a modest value of n the number of parameters is still tremendous due to the large vocabulary size. As a result direct maximum-likelihood parameter fitting severely overfits to the training data and smoothing methods are indispensible for proper training of n-gram models. A large number of smoothing methods have been proposed in the literature see Chen and Goodman 1998 Goodman 2001 Rosenfeld 2000 for good overviews . Most methods take a rather ad hoc approach where n-gram probabilities for various values of n are combined together using either interpolation or back-off schemes. Though some of these methods are intuitively appealing the main justification has

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.