TAILIEUCHUNG - Báo cáo khoa học: "A Hierarchical Bayesian Language Model based on Pitman-Yor Processes"

We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney, one of the best smoothing methods for n-gram language models. Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modiﬁed Kneser-Ney. . | A Hierarchical Bayesian Language Model based on Pitman-Yor Processes Yee Whye Teh School of Computing National University of Singapore 3 Science Drive 2 Singapore 117543. tehyw@ Abstract We propose a new hierarchical Bayesian n-gram model of natural languages. Our model makes use of a generalization of the commonly used Dirichlet distributions called Pitman-Yor processes which produce power-law distributions more closely resembling those in natural languages. We show that an approximation to the hierarchical Pitman-Yor language model recovers the exact formulation of interpolated Kneser-Ney one of the best smoothing methods for n-gram language models. Experiments verify that our model gives cross entropy results superior to interpolated Kneser-Ney and comparable to modified Kneser-Ney. 1 Introduction Probabilistic language models are used extensively in a variety of linguistic applications including speech recognition handwriting recognition optical character recognition and machine translation. Most language models fall into the class of n-gram models which approximate the distribution over sentences using the conditional distribution of each word given a context consisting of only the previous n 1 words T P sentence JJ P word. wordi-n i 1 i 1 with n 3 trigram models being typical. Even for such a modest value of n the number of parameters is still tremendous due to the large vocabulary size. As a result direct maximum-likelihood parameter fitting severely overfits to the training data and smoothing methods are indispensible for proper training of n-gram models. A large number of smoothing methods have been proposed in the literature see Chen and Goodman 1998 Goodman 2001 Rosenfeld 2000 for good overviews . Most methods take a rather ad hoc approach where n-gram probabilities for various values of n are combined together using either interpolation or back-off schemes. Though some of these methods are intuitively appealing the main justification has

Hữu Cảnh 90 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "A Hierarchical Bayesian Language Model based on Pitman-Yor Processes"

8 64 0

Báo cáo khoa học: "Hierarchical Bayesian Language Modelling for the Linguistically Informed"

10 42 0

Determining factors associated with cholera disease in Ethiopia using Bayesian hierarchical modeling

10 4 1

Identifying novel associations in GWAS by hierarchical Bayesian latent variable detection of differentially misclassified phenotypes

25 50 1

Gsslasso Cox: A Bayesian hierarchical model for predicting survival and detecting associated genes by incorporating pathway information

15 64 1

An integrative Bayesian Dirichletmultinomial regression model for the analysis of taxonomic abundances in microbiome data

12 41 1

Inference of regulatory networks with a convergence improved MCMC sampler

10 51 1

ALPHLARD: A Bayesian method for analyzing HLA genes from whole genome sequence data

11 24 1

BANDITS: Bayesian differential splicing accounting for sample-to-sample variability and mapping uncertainty

13 34 1

Hierarchical non-negative matrix factorization using clinical information for microbial communities

17 23 1

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462063 59

Giới thiệu :Lập trình mã nguồn mở

14 23807 74

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11124 536

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10364 458

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9646 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8637 1148

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8356 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7942 2249

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6978 260

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6729 1609

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Đánh giá hao mòn và độ tin cậy của chi tiết và kết cấu trên đầu máy diezel part 3

12 384 1 28-06-2024

Data Structures and Algorithms - Chapter 8: Heaps

41 147 1 28-06-2024

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 112 0 28-06-2024

Bảng màu theo chữ cái – V

11 119 1 28-06-2024

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 140 0 28-06-2024

The Constituents of Medicinal Plants

185 137 0 28-06-2024

Tổng hợp Đề thi học sinh giỏi môn Sinh lớp 9 cấp huyện vòng 1 năm 2010-2011

12 202 2 28-06-2024

Concluding interview 6

6 124 1 28-06-2024

Báo cáo y học: "Effectiveness of injectable risperidone long-acting therapy for schizophrenia: data from the US, Spain, Australia, and Belgium"

7 108 0 28-06-2024

Bệnh sán lá gan trên gia súc và cách phòng trị

3 123 0 28-06-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7942 2249

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6729 1609

Ebook Chào con ba mẹ đã sẵn sàng

112 4018 1300

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5696 1193

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8637 1148

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3633 665

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3845 601

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4397 544

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11124 536

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4291 483