TAILIEUCHUNG - Báo cáo khoa học: "Improving Language Model Size Reduction using Better Pruning Criteria"

Reducing language model (LM) size is a critical issue when applying a LM to realistic applications which have memory constraints. In this paper, three measures are studied for the purpose of LM pruning. They are probability, rank, and entropy. We evaluated the performance of the three pruning criteria in a real application of Chinese text input in terms of character error rate (CER). We first present an empirical comparison, showing that rank performs the best in most cases. We also show that the high-performance of rank lies in its strong correlation with error rate. . | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 176-182. Improving Language Model Size Reduction using Better Pruning Criteria Jianfeng Gao Min Zhang1 Microsoft Research Asia Beijing 100080 China jfgao@ State Key Lab of Intelligent Tech Sys. Computer Science Technology Dept. Tsinghua University China Abstract Reducing language model LM size is a critical issue when applying a LM to realistic applications which have memory constraints. In this paper three measures are studied for the purpose of LM pruning. They are probability rank and entropy. We evaluated the performance of the three pruning criteria in a real application of Chinese text input in terms of character error rate CER . We first present an empirical comparison showing that rank performs the best in most cases. We also show that the high-performance of rank lies in its strong correlation with error rate. We then present a novel method of combining two criteria in model pruning. Experimental results show that the combined criterion consistently leads to smaller models than the models pruned using either of the criteria separately at the same CER. 1 Introduction Backoff n-gram models for applications such as large vocabulary speech recognition are typically trained on very large text corpora. An uncompressed LM is usually too large for practical use since all realistic applications have memory constraints. Therefore LM pruning techniques are used to produce the smallest model while keeping the performance loss as small as possible. Research on backoff n-gram model pruning has been focused on the development of the pruning criterion which is used to estimate the performance loss of the pruned model. The traditional count cutoff method Jelinek 1990 used a pruning criterion based on absolute frequency while recent research has shown that better pruning criteria can be developed based on more sophisticated measures such as .

Thế Duyệt 58 7 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Improving Grammaticality in Statistical Sentence Generation: Introducing a Dependency Spanning Tree Algorithm with an Argument Satisfaction Model"

9 71 0

Báo cáo khoa học: "Improving Language Model Size Reduction using Better Pruning Criteria"

7 46 0

Báo cáo khoa học: "Improving Translation through Contextual Information"

3 37 0

Báo cáo khoa học: "Clique-Based Clustering for improving Named Entity Recognition systems"

9 38 0

Báo cáo khoa học: "Improving Pronoun Translation for Statistical Machine Translation"

10 70 0

Báo cáo khoa học: "Improving Machine Translation of Null Subjects in Italian and Spanish"

9 94 0

Báo cáo khoa học: "Improving Mid-Range Reordering using Templates of Factors"

8 37 0

Báo cáo khoa học: " New Models for Improving Supertag Disambiguation"

8 67 0

Báo cáo khoa học: "Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis"

8 60 0

Báo cáo khoa học: "Adaptive Transformation-based Learning for Improving Dictionary Tagging"

8 60 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462342 61

Giới thiệu :Lập trình mã nguồn mở

14 26083 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11348 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10552 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9843 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8506 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7756 1792

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7271 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 227 4 28-12-2024

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 156 3 28-12-2024

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 146 2 28-12-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 235 7 28-12-2024

Giáo án điện tử tiểu học môn lịch sử: Cách mạng mùa thu

39 165 1 28-12-2024

Báo cáo " Bàn về hành vi pháp luật và hành vi đạo đức "

11 179 2 28-12-2024

Valve Selection Handbook - Fourth Edition

337 146 2 28-12-2024

Bệnh sán lá gan trên gia súc và cách phòng trị

3 162 1 28-12-2024

Báo cáo nghiên cứu khoa học " Vai trò chính quyền địa phương trong phát triển kinh tế : khu chuyên doanh gốm sứ ( Trung Quốc ) và Bát Tràng ( Việt Nam )("

11 214 1 28-12-2024

Báo cáo nghiên cứu khoa học " Sự nhất quán phát triển kinh tế thị trường XHCN trong xây dựng xã hội hài hoà của Trung Quốc và đổi mới của Việt Nam "

8 145 1 28-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7756 1792

Ebook Chào con ba mẹ đã sẵn sàng

112 4409 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6292 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3842 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3920 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4712 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11348 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4510 490