TAILIEUCHUNG - Báo cáo khoa học: "Improving Language Model Size Reduction using Better Pruning Criteria"

Reducing language model (LM) size is a critical issue when applying a LM to realistic applications which have memory constraints. In this paper, three measures are studied for the purpose of LM pruning. They are probability, rank, and entropy. We evaluated the performance of the three pruning criteria in a real application of Chinese text input in terms of character error rate (CER). We first present an empirical comparison, showing that rank performs the best in most cases. We also show that the high-performance of rank lies in its strong correlation with error rate. . | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 176-182. Improving Language Model Size Reduction using Better Pruning Criteria Jianfeng Gao Min Zhang1 Microsoft Research Asia Beijing 100080 China jfgao@ State Key Lab of Intelligent Tech Sys. Computer Science Technology Dept. Tsinghua University China Abstract Reducing language model LM size is a critical issue when applying a LM to realistic applications which have memory constraints. In this paper three measures are studied for the purpose of LM pruning. They are probability rank and entropy. We evaluated the performance of the three pruning criteria in a real application of Chinese text input in terms of character error rate CER . We first present an empirical comparison showing that rank performs the best in most cases. We also show that the high-performance of rank lies in its strong correlation with error rate. We then present a novel method of combining two criteria in model pruning. Experimental results show that the combined criterion consistently leads to smaller models than the models pruned using either of the criteria separately at the same CER. 1 Introduction Backoff n-gram models for applications such as large vocabulary speech recognition are typically trained on very large text corpora. An uncompressed LM is usually too large for practical use since all realistic applications have memory constraints. Therefore LM pruning techniques are used to produce the smallest model while keeping the performance loss as small as possible. Research on backoff n-gram model pruning has been focused on the development of the pruning criterion which is used to estimate the performance loss of the pruned model. The traditional count cutoff method Jelinek 1990 used a pruning criterion based on absolute frequency while recent research has shown that better pruning criteria can be developed based on more sophisticated measures such as .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.