TAILIEUCHUNG - Báo cáo khoa học: "Randomized Language Models via Perfect Hash Functions"

We propose a succinct randomized language model which employs a perfect hash function to encode fingerprints of n-grams and their associated probabilities, backoff weights, or other parameters. The scheme can represent any standard n-gram model and is easily combined with existing model reduction techniques such as entropy-pruning. We demonstrate the space-savings of the scheme via machine translation experiments within a distributed language modeling framework. h | Randomized Language Models via Perfect Hash Functions David Talbot School of Informatics University of Edinburgh 2 Buccleuch Place Edinburgh UK Thorsten Brants Google Inc. 1600 Amphitheatre Parkway Mountain View CA 94303 USA brants@ Abstract We propose a succinct randomized language model which employs a perfect hash function to encode fingerprints of n-grams and their associated probabilities backoff weights or other parameters. The scheme can represent any standard n-gram model and is easily combined with existing model reduction techniques such as entropy-pruning. We demonstrate the space-savings of the scheme via machine translation experiments within a distributed language modeling framework. 1 Introduction Language models LMs are a core component in statistical machine translation speech recognition optical character recognition and many other areas. They distinguish plausible word sequences from a set of candidates. LMs are usually implemented as n-gram models parameterized for each distinct sequence of up to n words observed in the training corpus. Using higher-order models and larger amounts of training data can significantly improve performance in applications however the size of the resulting LM can become prohibitive. With large monolingual corpora available in major languages making use of all the available data is now a fundamental challenge in language modeling. Efficiency is paramount in applications such as machine translation which make huge numbers of LM requests per sentence. To scale LMs to larger corpora with higher-order dependencies researchers Work completed while this author was at Google Inc. have considered alternative parameterizations such as class-based models Brown et al. 1992 model reduction techniques such as entropy-based pruning Stolcke 1998 novel represention schemes such as suffix arrays Emami et al. 2007 Golomb Coding Church et al. 2007 and distributed language models that scale more readily .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.