TAILIEUCHUNG - Báo cáo khoa học: "Randomized Language Models via Perfect Hash Functions"

We propose a succinct randomized language model which employs a perfect hash function to encode ﬁngerprints of n-grams and their associated probabilities, backoff weights, or other parameters. The scheme can represent any standard n-gram model and is easily combined with existing model reduction techniques such as entropy-pruning. We demonstrate the space-savings of the scheme via machine translation experiments within a distributed language modeling framework. h | Randomized Language Models via Perfect Hash Functions David Talbot School of Informatics University of Edinburgh 2 Buccleuch Place Edinburgh UK Thorsten Brants Google Inc. 1600 Amphitheatre Parkway Mountain View CA 94303 USA brants@ Abstract We propose a succinct randomized language model which employs a perfect hash function to encode fingerprints of n-grams and their associated probabilities backoff weights or other parameters. The scheme can represent any standard n-gram model and is easily combined with existing model reduction techniques such as entropy-pruning. We demonstrate the space-savings of the scheme via machine translation experiments within a distributed language modeling framework. 1 Introduction Language models LMs are a core component in statistical machine translation speech recognition optical character recognition and many other areas. They distinguish plausible word sequences from a set of candidates. LMs are usually implemented as n-gram models parameterized for each distinct sequence of up to n words observed in the training corpus. Using higher-order models and larger amounts of training data can significantly improve performance in applications however the size of the resulting LM can become prohibitive. With large monolingual corpora available in major languages making use of all the available data is now a fundamental challenge in language modeling. Efficiency is paramount in applications such as machine translation which make huge numbers of LM requests per sentence. To scale LMs to larger corpora with higher-order dependencies researchers Work completed while this author was at Google Inc. have considered alternative parameterizations such as class-based models Brown et al. 1992 model reduction techniques such as entropy-based pruning Stolcke 1998 novel represention schemes such as suffix arrays Emami et al. 2007 Golomb Coding Church et al. 2007 and distributed language models that scale more readily .

Hoài Trung 55 9 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Improving outcomes of preschool language delay in the community: Protocol for the Language for Learning randomised controlled trial

11 66 0

Báo cáo khoa học: "Randomized Language Models via Perfect Hash Functions"

9 42 0

A randomized controlled trial to examine the effect of two teaching methods on preschool children’s language and communication, executive functions, socioemotional comprehension, and early math skills

28 71 0

A protocol for a three-arm cluster randomized controlled superiority trial investigating the effects of two pedagogical methodologies in Swedish preschool settings on language and communication, executive functions, auditive selective attention, socioemotional skills and early maths skills

25 84 0

Randomized controlled trial protocol to improve multisensory neural processing, language and motor outcomes in preterm infants

10 43 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461860 55

Giới thiệu :Lập trình mã nguồn mở

14 22613 59

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10883 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10060 446

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9515 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8274 1125

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8225 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7863 2220

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6669 253

Vật lý hạt cơ bản (1)

29 5767 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Mass Transfer in Multiphase Systems and its Applications Part 19

40 255 1 25-04-2024

Bơm máy nén quạt trong công nghệ part 1

20 249 2 25-04-2024

Magnetic Bearings Theory and Applications phần 2

14 170 0 25-04-2024

MySQL Database Usage & Administration PHẦN 7

37 154 0 25-04-2024

HƯỚNG DẪN SỬ DỤNG PHẦN MỀM CAITA part 9

18 128 0 25-04-2024

Diseases of the Liver and Biliary System - part 1

33 122 0 25-04-2024

Data Structures and Algorithms - Chapter 9: Hashing

54 113 0 25-04-2024

báo cáo hóa học:" Rare ligamentum flavum cyst causing incapacitating lumbar spinal stenosis: Experience with 3 Chinese patients"

4 96 0 25-04-2024

Christmas Meditations on the Twelve Holy Days

173 103 0 25-04-2024

Giáo trình kỹ thuật sữa chữa ô tô, máy nổ part 8

47 138 1 25-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7863 2220

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5695 1353

Ebook Chào con ba mẹ đã sẵn sàng

112 3764 1231

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5311 1135

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8274 1125

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3492 642

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10883 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3679 525

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4041 514

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4123 480