TAILIEUCHUNG - Báo cáo khoa học: "A Pylonic Decision-Tree Language Model with Optimal Question Selection"

This paper discusses a decision-tree approach to the problem of assigning probabilities to words following a given text. In contrast with previous decision-tree language model attempts, an algorithm for selecting nearly optimal questions is considered. The model is to be tested on a standard task, The Wall Street Journal, allowing a fair comparison with the well-known trigram model. | A Pylonic Decision-Tree Language Model with Optimal Question Selection Adrian Corduneanu University of Toronto 73 Saint George St 299 Toronto Ontario M5S 2E5 Canada g7adrian@ Abstract This paper discusses a decision-tree approach to the problem of assigning probabilities to words following a given text. In contrast with previous decision-tree language model attempts an algorithm for selecting nearly optimal questions is considered. The model is to be tested on a standard task The Wall Street Journal allowing a fair comparison with the well-known trigram model. 1 Introduction In many applications such as automatic speech recognition machine translation spelling correction etc. a statistical language model LM is needed to assign probabilities to sentences. This probability assignment may be used . to choose one of many transcriptions hypothesized by the recognizer or to make decisions about capitalization. Without any loss of generality we consider models that operate left-to-right on the sentences assigning a probability to the next word given its word history. Specifically we consider statistical LM s which compute probabilities of the type P wn I W1 W2 . Wn-1 where Wi denotes the i-th word in the text. Even for a small vocabulary the space of word histories is so large that any attempt to estimate the conditional probabilities for each distinct history from raw frequencies is infeasible. To make the problem manageable one partitions the word histories into some classes C wi W2 . wn_i and identifies the word probabilities with p wn I C iiq W2 . Wn-1 . Such probabilities are easier to estimate as each class gets significantly more counts from a training corpus. With this setup building a language model becomes a classification problem group the word histories into a small number of classes while preserving their predictive power. Currently popular 7V-gram models classify the word histories by their last N 1 words. N varies from 2 to 4 and the .

Quốc Anh 83 4 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "A Pylonic Decision-Tree Language Model with Optimal Question Selection"

4 65 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461990 55

Giới thiệu :Lập trình mã nguồn mở

14 23339 68

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11032 533

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10244 453

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9592 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8457 1139

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8311 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7903 2239

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6890 257

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6321 1529

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 332 1 02-06-2024

Đóng mới oto 8 chỗ ngồi part 9

10 137 0 02-06-2024

Giáo trình CẤU TRÚC DỮ LIỆU VÀ GIẢI THUẬT - Chương 1

5 145 0 02-06-2024

QUẢN LÝ CHẤT LƯỢNG KHÔNG KHÍ

75 147 0 02-06-2024

Hệ thống làm lạnh và điều hòa không khí

21 138 0 02-06-2024

GIÁO TRÌNH VI XỬ LÝ 1 - CHƯƠNG 5. LẬP TRÌNH CHO VI ĐIỀU KHIỂN 80C51

23 122 1 02-06-2024

MẪU GIẤY PHÉP VẬN TẢI LOẠI C

2 126 0 02-06-2024

A Practical Guide for Health Researchers - part 7

24 119 0 02-06-2024

Báo cáo khoa học: " Biogeography of Matsucoccus josephi Bodenheimer et Harpaz in Crete and mainland Greece"

6 91 0 02-06-2024

Bảng màu theo chữ cái – V

11 111 1 02-06-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7903 2239

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6321 1529

Ebook Chào con ba mẹ đã sẵn sàng

112 3887 1277

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5504 1148

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8457 1139

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3583 658

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3783 570

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11032 533

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4228 527

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4236 483