TAILIEUCHUNG - Báo cáo khoa học: "An Empirical Investigation of Discounting in Cross-Domain Language Models"

We investigate the empirical behavior of ngram discounts within and across domains. When a language model is trained and evaluated on two corpora from exactly the same domain, discounts are roughly constant, matching the assumptions of modiﬁed Kneser-Ney LMs. However, when training and test corpora diverge, the empirical discount grows essentially as a linear function of the n-gram count. We adapt a Kneser-Ney language model to incorporate such growing discounts, resulting in perplexity improvements over modiﬁed Kneser-Ney and Jelinek-Mercer baselines. . | An Empirical Investigation of Discounting in Cross-Domain Language Models Greg Durrett and Dan Klein Computer Science Division University of California Berkeley gdurrett klein @ Abstract We investigate the empirical behavior of ngram discounts within and across domains. When a language model is trained and evaluated on two corpora from exactly the same domain discounts are roughly constant matching the assumptions of modified Kneser-Ney LMs. However when training and test corpora diverge the empirical discount grows essentially as a linear function of the n-gram count. We adapt a Kneser-Ney language model to incorporate such growing discounts resulting in perplexity improvements over modified Kneser-Ney and Jelinek-Mercer baselines. 1 Introduction Discounting or subtracting from the count of each n-gram is one of the core aspects of Kneser-Ney language modeling Kneser and Ney 1995 . For all but the smallest n-gram counts Kneser-Ney uses a single discount one that does not grow with the ngram count because such constant-discounting was seen in early experiments on held-out data Church and Gale 1991 . However due to increasing computational power and corpus sizes language modeling today presents a different set of challenges than it did 20 years ago. In particular modeling crossdomain effects has become increasingly more important Klakow 2000 Moore and Lewis 2010 and deployed systems must frequently process data that is out-of-domain from the standpoint of the language model. In this work we perform experiments on held-out data to evaluate how discounting behaves in the 24 cross-domain setting. We find that when training and testing on corpora that are as similar as possible empirical discounts indeed do not grow with ngram count which validates the parametric assumption of Kneser-Ney smoothing. However when the train and evaluation corpora differ even slightly discounts generally exhibit linear growth in the count of the n-gram with the amount of .

Hiệp Hào 67 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

The effect of taxation and corruption on firm growth: An empirical investigation for Vietnam

25 77 0

Research " AN EMPIRICAL INVESTIGATION OF TURNOVER INTENTIONS OF INTERNAL AUDITORS "

101 71 0

Research " An Empirical Investigation of International Consumer Market Segmentation Decisions "

167 71 0

Báo cáo khoa học: "An Empirical Investigation of Discounting in Cross-Domain Language Models"

6 52 0

Báo cáo khoa học: "An Empirical Investigation of Proposals in Collaborative Dialogues"

5 65 0

Using accounting ratios in predicting financial distress: An empirical investigation in the Vietnam stock market

9 89 0

An empirical investigation of factors affecting stock prices in Vietnam

16 87 0

Early warning systems of currency crises: An empirical investigation in Vietnam

20 58 0

An empirical investigation of personal characteristics significantly affecting employment offers from international accounting firms to accounting graduates

222 91 0

Economic growth, financial depth and savings nexus in Saudi Arabia: An empirical investigation

15 75 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461870 55

Giới thiệu :Lập trình mã nguồn mở

14 22662 59

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10897 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10069 446

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9525 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8293 1125

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8242 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7865 2220

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6691 253

Vật lý hạt cơ bản (1)

29 5775 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 312 1 28-04-2024

Báo cáo khoa học: Loss of kinase activity in Mycobacterium tuberculosis multidomain protein Rv1364c

14 235 0 28-04-2024

beginning Ubuntu Linux phần 1

34 212 1 28-04-2024

Trading Strategies Profit Making Techniques For Stock_3

23 184 0 28-04-2024

Management and Services Part 1

10 157 0 28-04-2024

Posted prices versus bargaining in markets_7

23 157 0 28-04-2024

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 146 0 28-04-2024

MÔN HỌC VẬT LIỆU VÀ CÔNG NGHỆ KIM LOẠI - PHẦN I: KIM LOẠI HỌC

32 177 2 28-04-2024

XỬ TRÍ CHẤN THƯƠNG SỌ NÃO KÍN

1 114 1 28-04-2024

Giáo trình kỹ thuật sữa chữa ô tô, máy nổ part 8

47 138 1 28-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7865 2220

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5753 1381

Ebook Chào con ba mẹ đã sẵn sàng

112 3769 1231

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5326 1136

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8293 1125

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3502 643

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10897 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3687 525

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4055 516

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4132 480