TAILIEUCHUNG - Báo cáo khoa học: " Entropy Rate Constancy in Text"

We present a constancy rate principle governing language generation. We show that this principle implies that local measures of entropy (ignoring context) should increase with the sentence number. We demonstrate that this is indeed the case by measuring entropy in three diﬀerent ways. We also show that this eﬀect has both lexical (which words are used) and non-lexical (how the words are used) causes. | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 199-206. Entropy Rate Constancy in Text Dmitriy Genzel and Eugene Charniak Brown Laboratory for Linguistic Information Processing Department of Computer Science Brown University Providence RI USA 02912 dg ec @ Abstract We present a constancy rate principle governing language generation. We show that this principle implies that local measures of entropy ignoring context should increase with the sentence number. We demonstrate that this is indeed the case by measuring entropy in three different ways. We also show that this effect has both lexical which words are used and non-lexical how the words are used causes. 1 Introduction It is well-known from Information Theory that the most efficient way to send information through noisy channels is at a constant rate. If humans try to communicate in the most efficient way then they must obey this principle. The communication medium we examine in this paper is text and we present some evidence that this principle holds here. Entropy is a measure of information first proposed by Shannon 1948 . Informally entropy of a random variable is proportional to the difficulty of correctly guessing the value of this variable when the distribution is known . Entropy is the highest when all values are equally probable and is lowest equal to 0 when one of the choices has probability of 1 . deterministically known in advance. In this paper we are concerned with entropy of English as exhibited through written text though these results can easily be extended to speech as well. The random variable we deal with is therefore a unit of text a word for our purposes1 that a random person who has produced all the previous words in the text stream is likely to produce next. We have as many random variables as we have words in a text. The distributions of these variables are obviously different and depend on all previous

Thanh Xuân 61 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Unreliability of approximate entropy to locate optimal complexity in diabetes mellitus via heart rate variability

9 4 1

Báo cáo khoa học: " Entropy Rate Constancy in Text"

8 49 0

Mass, energy, entropy and exergy rate balance in a ranque-hilsh vortex tube

10 90 2

Parametric sensitivity analysis for biochemical reaction networks based on pathwise information theory

19 26 1

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462302 61

Giới thiệu :Lập trình mã nguồn mở

14 24977 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11294 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10514 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9797 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8878 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8468 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8092 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7481 1764

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7196 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 212 4 30-11-2024

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 214 3 30-11-2024

Đóng mới oto 8 chỗ ngồi part 9

10 174 3 30-11-2024

Data Structures and Algorithms - Chapter 8: Heaps

41 173 5 30-11-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 228 7 30-11-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 149 1 30-11-2024

Báo cáo y học: "The Factors Influencing Depression Endpoints Research (FINDER) study: final results of Italian patients with depressio"

9 140 1 30-11-2024

Chủ đề 3 : SỰ CÂN BẰNG CỦA VẬT RẮN (4 tiết)

9 199 1 30-11-2024

5 thói quen ăn uống hủy hoại hàm răng đẹp

5 160 1 30-11-2024

OPEN SOURCE ERP REASONABLE TOOLS FOR MANUFACTURING SMEs?

1 144 1 30-11-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8092 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7481 1764

Ebook Chào con ba mẹ đã sẵn sàng

112 4369 1369

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6162 1259

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8878 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3797 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3911 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4623 562

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11294 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4460 490