TAILIEUCHUNG - Báo cáo khoa học: "Parametric Models of Linguistic Count Data"

It is well known that occurrence counts of words in documents are often modeled poorly by standard distributions like the binomial or Poisson. Observed counts vary more than simple models predict, prompting the use of overdispersed models like Gamma-Poisson or Beta-binomial mixtures as robust alternatives. Another deﬁciency of standard models is due to the fact that most words never occur in a given document, resulting in large amounts of zero counts. We propose using zeroinﬂated models for dealing with this, and evaluate competing models on a Naive Bayes text classiﬁcation task. Simple zero-inﬂated models can account for practically relevant. | Parametric Models of Linguistic Count Data Martin Jansche Department of Linguistics The Ohio State University Columbus OH 43210 USA jansche@ Abstract It is well known that occurrence counts of words in documents are often modeled poorly by standard distributions like the binomial or Poisson. Observed counts vary more than simple models predict prompting the use of overdispersed models like Gamma-Poisson or Beta-binomial mixtures as robust alternatives. Another deficiency of standard models is due to the fact that most words never occur in a given document resulting in large amounts of zero counts. We propose using zero-inflated models for dealing with this and evaluate competing models on a Naive Bayes text classification task. Simple zero-inflated models can account for practically relevant variation and can be easier to work with than overdispersed models. 1 Introduction Linguistic count data often violate the simplistic assumptions of standard probability models like the binomial or Poisson distribution. In particular the inadequacy of the Poisson distribution for modeling word token frequency is well known and robust alternatives have been proposed Mosteller and Wallace 1984 Church and Gale 1995 . In the case of the Poisson a commonly used robust alternative is the negative binomial distribution Pawitan 2001 which has the ability to capture extra-Poisson variation in the data in other words it is overdispersed compared with the Poisson. When a small set of parameters controls all properties of the distribution it is important to have enough parameters to model the relevant aspects of one s data. Simple models like the Poisson or binomial do not have enough parameters for many realistic applications and we suspect that the same might be true of log-linear models. When applying robust models like the negative binomial to linguistic count data like word occurrences in documents it is natural to ask to what extent the extra-Poisson variation has been .

Hiệp Hà 80 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Modeling and monitoring walnut (Juglans regia) Area & Production based on parametric and non-parametric regression Models

10 34 1

Measurement of Technical Efficiency. A brief survey on parametric and non-parametric techniques - Francesco Porcelli January 2009

27 67 0

Báo cáo khoa học: "Parametric Models of Linguistic Count Data"

8 62 0

Sheetmetal Design using Creo Parametric 2.0

286 89 1

Utilization of parametric and nonparametric regression models for production, productivity and area trends of apple (Malus domestica) in Jammu and Kashmir, India

10 85 0

Statistical study on modeling and forecasting of jute production in west Bengal, India

12 80 0

Mathematical Statistics: Exercises and Solutions

384 50 1

Ebook Methods in human growth research: Part 2

195 38 1

IMPROVED SEMI-PARAMETRIC TIME SERIES MODELS OF AIR POLLUTION AND MORTALITY

38 74 0

Báo cáo hóa học: " Research Article Image Resolution Enhancement via Data-Driven Parametric Models in the Wavelet Space"

12 40 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462351 61

Giới thiệu :Lập trình mã nguồn mở

14 26634 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11375 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10566 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9854 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8906 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8518 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7894 1813

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7289 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Đóng mới oto 8 chỗ ngồi part 9

10 186 3 07-01-2025

Data Structures and Algorithms - Chapter 8: Heaps

41 195 5 07-01-2025

Báo cáo nghiên cứu nông nghiệp " Field control of pest fruit flies in Vietnam "

14 195 4 07-01-2025

Quy Trình Canh Tác Cây Bông Vải

8 170 3 07-01-2025

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 166 1 07-01-2025

Báo cáo y học: "The Factors Influencing Depression Endpoints Research (FINDER) study: final results of Italian patients with depressio"

9 157 1 07-01-2025

Word Games with English 1

65 146 1 07-01-2025

Báo cáo nghiên cứu khoa học " NÂNG QUAN HỆ KINH TẾ THƯƠNG MẠI VIỆT NAM - TRUNG QUỐC LÊN TẦM CAO THỜI ĐẠI "

8 178 1 07-01-2025

Báo cáo nghiên cứu khoa học " Sự nhất quán phát triển kinh tế thị trường XHCN trong xây dựng xã hội hài hoà của Trung Quốc và đổi mới của Việt Nam "

8 151 1 07-01-2025

The Ombudsman Enterprise and Administrative Justice

309 152 0 07-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7894 1813

Ebook Chào con ba mẹ đã sẵn sàng

112 4435 1376

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6352 1276

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8906 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3858 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3930 610

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4768 567

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11375 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4533 490