TAILIEUCHUNG - Báo cáo khoa học: "Eﬃcient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-of-Speech Tagging"

The Minimum Description Length (MDL) principle is a method for model selection that trades oﬀ between the explanation of the data by the model and the complexity of the model itself. Inspired by the MDL principle, we develop an objective function for generative models that captures the description of the data by the model (log-likelihood) and the description of the model (model size). We also develop a efﬁcient general search algorithm based on the MAP-EM framework to optimize this function. . | Efficient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-of-Speech Tagging Ashish Vaswani1 Adam Pauls2 David Chiang1 information Sciences Institute University of Southern California 4676 Admiralty Way Suite 1001 Marina del Rey CA 90292 avaswani chiang @ Abstract The Minimum Description Length MDL principle is a method for model selection that trades off between the explanation of the data by the model and the complexity of the model itself. Inspired by the MDL principle we develop an objective function for generative models that captures the description of the data by the model log-likelihood and the description of the model model size . We also develop a efficient general search algorithm based on the MAP-EM framework to optimize this function. Since recent work has shown that minimizing the model size in a Hidden Markov Model for part-of-speech POS tagging leads to higher accuracies we test our approach by applying it to this problem. The search algorithm involves a simple change to EM and achieves high POS tagging accuracies on both English and Italian data sets. 1 Introduction The Minimum Description Length MDL principle is a method for model selection that provides a generic solution to the overfitting problem Barron et al. 1998 . A formalization of Ockham s Razor it says that the parameters are to be chosen that minimize the description length of the data given the model plus the description length of the model itself. It has been successfully shown that minimizing the model size in a Hidden Markov Model HMM for part-of-speech POS tagging leads to higher accuracies than simply running the ExpectationMaximization EM algorithm Dempster et al. 1977 . Goldwater and Griffiths 2007 employ a Bayesian approach to POS tagging and use sparse Dirichlet priors to minimize model size. More re- 2Computer Science Division University of California at Berkeley Soda Hall Berkeley CA 94720 adpauls@ cently Ravi and Knight 2009 .

Lê Quỳnh 72 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Eﬃcient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-of-Speech Tagging"

6 58 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462336 61

Giới thiệu :Lập trình mã nguồn mở

14 25928 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10544 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9835 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8499 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7710 1789

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7241 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 223 4 24-12-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 232 7 24-12-2024

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 167 2 24-12-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 157 1 24-12-2024

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 146 1 24-12-2024

Word Games with English 1

65 137 1 24-12-2024

Lập trình Java cơ bản : Luồng và xử lý file part 8

5 140 1 24-12-2024

Lịch sử Trung Quốc 5000 năm tập 3 part 2

54 148 1 24-12-2024

Báo cáo lâm nghiệp: "Assessment of the effects of below-zero temperatures on photosynthesis and chlorophyll a fluorescence in leaf discs of Eucalyptus globulu"

4 140 0 24-12-2024

ĐỀ LUYỆN THI ĐẠI HỌC MÔN: TIẾNG ANH - SỐ 3

4 128 1 24-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7710 1789

Ebook Chào con ba mẹ đã sẵn sàng

112 4406 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6273 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3835 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3917 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4703 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4501 490