TAILIEUCHUNG - Báo cáo khoa học: "Eﬃcient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-of-Speech Tagging"

The Minimum Description Length (MDL) principle is a method for model selection that trades oﬀ between the explanation of the data by the model and the complexity of the model itself. Inspired by the MDL principle, we develop an objective function for generative models that captures the description of the data by the model (log-likelihood) and the description of the model (model size). We also develop a efﬁcient general search algorithm based on the MAP-EM framework to optimize this function. . | Efficient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-of-Speech Tagging Ashish Vaswani1 Adam Pauls2 David Chiang1 information Sciences Institute University of Southern California 4676 Admiralty Way Suite 1001 Marina del Rey CA 90292 avaswani chiang @ Abstract The Minimum Description Length MDL principle is a method for model selection that trades off between the explanation of the data by the model and the complexity of the model itself. Inspired by the MDL principle we develop an objective function for generative models that captures the description of the data by the model log-likelihood and the description of the model model size . We also develop a efficient general search algorithm based on the MAP-EM framework to optimize this function. Since recent work has shown that minimizing the model size in a Hidden Markov Model for part-of-speech POS tagging leads to higher accuracies we test our approach by applying it to this problem. The search algorithm involves a simple change to EM and achieves high POS tagging accuracies on both English and Italian data sets. 1 Introduction The Minimum Description Length MDL principle is a method for model selection that provides a generic solution to the overfitting problem Barron et al. 1998 . A formalization of Ockham s Razor it says that the parameters are to be chosen that minimize the description length of the data given the model plus the description length of the model itself. It has been successfully shown that minimizing the model size in a Hidden Markov Model HMM for part-of-speech POS tagging leads to higher accuracies than simply running the ExpectationMaximization EM algorithm Dempster et al. 1977 . Goldwater and Griffiths 2007 employ a Bayesian approach to POS tagging and use sparse Dirichlet priors to minimize model size. More re- 2Computer Science Division University of California at Berkeley Soda Hall Berkeley CA 94720 adpauls@ cently Ravi and Knight 2009 .

Lê Quỳnh 72 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Eﬃcient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-of-Speech Tagging"

6 58 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462336 61

Giới thiệu :Lập trình mã nguồn mở

14 25925 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10543 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9835 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8499 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7709 1788

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7241 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 275 4 24-12-2024

Giáo trình phân tích phương trình vi phân viết dưới dạng thuật toán đặc tính của hệ thống p1

5 161 1 24-12-2024

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 180 3 24-12-2024

Báo cáo y học: "The Factors Influencing Depression Endpoints Research (FINDER) study: final results of Italian patients with depressio"

9 147 1 24-12-2024

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 146 1 24-12-2024

ĐỀ TÀI " ĐÁNH GIÁ HIỆU QUẢ HOẠT ĐỘNG KINH DOANH NGOẠI HỐI CỦA NGÂN HÀNG THƯƠNG MẠI CỔ PHẦN XUẤT NHẬP KHẨU VIỆT NAM "

51 149 3 24-12-2024

Báo cáo nghiên cứu khoa học " Vai trò chính quyền địa phương trong phát triển kinh tế : khu chuyên doanh gốm sứ ( Trung Quốc ) và Bát Tràng ( Việt Nam )("

11 212 1 24-12-2024

Sáng kiến kinh nghiệm môn mỹ thuật

5 171 1 24-12-2024

Lập trình Java cơ bản : Luồng và xử lý file part 8

5 140 1 24-12-2024

Báo cáo lâm nghiệp: "Assessment of the effects of below-zero temperatures on photosynthesis and chlorophyll a fluorescence in leaf discs of Eucalyptus globulu"

4 140 0 24-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7709 1788

Ebook Chào con ba mẹ đã sẵn sàng

112 4406 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6273 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3835 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3917 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4702 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4501 490