TAILIEUCHUNG - Báo cáo khoa học: "A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing"

This paper presents a comparative study of five parameter estimation algorithms on four NLP tasks. Three of the five algorithms are well-known in the computational linguistics community: Maximum Entropy (ME) estimation with L2 regularization, the Averaged Perceptron (AP), and Boosting. We also investigate ME estimation with L1 regularization using a novel optimization algorithm, and BLasso, which is a version of Boosting with Lasso (L1) regularization. We first investigate all of our estimators on two re-ranking tasks: a parse selection task and a language model (LM) adaptation task. . | A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing Jianfeng Gao Galen Andrew Mark Johnson Kristina Toutanova Microsoft Research Redmond WA 98052 jfgao galena kristout @ Brown University Providence RI 02912 mj@. edu Abstract This paper presents a comparative study of five parameter estimation algorithms on four NLP tasks. Three of the five algorithms are well-known in the computational linguistics community Maximum Entropy ME estimation with L2 regularization the Averaged Perceptron AP and Boosting. We also investigate ME estimation with L1 regularization using a novel optimization algorithm and BLasso which is a version of Boosting with Lasso L1 regularization. We first investigate all of our estimators on two re-ranking tasks a parse selection task and a language model LM adaptation task. Then we apply the best of these estimators to two additional tasks involving conditional sequence models a Conditional Markov Model CMM for part of speech tagging and a Conditional Random Field CRF for Chinese word segmentation. Our experiments show that across tasks three of the estimators ME estimation with L1 or L2 regularization and AP are in a near statistical tie for first place. 1 Introduction Parameter estimation is fundamental to many statistical approaches to NLP. Because of the high-dimensional nature of natural language it is often easy to generate an extremely large number of features. The challenge of parameter estimation is to find a combination of the typically noisy redundant features that accurately predicts the target output variable and avoids overfitting. Intuitively this can be achieved either by selecting a small number of highly-effective features and ignoring the others or by averaging over a large number of weakly informative features. The first intuition motivates feature selection methods such as Boosting and BLasso . Collins 2000 Zhao and Yu 2004 which usually work best when many .

Anh Khôi 108 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

A comparative study of flat surface design and medial pivot design in posterior cruciate-retaining total knee arthroplasty: A matched pair cohort study of two years

7 40 1

Báo cáo khoa học: "A Comparative Study of Target Dependency Structures for Statistical Machine Translation"

5 50 0

Báo cáo khoa học: "A Comparative Study on Generalization of Semantic Roles in FrameNet"

9 78 0

Báo cáo khoa học: "A Comparative Study of Hypothesis Alignment and its Improvement for Machine Translation System Combination"

8 88 0

Báo cáo khoa học: "A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing"

8 78 0

Báo cáo khoa học: "A Comparative Study on Reordering Constraints in Statistical Machine Translation"

8 91 0

Strategies for internationalization: A Comparative study of Thai and Vietnamese companies in two industries

226 86 0

Báo cáo khoa học: "A Comparative Study of Reinforcement Learning Techniques on Dialogue Management"

10 72 0

Thesis: Directness in conversations in american english and vietnamese – A comparative study

53 69 0

Nóng, ấm, mát and lạnh in Vietnamese and hot, warm, cool and cold in English: A comparative study

11 116 2

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462386 61

Giới thiệu :Lập trình mã nguồn mở

14 27289 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11388 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10588 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9870 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8914 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8539 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8114 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 8077 1836

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7324 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 243 3 23-01-2025

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 247 8 23-01-2025

báo cáo hóa học:" Quality of data collection in a large HIV observational clinic database in sub-Saharan Africa: implications for clinical research and audit of care"

7 164 4 23-01-2025

Bệnh sán lá gan trên gia súc và cách phòng trị

3 171 1 23-01-2025

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 149 1 23-01-2025

IT Audit: EMC’s Journey to the Private Cloud

13 165 1 23-01-2025

Giáo trình môn cầu đường

26 148 2 23-01-2025

đề cương ôn tập chương Vật lý 10 - Cơ học

6 136 0 23-01-2025

Cách trẻ ăn trái cây có lợi nhất

2 150 1 23-01-2025

THUẬT TOÁN LUYỆN KIM SONG SONG (Parallel Simulated Annealing Algorithms) GIẢI QUYẾT BÀI TOÁN MAX-SAT

41 136 1 23-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8114 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 8077 1836

Ebook Chào con ba mẹ đã sẵn sàng

112 4475 1381

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6463 1285

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8914 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3884 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3934 613

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4833 568

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11388 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4551 490