TAILIEUCHUNG - Báo cáo khoa học: "SVM Model Tampering and Anchored Learning: A Case Study in Hebrew NP Chunking"

We study the issue of porting a known NLP method to a language with little existing NLP resources, speciﬁcally Hebrew SVM-based chunking. We introduce two SVM-based methods – Model Tampering and Anchored Learning. These allow ﬁne grained analysis of the learned SVM models, which provides guidance to identify errors in the training corpus, distinguish the role and interaction of lexical features and eventually construct a model with ∼10% error reduction. | SVM Model Tampering and Anchored Learning A Case Study in Hebrew NP Chunking Yoav Goldberg and Michael Elhadad Computer Science Department Ben Gurion University of the Negev 653 Be er Sheva 84105 Israel yoavg elhadad@ Abstract We study the issue of porting a known NLP method to a language with little existing NLP resources specifically Hebrew SVM-based chunking. We introduce two SVM-based methods - Model Tampering and Anchored Learning. These allow fine grained analysis of the learned SVM models which provides guidance to identify errors in the training corpus distinguish the role and interaction of lexical features and eventually construct a model with 10 error reduction. The resulting chunker is shown to be robust in the presence of noise in the training corpus relies on less lexical features than was previously understood and achieves an F-measure performance of on automatically PoS-tagged text. The SVM analysis methods also provide general insight on SVM-based chunking. 1 Introduction While high-quality NLP corpora and tools are available in English such resources are difficult to obtain in most other languages. Three challenges must be met when adapting results established in English to another language 1 acquiring high quality annotated data 2 adapting the English task definition to the nature of a different language and 3 adapting the algorithm to the new language. This paper presents a case study in the adaptation of a well known task to a language with few NLP resources available. Specifically we deal with SVM based Hebrew NP chunking. In Goldberg et al. 2006 we established that the task is not trivially transferable 224 to Hebrew but reported that SVM based chunking Kudo and Matsumoto 2000 performs well. We extend that work and study the problem from 3 angles 1 how to deal with a corpus that is smaller and with a higher level of noise than is available in English we propose techniques that help identify suspicious data points in t F

Thường Kiệt 64 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Classification of three class hand imagery movement with the application of 2 - Stage SVM model

6 84 0

Delisting sharia stock prediction model based on financial information: Support Vector Machine

8 86 1

Báo cáo khoa học: "SVM Model Tampering and Anchored Learning: A Case Study in Hebrew NP Chunking"

8 49 0

Ứng dụng mô hình máy học véc tơ tựa (SVM) trong phân tích dữ liệu điểm sinh viên

5 34 1

Báo cáo hóa học: " Research Article A Fault Diagnosis Approach for Gears Based on IMF AR Model and SVM"

7 44 0

PredSTP: A highly accurate SVM based model to predict sequential cystine stabilized peptides

11 25 1

A recommender system based on one-class classification

6 70 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462351 61

Giới thiệu :Lập trình mã nguồn mở

14 26660 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11375 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10567 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9855 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8906 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8518 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7915 1821

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7289 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 162 3 08-01-2025

Bảng màu theo chữ cái – V

11 177 2 08-01-2025

Đề tài " Dự báo về tác động của Tổ chức Thương mại Thế giới WTO đối với các doanh nghiệp xuất khẩu vừa và nhỏ Việt Nam – Những giải pháp đề xuất "

72 193 2 08-01-2025

Valve Selection Handbook - Fourth Edition

337 150 2 08-01-2025

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 156 1 08-01-2025

Bệnh sán lá gan trên gia súc và cách phòng trị

3 170 1 08-01-2025

Word Games with English 1

65 146 1 08-01-2025

OPEN SOURCE ERP REASONABLE TOOLS FOR MANUFACTURING SMEs?

1 156 1 08-01-2025

Sáng kiến kinh nghiệm môn mỹ thuật

5 184 1 08-01-2025

Determini prounoun 1

6 147 0 08-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8109 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7915 1821

Ebook Chào con ba mẹ đã sẵn sàng

112 4435 1376

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6353 1276

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8906 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3859 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3930 610

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4768 567

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11375 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4533 490