TAILIEUCHUNG - Báo cáo khoa học: "SVM Model Tampering and Anchored Learning: A Case Study in Hebrew NP Chunking"

We study the issue of porting a known NLP method to a language with little existing NLP resources, specifically Hebrew SVM-based chunking. We introduce two SVM-based methods – Model Tampering and Anchored Learning. These allow fine grained analysis of the learned SVM models, which provides guidance to identify errors in the training corpus, distinguish the role and interaction of lexical features and eventually construct a model with ∼10% error reduction. | SVM Model Tampering and Anchored Learning A Case Study in Hebrew NP Chunking Yoav Goldberg and Michael Elhadad Computer Science Department Ben Gurion University of the Negev 653 Be er Sheva 84105 Israel yoavg elhadad@ Abstract We study the issue of porting a known NLP method to a language with little existing NLP resources specifically Hebrew SVM-based chunking. We introduce two SVM-based methods - Model Tampering and Anchored Learning. These allow fine grained analysis of the learned SVM models which provides guidance to identify errors in the training corpus distinguish the role and interaction of lexical features and eventually construct a model with 10 error reduction. The resulting chunker is shown to be robust in the presence of noise in the training corpus relies on less lexical features than was previously understood and achieves an F-measure performance of on automatically PoS-tagged text. The SVM analysis methods also provide general insight on SVM-based chunking. 1 Introduction While high-quality NLP corpora and tools are available in English such resources are difficult to obtain in most other languages. Three challenges must be met when adapting results established in English to another language 1 acquiring high quality annotated data 2 adapting the English task definition to the nature of a different language and 3 adapting the algorithm to the new language. This paper presents a case study in the adaptation of a well known task to a language with few NLP resources available. Specifically we deal with SVM based Hebrew NP chunking. In Goldberg et al. 2006 we established that the task is not trivially transferable 224 to Hebrew but reported that SVM based chunking Kudo and Matsumoto 2000 performs well. We extend that work and study the problem from 3 angles 1 how to deal with a corpus that is smaller and with a higher level of noise than is available in English we propose techniques that help identify suspicious data points in t F

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.