TAILIEUCHUNG - Báo cáo khoa học: "Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia"

In this paper we propose a method to automatically label multi-lingual data with named entity tags. We build on prior work utilizing Wikipedia metadata and show how to effectively combine the weak annotations stemming from Wikipedia metadata with information obtained through English-foreign language parallel Wikipedia sentences. | Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia Sungchul Kim POSTECH Pohang South Korea subright@ Kristina Toutanova Microsoft Research Redmond WA 98502 kristout@ Hwanjo Yu POSTECH Pohang South Korea hwanjoyu@ Abstract In this paper we propose a method to automatically label multi-lingual data with named entity tags. We build on prior work utilizing Wikipedia metadata and show how to effectively combine the weak annotations stemming from Wikipedia metadata with information obtained through English-foreign language parallel Wikipedia sentences. The combination is achieved using a novel semi-CRF model for foreign sentence tagging in the context of a parallel English sentence. The model outperforms both standard annotation projection methods and methods based solely on Wikipedia metadata. 1 Introduction Named Entity Recognition NER is a frequently needed technology in NLP applications. State-of-the-art statistical models for NER typically require a large amount of training data and linguistic expertise to be sufficiently accurate which makes it nearly impossible to build high-accuracy models for a large number of languages. Recently there have been two lines of work which have offered hope for creating NER analyzers in many languages. The first has been to devise an algorithm to tag foreign language entities using metadata from the semi-structured Wikipedia repository inter-wiki links article categories and crosslanguage links Richman and Schone 2008 . The second has been to use parallel English-foreign language data a high-quality NER tagger for English and projected annotations for the foreign language Yarowsky et al. 2001 Das and Petrov 2011 . Parallel data has also been used to improve existing monolingual taggers or other analyzers in two languages Burkett et al. 2010a Burkett et al. 2010b . This research was conducted during the author s internship at Microsoft Research The goal of this

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.