TAILIEUCHUNG - Báo cáo khoa học: "Named Entity Recognition for Catalan Using Spanish Resources"

This work studies Named Entity Recognition (NER) for Catalan without making use of annotated resources of this language. The approach presented is based on machine learning techniques and exploits Spanish resources, either by first training models for Spanish and then translating them into Catalan, or by directly training bilingual models. The resulting models are retrained on unlabelled Catalan data using bootstrapping techniques. Exhaustive experimentation has been conducted on real data, showing competitive results for the obtained NER systems. . | Named Entity Recognition for Catalan Using Spanish Resources Xavier Carreras Lluis Marquez and Lluís Padró TALP Research Center LSI Department Universitat Politècnica de Catalunya Jordi Girona 1-3 E-08034 Barcelona carreras lluism padro @ Abstract This work studies Named Entity Recognition NER for Catalan without making use of annotated resources of this language. The approach presented is based on machine learning techniques and exploits Spanish resources either by first training models for Spanish and then translating them into Catalan or by directly training bilingual models. The resulting models are retrained on unlabelled Catalan data using bootstrapping techniques. Exhaustive experimentation has been conducted on real data showing competitive results for the obtained NER systems. 1 Introduction A Named Entity NE is a lexical unit consisting of a sequence of contiguous words which refers to a concrete entity such as a person a location an organization or an artifact. Figure 1 contains an example sentence extracted from the Spanish corpus referred in section 2 and translated into Catalan including several entities. There is a wide consensus about that Named Entity Recognition and Classification NERC are Natural Language Processing tasks which may improve the performance of many applications such as Information Extraction Machine Translation Question Answering Topic Detection and Tracking etc. Thus interest on detecting and classify ing those units in a text has kept on growing during the last years. Named Entity processing consists of two steps which are usually approached sequentially. First NEs are detected in the text and their boundaries delimited Named Entity Recognition NER . Second entities are classified in a predefined set of classes which usually contain labels such as person organization location etc. Named Entity Classification NEC . In this paper we will focus on the first of these stages that is Named Entity boundary detection. Previous

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.