TAILIEUCHUNG - Báo cáo khoa học: "Tree Representations in Probabilistic Models for Extended Named Entities Detection"

In this paper we deal with Named Entity Recognition (NER) on transcriptions of French broadcast data. Two aspects make the task more difficult with respect to previous NER tasks: i) named entities annotated used in this work have a tree structure, thus the task cannot be tackled as a sequence labelling task; ii) the data used are more noisy than data used for previous NER tasks. We approach the task in two steps, involving Conditional Random Fields and Probabilistic Context-Free Grammars, integrated in a single parsing algorithm. We analyse the effect of using several tree representations. Our system outperforms. | Tree Representations in Probabilistic Models for Extended Named Entities Detection Marco Dinarelli LIMSI-CNRS Orsay France marcod@ Sophie Rosset LIMSI-CNRS Orsay France rosset@ Abstract In this paper we deal with Named Entity Recognition NER on transcriptions of French broadcast data. Two aspects make the task more difficult with respect to previous NER tasks i named entities annotated used in this work have a tree structure thus the task cannot be tackled as a sequence labelling task ii the data used are more noisy than data used for previous NER tasks. We approach the task in two steps involving Conditional Random Fields and Probabilistic Context-Free Grammars integrated in a single parsing algorithm. We analyse the effect of using several tree representations. Our system outperforms the best system of the evaluation campaign by a significant margin. 1 Introduction Named Entity Recognition is a traditinal task of the Natural Language Processing domain. The task aims at mapping words in a text into semantic classes such like persons organizations or localizations. While at first the NER task was quite simple involving a limited number of classes Gr-ishman and Sundheim 1996 along the years the task complexity increased as more complex class taxonomies were defined Sekine and Nobata 2004 . The interest in the task is related to its use in complex frameworks for semantic content extraction such like Relation Extraction applications Doddington et al. 2004 . This work presents research on a Named Entity Recognition task defined with a new set of named entities. The characteristic of such set is in that named entities have a tree structure. As conce-quence the task cannot be tackled as a sequence labelling approach. Additionally the use of noisy data like transcriptions of French broadcast data makes the task very challenging for traditional NLP solutions. To deal with such problems we adopt a two-steps approach the first being realized with Conditional

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.