TAILIEUCHUNG - Báo cáo khoa học: "Exploiting Structure for Event Discovery Using the MDI Algorithm"

Effectively identifying events in unstructured text is a very difficult task. This is largely due to the fact that an individual event can be expressed by several sentences. In this paper, we investigate the use of clustering methods for the task of grouping the text spans in a news article that refer to the same event. The key idea is to cluster the sentences, using a novel distance metric that exploits regularities in the sequential structure of events within a document. | Exploiting Structure for Event Discovery Using the MDI Algorithm Martina Naughton School of Computer Science Informatics University College Dublin Ireland Abstract Effectively identifying events in unstructured text is a very difficult task. This is largely due to the fact that an individual event can be expressed by several sentences. In this paper we investigate the use of clustering methods for the task of grouping the text spans in a news article that refer to the same event. The key idea is to cluster the sentences using a novel distance metric that exploits regularities in the sequential structure of events within a document. When this approach is compared to a simple bag of words baseline a statistically significant increase in performance is observed. 1 Introduction Accurately identifying events in unstructured text is an important goal for many applications that require natural language understanding. There has been an increased focus on this problem in recent years. The Automatic Content Extraction ACE program1 is dedicated to developing methods that automatically infer meaning from language data. Tasks include the detection and characterisation of Entities Relations and Events. Extensive research has been dedicated to entity recognition and binary relation detection with significant results Bikel et al. 1999 . However event extraction is still considered as one of the most challenging tasks because an individual event can be expressed by several sentences Xu et al. 2006 . In this paper we primarily focus on techniques for identifying events within a given news article. Specifically we describe and evaluate clustering 1http speech tests ace methods for the task of grouping sentences in a news article that refer to the same event. We generate sentence clusters using three variations of the well-documented Hierarchical Agglomerative Clustering HAC Manning and Schutze 1999 as a baseline for this task. We provide .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.