TAILIEUCHUNG - Báo cáo khoa học: "Finding Salient Dates for Building Thematic Timelines"

We present an approach for detecting salient (important) dates in texts in order to automatically build event timelines from a search query (. the name of an event or person, etc.). This work was carried out on a corpus of newswire texts in English provided by the Agence France Presse (AFP). In order to extract salient dates that warrant inclusion in an event timeline, we first recognize and normalize temporal expressions in texts and then use a machine-learning approach to extract salient dates that relate to a particular topic | Finding Salient Dates for Building Thematic Timelines Remy Kessler LIMSI-CNRS Orsay France kessler@ Xavier Tannier Univ. Paris-Sud LIMSI-CNRS Orsay France xtannier@ Caroline Hagege Xerox Research Center Europe Meylan France hagege@ Veronique Moriceau Univ. Paris-Sud LIMSI-CNRS Orsay France moriceau@ Abstract We present an approach for detecting salient important dates in texts in order to automatically build event timelines from a search query . the name of an event or person etc. . This work was carried out on a corpus of newswire texts in English provided by the Agence France Presse AFP . In order to extract salient dates that warrant inclusion in an event timeline we first recognize and normalize temporal expressions in texts and then use a machine-learning approach to extract salient dates that relate to a particular topic. We focused only on extracting the dates and not the events to which they are related. 1 Introduction Our aim here was to build thematic timelines for a general domain topic defined by a user query. This task which involves the extraction of important events is related to the tasks of Retrospective Event Detection Yang et al. 1998 or New Event Detection as defined for example in Topic Detection and Tracking TDT campaigns Allan 2002 . The majority of systems designed to tackle this task make use of textual information in a bag-of-words manner. They use little temporal information generally only using document metadata such as the document creation time DCT . The few systems that do make use of temporal information such as the now discontinued Google timeline only extract absolute full dates that feature a day month and year . In our corpus described in Section we found that only 7 of extracted temporal expressions are absolute dates. Andre Bittar Xerox Research Center Europe Meylan France bittar@ We distinguish our work from that of previous researchers in that we have focused .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.