Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We present an approach for detecting salient (important) dates in texts in order to automatically build event timelines from a search query (e.g. the name of an event or person, etc.). This work was carried out on a corpus of newswire texts in English provided by the Agence France Presse (AFP). In order to extract salient dates that warrant inclusion in an event timeline, we first recognize and normalize temporal expressions in texts and then use a machine-learning approach to extract salient dates that relate to a particular topic | Finding Salient Dates for Building Thematic Timelines Remy Kessler LIMSI-CNRS Orsay France kessler@limsi.fr Xavier Tannier Univ. Paris-Sud LIMSI-CNRS Orsay France xtannier@limsi.fr Caroline Hagege Xerox Research Center Europe Meylan France hagege@xrce.xerox.com Veronique Moriceau Univ. Paris-Sud LIMSI-CNRS Orsay France moriceau@limsi.fr Abstract We present an approach for detecting salient important dates in texts in order to automatically build event timelines from a search query e.g. the name of an event or person etc. . This work was carried out on a corpus of newswire texts in English provided by the Agence France Presse AFP . In order to extract salient dates that warrant inclusion in an event timeline we first recognize and normalize temporal expressions in texts and then use a machine-learning approach to extract salient dates that relate to a particular topic. We focused only on extracting the dates and not the events to which they are related. 1 Introduction Our aim here was to build thematic timelines for a general domain topic defined by a user query. This task which involves the extraction of important events is related to the tasks of Retrospective Event Detection Yang et al. 1998 or New Event Detection as defined for example in Topic Detection and Tracking TDT campaigns Allan 2002 . The majority of systems designed to tackle this task make use of textual information in a bag-of-words manner. They use little temporal information generally only using document metadata such as the document creation time DCT . The few systems that do make use of temporal information such as the now discontinued Google timeline only extract absolute full dates that feature a day month and year . In our corpus described in Section 3.1 we found that only 7 of extracted temporal expressions are absolute dates. Andre Bittar Xerox Research Center Europe Meylan France bittar@xrce.xerox.com We distinguish our work from that of previous researchers in that we have focused .