TAILIEUCHUNG - Báo cáo khoa học: "Labeling Documents with Timestamps: Learning from their Time Expressions"

Temporal reasoners for document understanding typically assume that a document’s creation date is known. Algorithms to ground relative time expressions and order events often rely on this timestamp to assist the learner. Unfortunately, the timestamp is not always known, particularly on the Web. | Labeling Documents with Timestamps Learning from their Time Expressions Nathanael Chambers Department of Computer Science United States Naval Academy nchamber@ Abstract Temporal reasoners for document understanding typically assume that a document s creation date is known. Algorithms to ground relative time expressions and order events often rely on this timestamp to assist the learner. Unfortunately the timestamp is not always known particularly on the Web. This paper addresses the task of automatic document timestamping presenting two new models that incorporate rich linguistic features about time. The first is a discriminative classifier with new features extracted from the text s time expressions . since 1999 . This model alone improves on previous generative models by 77 . The second model learns probabilistic constraints between time expressions and the unknown document time. Imposing these learned constraints on the discriminative model further improves its accuracy. Finally we present a new experiment design that facilitates easier comparison by future work. 1 Introduction This paper addresses a relatively new task in the NLP community automatic document dating. Given a document with unknown origins what characteristics of its text indicate the year in which the document was written This paper proposes a learning approach that builds constraints from a document s use of time expressions and combines them with a new discriminative classifier that greatly improves previous work. The temporal reasoning community has long depended on document timestamps to ground rela 98 tive time expressions and events Mani and Wilson 2000 Llido et al. 2001 . For instance consider the following passage from the TimeBank corpus Pustejovsky et al. 2003 And while there was no profit this year from discontinued operations last year they contributed 34 million before tax. Reconstructing the timeline of events from this document requires extensive temporal knowledge most

TỪ KHÓA LIÊN QUAN
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.