Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We explore a semi-supervised approach for improving the portability of time expression recognition to non-newswire domains: we generate additional training examples by substituting temporal expression words with potential synonyms. We explore using synonyms both from WordNet and from the Latent Words Language Model (LWLM), which predicts synonyms in context using an unsupervised approach. | Model-Portability Experiments for Textual Temporal Analysis Oleksandr Kolomiyets Steven Bethard and Marie-Francine Moens Department of Computer Science Katholieke Universiteit Leuven Celestijnenlaan 200A Heverlee 3001 Belgium oleksandr.kolomiyets steven.bethard sien.moens @cs.kuleuven.be Abstract We explore a semi-supervised approach for improving the portability of time expression recognition to non-newswire domains we generate additional training examples by substituting temporal expression words with potential synonyms. We explore using synonyms both from WordNet and from the Latent Words Language Model LWLM which predicts synonyms in context using an unsupervised approach. We evaluate a state-of-the-art time expression recognition system trained both with and without the additional training examples using data from TempEval 2010 Reuters and Wikipedia. We find that the LWLM provides substantial improvements on the Reuters corpus and smaller improvements on the Wikipedia corpus. We find that WordNet alone never improves performance though intersecting the examples from the LWLM and WordNet provides more stable results for Wikipedia. 1 Introduction The recognition of time expressions such as April 2011 mid-September and early next week is a crucial first step for applications like question answering that must be able to handle temporally anchored queries. This need has inspired a variety of shared tasks for identifying time expressions including the Message Understanding Conference named entity task Grishman and Sundheim 1996 the Automatic Content Extraction time 271 normalization task http fofoca.mitre.org tern.html and the TempEval 2010 time expression task Verhagen et al. 2010 . Many researchers competed in these tasks applying both rule-based and machine-learning approaches Mani and Wilson 2000 Negri and Marseglia 2004 Hacioglu et al. 2005 Ahn et al. 2007 Poveda et al. 2007 Strotgen and Gertz 2010 Llorens et al. 2010 and achieving F1 measures as high as 0.86 .