Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Bag-of-words approaches to information retrieval (IR) are effective but assume independence between words. The Hyperspace Analogue to Language (HAL) is a cognitively motivated and validated semantic space model that captures statistical dependencies between words by considering their co-occurrences in a surrounding window of text. | Event-based Hyperspace Analogue to Language for Query Expansion Tingxu Yan Tianjin University Tianjin China sunriser2008@gmail.com Tamsin Maxwell University of Edinburgh Edinburgh United Kingdom t.maxwell@ed.ac.uk Dawei Song Robert Gordon University Aberdeen United Kingdom d.song@rgu.ac.uk Yuexian Hou Tianjin University Tianjin China yxhou@tju.edu.cn Peng Zhang Robert Gordon University Aberdeen United Kingdom. p.zhang1@rgu.ac.uk Abstract Bag-of-words approaches to information retrieval IR are effective but assume independence between words. The Hyperspace Analogue to Language HAL is a cognitively motivated and validated semantic space model that captures statistical dependencies between words by considering their co-occurrences in a surrounding window of text. HAL has been successfully applied to query expansion in IR but has several limitations including high processing cost and use of distributional statistics that do not exploit syntax. In this paper we pursue two methods for incorporating syntactic-semantic information from textual events into HAL. We build the HAL space directly from events to investigate whether processing costs can be reduced through more careful definition of word co-occurrence and improve the quality of the pseudo-relevance feedback by applying event information as a constraint during HAL construction. Both methods significantly improve performance results in comparison with original HAL and interpolation of HAL and relevance model expansion outperforms either method alone. 1 Introduction Despite its intuitive appeal the incorporation of linguistic and semantic word dependencies in IR has not been shown to significantly improve over a bigram language modeling approach Song and Croft 1999 that encodes word dependencies assumed from mere syntactic adjacency. Both the dependence language model for IR Gao et al. 2004 which incorporates linguistic relations between non-adjacent words while limiting the generation of meaningless phrases and the