Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We discuss a seml-interactive approach to information retrieval which consists of two tasks performed in a sequence. First, the system assists the searcher in building a comprehensive statement of information need, using automatically generated topical summaries of sample documents. | Summarization-based Query Expansion in Information Retrieval Tomek Sfrzalkowski Jin Wang and Bowden Wise GE Corporate Research and Development 1 Research Circle Niskayuna NY 12309 strzalkowski@crd.ge.com Abstract We discuss a semi-interactive approach to information retrieval which consists of two tasks performed in a sequence. First the system assists the searcher in building a comprehensive statement of information need using automatically generated topical summaries of sample documents. Second the detailed statement of information need is automatically processed by a series of natural language processing routines in order to derive an optimal search query for a statistical information retrieval system. In this paper we investigate the role of automated document summarization in building effective search statements. We also discuss the results of latest evaluation of our system at the annual Text Retrieval Conference TREC . Information Retrieval Information retrieval IR is a task of selecting documents from a database in response to a user s query and ranking them according to relevance. This has been usually accomplished using statistical methods often coupled with manual encoding that a select terms words phrases and other units from documents that are deemed to best represent their content and b create an inverted index file or files that provide an easy access to documents containing these terms. A subsequent search process attempts to match preprocessed user queries against termbased representations of documents in each case determining a degree of relevance between the two which depends upon the number and types of matching terms. A search is successful if it can return as many as possible documents which are relevant to the query with as few as possible non-relevant documents. In addition the relevant documents should be ranked ahead of non-relevant ones. The quantitative text representation methods predominant in today s leading information retrieval .