Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We present BAYE S UM (for “Bayesian summarization”), a model for sentence extraction in query-focused summarization. BAYE S UM leverages the common case in which multiple documents are relevant to a single query. Using these documents as reinforcement for query terms, BAYE S UM is not afflicted by the paucity of information in short queries. We show that approximate inference in BAYE S UM is possible on large data sets and results in a stateof-the-art summarization system. Furthermore, we show how BAYE S UM can be understood as a justified query expansion technique in the language modeling for IR. | Bayesian Query-Focused Summarization Hal Daume III and Daniel Marcu Information Sciences Institute 4676 Admiralty Way Suite 1001 Marina del Rey CA 90292 me@hal3.name marcu@isi.edu Abstract We present BayeSum for Bayesian summarization a model for sentence extraction in query-focused summarization. BayeS UM leverages the common case in which multiple documents are relevant to a single query. Using these documents as reinforcement for query terms BayeSum is not afflicted by the paucity of information in short queries. We show that approximate inference in BayeS UM is possible on large data sets and results in a state-of-the-art summarization system. Furthermore we show how BayeSum can be understood as a justified query expansion technique in the language modeling for IR framework. 1 Introduction We describe BayeSum an algorithm for performing query-focused summarization in the common case that there are many relevant documents for a given query. Given a query and a collection of relevant documents our algorithm functions by asking itself the following question what is it about these relevant documents that differentiates them from the non-relevant documents BayeS UM can be seen as providing a statistical formulation of this exact question. The key requirement of BayeSum is that multiple relevant documents are known for the query in question. This is not a severe limitation. In two well-studied problems it is the de-facto standard. In standard multidocument summarization with or without a query we have access to known relevant documents for some user need. Similarly in the case of a web-search application an underlying IR engine will retrieve multiple presumably relevant documents for a given query. For both of these tasks BayeS UM performs well even when the underlying retrieval model is noisy. The idea of leveraging known relevant documents is known as query expansion in the information retrieval community where it has been shown to be successful in ad hoc retrieval