Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We design a class of submodular functions meant for document summarization tasks. These functions each combine two terms, one which encourages the summary to be representative of the corpus, and the other which positively rewards diversity. Critically, our functions are monotone nondecreasing and submodular, which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality. | A Class of Submodular Functions for Document Summarization Hui Lin Dept. of Electrical Engineering University of Washington Seattle WA 98195 USA hlin@ee.washington.edu Jeff Bilmes Dept. of Electrical Engineering University of Washington Seattle Wa 98195 UsA bilmes@ee.washington.edu Abstract We design a class of submodular functions meant for document summarization tasks. These functions each combine two terms one which encourages the summary to be representative of the corpus and the other which positively rewards diversity. Critically our functions are monotone nondecreasing and submodular which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality. When evaluated on DuC 2004-2007 corpora we obtain better than existing state-of-art results in both generic and query-focused document summarization. Lastly we show that several well-established methods for document summarization correspond in fact to submodular function optimization adding further evidence that submodular functions are a natural fit for document summarization. 1 Introduction In this paper we address the problem of generic and query-based extractive summarization from collections of related documents a task commonly known as multi-document summarization. We treat this task as monotone submodular function maximization to be defined in Section 2 . This has a number of critical benefits. On the one hand there exists a simple greedy algorithm for monotone submodular function maximization where the summary solution obtained say S is guaranteed to be almost as good as the best possible solution say Sopt according to an objective F. More precisely the greedy algorithm is a constant factor approximation to the cardinality constrained version of the problem so that 510 F S 1 - 1 e F Sopt - 0.632F Sopt . This is particularly attractive since the quality of the solution does not depend on the size of the problem so even very large size problems do well. It is .