Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
In this paper we present a joint content selection and compression model for single-document summarization. The model operates over a phrase-based representation of the source document which we obtain by merging information from PCFG parse trees and dependency graphs. Using an integer linear programming formulation, the model learns to select and combine phrases subject to length, coverage and grammar constraints. | Automatic Generation of Story Highlights Kristian Woodsend and Mirella Lapata School of Informatics University of Edinburgh Edinburgh EH8 9AB United Kingdom k.woodsend@ed.ac.uk mlap@inf.ed.ac.uk Abstract In this paper we present a joint content selection and compression model for single-document summarization. The model operates over a phrase-based representation of the source document which we obtain by merging information from PCFG parse trees and dependency graphs. Using an integer linear programming formulation the model learns to select and combine phrases subject to length coverage and grammar constraints. We evaluate the approach on the task of generating story highlights a small number of brief self-contained sentences that allow readers to quickly gather information on news stories. Experimental results show that the model s output is comparable to human-written highlights in terms of both grammaticality and content. 1 Introduction Summarization is the process of condensing a source text into a shorter version while preserving its information content. Humans summarize on a daily basis and effortlessly but producing high quality summaries automatically remains a challenge. The difficulty lies primarily in the nature of the task which is complex must satisfy many constraints e.g. summary length informativeness coherence grammaticality and ultimately requires wide-coverage text understanding. Since the latter is beyond the capabilities of current NLP technology most work today focuses on extractive summarization where a summary is created simply by identifying and subsequently concatenating the most important sentences in a document. Without a great deal of linguistic analysis it is possible to create summaries for a wide range of documents. Unfortunately extracts are often documents of low readability and text quality and contain much redundant information. This is in marked contrast with hand-written summaries which often combine several pieces of .