Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Different summarization requirements could make the writing of a good summary more difficult, or easier. Summary length and the characteristics of the input are such constraints influencing the quality of a potential summary. In this paper we report the results of a quantitative analysis on data from large-scale evaluations of multi-document summarization, empirically confirming this hypothesis. We further show that features measuring the cohesiveness of the input are highly correlated with eventual summary quality and that it is possible to use these as features to predict the difficulty of new, unseen, summarization inputs. . | Can you summarize this Identifying correlates of input difficulty for generic multi-document summarization Ani Nenkova University of Pennsylvania Philadelphia PA 19104 USA nenkova@seas.upenn.edu Annie Louis University of Pennsylvania Philadelphia PA 19104 USA lannie@seas.upenn.edu Abstract Different summarization requirements could make the writing of a good summary more difficult or easier. Summary length and the characteristics of the input are such constraints influencing the quality of a potential summary. In this paper we report the results of a quantitative analysis on data from large-scale evaluations of multi-document summarization empirically confirming this hypothesis. We further show that features measuring the cohesiveness of the input are highly correlated with eventual summary quality and that it is possible to use these as features to predict the difficulty of new unseen summarization inputs. 1 Introduction In certain situations even the best automatic summarizes or professional writers can find it hard to write a good summary of a set of articles. If there is no clear topic shared across the input articles or if they follow the development of the same event in time for a longer period it could become difficult to decide what information is most representative and should be conveyed in a summary. Similarly length requirements could pre-determine summary quality a short outline of a story might be confusing and unclear but a page long discussion might give an excellent overview of the same issue. Even systems that perform well on average produce summaries of poor quality for some inputs. For this reason understanding what aspects of the input make it difficult for summarization becomes an interesting and important issue that has not been addressed in the summarization community untill now. In information retrieval for example the variable system performance has been recognized as a research challenge and numerous studies on identifying query .