TAILIEUCHUNG - Báo cáo khoa học: "Topic-Focused Multi-document Summarization Using an Approximate Oracle Score"

We consider the problem of producing a multi-document summary given a collection of documents. Since most successful methods of multi-document summarization are still largely extractive, in this paper, we explore just how well an extractive method can perform. We introduce an “oracle” score, based on the probability distribution of unigrams in human summaries. We then demonstrate that with the oracle score, we can generate extracts which score, on average, better than the human summaries, when evaluated with ROUGE. . | Topic-Focused Multi-document Summarization Using an Approximate Oracle Score John M. Conroy Judith D. Schlesinger Dianne P. O Leary IDA Center for Computing Sciences University of Maryland Bowie Maryland UsA College Park Maryland USA conroy@ judith@ oleary@ Abstract We consider the problem of producing a multi-document summary given a collection of documents. Since most successful methods of multi-document summarization are still largely extractive in this paper we explore just how well an extractive method can perform. We introduce an oracle score based on the probability distribution of unigrams in human summaries. We then demonstrate that with the oracle score we can generate extracts which score on average better than the human summaries when evaluated with ROUGE. In addition we introduce an approximation to the oracle score which produces a system with the best known performance for the 2005 Document Understanding Conference DUC evaluation. 1 Introduction We consider the problem of producing a multidocument summary given a collection of documents. Most automatic methods of multidocument summarization are largely extractive. This mimics the behavior of humans for single document summarization Kupiec Pendersen and Chen 1995 reported that 79 of the sentences in a human-generated abstract were a direct match to a sentence in a document. In contrast for multi-document summarization Copeck and Szpakowicz 2004 report that no more than 55 of the vocabulary contained in human-generated abstracts can be found in the given documents. Furthermore multiple human summaries on the same collection of documents often have little agreement. For example Hovy and Lin 2002 report that unigram overlap is around 40 . Teufel and van Halteren 2004 used a factoid agreement analysis of human summaries for a single document and concluded that a resulting consensus summary is stable only if 30-40 summaries are collected. In light of the strong evidence that .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.