TAILIEUCHUNG - Báo cáo khoa học: "a Topic-Model based approach for update summarization"

Update summarization is a new challenge in multi-document summarization focusing on summarizing a set of recent documents relatively to another set of earlier documents. We present an unsupervised probabilistic approach to model novelty in a document collection and apply it to the generation of update summaries. The new model, called D UAL S UM, results in the second or third position in terms of the ROUGE metrics when tuned for previous TAC competitions and tested on TAC-2011, being statistically indistinguishable from the winning system. A manual evaluation of the generated summaries shows state-of-the art results for D UAL S. | DualSum a Topic-Model based approach for update summarization Jean-Yves Delort Google Research Brandschenkestrasse 110 8002 Zurich Switzerland jydelort@ Enrique Alfonseca Google Research Brandschenkestrasse 110 8002 Zurich Switzerland ealfonseca@ Abstract Update summarization is a new challenge in multi-document summarization focusing on summarizing a set of recent documents relatively to another set of earlier documents. We present an unsupervised probabilistic approach to model novelty in a document collection and apply it to the generation of update summaries. The new model called DualSum results in the second or third position in terms of the ROUGE metrics when tuned for previous TAC competitions and tested on TAC-2011 being statistically indistinguishable from the winning system. A manual evaluation of the generated summaries shows state-of-the art results for DualSum with respect to focus coherence and overall responsiveness. 1 Introduction Update summarization is the problem of extracting and synthesizing novel information in a collection of documents with respect to a set of documents assumed to be known by the reader. This problem has received much attention in recent years as can be observed in the number of participants to the special track on update summarization organized by DUC and TAC since 2007. The problem is usually formalized as follows Given two collections A and B where the documents in A chronologically precede the documents in B generate a summary of B under the assumption that the user of the summary has already read the documents in A. Extractive techniques are the most common approaches in multi-document summarization. Summaries generated by such techniques consist of sentences extracted from the document collection. Extracts can have coherence and cohesion problems but they generally offer a good tradeoff between linguistic quality and informativeness. While numerous extractive summarization techniques have been .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.