TAILIEUCHUNG - Báo cáo khoa học: "Discovery of Topically Coherent Sentences for Extractive Summarization"

Extractive methods for multi-document summarization are mainly governed by information overlap, coherence, and content constraints. We present an unsupervised probabilistic approach to model the hidden abstract concepts across documents as well as the correlation between these concepts, to generate topically coherent and non-redundant summaries. Based on human evaluations our models generate summaries with higher linguistic quality in terms of coherence, readability, and redundancy compared to benchmark systems. . | Discovery of Topically Coherent Sentences for Extractive Summarization Asli Celikyilmaz Microsoft Speech Labs Mountain View CA 94041 asli@ Dilek Hakkani-Tur Microsoft Speech Labs Microsoft Research Mountain View CA 94041 dilek@ Abstract Extractive methods for multi-document summarization are mainly governed by information overlap coherence and content constraints. We present an unsupervised probabilistic approach to model the hidden abstract concepts across documents as well as the correlation between these concepts to generate topically coherent and non-redundant summaries. Based on human evaluations ourmod-els generate summaries with higher linguistic quality in terms of coherence readability and redundancy compared to benchmark systems. Although our system is unsupervised and optimized for topical coherence we achieve a ROUGE on the DUC-07 test set roughly in the range of state-of-the-art supervised models. 1 Introduction A query-focused multi-document summarization model produces a short-summary text of a set of documents which are retrieved based on a user s query. An ideal generated summary text should contain the shared relevant content among set of documents only once plus other unique information from individual documents that are directly related to the user s query addressing different levels of detail. Recent approaches to the summarization task has somewhat focused on the redundancy and coherence issues. In this paper we introduce a series of new generative models for multiple-documents based on a discovery of hierarchical topics and their correlations to extract topically coherent sentences. Prior research has demonstrated the usefulness of sentence extraction for generating summary text 491 taking advantage of surface level features such as word repetition position in text cue phrases etc Radev 2004 Nenkova and Vanderwende 2005a Wan and Yang 2006 Nenkova et al. 2006 . Because documents have pre-defined structures . sections .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.