TAILIEUCHUNG - Báo cáo khoa học: "Topic-Focused Multi-document Summarization Using an Approximate Oracle Score"

We consider the problem of producing a multi-document summary given a collection of documents. Since most successful methods of multi-document summarization are still largely extractive, in this paper, we explore just how well an extractive method can perform. We introduce an “oracle” score, based on the probability distribution of unigrams in human summaries. We then demonstrate that with the oracle score, we can generate extracts which score, on average, better than the human summaries, when evaluated with ROUGE. . | Topic-Focused Multi-document Summarization Using an Approximate Oracle Score John M. Conroy Judith D. Schlesinger Dianne P. O Leary IDA Center for Computing Sciences University of Maryland Bowie Maryland UsA College Park Maryland USA conroy@ judith@ oleary@ Abstract We consider the problem of producing a multi-document summary given a collection of documents. Since most successful methods of multi-document summarization are still largely extractive in this paper we explore just how well an extractive method can perform. We introduce an oracle score based on the probability distribution of unigrams in human summaries. We then demonstrate that with the oracle score we can generate extracts which score on average better than the human summaries when evaluated with ROUGE. In addition we introduce an approximation to the oracle score which produces a system with the best known performance for the 2005 Document Understanding Conference DUC evaluation. 1 Introduction We consider the problem of producing a multidocument summary given a collection of documents. Most automatic methods of multidocument summarization are largely extractive. This mimics the behavior of humans for single document summarization Kupiec Pendersen and Chen 1995 reported that 79 of the sentences in a human-generated abstract were a direct match to a sentence in a document. In contrast for multi-document summarization Copeck and Szpakowicz 2004 report that no more than 55 of the vocabulary contained in human-generated abstracts can be found in the given documents. Furthermore multiple human summaries on the same collection of documents often have little agreement. For example Hovy and Lin 2002 report that unigram overlap is around 40 . Teufel and van Halteren 2004 used a factoid agreement analysis of human summaries for a single document and concluded that a resulting consensus summary is stable only if 30-40 summaries are collected. In light of the strong evidence that .

Ngọc Thạch 55 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462336 61

Giới thiệu :Lập trình mã nguồn mở

14 25946 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11336 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10544 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9836 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8500 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7710 1789

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7243 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 180 3 24-12-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 157 1 24-12-2024

Báo cáo " Thẩm quyền quản lí nhà nước đối với hoạt động quảng cáo thực trạng và hướng hoàn thiện "

7 204 7 24-12-2024

Báo cáo nghiên cứu khoa học " NÂNG QUAN HỆ KINH TẾ THƯƠNG MẠI VIỆT NAM - TRUNG QUỐC LÊN TẦM CAO THỜI ĐẠI "

8 171 1 24-12-2024

Sáng kiến kinh nghiệm môn mỹ thuật

5 171 1 24-12-2024

Báo cáo lâm nghiệp: "Assessment of the effects of below-zero temperatures on photosynthesis and chlorophyll a fluorescence in leaf discs of Eucalyptus globulu"

4 140 0 24-12-2024

Determini prounoun 1

6 139 0 24-12-2024

THUẬT TOÁN LUYỆN KIM SONG SONG (Parallel Simulated Annealing Algorithms) GIẢI QUYẾT BÀI TOÁN MAX-SAT

41 125 1 24-12-2024

Norton Commander version 5 part 5

18 129 0 24-12-2024

Giáo trình hướng dẫn phân tích hệ thống xu pap xả trong động cơ đốt trong dưới tác dụng của nhiệt độ cao p5

5 108 6 24-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7710 1789

Ebook Chào con ba mẹ đã sẵn sàng

112 4406 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6275 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3836 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3918 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4703 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11336 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4502 490