TAILIEUCHUNG - Báo cáo khoa học: "Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models "

Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. . | Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models Sameer Singh Amarnag Subramanya Fernando Pereira Andrew McCallum Department of Computer Science University of Massachusetts Amherst MA 01002 t Google Research Mountain View CA 94043 sameer@ asubram@ pereira@ mccallum@ Abstract Cross-document coreference the task of grouping all the mentions of each entity in a document collection arises in information extraction and automated knowledge base construction. For large collections it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas a a distributed inference technique that uses parallelism to enable large scale processing and b a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas we constructed a labeled corpus of million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy with error reduction of 38 on this large dataset demonstrating the scalability of our approach. 1 Introduction Given a collection of mentions of entities extracted from a body of text coreference or entity resolution consists of clustering the mentions such that two mentions belong to the same cluster if and only if they refer to the same entity. Solutions to this problem are important in semantic analysis and knowledge discovery tasks Blume 2005 Mayfield et al. 2009 . While significant progress has been made in within-document coreference Ng 2005 Culotta et al. 2007 Haghighi and Klein 2007 Bengston and Roth 2008 Haghighi and Klein 793 2009 Haghighi and Klein 2010 the larger problem of cross-document coreference has not received as much attention. Unlike

Bạch Loan 81 11 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462079 59

Giới thiệu :Lập trình mã nguồn mở

14 23851 75

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11129 537

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10374 458

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9655 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8657 1150

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8363 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7947 2250

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6983 260

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6776 1611

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Sáng tạo trong thuật toán và lập trình với ngôn ngữ Pascal và C# Tập 2 - Chương 4

47 292 2 01-07-2024

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 173 1 01-07-2024

Báo cáo tốt nghiệp: Vận hành và bảo dưỡng trong MPLS

92 169 5 01-07-2024

Bài Tiểu Luận Chuyên Đề Tổ Chức Hoạt Động Nhận Thức Trong Dạy Học Vật Lý " Định Luật Ôm Cho Các Loại Đoạn Mạch Chứa Nguồn Điện"

10 181 3 01-07-2024

MẪU CHỨNG CHỈ QUẢN LÝ VŨ KHÍ, VẬT LIỆU NỔ, CCHT

1 145 0 01-07-2024

Báo cáo nghiên cứu nông nghiệp " Field control of pest fruit flies in Vietnam "

14 163 2 01-07-2024

Bảng màu theo chữ cái – V

11 121 1 01-07-2024

The Constituents of Medicinal Plants

185 140 0 01-07-2024

báo cáo hóa học:" A decade of modelling research yields considerable evidence for the importance of concurrency: a response to Sawers and Stillwaggon"

7 120 0 01-07-2024

Concluding interview 6

6 126 1 01-07-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7947 2250

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6776 1611

Ebook Chào con ba mẹ đã sẵn sàng

112 4025 1302

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5712 1196

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8657 1150

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3649 667

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3849 601

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4412 546

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11129 537

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4299 483