TAILIEUCHUNG - Báo cáo khoa học: "Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models "

Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. . | Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models Sameer Singh Amarnag Subramanya Fernando Pereira Andrew McCallum Department of Computer Science University of Massachusetts Amherst MA 01002 t Google Research Mountain View CA 94043 sameer@ asubram@ pereira@ mccallum@ Abstract Cross-document coreference the task of grouping all the mentions of each entity in a document collection arises in information extraction and automated knowledge base construction. For large collections it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas a a distributed inference technique that uses parallelism to enable large scale processing and b a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas we constructed a labeled corpus of million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy with error reduction of 38 on this large dataset demonstrating the scalability of our approach. 1 Introduction Given a collection of mentions of entities extracted from a body of text coreference or entity resolution consists of clustering the mentions such that two mentions belong to the same cluster if and only if they refer to the same entity. Solutions to this problem are important in semantic analysis and knowledge discovery tasks Blume 2005 Mayfield et al. 2009 . While significant progress has been made in within-document coreference Ng 2005 Culotta et al. 2007 Haghighi and Klein 2007 Bengston and Roth 2008 Haghighi and Klein 793 2009 Haghighi and Klein 2010 the larger problem of cross-document coreference has not received as much attention. Unlike

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.