TAILIEUCHUNG - Báo cáo khoa học: "Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction"

Among major categories of named entities (NEs, which in this paper refer to entity names, excluding the MUC time and numerical NEs), company and product names are often trademarked or uniquely registered, and hence less subject to name ambiguity. This paper focuses on cross-document disambiguation of person names. Previous research for cross-document name disambiguation applies vector space model (VSM) for context similarity, only using co-occurring words [Bagga & Baldwin 1998]. | Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction Cheng Niu Wei Li and Rohini K. Srihari Cymfony Inc. 600 Essjay Road Williamsville NY 14221 USA. cniu wei rohini @ Abstract It is fairly common that different people are associated with the same name. In tracking person entities in a large document pool it is important to determine whether multiple mentions of the same name across documents refer to the same entity or not. Previous approach to this problem involves measuring context similarity only based on co-occurring words. This paper presents a new algorithm using information extraction support in addition to co-occurring words. A learning scheme with minimal supervision is developed within the Bayesian framework. Maximum entropy modeling is then used to represent the probability distribution of context similarities based on heterogeneous features. Statistical annealing is applied to derive the final entity coreference chains by globally fitting the pairwise context similarities. Benchmarking shows that our new approach significantly outperforms the existing algorithm by 25 percentage points in overall F-measure. 1 Introduction Cross document name disambiguation is required for various tasks of knowledge discovery from textual documents such as entity tracking link discovery information fusion and event tracking. This task is part of the co-reference task if two mentions of the same name refer to same different entities by definition they should should not be co-referenced. As far as names are concerned co-reference consists of two sub-tasks i name disambiguation to handle the problem of different entities happening to use the same name ii alias association to handle the problem of the same entity using multiple names aliases . Message Understanding Conference MUC community has established within-document coreference standards MUC-7 1998 . Compared with within-document name disambiguation which

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.