TAILIEUCHUNG - Báo cáo khoa học: "A Probabilistic Model for Canonicalizing Named Entity Mentions"

We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes (or parts of attributes). The model is novel in that it incorporates entity context, surface features, firstorder dependencies among attribute-parts, and a notion of noise. | A Probabilistic Model for Canonicalizing Named Entity Mentions Dani Yogatama Yanchuan Sim Noah A. Smith Language Technologies Institute Carnegie Mellon University PittsbUrgh PA 15213 UsA dyogatama ysim nasmith @ Abstract We present a statistical model for canonicalizing named entity mentions into a table whose rows represent entities and whose columns are attributes or parts of attributes . The model is novel in that it incorporates entity context surface features first-order dependencies among attribute-parts and a notion of noise. Transductive learning from a few seeds and a collection of mention tokens combines Bayesian inference and conditional estimation. We evaluate our model and its components on two datasets collected from political blogs and sports news finding that it outperforms a simple agglom-erative clustering approach and previous work. 1 Introduction Proper handling of mentions in text of real-world entities identifying and resolving them is a central part of many NLP applications. We seek an algorithm that infers a set of real-world entities from mentions in a text mapping each entity mention token to an entity and discovers general categories of words used in names . titles and last names . Here we use a probabilistic model to infer a structured representation of canonical forms of entity attributes through transductive learning from named entity mentions with a small number of seeds see Table 1 . The input is a collection of mentions found by a named entity recognizer along with their contexts and following Eisenstein et al. 2011 the output is a table in which entities are rows the number of which is not pre-specified and attribute words are organized into columns. This paper contributes a model that builds on the approach of Eisenstein et al. 2011 but also incorporates context of the mention to help with disambiguation and to allow mentions that do not share words to be merged liberally conditions against shape features which .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.