TAILIEUCHUNG - Báo cáo khoa học: "Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining"

This paper focuses on mining the hyponymy (or is-a) relation from large-scale, open-domain web documents. A nonlinear probabilistic model is exploited to model the correlation between sentences in the aggregation of pattern matching results. Based on the model, we design a set of evidence combination and propagation algorithms. | Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining Fan Zhang2 Shuming Shi1 Jing Liu2 Shuqi Sun3 Chin-Yew Lin1 1Microsoft Research Asia 2Nankai University China 3Harbin Institute of Technology China shumings cyl @ Abstract This paper focuses on mining the hyponymy or is-a relation from large-scale open-domain web documents. A nonlinear probabilistic model is exploited to model the correlation between sentences in the aggregation of pattern matching results. Based on the model we design a set of evidence combination and propagation algorithms. These significantly improve the result quality of existing approaches. Experimental results conducted on 500 million web pages and hypernym labels for 300 terms show over 20 performance improvement in terms of P@5 MAP and R-Precision. 1 Introduction An important task in text mining is the automatic extraction of entities and their lexical relations this has wide applications in natural language processing and web search. This paper focuses on mining the hyponymy or is-a relation from large-scale open-domain web documents. From the viewpoint of entity classification the problem is to automatically assign fine-grained class labels to terms. There have been a number of approaches Hearst 1992 Pantel Ravichandran 2004 Snow et al. 2005 Durme Pasca 2008 Talukdar et al. 2008 to address the problem. These methods typically exploited manually-designed or automatical- This work was performed when Fan Zhang and Shuqi Sun were interns at Microsoft Research Asia 1159 ly-learned patterns . NP such as NP NP like NP NP is a NP . Although some degree of success has been achieved with these efforts the results are still far from perfect in terms of both recall and precision. As will be demonstrated in this paper even by processing a large corpus of 500 million web pages with the most popular patterns we are not able to extract correct labels for many especially rare entities. Even for popular terms incorrect .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.