TAILIEUCHUNG - Báo cáo khoa học: "Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web"

This paper presents an unsupervised relation extraction method for discovering and enhancing relations in which a specified concept in Wikipedia participates. Using respective characteristics of Wikipedia articles and Web corpus, we develop a clustering approach based on combinations of patterns: dependency patterns from dependency analysis of texts in Wikipedia, and surface patterns generated from highly redundant information related to the Web. Evaluations of the proposed approach on two different domains demonstrate the superiority of the pattern combination over existing approaches. . | Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web Yulan Yan Naoaki Okazaki Yutaka Matsuo Zhenglu Yang and Mitsuru Ishizuka The University of Tokyo 7-3-1 Hongo Bunkyo-ku Tokyo 113-8656 Japan yulan@ okazaki@ matsuo@ yangzl@ ishizuka@ Abstract This paper presents an unsupervised relation extraction method for discovering and enhancing relations in which a specified concept in Wikipedia participates. Using respective characteristics of Wikipedia articles and Web corpus we develop a clustering approach based on combinations of patterns dependency patterns from dependency analysis of texts in Wikipedia and surface patterns generated from highly redundant information related to the Web. Evaluations of the proposed approach on two different domains demonstrate the superiority of the pattern combination over existing approaches. Fundamentally our method demonstrates how deep linguistic patterns contribute complementarily with Web surface patterns to the generation of various relations. 1 Introduction Machine learning approaches for relation extraction tasks require substantial human effort particularly when applied to the broad range of documents entities and relations existing on the Web. Even with semi-supervised approaches which use a large unlabeled corpus manual construction of a small set of seeds known as true instances of the target entity or relation is susceptible to arbitrary human decisions. Consequently a need exists for development of semantic information-retrieval algorithms that can operate in a manner that is as unsupervised as possible. Currently the leading methods in unsupervised information extraction collect redundancy information from a local corpus or use the Web as a corpus Pantel and Pennacchiotti 2006 Banko et al. 2007 Bollegala et al. 2007 Fan et al. 2008 Davidov and Rappoport 2008 . The standard process is to

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.