TAILIEUCHUNG - Báo cáo khoa học: "Experiments in Graph-based Semi-Supervised Learning Methods for Class-Instance Acquisition"

Graph-based semi-supervised learning (SSL) algorithms have been successfully used to extract class-instance pairs from large unstructured and structured text collections. However, a careful comparison of different graph-based SSL algorithms on that task has been lacking. We compare three graph-based SSL algorithms for class-instance acquisition on a variety of graphs constructed from different domains. We find that the recently proposed MAD algorithm is the most effective. | Experiments in Graph-based Semi-Supervised Learning Methods for Class-Instance Acquisition Partha Pratim Talukdar Search Labs Microsoft Research Mountain View CA 94043 partha@ Fernando Pereira Google Inc. Mountain View CA 94043 pereira@ Abstract Graph-based semi-supervised learning SSL algorithms have been successfully used to extract class-instance pairs from large unstructured and structured text collections. However a careful comparison of different graph-based SSL algorithms on that task has been lacking. We compare three graph-based SSL algorithms for class-instance acquisition on a variety of graphs constructed from different domains. We find that the recently proposed MAD algorithm is the most effective. We also show that class-instance extraction can be significantly improved by adding semantic information in the form of instance-attribute edges derived from an independently developed knowledge base. All of our code and data will be made publicly available to encourage reproducible research in this area. 1 Introduction Traditionally named-entity recognition NER has focused on a small number of broad classes such as person location organization. However those classes are too coarse to support important applications such as sense disambiguation semantic matching and textual inference in Web search. For those tasks we need a much larger inventory of specific classes and accurate classification of terms into those classes. While supervised learning methods perform well for traditional NER they are impractical for fine-grained classification because sufficient labeled data to train classifiers for all the classes is unavailable and would be very expensive to obtain. Research carried out while at the University of Pennsylvania Philadelphia PA USA. To overcome these difficulties seed-based information extraction methods have been developed over the years Hearst 1992 Riloff and Jones 1999 Etzioni et al. 2005 Talukdar et al. 2006 Van Durme and

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.