Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Many errors produced by unsupervised and semi-supervised relation extraction (RE) systems occur because of wrong recognition of entities that participate in the relations. This is especially true for systems that do not use separate named-entity recognition components, instead relying on general-purpose shallow parsing. Such systems have greater applicability, because they are able to extract relations that contain attributes of unknown types. However, this generality comes with the cost in accuracy. In this paper we show how to use corpus statistics to validate and correct the arguments of extracted relation instances, improving the overall RE performance . | Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web Benjamin Rosenfeld Information Systems HU School of Business Hebrew University Jerusalem Israel grurgrur@gmail.com Ronen Feldman Information Systems HU School of Business Hebrew University Jerusalem Israel ronen.feldman@huji.ac.il Abstract Many errors produced by unsupervised and semi-supervised relation extraction RE systems occur because of wrong recognition of entities that participate in the relations. This is especially true for systems that do not use separate named-entity recognition components instead relying on general-purpose shallow parsing. Such systems have greater applicability because they are able to extract relations that contain attributes of unknown types. However this generality comes with the cost in accuracy. In this paper we show how to use corpus statistics to validate and correct the arguments of extracted relation instances improving the overall RE performance. We test the methods on SRES - a self-supervised Web relation extraction system. We also compare the performance of corpus-based methods to the performance of validation and correction methods based on supervised NER components. 1 Introduction Information Extraction IE is the task of extracting factual assertions from text. Most IE systems rely on knowledge engineering or on machine learning to generate the task model that is subsequently used for extracting instances of entities and relations from new text. In the knowledge engineering approach the model usually in the form of extraction rules is created manually and in the machine learning approach the model is learned automatically from a manually labeled training set of documents. Both approaches require substantial human effort particularly when applied to the broad range of documents entities and relations on the Web. In order to minimize the manual effort necessary to build Web IE systems semisupervised and completely unsupervised .