Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
A set of labeled classes of instances is extracted from text and linked into an existing conceptual hierarchy. Besides a significant increase in the coverage of the class labels assigned to individual instances, the resulting resource of labeled classes is more effective than similar data derived from the manually-created Wikipedia, in the task of attribute extraction over conceptual hierarchies. | Outclassing Wikipedia in Open-Domain Information Extraction Weakly-Supervised Acquisition of Attributes over Conceptual Hierarchies Marius Pa ca Google Inc. Mountain View California 94043 mars@google.com Abstract A set of labeled classes of instances is extracted from text and linked into an existing conceptual hierarchy. Besides a significant increase in the coverage of the class labels assigned to individual instances the resulting resource of labeled classes is more effective than similar data derived from the manually-created Wikipedia in the task of attribute extraction over conceptual hierarchies. 1 Introduction Motivation Sharing basic intuitions and longterm goals with other tasks within the area of Webbased information extraction Banko and Etzioni 2008 Davidov and Rappoport 2008 the task of acquiring class attributes relies on unstructured text available on the Web as a data source for extracting generally-useful knowledge. In the case of attribute extraction the knowledge to be extracted consists in quantifiable properties of various classes e.g. top speed body style and gas mileage for the class of sports cars . Existing work on large-scale attribute extraction focuses on producing ranked lists of attributes for target classes of instances available in the form of flat sets of instances e.g. ferrari modena porsche carrera gt sharing the same class label e.g. sports cars . Independently of how the input target classes are populated with instances manually Pasca 2007 or automatically Pasca and Van Durme 2008 and what type of textual data source is used for extracting attributes Web documents or query logs the extraction of attributes operates at a lexical rather than semantic level. Indeed the class labels of the target classes may be not more than text surface strings e.g. sports cars or even artificially-created labels e.g. CartoonChar in lieu of cartoon characters . Moreover although it is commonly accepted that sports cars are also cars which in turn .