Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
The well-studied supervised Relation Extraction algorithms require training data that is accurate and has good coverage. To obtain such a gold standard, the common practice is to do independent double annotation followed by adjudication. This takes significantly more human effort than annotation done by a single annotator. We do a detailed analysis on a snapshot of the ACE 2005 annotation files to understand the differences between single-pass annotation and the more expensive nearly three-pass process, and then propose an algorithm that learns from the much cheaper single-pass annotation and achieves a performance on a par with the extractor trained. | Compensating for Annotation Errors in Training a Relation Extractor Bonan Min New York University 715 Broadway 7th floor New York NY 10003 USA min@cs.nyu.edu Abstract The well-studied supervised Relation Extraction algorithms require training data that is accurate and has good coverage. To obtain such a gold standard the common practice is to do independent double annotation followed by adjudication. This takes significantly more human effort than annotation done by a single annotator. We do a detailed analysis on a snapshot of the ACE 2005 annotation files to understand the differences between single-pass annotation and the more expensive nearly three-pass process and then propose an algorithm that learns from the much cheaper single-pass annotation and achieves a performance on a par with the extractor trained on multi-pass annotated data. Furthermore we show that given the same amount of human labor the better way to do relation annotation is not to annotate with high-cost quality assurance but to annotate more. 1. Introduction Relation Extraction aims at detecting and categorizing semantic relations between pairs of entities in text. It is an important NLP task that has many practical applications such as answering factoid questions building knowledge bases and improving web search. Supervised methods for relation extraction have been studied extensively since rich annotated linguistic resources e.g. the Automatic Content Extraction1 ACE training corpus were released. We will give a summary of related methods in section 2. Those methods rely on accurate and complete annotation. To obtain high quality annotation the common wisdom is to let 1 http www.itl.nist.gov iad mig tests ace Ralph Grishman New York University 715 Broadway 7th floor New York NY 10003 USA grishman@cs.nyu.edu two annotators independently annotate a corpus and then asking a senior annotator to adjudicate the disagreements2. This annotation procedure roughly requires 3 passes3 over the same .