TAILIEUCHUNG - Báo cáo khoa học: "Compensating for Annotation Errors in Training a Relation Extractor"

The well-studied supervised Relation Extraction algorithms require training data that is accurate and has good coverage. To obtain such a gold standard, the common practice is to do independent double annotation followed by adjudication. This takes significantly more human effort than annotation done by a single annotator. We do a detailed analysis on a snapshot of the ACE 2005 annotation files to understand the differences between single-pass annotation and the more expensive nearly three-pass process, and then propose an algorithm that learns from the much cheaper single-pass annotation and achieves a performance on a par with the extractor trained. | Compensating for Annotation Errors in Training a Relation Extractor Bonan Min New York University 715 Broadway 7th floor New York NY 10003 USA min@ Abstract The well-studied supervised Relation Extraction algorithms require training data that is accurate and has good coverage. To obtain such a gold standard the common practice is to do independent double annotation followed by adjudication. This takes significantly more human effort than annotation done by a single annotator. We do a detailed analysis on a snapshot of the ACE 2005 annotation files to understand the differences between single-pass annotation and the more expensive nearly three-pass process and then propose an algorithm that learns from the much cheaper single-pass annotation and achieves a performance on a par with the extractor trained on multi-pass annotated data. Furthermore we show that given the same amount of human labor the better way to do relation annotation is not to annotate with high-cost quality assurance but to annotate more. 1. Introduction Relation Extraction aims at detecting and categorizing semantic relations between pairs of entities in text. It is an important NLP task that has many practical applications such as answering factoid questions building knowledge bases and improving web search. Supervised methods for relation extraction have been studied extensively since rich annotated linguistic resources . the Automatic Content Extraction1 ACE training corpus were released. We will give a summary of related methods in section 2. Those methods rely on accurate and complete annotation. To obtain high quality annotation the common wisdom is to let 1 http iad mig tests ace Ralph Grishman New York University 715 Broadway 7th floor New York NY 10003 USA grishman@ two annotators independently annotate a corpus and then asking a senior annotator to adjudicate the disagreements2. This annotation procedure roughly requires 3 passes3 over the same .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.