TAILIEUCHUNG - Báo cáo khoa học: "Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction"

Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. | Can Document Selection Help Semi-supervised Learning A Case Study On Event Extraction Shasha Liao Ralph Grishman Computer Science Department New York University liaoss@ grishman@ Abstract Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents but this is often not enough. In this paper we present a novel self-training strategy which uses Information Retrieval IR to collect a cluster of related documents as the resource for bootstrapping. Also based on the particular characteristics of this corpus global inference is applied to provide more confident and informative data selection. We compare this approach to self-training on a normal newswire corpus and show that IR can provide a better corpus for bootstrapping and that global inference can further improve instance selection. We obtain gains of in trigger labeling and in role labeling through IR and an additional in trigger labeling and in role labeling by applying global inference. 1 Introduction The goal of event extraction is to identify instances of a class of events in text. In addition to identifying the event itself it also identifies all of the participants and attributes of each event these are the entities that are involved in that event. The same event might be presented in various expressions and an expression might represent different events in different contexts. Moreover for each event type the event participants and attributes may also appear in multiple forms and exemplars of the different forms may be required. Thus event extraction is a difficult task and requires substantial training data. However annotating events for training is a tedious task. Annotators need to read the whole sentence possibly several sentences to decide whether there is a specific event or not and then need to identify the event participants like Agent and Patient and attributes like place .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.