Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We extend the classical single-task active learning (AL) approach. In the multi-task active learning (MTAL) paradigm, we select examples for several annotation tasks rather than for a single one as usually done in the context of AL. We introduce two MTAL metaprotocols, alternating selection and rank combination, and propose a method to implement them in practice. We experiment with a twotask annotation scenario that includes named entity and syntactic parse tree annotations on three different corpora. . | Multi-Task Active Learning for Linguistic Annotations Roi Reichart1 Katrin Tomanek2 Udo Hahn2 Ari Rappoport1 institute of Computer Science Hebrew University of Jerusalem Israel roiri arir @cs.huj i.ac.il Jena University Language Information Engineering Julie Lab Friedrich-Schiller-Universitat Jena Germany katrin.tomanek udo.hahn @uni-jena.de Abstract We extend the classical single-task active learning AL approach. In the multi-task active learning MTAL paradigm we select examples for several annotation tasks rather than for a single one as usually done in the context of AL. We introduce two MTAL metaprotocols alternating selection and rank combination and propose a method to implement them in practice. We experiment with a two-task annotation scenario that includes named entity and syntactic parse tree annotations on three different corpora. MTAL outperforms random selection and a stronger baseline onesided example selection in which one task is pursued using AL and the selected examples are provided also to the other task. 1 Introduction Supervised machine learning methods have successfully been applied to many NLP tasks in the last few decades. These techniques have demonstrated their superiority over both hand-crafted rules and unsupervised learning approaches. However they require large amounts of labeled training data for every level of linguistic processing e.g. POS tags parse trees or named entities . When when domains and text genres change e.g. moving from commonsense newspapers to scientific biology journal articles extensive retraining on newly supplied training material is often required since different domains may use different syntactic structures as well as different semantic classes entities and relations . Both authors contributed equally to this work. Consequently with an increasing coverage of a wide variety of domains in human language technology HLT systems we can expect a growing need for manual annotations to support many kinds of .