Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
In this paper, we analyze the impact of different automatic annotation methods on the performance of supervised approaches to the complex question answering problem (defined in the DUC-2007 main task). Huge amount of annotated or labeled data is a prerequisite for supervised training. The task of labeling can be accomplished either by humans or by computer programs. When humans are employed, the whole process becomes time consuming and expensive. | Do Automatic Annotation Techniques Have Any Impact on Supervised Complex Question Answering Yllias Chali University of Lethbridge Lethbridge AB Canada chali@cs.uleth.ca Sadid A. Hasan University of Lethbridge Lethbridge AB Canada hasan@cs.uleth.ca Shafiq R. Joty University of British Columbia Vancouver BC Canada rjoty@cs.ubc.ca Abstract In this paper we analyze the impact of different automatic annotation methods on the performance of supervised approaches to the complex question answering problem defined in the DUC-2007 main task . Huge amount of annotated or labeled data is a prerequisite for supervised training. The task of labeling can be accomplished either by humans or by computer programs. When humans are employed the whole process becomes time consuming and expensive. So in order to produce a large set of labeled data we prefer the automatic annotation strategy. We apply five different automatic annotation techniques to produce labeled data using ROUGE similarity measure Basic Element BE overlap syntactic similarity measure semantic similarity measure and Extended String Subsequence Kernel ESSK . The representative supervised methods we use are Support Vector Machines SVM Conditional Random Fields CRF Hidden Markov Models HMM and Maximum Entropy MaxEnt . Evaluation results are presented to show the impact. 1 Introduction In this paper we consider the complex question answering problem defined in the DUC-2007 main task1. We focus on an extractive approach of summarization to answer complex questions where a subset of the sentences in the original documents are chosen. For supervised learning methods huge amount of annotated or labeled data sets are obviously required as a precondition. The decision as to whether a sentence is important enough 1http www-nlpir.nist.gov projects duc duc2007 to be annotated can be taken either by humans or by computer programs. When humans are employed in the process producing such a large labeled corpora becomes time consuming