Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
In this paper we explore the power of surface text patterns for open-domain question answering systems. In order to obtain an optimal set of patterns, we have developed a method for learning such patterns automatically. A tagged corpus is built from the Internet in a bootstrapping process by providing a few hand-crafted examples of each question type to Altavista. Patterns are then automatically extracted from the returned documents and standardized. We calculate the precision of each pattern, and the average precision for each question type. . | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 41-47. Learning Surface Text Patterns for a Question Answering System Deepak Ravichandran and Eduard Hovy Information Sciences Institute University of Southern California 4676 Admiralty Way Marina del Rey CA 90292-6695 USA ravichan hovy @isi.edu Abstract In this paper we explore the power of surface text patterns for open-domain question answering systems. In order to obtain an optimal set of patterns we have developed a method for learning such patterns automatically. A tagged corpus is built from the Internet in a bootstrapping process by providing a few hand-crafted examples of each question type to Altavista. Patterns are then automatically extracted from the returned documents and standardized. We calculate the precision of each pattern and the average precision for each question type. These patterns are then applied to find answers to new questions. Using the TREC-10 question set we report results for two cases answers determined from the TREC-10 corpus and from the web. 1 Introduction Most of the recent open domain questionanswering systems use external knowledge and tools for answer pinpointing. These may include named entity taggers WordNet parsers hand-tagged corpora and ontology lists Srihari and Li 00 Harabagiu et al. 01 Hovy et al. 01 Prager et al. 01 . However at the recent TREC-10 QA evaluation Voorhees 01 the winning system used just one resource a fairly extensive list of surface patterns Soubbotin and Soubbotin 01 . The apparent power of such patterns surprised many. We therefore decided to investigate their potential by acquiring patterns automatically and to measure their accuracy. It has been noted in several QA systems that certain types of answer are expressed using characteristic phrases Lee et al. 01 Wang et al. 01 . For example for BIRTHDATEs with questions like When was X born typical answers are Mozart was born in 1756.