Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper presents a novel algorithm for the acquisition of Information Extraction patterns. The approach makes the assumption that useful patterns will have similar meanings to those already identified as relevant. Patterns are compared using a variation of the standard vector space model in which information from an ontology is used to capture semantic similarity. Evaluation shows this algorithm performs well when compared with a previously reported document-centric approach. | A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Department of Computer Science University of Sheffield Sheffield S1 4DP UK marks m.greenwood@dcs.shef.ac.uk Abstract This paper presents a novel algorithm for the acquisition of Information Extraction patterns. The approach makes the assumption that useful patterns will have similar meanings to those already identified as relevant. Patterns are compared using a variation of the standard vector space model in which information from an ontology is used to capture semantic similarity. Evaluation shows this algorithm performs well when compared with a previously reported document-centric approach. 1 Introduction Developing systems which can be easily adapted to new domains with the minimum of human intervention is a major challenge in Information Extraction IE . Early IE systems were based on knowledge engineering approaches but suffered from a knowledge acquisition bottleneck. For example Lehnert et al. 1992 reported that their system required around 1 500 person-hours of expert labour to modify for a new extraction task. One approach to this problem is to use machine learning to automatically learn the domain-specific information required to port a system Riloff 1996 . Yangarber et al. 2000 proposed an algorithm for learning extraction patterns for a small number of examples which greatly reduced the burden on the application developer and reduced the knowledge acquisition bottleneck. Weakly supervised algorithms which bootstrap from a small number of examples have the advantage of requiring only small amounts of annotated data which is often difficult and time-consuming to produce. However this also means that there are fewer examples of the patterns to be learned making the learning task more challenging. Providing the learning algorithm with access to additional knowledge can compensate for the limited number of annotated examples. This paper presents a novel weakly supervised .