Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
In this paper, we describe a rote extractor that learns patterns for finding semantic relationships in unrestricted text, with new procedures for pattern generalization and scoring. These include the use of partof-speech tags to guide the generalization, Named Entity categories inside the patterns, an edit-distance-based pattern generalization algorithm, and a pattern accuracy calculation procedure based on evaluating the patterns on several test corpora. | A Rote Extractor with Edit Distance-based Generalisation and Multi-corpora Precision Calculation Enrique Alfonseca12 Pablo Castells1 Manabu Okumura2 Maria Ruiz-Casado12 1 Computer Science Deptartment Univ. Autonoma de Madrid Enrique.Alfonseca@uam.es Pablo.Castells@uam.es Maria.Ruiz@uam.es 2Precision and Intelligence Laboratory Tokyo Institute of Technology enrique@lr.pi.titech.ac.jp oku@pi.titech.ac.jp maria@lr.pi.titech.ac.jp Abstract In this paper we describe a rote extractor that learns patterns for finding semantic relationships in unrestricted text with new procedures for pattern generalization and scoring. These include the use of part-of-speech tags to guide the generalization Named Entity categories inside the patterns an edit-distance-based pattern generalization algorithm and a pattern accuracy calculation procedure based on evaluating the patterns on several test corpora. In an evaluation with 14 entities the system attains a precision higher than 50 for half of the relationships considered. 1 Introduction Recently there is an increasing interest in automatically extracting structured information from large corpora and in particular from the Web Craven et al. 1999 . Because of the difficulty of collecting annotated data several procedures have been described that can be trained on unannotated textual corpora Riloff and Schmelzenbach 1998 Soderland 1999 Mann and Yarowsky 2005 . An interesting approach is that of rote extractors Brin 1998 Agichtein and Gravano 2000 Ravichandran and Hovy 2002 which look for textual contexts that happen to convey a certain relationship between two concepts. In this paper we describe some contributions to the training of Rote extractors including a procedure for generalizing the patterns and a more complex way of calculating their accuracy. We first introduce the general structure of a rote extractor and its limitations. Next we describe the proposed modifications Sections 2 3 and 4 and the evaluation performed Section 5 . .