Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper proposes an approach to full parsing suitable for Information Extraction from texts. Sequences of cascades of rules deterministically analyze the text, building unambiguous structures. Initially basic chunks are analyzed; then argumental relations are recognized; finally modifier attachment is performed and the global parse tree is built. The approach was proven to work for three languages and different domains. It was implemented in the IE module of FACILE, a EU project for multilingual text classification and !E. . | Proceedings of EACL 99 Full Text Parsing using Cascades of Rules an Information Extraction Perspective Fabio Ciravegna and Alberto Lavelli ITC-irst Centro per la Ricerca Scientifica e Tecnologica via Sommarive 18 38050 Povo TN Italy cirave lavelli @irst.itc.it Abstract This paper proposes an approach to full parsing suitable for Information Extraction from texts. Sequences of cascades of rules deterministically analyze the text building unambiguous structures. Initially basic chunks are analyzed then ar-gumental relations are recognized finally modifier attachment is performed and the global parse tree is built. The approach was proven to work for three languages and different domains. It was implemented in the IE module of FACILE a EU project for multilingual text classification and IE. 1 Introduction Most successful approaches in IE Appelt et al. 1993 Grishman 1995 Aone et al. 1998 make a very poor use of syntactic information. They are generally based on shallow parsing for the analysis of non recursive NPs and Verbal Groups VGs . After such step regular patterns are applied in order to trigger primitive actions that fill template s meta-rules are applied to patterns to cope with different syntactic clausal forms e.g. passive forms . If we consider the most complex MUC-7 task i.e. the Scenario Template task MUC7 1998 the current technology is not able to provide results near an operational level expected F l 75 the best system scored about 50 Aone et al. 1998 . One of the limitations of the current technology is the inability to extract and to represent syntactic relations among elements in the sentence i.e. grammatical functions and thematic roles. Scenario Template recognition needs the correct treatment of syntactic relations at both sentence and text level Aone et al. 1998 . Full parsing systems are generally able to correctly model syntactic relations but they tend to be slow because of huge search spaces and brittle because of gaps in the grammar . The use