Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We describe a method for enriching the output of a parser with information available in a corpus. The method is based on graph rewriting using memorybased learning, applied to dependency structures. This general framework allows us to accurately recover both grammatical and semantic information as well as non-local dependencies. It also facilitates dependency-based evaluation of phrase structure parsers. Our method is largely independent of the choice of parser and corpus, and shows state of the art performance. . | Enriching the Output of a Parser Using Memory-Based Learning Valentin Jijkoun and Maarten de Rijke Informatics Institute University of Amsterdam jijkoun mdr @science.uva.nl Abstract We describe a method for enriching the output of a parser with information available in a corpus. The method is based on graph rewriting using memorybased learning applied to dependency structures. This general framework allows us to accurately recover both grammatical and semantic information as well as non-local dependencies. It also facilitates dependency-based evaluation of phrase structure parsers. Our method is largely independent of the choice of parser and corpus and shows state of the art performance. 1 Introduction We describe a method to automatically enrich the output of parsers with information that is present in existing treebanks but usually not produced by the parsers themselves. Our motivation is two-fold. First and most important for applications requiring information extraction or semantic interpretation of text it is desirable to have parsers produce grammatically and semantically rich output. Second to facilitate dependency-based comparison and evaluation of different parsers their outputs may need to be transformed into specific rich dependency formalisms. The method allows us to automatically transform the output of a parser into structures as they are annotated in a dependency treebank. For a phrase structure parser we first convert the produced phrase structures into dependency graphs in a straightforward way and then apply a sequence of graph transformations changing dependency labels adding new nodes and adding new dependencies. A memory-based learner trained on a dependency corpus is used to detect which modifications should be performed. For a dependency corpus derived from the Penn Treebank and the parsers we considered these transformations correspond to adding Penn functional tags e.g. -SBJ -TMP -LOC empty nodes e.g. NP PRO and non-local dependencies .