Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
This paper presents a new approach to partial parsing of context-free structures. The approach is based on Markov Models. Each layer of the resulting structure is represented by its own Markov Model, and output of a lower layer is passed as input to the next higher layer. An empirical evaluation of the method yields very good results for NP/PP chunking of German newspaper texts. | Proceedings of EACL 99 Cascaded Markov Models Thorsten Brants Universitãt des Saarlandes Computerlinguistik D-66041 Saarbriicken Germany thorstenỗcoli.uni-sb.de Abstract This paper presents a new approach to partial parsing of context-free structures. The approach is based on Markov Models. Each layer of the resulting structure is represented by its own Markov Model and output of a lower layer is passed as input to the next higher layer. An empirical evaluation of the method yields very good results for NP PP chunking of German newspaper texts. 1 Introduction Partial parsing often referred to as chunking is used as a pre-processing step before deep analysis or as shallow processing for applications like information retrieval messsage extraction and text summarization. Chunking concentrates on constructs that can be recognized with a high degree of certainty. For several applications this type of information with high accuracy is more valuable than deep analysis with lower accuracy. We will present a new approach to partial parsing that uses Markov Models. The presented models are extensions of the part-of-speech tagging technique and are capable of emitting structure. They utilize context-free grammar rules and add left-to-right transitional context information. This type of model is used to facilitate the syntactic annotation of the NEGRA corpus of German newspaper texts Skut et al. 1997 . Part-of-speech tagging is the assignment of syntactic categories tags to words that occur in the processed text. Among others this task is efficiently solved with Markov Models. States of a Markov Model represent syntactic categories or tuples of syntactic categories and outputs represent words and punctuation Church 1988 DeRose 1988 and others . This technique of statistical part-of-speech tagging operates very suc cessfully and usually accuracy rates between 96 and 97 are reported for new unseen text. Brants et al. 1997 showed that the technique of statistical tagging can be .