Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
The paper describes two parsing schemes: a shallow approach based on machine learning and a cascaded finite-state parser with a hand-crafted grammar. It discusses several ways to combine them and presents evaluation results for the two individual approaches and their combination. An underspecification scheme for the output of the finite-state parser is introduced and shown to improve performance. | Combining Deep and Shallow Approaches in Parsing German Michael Schiehlen Institute for Computational Linguistics University of Stuttgart Azenbergstr. 12 D-70174 Stuttgart mike@adler.ims.uni-stuttgart.de Abstract The paper describes two parsing schemes a shallow approach based on machine learning and a cascaded finite-state parser with a hand-crafted grammar. It discusses several ways to combine them and presents evaluation results for the two individual approaches and their combination. An underspecification scheme for the output of the finite-state parser is introduced and shown to improve performance. 1 Introduction In several areas of Natural Language Processing a combination of different approaches has been found to give the best results. It is especially rewarding to combine deep and shallow systems where the former guarantees interpretability and high precision and the latter provides robustness and high recall. This paper investigates such a combination consisting of an n-gram based shallow parser and a cascaded finite-state parser1 with hand-crafted grammar and morphological checking. The respective strengths and weaknesses of these approaches are brought to light in an in-depth evaluation on a treebank of German newspaper texts Skut et al. 1997 containing ca. 340 000 tokens in 19 546 sentences. The evaluation format chosen dependency tuples is used as the common denominator of the systems Although not everyone would agree that finite-state parsers constitute a deep approach to parsing they still are knowledge-based require efforts of grammar-writing a complex linguistic lexicon manage without training data etc. in building a hybrid parser with improved performance. An underspecification scheme allows the finite-state parser partially ambiguous output. It is shown that the other parser can in most cases successfully disambiguate such information. Section 2 discusses the evaluation format adopted dependency structures its advantages but also some of its .