Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
To facilitate the use of syntactic information in the study of child language acquisition, a coding scheme for Grammatical Relations (GRs) in transcripts of parent-child dialogs has been proposed by Sagae, MacWhinney and Lavie (2004). We discuss the use of current NLP techniques to produce the GRs in this annotation scheme. By using a statistical parser (Charniak, 2000) and memorybased learning tools for classification (Daelemans et al., 2004), we obtain high precision and recall of several GRs. . | Automatic Measurement of Syntactic Development in Child Language Kenji Sagae and Alon Lavie Language Technologies Institute Carnegie Mellon University Pittsburgh PA 15232 sagae alavie @cs.cmu.edu Brian MacWhinney Department of Psychology Carnegie Mellon University Pittsburgh PA 15232 macw@cmu.edu Abstract To facilitate the use of syntactic information in the study of child language acquisition a coding scheme for Grammatical Relations GRs in transcripts of parent-child dialogs has been proposed by Sagae MacWhinney and Lavie 2004 . We discuss the use of current NLP techniques to produce the GRs in this annotation scheme. By using a statistical parser Charniak 2000 and memorybased learning tools for classification Daelemans et al. 2004 we obtain high precision and recall of several GRs. We demonstrate the usefulness of this approach by performing automatic measurements of syntactic development with the Index of Productive Syntax Scarborough 1990 at similar levels to what child language researchers compute manually. 1 Introduction Automatic syntactic analysis of natural language has benefited greatly from statistical and corpus-based approaches in the past decade. The availability of syntactically annotated data has fueled the development of high quality statistical parsers which have had a large impact in several areas of human language technologies. Similarly in the study of child language the availability of large amounts of electronically accessible empirical data in the form of child language transcripts has been shifting much of the research effort towards a corpus-based mentality. However child language researchers have only recently begun to utilize modern NLP techniques for syntactic analysis. Although it is now common for researchers to rely on automatic morphosyntactic analyses of transcripts to obtain part-of-speech and morphological analyses their use of syntactic parsing is rare. Sagae MacWhinney and Lavie 2004 have proposed a syntactic annotation scheme