Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We investigate the controversial issue about the upper bound of interjudge agreement in the use of a low-level grammatical representation. Pessimistic views suggest that several percent of words in running text are undecidable in terms of part-of-speech categories. Our experiments with 55kW data give reason for optimism: linguists with only 30 hours' training apply the EngCG-2 morphological tags with almost 100% interjudge agreement. first by having two (or more) linguists analyse the same text independently by using the same grammatical representation, and then identifying differences of analysis by automatically comparing the analysed text versions with each other and finally. | Proceedings of EACL 99 An experiment on the upper bound of interjudge agreement the case of tagging Atro Voutilainen Research Unit for Multilingual Language Technology P.O. Box 4 FIN-00014 University of Helsinki Finland Atro.Voutilainen@ling.Helsinki.FI Abstract We investigate the controversial issue about the upper bound of interjudge agreement in the use of a low-level grammatical representation. Pessimistic views suggest that several percent of words in running text are undecidable in terms of part-of-speech categories. Our experiments with 55kW data give reason for optimism linguists with only 30 hours training apply the EngCG-2 morphological tags with almost 100 interjudge agreement. 1 Orientation Linguistic analysers are developed for assigning linguistic descriptions to linguistic utterances. Linguistic descriptions are based on a fixed inventory of descriptors plus their usage principles in short a grammatical representation specified by linguists for the specific kind of analysis - e.g. morphological analysis tagging syntax discourse structure - that the program should perform. Because automatic linguistic analysis generally is a very difficult problem various methods for evaluating theữ success have been used. One such is based on the degree of correctness of the analysis provided e.g. the percentage of linguistic tokens in the text analysed that receives the appropriate description relative to analyses provided independently of the program by competent linguists ideally not involved in the development of the analyser itself. Now use of benchmark corpora like this turns out to be problematic because arguments have been made to the effect that linguists themselves make erroneous and inconsistent analyses. Unintentional mistakes due e.g. to slips of attention are obviously unavoidable but these errors can largely be identified by the double-blind method first by having two or more linguists analyse the same text independently by using the same grammatical .