TAILIEUCHUNG - Báo cáo khoa học: "Methods for the Qualitative Evaluation of Lexical Association Measures"

This paper presents methods for a qualitative, unbiased comparison of lexical association measures and the results we have obtained for adjective-noun pairs and preposition-noun-verb triples extracted from German corpora. In our approach, we compare the entire list of candidates, sorted according to the particular measures, to a reference set of manually identified “true positives”. We also show how estimates for the very large number of hapaxlegomena and double occurrences can be inferred from random samples. . | Methods for the Qualitative Evaluation of Lexical Association Measures Stefan Evert IMS University of Stuttgart Azenbergstr. 12 D-70174 Stuttgart Germany evert@ Brigitte Krenn Austrian Research Institute for Artificial Intelligence OFAI Schottengasse 3 A-1010 Vienna Austria brigitte@ Abstract This paper presents methods for a qualitative unbiased comparison of lexical association measures and the results we have obtained for adjective-noun pairs and preposition-noun-verb triples extracted from German corpora. In our approach we compare the entire list of candidates sorted according to the particular measures to a reference set of manually identified true positives . We also show how estimates for the very large number of hapaxlegomena and double occurrences can be inferred from random samples. 1 Introduction In computational linguistics a variety of statistical measures have been proposed for identifying lexical associations between words in lexical tuples extracted from text corpora. Methods used range from pure frequency counts to information theoretic measures and statistical significance tests. While the mathematical properties of those measures have been extensively discussed 1 the strategies employed for evaluating the identification results are far from adequate. Another crucial but still unsolved issue in statistical collocation identification is the treatment of low-frequency data. In this paper we first specify requirements for a qualitative evaluation of lexical association mea- See for instance Manning and Schutze 1999 chapter 5 Kilgarriff 1996 and Pedersen 1996 . sures AMs . Based on these requirements we introduce an experimentation procedure and discuss the evaluation results for a number of widely used AMs. Finally methods and strategies for handling low-frequency data are suggested. The measures2 - Mutual Information MI Church and Hanks 1989 the log-likelihood ratio test Dunning 1993 two statistical tests t-test .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.