TAILIEUCHUNG - Báo cáo khoa học: "The Problem with Kappa"

It is becoming clear that traditional evaluation measures used in Computational Linguistics (including Error Rates, Accuracy, Recall, Precision and F-measure) are of limited value for unbiased evaluation of systems, and are not meaningful for comparison of algorithms unless both the dataset and algorithm parameters are strictly controlled for skew (Prevalence and Bias). | The Problem with Kappa David M W Powers Centre for Knowledge Interaction Technology CSEM Flinders University Abstract It is becoming clear that traditional evaluation measures used in Computational Linguistics including Error Rates Accuracy Recall Precision and F-measure are of limited value for unbiased evaluation of systems and are not meaningful for comparison of algorithms unless both the dataset and algorithm parameters are strictly controlled for skew Prevalence and Bias . The use of techniques originally designed for other purposes in particular Receiver Operating Characteristics Area Under Curve plus variants of Kappa have been proposed to fill the void. This paper aims to clear up some of the confusion relating to evaluation by demonstrating that the usefulness of each evaluation method is highly dependent on the assumptions made about the distributions of the dataset and the underlying populations. The behaviour of a number of evaluation measures is compared under common assumptions. Deploying a system in a context which has the opposite skew from its validation set can be expected to approximately negate Fleiss Kappa and halve Cohen Kappa but leave Powers Kappa unchanged. For most performance evaluation purposes the latter is thus most appropriate whilst for comparison of behaviour Matthews Correlation is recommended. Introduction Research in Computational Linguistics usually requires some form of quantitative evaluation. A number of traditional measures borrowed from Information Retrieval Manning Schutze 1999 are in common use but there has been considerable critical evaluation of these measures themselves over the last decade or so Entwisle Powers 1998 Flach 2003 Ben-David. 2008 . Receiver Operating Analysis ROC has been advocated as an alternative by many and in particular has been used by Furnkranz and Flach 2005 Ben-David 2008 and Powers 2008 to better understand both learning algorithms relationship and the between the

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.