TAILIEUCHUNG - Báo cáo khoa học: "Evaluating language understanding accuracy with respect to objective outcomes in a dialogue system"

It is not always clear how the differences in intrinsic evaluation metrics for a parser or classifier will affect the performance of the system that uses it. We investigate the relationship between the intrinsic evaluation scores of an interpretation component in a tutorial dialogue system and the learning outcomes in an experiment with human users. Following the PARADISE methodology, we use multiple linear regression to build predictive models of learning gain, an important objective outcome metric in tutorial dialogue. We show that standard intrinsic metrics such as F-score alone do not predict the outcomes well. . | Evaluating language understanding accuracy with respect to objective outcomes in a dialogue system Myroslava O. Dzikovska and Peter Bell and Amy Isard and Johanna D. Moore Institute for Language Cognition and Computation School of Informatics University of Edinburgh United Kingdom @ Abstract It is not always clear how the differences in intrinsic evaluation metrics for a parser or classifier will affect the performance of the system that uses it. We investigate the relationship between the intrinsic evaluation scores of an interpretation component in a tutorial dialogue system and the learning outcomes in an experiment with human users. Following the PARADISE methodology we use multiple linear regression to build predictive models of learning gain an important objective outcome metric in tutorial dialogue. We show that standard intrinsic metrics such as F-score alone do not predict the outcomes well. However we can build predictive performance functions that account for up to 50 of the variance in learning gain by combining features based on standard evaluation scores and on the confusion matrix entries. We argue that building such predictive models can help us better evaluate performance of NLP components that cannot be distinguished based on F-score alone and illustrate our approach by comparing the current interpretation component in the system to a new classifier trained on the evaluation data. 1 Introduction Much of the work in natural language processing relies on intrinsic evaluation computing standard evaluation metrics such as precision recall and F-score on the same data set to compare the performance of different approaches to the same NLP problem. However once a component such as a parser is included in a larger system it is not always clear that improvements in intrinsic evaluation scores will translate into improved overall system performance. Therefore extrinsic or task-based evaluation can be used to

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.