TAILIEUCHUNG - Báo cáo khoa học: "Assessing the Effect of Inconsistent Assessors on Summarization Evaluation"

National Institute of Standards and Technology We investigate the consistency of human assessors involved in summarization evaluation to understand its effect on system ranking and automatic evaluation techniques. Using Text Analysis Conference data, we measure annotator consistency based on human scoring of summaries for Responsiveness, Readability, and Pyramid scoring. | Assessing the Effect of Inconsistent Assessors on Summarization Evaluation Karolina Owczarzak National Institute of Standards and Technology Gaithersburg MD 20899 Peter A. Rankel University of Maryland College Park Maryland rankel@ Hoa Trang Dang National Institute of Standards and Technology Gaithersburg MD 20899 John M. Conroy IDA Center for Computing Sciences Bowie Maryland conroy@ Abstract We investigate the consistency of human assessors involved in summarization evaluation to understand its effect on system ranking and automatic evaluation techniques. Using Text Analysis Conference data we measure annotator consistency based on human scoring of summaries for Responsiveness Readability and Pyramid scoring. We identify inconsistencies in the data and measure to what extent these inconsistencies affect the ranking of automatic summarization systems. Finally we examine the stability of automatic metrics ROUGE and CLASsY with respect to the inconsistent assessments. 1 Introduction Automatic summarization of documents is a research area that unfortunately depends on human feedback. Although attempts have been made at automating the evaluation of summaries none is so good as to remove the need for human assessors. Human judgment of summaries however is notper-fect either. We investigate two ways of measuring evaluation consistency in order to see what effect it has on summarization evaluation and training of automatic evaluation metrics. 2 Assessor consistency In the Text Analysis Conference TAC Summarization track participants are allowed to submit more than one run usually two and this option is often used to test different settings or versions of the same summarization system. In cases when the system versions are not too divergent they sometimes produce identical summaries for a given topic. Summaries are randomized within each topic before they are evaluated so the identical copies are usually

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.