TAILIEUCHUNG - Báo cáo khoa học: "Human Evaluation of a German Surface Realisation Ranker"

In this paper we present a human-based evaluation of surface realisation alternatives. We examine the relative rankings of naturally occurring corpus sentences and automatically generated strings chosen by statistical models (language model, loglinear model), as well as the naturalness of the strings chosen by the log-linear model. We also investigate to what extent preceding context has an effect on choice. We show that native speakers do accept quite some variation in word order, but there are also clearly factors that make certain realisation alternatives more natural. . | Human Evaluation of a German Surface Realisation Ranker Aoife Cahill Institut fur Maschinelle Sprachverarbeitung IMS University of Stuttgart 70174 Stuttgart Germany Martin Forst Palo Alto Research Center 3333 Coyote Hill Road Palo Alto CA 94304 USA mforst@ Abstract In this paper we present a human-based evaluation of surface realisation alternatives. We examine the relative rankings of naturally occurring corpus sentences and automatically generated strings chosen by statistical models language model log-linear model as well as the naturalness of the strings chosen by the log-linear model. We also investigate to what extent preceding context has an effect on choice. We show that native speakers do accept quite some variation in word order but there are also clearly factors that make certain realisation alternatives more natural. 1 Introduction An important component of research on surface realisation the task of generating strings for a given abstract representation is evaluation especially if we want to be able to compare across systems. There is consensus that exact match with respect to an actually observed corpus sentence is too strict a metric and that BLEU score measured against corpus sentences can only give a rough impression of the quality of the system output. It is unclear however what kind of metric would be most suitable for the evaluation of string realisations so that as a result there have been a range of automatic metrics applied including inter alia exact match string edit distance NIST SSA BLEU NIST ROUGE generation string accuracy generation tree accuracy word accuracy Bangalore et al. 2000 Callaway 2003 Nakanishi et al. 2005 Velldal and Oepen 2006 Belz and Reiter 2006 . It is not always clear how appropriate these metrics are especially at the level of individual sentences. Using automatic evaluation metrics cannot be avoided but ideally a metric for the evaluation of realisation rankers would rank .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.