TAILIEUCHUNG - Báo cáo khoa học: "Comparing Automatic and Human Evaluation of NLG Systems"

We consider the evaluation problem in Natural Language Generation (NLG) and present results for evaluating several NLG systems with similar functionality, including a knowledge-based generator and several statistical systems. We compare evaluation results for these systems by human domain experts, human non-experts, and several automatic evaluation metrics, including NIST, BLEU, and ROUGE. We ﬁnd that NIST scores correlate best ( ) with human judgments, but that all automatic metrics we examined are biased in favour of generators that select on the basis of frequency alone. . | Comparing Automatic and Human Evaluation of NLG Systems Anja Belz Natural Language Technology Group CMIS University of Brighton UK Ehud Reiter Dept of Computing Science University of Aberdeen UK ereiter@ Abstract We consider the evaluation problem in Natural Language Generation NLG and present results for evaluating several NLG systems with similar functionality including a knowledge-based generator and several statistical systems. We compare evaluation results for these systems by human domain experts human non-experts and several automatic evaluation metrics including NIST BLEU and ROUGE. We find that NIST scores correlate best with human judgments but that all automatic metrics we examined are biased in favour of generators that select on the basis of frequency alone. We conclude that automatic evaluation of NLG systems has considerable potential in particular where high-quality reference texts and only a small number of human evaluators are available. However in general it is probably best for automatic evaluations to be supported by human-based evaluations or at least by studies that demonstrate that a particular metric correlates well with human judgments in a given domain. 1 Introduction Evaluation is becoming an increasingly important topic in Natural Language Generation NLG as in other fields of computational linguistics. Some NLG researchers are impressed by the success of the BLEU evaluation metric Papineni et al. 2002 in Machine Translation MT which has transformed the MT field by allowing researchers to quickly and cheaply evaluate the impact of new ideas algorithms and data sets. BLEU and related metrics work by comparing the output of an MT system to a set of reference gold standard translations and in principle this kind of evaluation could be done with NLG systems as well. Indeed NLG researchers are already starting to use BLEU Habash 2004 Belz 2005 in their evaluations as this is much cheaper and easier to

Triệu Thái 85 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Comparing Automatic and Human Evaluation of NLG Systems"

8 70 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462057 59

Giới thiệu :Lập trình mã nguồn mở

14 23756 74

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11118 535

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10356 458

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9635 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8634 1148

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8356 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7942 2249

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6976 260

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6701 1606

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 353 1 26-06-2024

Đánh giá hao mòn và độ tin cậy của chi tiết và kết cấu trên đầu máy diezel part 3

12 384 1 26-06-2024

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 169 1 26-06-2024

Báo cáo nghiên cứu nông nghiệp " Field control of pest fruit flies in Vietnam "

14 160 2 26-06-2024

ĐỀ THI THỬ ĐẠI HỌC 2009 – THPT ĐÔNG SƠN 1 – LẦN 2 – MÔN TOÁN

8 123 1 26-06-2024

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 142 1 26-06-2024

Anh văn TOEFL Vocabulary-008

8 131 0 26-06-2024

báo cáo hóa học:" Perceptions of rewards among volunteer caregivers of people living with AIDS working in faith-based organizations in South Africa: a qualitative study"

10 120 0 26-06-2024

Concluding interview 6

6 123 1 26-06-2024

MANAGING NANO-BIO-INFO-COGNO INNOVATIONS

380 127 0 26-06-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7942 2249

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6701 1606

Ebook Chào con ba mẹ đã sẵn sàng

112 4005 1299

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5688 1193

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8634 1148

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3633 665

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3845 601

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4378 543

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11118 535

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4291 483