TAILIEUCHUNG - Báo cáo khoa học: "Phrase-based Statistical Language Generation using Graphical Models and Active Learning"

Most previous work on trainable language generation has focused on two paradigms: (a) using a statistical model to rank a set of generated utterances, or (b) using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator, which limits their scalability to new domains. This paper presents BAGEL, a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. . | Phrase-based Statistical Language Generation using Graphical Models and Active Learning Francois Mairesse Milica Gasic Filip JurCiCek Simon Keizer Blaise Thomson Kai Yu and Steve Young Cambridge University Engineering Department Trumpington Street Cambridge CB2 1PZ UK mg436 fj228 sk561 brmt2 ky219 sjy @ Abstract Most previous work on trainable language generation has focused on two paradigms a using a statistical model to rank a set of generated utterances or b using statistics to inform the generation decision process. Both approaches rely on the existence of a handcrafted generator which limits their scalability to new domains. This paper presents Bagel a statistical language generator which uses dynamic Bayesian networks to learn from semantically-aligned data produced by 42 untrained annotators. A human evaluation shows that BAGEL can generate natural and informative utterances from unseen inputs in the information presentation domain. Additionally generation performance on sparse datasets is improved significantly by using certainty-based active learning yielding ratings close to the human gold standard with a fraction of the data. 1 Introduction The field of natural language generation NLG is one of the last areas of computational linguistics to embrace statistical methods. Over the past decade statistical NLG has followed two lines of research. The first one pioneered by Langkilde and Knight 1998 introduces statistics in the generation process by training a model which reranks candidate outputs of a handcrafted generator. While their HALOGEN system uses an n-gram language model trained on news articles other systems have used hierarchical syntactic models Bangalore and Rambow 2000 models trained on user ratings of This research was partly funded by the UK EPSRC under grant agreement eP F013930 1 and funded by the EU FP7 Programme under grant agreement 216594 CLASSiC project . utterance quality Walker et al. .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.