TAILIEUCHUNG - Báo cáo khoa học: "Learning Effective Multimodal Dialogue Strategies from Wizard-of-Oz data: Bootstrapping and Evaluation"

We address two problems in the field of automatic optimization of dialogue strategies: learning effective dialogue strategies when no initial data or system exists, and evaluating the result with real users. We use Reinforcement Learning (RL) to learn multimodal dialogue strategies by interaction with a simulated environment which is “bootstrapped” from small amounts of Wizard-of-Oz (WOZ) data. | Learning Effective Multimodal Dialogue Strategies from Wizard-of-Oz data Bootstrapping and Evaluation Verena Rieser School of Informatics University of Edinburgh Edinburgh EH8 9LW Gb vrieser@ Oliver Lemon School of Informatics University of Edinburgh Edinburgh EH8 9LW GB olemon@ Abstract We address two problems in the field of automatic optimization of dialogue strategies learning effective dialogue strategies when no initial data or system exists and evaluating the result with real users. We use Reinforcement Learning RL to learn multimodal dialogue strategies by interaction with a simulated environment which is bootstrapped from small amounts of Wizard-of-Oz WOZ data. This use of WOZ data allows development of optimal strategies for domains where no working prototype is available. We compare the RL-based strategy against a supervised strategy which mimics the wizards policies. This comparison allows us to measure relative improvement over the training data. Our results show that RL significantly outperforms Supervised Learning when interacting in simulation as well as for interactions with real users. The RL-based policy gains on average 50-times more reward when tested in simulation and almost 18-times more reward when interacting with real users. Users also subjectively rate the RL-based policy on average 10 higher. 1 Introduction Designing a spoken dialogue system is a timeconsuming and challenging task. A developer may spend a lot of time and effort anticipating the potential needs of a specific application environment and then deciding on the most appropriate system action . confirm present items . . One of the key advantages of statistical optimisation methods such as Reinforcement Learning RL for dialogue strategy design is that the problem can be formulated as a principled mathematical model which can be automatically trained on real data Lemon and Pietquin 2007 Frampton and Lemon to appear . In cases where a system is designed

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.