TAILIEUCHUNG - Báo cáo khoa học: " a Movie Dialogue Corpus for Research and Development"

This paper describes Movie-DiC a Movie Dialogue Corpus recently collected for research and development purposes. The collected dataset comprises 132,229 dialogues containing a total of 764,146 turns that have been extracted from 753 movies. Details on how the data collection has been created and how it is structured are provided along with its main statistics and characteristics. | Movie-DiC a Movie Dialogue Corpus for Research and Development Rafael E. Banchs Human Language Technology Institute for Infocomm Research Singapore 138632 rembanchs@ Abstract This paper describes Movie-DiC a Movie Dialogue Corpus recently collected for research and development purposes. The collected dataset comprises 132 229 dialogues containing a total of 764 146 turns that have been extracted from 753 movies. Details on how the data collection has been created and how it is structured are provided along with its main statistics and characteristics. 1 Introduction Data driven applications have proliferated in Computational Linguistics during the last decade. Several factors such as the availability of more powerful computers an almost unlimited storage capacity the availability of large volumes of data in digital format as well as the recent advances in machine learning theory have significantly contributed to such a proliferation. Among the many applications that have benefited from this data-driven boom probably the most representative examples are information retrieval Qin et al. 2008 machine translation Brown et al. 1993 question answering Molla-Aliod and Vicedo 2010 and dialogue systems Rieser and Lemon 2011 . In the specific case of dialogue systems data acquisition can impose some challenges depending on the specific domain and task the dialogue system is targeted for. In some specific domains in which human-human dialogue applications already 203 exists data collection is generally straight forward while in some other cases data design and collection can constitute a complex problem Williams and Young 2003 Zue 2007 Misu et al. 2009 . Depending on the objective being pursued dialogue systems can be grouped into two major categories task-oriented and chat-oriented systems. In the first case the system is required to help the user to accomplish a specific goal or objective Busemann et al. 1997 Stallard 2000 . In the second case the system .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.