TAILIEUCHUNG - Automatic User-Adaptive Speaking Rate Selection

Today there are many services which provide information over the phone using a prerecorded or synthesized voice. These voices are invariant in speed. Humans giving information over the telephone, however, tend to adapt the speed of their presentation to suit the needs of the listener. This paper presents a preliminary model of this adaptation. In a corpus of simulated directory assistance dialogs the operator’s speed in number-giving correlates with the speed of the user’s initial response and with the user’s speaking regression gives a formula which predicts appropriate speaking rates, and these predictions correlate (.46) with the speeds observed in. | revised for the International Journal of Speech Technology January 2, 2004 Automatic User-Adaptive Speaking Rate Selection NIGEL WARD and SATOSHI NAKAGAWA University of Tokyo1 Abstract: Today there are many services which provide information over the phone using a prerecorded or synthesized voice. These voices are invariant in speed. Humans giving information over the telephone, however, tend to adapt the speed of their presentation to suit the needs of the listener. This paper presents a preliminary model of this adaptation. In a corpus of simulated directory assistance dialogs the operator’s speed in number-giving correlates with the speed of the user’s initial response and with the user’s speaking rate. Multiple regression gives a formula which predicts appropriate speaking rates, and these predictions correlate (.46) with the speeds observed in good dialogs in the corpus. It is therefore easy, at least in principle, to make systems which adapt their speed to users’ needs. Keywords: rate, speed, pace, adaptation, number-giving 1 Introduction Many commercial telephone dialogs include an information delivery phase, in which the system gives the user information such as a time, a price, a password, directions, a confirmation number, etc. As far as we know, all IVR and spoken dialog systems today provide information either by playing back a fixed, prerecorded voice, or by using a synthesized voice generated with fixed parameters. With information delivered at a single speed, invariant across users, it will be too fast for some users, such as non-native speakers, children, and people in noisy environments, and too slow for others, such as business people in a hurry. There is a time cost either way: if the speed is too slow there is a clear loss in user time, system time, and connection time; if the speed is too fast there is again a time loss as the user waits for a repetition. 1Ward is currently at the University of Texas at El Paso. Nakagawa is currently at IBM Japan.

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.