Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Recent empirical research has shown conclusive advantages of multimodal interaction over speech-only interaction for mapbased tasks. This paper describes a multimodal language processing architecture which supports interfaces allowing simultaneous input from speech and gesture recognition. Integration of spoken and gestural input is driven by unification of typed feature structures representing the semantic contributions of the different modes. | Unification-based Multimodal Integration Michael Johnston Philip R. Cohen David McGee Sharon L. Oviatt James A. Pittman Ira Smith Center for Human Computer Communication Department of Computer Science and Engineering Oregon Graduate Institute PO BOX 91000 Portland OR 97291 USA. j ohnston pcohen dmcgee oviatt j ay ira 3cse.ogi.edu Abstract Recent empirical research has shown conclusive advantages of multimodal interaction over speech-only interaction for mapbased tasks. This paper describes a multimodal language processing architecture which supports interfaces allowing simultaneous input from speech and gesture recognition. Integration of spoken and gestural input is driven by unification of typed feature structures representing the semantic contributions of the different modes. This integration method allows the component modalities to mutually compensate for each others errors. It is implemented in Quick-Set a multimodal pen voice system that enables users to set up and control distributed interactive simulations. 1 Introduction By providing a number of channels through which information may pass between user and computer multimodal interfaces promise to significantly increase the bandwidth and fluidity of the interface between humans and machines. In this work we are concerned with the addition of multimodal input to the interface. In particular we focus on interfaces which support simultaneous input from speech and pen utilizing speech recognition and recognition of gestures and drawings made with a pen on a complex visual display such as a map. Our focus on multimodal interfaces is motivated in part by the trend toward portable computing devices for which complex graphical user interfaces are infeasible. For such devices speech and gesture will be the primary means of user input. Recent empirical results Oviatt 1996 demonstrate clear task performance and user preference advantages for multimodal interfaces over speech only interfaces in par ticular for spatial