Skip to Main Content
Spoken language user interfaces can dramatically speed up computer use. Unfortunately, if the speech user interface interferes too often, the user turns it off. Users are unforgiving: a technology that impairs productivity just once may never get a second chance. To give the user interface a fighting chance, why not endow it with a certain amount of emotional sensitivity? Users respond better to an avatar that displays appropriate emotional nuance; conversely, if the avatar detects extreme frustration on the part of the user, it can hide in the corner of the monitor until the frustration has passed. A hidden avatar is still present and can continue to be of service to the user upon request. This paper describes experiments in emotive spoken language user interfaces. We find that both recognition accuracy and synthesis quality are improved when one takes advantage of multimodal, synthesizing, and recognizing information in both the audio and video modalities.