Skip to Main Content
The recognition errors make recognition-based systems brittle, and lead to usability problems. Multimodal system is generally believed as an effective means of being able to contribute to error avoidance and recovery. This work explores how to combine gaze and speech, which are two error-prone modes, in order to get a robust multimodal architecture. Combining the two overcomes imperfections of recognition techniques, compensates for drawbacks of a single mode, resolves the language ambiguity, and leads to a much more effective system. In addition, we try to employ a new performance criterion about the error-handling ability to analyze and assess the multimodal integration strategies. With this new measure approach, not only the benefits of mutual disambiguation of individual input signals within the multimodal architecture are demonstrated, but also the condition under which the multimodal system becomes the most effective is identified.