Skip to Main Content
The capability to coordinate and interrelate speech and vision is a virtual prerequisite for adaptive, cooperative, and flexible interaction among people. It is therefore fair to assume that human-machine interaction, too, would benefit from intelligent interfaces for integrated speech and image processing. We first sketch an interactive system that integrates automatic speech processing with image understanding. Then, we concentrate on performance assessment which we believe is an emerging key issue in multimodal interaction. We explain the benefit of time scale analysis and usability studies and evaluate our system accordingly.