Skip to Main Content
In this paper, the authors have developed a system that animates 3D facial agents based on real-time facial expression analysis techniques and research on synthesizing facial expressions and text-to-speech capabilities. This system combines visual, auditory, and primary interfaces to communicate one coherent multimodal chat experience. Users can represent themselves using agents they select from a group that we have predefined. When a user shows a particular expression while typing a text, the 3D agent at the receiving end speaks the message aloud while it replays the recognized facial expression sequences and also augments the synthesized voice with appropriate emotional content. Because the visual data exchange is based on the MPEG-4 high-level Facial Animation Parameter for facial expressions (FAP 2), rather than real-time video, the method requires very low bandwidth.