In this paper, a system that transforms speech waveforms to animated faces are proposed. The system relies on a state space model to perform the mapping. To create a photo realistic image, an active appearance model is used. The main contribution of the paper is to compare a Kalman filter and a hidden Markov model approach to the mapping. It is shown that even though the HMM can get a higher test likelihood than the Kalman filter, it is much easier to train and the animation quality is better for the Kalman filter.
Published in:
Multimedia Signal Processing, 2004 IEEE 6th Workshop on
Date of Conference: 29 Sept.-1 Oct. 2004