Abstract:
We introduce Language2Gesture (L2G), a cross-modal generative model designed to predict head gesture animations directly from audio inputs. Unlike existing head gesture p...Show MoreMetadata
Abstract:
We introduce Language2Gesture (L2G), a cross-modal generative model designed to predict head gesture animations directly from audio inputs. Unlike existing head gesture prediction models, L2G excels in modelling both macro and subtle micro head gesture motions. L2G further refines the generated motions by conditioning the output of the L2G decoder on the emotional dimensions of Valence, Arousal, and Dominance (VAD). L2G preserves gesture diversity and supports a wide range of expressive behaviors. Our evaluation shows that L2G can directly regress continuous 3D head gestures in multi-speaker settings, outperforming the current state-of-the-art method S3 [1].
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: