Audio–Visual Fusion for Emotion Recognition in the Valence–Arousal Space Using Joint Cross-Attention | IEEE Journals & Magazine | IEEE Xplore