A Pre-Trained Audio-Visual Transformer for Emotion Recognition | IEEE Conference Publication | IEEE Xplore