Self-Supervised Learning with Cross-Modal Transformers for Emotion Recognition | IEEE Conference Publication | IEEE Xplore