Disentangled Speech Embeddings Using Cross-Modal Self-Supervision | IEEE Conference Publication | IEEE Xplore