Cross-Modal Audio-Visual Co-Learning for Text-Independent Speaker Verification | IEEE Conference Publication | IEEE Xplore