Unsupervised Improvement of Audio-Text Cross-Modal Representations | IEEE Conference Publication | IEEE Xplore