Bridging the Modality Gap for Speech-image Retrieval with Text Supervision | IEEE Conference Publication | IEEE Xplore