Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval | IEEE Conference Publication | IEEE Xplore