Coarse-to-fine Alignment Makes Better Speech-image Retrieval | IEEE Conference Publication | IEEE Xplore