Unsupervised Vision-and-Language Pretraining via Retrieval-based Multi-Granular Alignment | IEEE Conference Publication | IEEE Xplore