Improved CLIP cross-modal retrieval model for fine-grained interactions | IEEE Conference Publication | IEEE Xplore