Modeling Cross-Modal Semantic Transformations from Coarse to Fine in CLIP | IEEE Journals & Magazine | IEEE Xplore