Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models | IEEE Conference Publication | IEEE Xplore