A Multiscale Grouping Transformer With CLIP Latents for Remote Sensing Image Captioning | IEEE Journals & Magazine | IEEE Xplore