VLCA: vision-language aligning model with cross-modal attention for bilingual remote sensing image captioning | BIAI Journals & Magazine | IEEE Xplore