Attention-Based Multimodal Deep Learning on Vision-Language Data: Models, Datasets, Tasks, Evaluation Metrics and Applications | IEEE Journals & Magazine | IEEE Xplore