A CNN-Transformer Approach for Image-Text Multimodal Classification with Cross-Modal Feature Fusion | IEEE Conference Publication | IEEE Xplore