Visual Question Answering: Convolutional Vision Transformers with Image-Guided Knowledge and Stacked Attention | IEEE Conference Publication | IEEE Xplore