Modelling Complex Associations for Image Captioning Using Vision Transformers | IEEE Conference Publication | IEEE Xplore