Balanced Overall and Local: Improving Image Captioning with Enhanced Transformer Model | IEEE Conference Publication | IEEE Xplore