Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning | IEEE Conference Publication | IEEE Xplore