Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models | IEEE Conference Publication | IEEE Xplore