Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding | IEEE Conference Publication | IEEE Xplore