Learning Fine-Grained Semantics in Spoken Language Using Visual Grounding | IEEE Conference Publication | IEEE Xplore