Multimodal Grounding for Sequence-to-sequence Speech Recognition | IEEE Conference Publication | IEEE Xplore