Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation | IEEE Journals & Magazine | IEEE Xplore