Listen, Look and Deliberate: Visual Context-Aware Speech Recognition Using Pre-Trained Text-Video Representations | IEEE Conference Publication | IEEE Xplore