Robust Audio-Visual ASR with Unified Cross-Modal Attention | IEEE Conference Publication | IEEE Xplore