Multimodal Sparse Transformer Network for Audio-Visual Speech Recognition | IEEE Journals & Magazine | IEEE Xplore