Cross-modal mask fusion and modality-balanced audio-visual speech recognition | IEEE Conference Publication | IEEE Xplore