Text-to-Speech With Lip Synchronization Based on Speech-Assisted Text-to-Video Alignment and Masked Unit Prediction | IEEE Journals & Magazine | IEEE Xplore