Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation | IEEE Conference Publication | IEEE Xplore