We present a two stage framework for automatic video text removal to detect and remove embedded video texts and fill-in their remaining regions by appropriate data. In the video text detection stage, text locations in each frame are found via an unsupervised clustering performed on the connected components produced by the stroke width transform (SWT). Since SWT needs an accurate edge map, we develop a novel edge detector which benefits from the geometric features revealed by the bandlet transform. Next, the motion patterns of the text objects of each frame are analyzed to localize video texts. The detected video text regions are removed, then the video is restored by an inpainting scheme. The proposed video inpainting approach applies spatio-temporal geometric flows extracted by bandlets to reconstruct the missing data. A 3D volume regularization algorithm, which takes advantage of bandlet bases in exploiting the anisotropic regularities, is introduced to carry out the inpainting task. The method does not need extra processes to satisfy visual consistency. The experimental results demonstrate the effectiveness of both our proposed video text detection approach and the video completion technique, and consequently the entire automatic video text removal and restoration process.