Skip to Main Content
Audio-to-video synchronization (AV-sync) may drift and is difficult to recover without dedicated human effort. In this work, we develop an interactive method to recover the drifted AV-sync by audiovisual correlation analysis. Given a video segment, a user specifies a rough time span during which a person is speaking. Our system first detects a speaker region using face detection. It then does a two-stage search to find the optimum AV-drift that can maximize the average audiovisual correlation inside the speaker region. The correlation is evaluated using quadratic mutual information with kernel density estimation. AV-sync is finally recovered by the detected optimum AV-drift. Experimental results demonstrate the effectiveness of our method.