Skip to Main Content
In this paper, we consider the problem of estimating the spatiotemporal alignment between N unsynchronized video sequences of the same dynamic 3D scene, captured from distinct viewpoints. Unlike most existing methods, which work for N = 2 and rely on a computationally intensive search in the space of temporal alignments, we present a novel approach that reduces the problem for general N to the robust estimation of a single line in RN. This line captures all temporal relations between the sequences and can be computed without any prior knowledge of these relations. Considering that the spatial alignment is captured by the parameters of fundamental matrices, an iterative algorithm is used to refine simultaneously the parameters representing the temporal and spatial relations between the sequences. Experimental results with real-world and synthetic sequences show that our method can accurately align the videos even when they have large misalignments (e.g., hundreds of frames), when the problem is seemingly ambiguous (e.g., scenes with roughly periodic motion), and when accurate manual alignment is difficult (e.g., due to slow-moving objects).