We investigate the problem of spatiotemporal alignment of videos, signals, or feature sequences extracted from them. Specifically, we consider the scenario where the spatiotemporal misalignments can be characterized by parametric transformations. Using a nonlinear analytical structure referred to as an alignment manifold, we formulate the alignment problem as an optimization problem on this nonlinear space. We focus our attention on semantically meaningful videos or signals, e.g., those describing or capturing human motion or activities, and propose a new formalism for temporal alignment accounting for executing rate variations among instances of the same video event. The strategy taken in this effort bridges the family of geometric optimization and the family of stochastic algorithms: We regard the search for optimal alignment parameters as a recursive state estimation problem for a particular dynamic system evolving on the alignment manifold. Subsequently, a Sequential Importance Sampling procedure on the alignment manifold is designed for effective alignment. We further extend the basic Sequential Importance Sampling algorithm into a new version called Stochastic Gradient Sequential Importance Sampling, in which we incorporate a steepest descent structure on the alignment manifold and provide a more efficient particle propagation mechanism. We demonstrate the performance of alignment using manifolds on several types of input data that arise in vision problems.