We consider the multiple time-series alignment problem, typically focusing on the task of synchronizing multiple motion videos of the same kind of human activity. Finding an optimal global alignment of multiple sequences is infeasible, while there have been several approximate solutions, including iterative pairwise warping algorithms and variants of hidden Markov models. In this paper, we propose a novel probabilistic model that represents the conditional densities of the latent target sequences which are aligned with the given observed sequences through the hidden alignment variables. By imposing certain constraints on the target sequences at the learning stage, we have a sensible model for multiple alignments that can be learned very efficiently by the EM algorithm. Compared to existing methods, our approach yields more accurate alignment while being more robust to local optima and initial configurations. We demonstrate its efficacy on both synthetic and real-world motion videos including facial emotions and human activities.