Skip to Main Content
We propose two new models for human action recognition from video sequences using topic models. Video sequences are represented by a novel "bag-of-words" representation, where each frame corresponds to a "word". Our models differ from previous latent topic models for visual recognition in two major aspects: first of all, the latent topics in our models directly correspond to class labels; second, some of the latent variables in previous topic models become observed in our case. Our models have several advantages over other latent topic models used in visual recognition. First of all, the training is much easier due to the decoupling of the model parameters. Second, it alleviates the issue of how to choose the appropriate number of latent topics. Third, it achieves much better performance by utilizing the information provided by the class labels in the training set. We present action classification results on five different data sets. Our results are either comparable to, or significantly better than previously published results on these data sets.