Skip to Main Content
Human action recognition in video is often approached by means of sequential probabilistic models as they offer a natural match to the temporal dimension of the actions. However, effective estimation of the models' parameters is critical if one wants to achieve significant recognition accuracy. Parameter estimation is typically performed over a set of training data by maximizing objective functions such as the data likelihood or the conditional likelihood. However, such functions are nonconvex in nature and subject to local maxima. This problem is major since any solution algorithm (expectation-maximization, gradient ascent, variational methods and others) requires an arbitrary initialization and can only find a corresponding local maximum. Exhaustive search is otherwise impossible since the number of local maxima is unknown. While no theoretical solutions are available for this problem, the only practicable mollification is to repeat training with different initializations until satisfactory cross-validation accuracy is attained. Such a process is overall empirical and highly time-consuming. In this paper, we propose two methods for one-off initialization of hidden Markov models achieving interesting tradeoffs between accuracy and training time. Experiments over three challenging human action video datasets (Weizmann, MuHAVi and Hollywood Human Actions) and with various feature sets measured from the frames (STIP descriptors, projection histograms, notable contour points) prove that the proposed one-off initializations are capable of achieving accuracy above the average of repeated random initializations and comparable to the best. In addition, the methods proposed are not restricted solely to human action recognition as they suit time series classification as a general problem.