Skip to Main Content
In this paper, an approach to semantic video analysis that is based on the statistical processing and representation of the motion signal is presented. Overall, the examined video is temporally segmented into shots and for every resulting shot appropriate motion features are extracted; using these, hidden Markov models (HMMs) are employed for performing the association of each shot with one of the semantic classes that are of interest. The novel contributions of this paper lie in the areas of motion information processing and representation. Regarding the motion information processing, the kurtosis of the optical flow motion estimates is calculated for identifying which motion values originate from true motion rather than measurement noise. Additionally, unlike the majority of the approaches of the relevant literature that are mainly limited to global- or camera-level motion representations, a new representation for providing local-level motion information to HMMs is also presented. It focuses only on the pixels where true motion is observed. For the selected pixels, energy distribution-related information, as well as a complementary set of features that highlight particular spatial attributes of the motion signal, are extracted. Experimental results, as well as comparative evaluation, from the application of the proposed approach in the domains of Tennis, News and Volleyball broadcast video, and Human Action video demonstrate the efficiency of the proposed method.