Skip to Main Content
Automatic semantic annotation of video events has received a large attention from the scientific community in the latest years. Events can be defined by spatio-temporal relations and properties of objects and entities, which change over time; some events can be described by a set of patterns. Despite this application of dynamic graphical modeling, the performance for event modeling and detection continues to be a challenge in scenarios where a very large number of training samples are not available. It is in situations like these that the need for event models that are built using discriminate classifiers is acute and the need for well designed features that can capture motion information of video shots into a small number of feature dimensions is required. In this paper, we present a framework for semantic video event annotation that exploits global feature, local feature and motion feature. Using these features, video clip can be encoded as a set of feature vectors. Then according to different features, we train SVM classifiers, and a bi-coded chromosome based genetic algorithm is performed to obtain optimal classifiers and relevant optimal weights based on training stage. With the optimal classifiers set and optimal weights, the maximum similarity between video clip in original database and unlabeled video clip is considered to be the final label result.