By Topic

Integrating Spatio-Temporal Context With Multiview Representation for Object Recognition in Visual Surveillance

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Xiaobai Liu ; Sch. of Comput. Sci. & Technol., Huazhong Univ. of Sci. & Technol., Wuhan, China ; Liang Lin ; Shuicheng Yan ; Hai Jin
more authors

We present in this paper an integrated solution to rapidly recognizing dynamic objects in surveillance videos by exploring various contextual information. This solution consists of three components. The first one is a multi-view object representation. It contains a set of deformable object templates, each of which comprises an ensemble of active features for an object category in a specific view/pose. The template can be efficiently learned via a small set of roughly aligned positive samples without negative samples. The second component is a unified spatio-temporal context model, which integrates two types of contextual information in a Bayesian way. One is the spatial context, including main surface property (constraints on object type and density) and camera geometric parameters (constraints on object size at a specific location). The other is the temporal context, containing the pixel-level and instance-level consistency models, used to generate the foreground probability map and local object trajectory prediction. We also combine the above spatial and temporal contextual information to estimate the object pose in scene and use it as a strong prior for inference. The third component is a robust sampling-based inference procedure. Taking the spatio-temporal contextual knowledge as the prior model and deformable template matching as the likelihood model, we formulate the problem of object category recognition as a maximum-a-posteriori problem. The probabilistic inference can be achieved by a simple Markov chain Mento Carlo sampler, owing to the informative spatio-temporal context model which is able to greatly reduce the computation complexity and the category ambiguities. The system performance and benefit gain from the spatio-temporal contextual information are quantitatively evaluated on several challenging datasets and the comparison results clearly demonstrate that our proposed algorithm outperforms other state-of-the-art algorithms.

Published in:

Circuits and Systems for Video Technology, IEEE Transactions on  (Volume:21 ,  Issue: 4 )