Abstract:
The goal of data analytics in surveillance videos is to fully understand and reconstruct the 3D scene, i.e., to recover the trajectory and action of each object. In a sur...Show MoreMetadata
Abstract:
The goal of data analytics in surveillance videos is to fully understand and reconstruct the 3D scene, i.e., to recover the trajectory and action of each object. In a surveillance system with camera arrays of overlapping views, we propose a novel video scene reconstruction framework to collaboratively track multiple human objects and estimate their 3D poses. First, tracklets are extracted from each single view following the tracking-by-detection paradigm. We propose an effective integration of visual and semantic object attributes, i.e., appearance models, geometry information and poses/actions, to associate tracklets across different views. Based on the optimum viewing perspectives derived from tracking, a hierarchical estimation of human poses is introduced to generate the 3D skeleton of each object. The estimated body joint points are fed back to the tracking stage to enhance tracklet association. Experiments on benchmarks of multiview tracking and 3D pose estimation validate the effectiveness of the proposed method.
Date of Conference: 23-27 July 2018
Date Added to IEEE Xplore: 11 October 2018
ISBN Information: