Skip to Main Content
In this paper, we describe how information obtained from multiple views using a network of cameras can be effectively combined to yield a reliable and fast human action recognition system. We describe a score-based fusion technique for combining information from multiple cameras that can handle arbitrary orientation of the subject with respect to the cameras. Our fusion technique does not rely on a symmetric deployment of the cameras and does not require that camera network deployment configuration be preserved between training and testing phases. To classify human actions, we use motion information characterized by the spatio-temporal shape of a human silhouette over time. By relying on feature vectors that are relatively easy to compute, our technique lends itself to an efficient distributed implementation while maintaining a high frame capture rate. We evaluate the performance of our system under different camera densities and view availabilities. Finally, we demonstrate the performance of our system in an online setting where the camera network is used to identify human actions as they are being performed.