Skip to Main Content
Automatic initialization and tracking of multiple people and their body parts is one of the first steps in designing interactive multimedia applications. The key problems in this context are robust detection and tracking of people and their body parts in an unconstrained environment. This paper presents an integrated framework to address detection and tracking of multiple objects in a computationally efficient manner. In particular, a neural network-based face detector was employed to detect faces and compute person specific statistical model for skin color from the face regions. A probabilistic model was proposed to fuse the color and motion information to localize the moving body parts (hands). Multiple hypothesis tracking (MHT) algorithm was adopted to track face and hands. In real world scenes extracted features (face and hands) usually contain spurious measurements that create unconvincing trajectories and needless computations. To deal with this problem a path coherence function was incorporated along with MHT to reduce the number of hypotheses, which in turn reduces the computational cost and improves the structure of trajectories. The performance of the framework was validated using experiments on synthetic and real sequence of images.