Skip to Main Content
This paper proposes an efficient system which integrates multiple vision models for robust multiperson detection and tracking for mobile service and social robots in public environments. The core technique is a novel maximum likelihood (ML)-based algorithm which combines the multimodel detections in mean-shift tracking. First, a likelihood probability which integrates detections and similarity to local appearance is defined. Then, an expectation-maximization (EM)-like mean-shift algorithm is derived under the ML framework. In each iteration, the E-step estimates the associations to the detections, and the M-step locates the new position according to the ML criterion. To be robust to the complex crowded scenarios for multiperson tracking, an improved sequential strategy to perform the mean-shift tracking is proposed. Under this strategy, human objects are tracked sequentially according to their priority order. To balance the efficiency and robustness for real-time performance, at each stage, the first two objects from the list of the priority order are tested, and the one with the higher score is selected. The proposed method has been successfully implemented on real-world service and social robots. The vision system integrates stereo-based and histograms-of-oriented-gradients-based human detections, occlusion reasoning, and sequential mean-shift tracking. Various examples to show the advantages and robustness of the proposed system for multiperson tracking from mobile robots are presented. Quantitative evaluations on the performance of multiperson tracking are also performed. Experimental results indicate that significant improvements have been achieved by using the proposed method.