Skip to Main Content
Our goal is to detect people in highly articulated poses, including bending, crouching, etc. Such formidable diversity in human poses makes detection much more difficult than for pedestrian poses. Â¿Divide-and-conquerÂ¿ is a favorable strategy for detecting objects with large intra class variations, which splits object instances into several subcategories and trains relatively simple classifiers for each sub-category. We propose a novel sample split method, which benefits the learning results of articulated humans. We adopt the cluster boosted tree (CBT) structure to automatically decide when a split should be triggered. Unlike the simple k-means used in CBT for sample split, our approach aims at minimizing the training loss after the split. Since this minimization is an NP-hard problem, we design a heuristic algorithm, in which we find optimal sample divisions according to each single feature, and then make compromises to get a final division by a voting-like process. We name our training method as voting cluster boosted tree (VCBT). Furthermore, to avoid large background area in training samples, we first cluster samples according to their width/height ratios, and then train a VCBT for each subset. We conduct an experiment on 17 infrared surveillance video clips, report superior performance compared with previous human detection methods, and show how our approach benefits the learning results by reducing training loss.