Skip to Main Content
Unsupervised categorization of objects is a fundamental problem in computer vision. While appearance-based methods have become popular recently, other important cues like functionality are largely neglected. Motivated by psycho logical studies giving evidence that human demonstration has a facilitative effect on categorization in infancy, we pro pose an approach for object categorization from depth video streams. To this end, we have developed a method for cap turing human motion in real-time. The captured data is then used to temporally segment the depth streams into actions. The set of segmented actions are then categorized in an un supervised manner, through a novel descriptor for motion capture data that is robust to subject variations. Further more, we automatically localize the object that is manipulated within a video segment, and categorize it using the corresponding action. For evaluation, we have recorded a dataset that comprises depth data with registered video sequences for 6 subjects, 13 action classes, and 174 object manipulations.