Skip to Main Content
In this paper we propose a new method of object modeling for environment perception through human-robot interaction. Particularly, within a multi-modal object modeling architecture, we tackle the gestural language part using a stereo camera. To do that, we define three human gestures based on the size of target objects; holding small objects, pointing at medium ones, and contacting two corner points of large ones. When a user indicates where the target object is located in the environment, the robot interprets the user's gesture and captures one or more images including the target objects. The region of interest where a target object is likely to be located in the captured image is estimated from the environmental context and the user's gesture. Finally, given an image with a region of interest, the robot performs foreground/background segmentation automatically. Here, we suggest a marker-based watershed segmentation method for object segmentation. Experimental results show that the segmentation quality of our method is as good as that of the GrabCut algorithm, but the computational time of ours is so much faster that it is appropriate for on-line interactive object modeling.