Skip to Main Content
Summary form only given. Vision and manipulation are inextricably intertwined in the primate brain. Tantalizing results from neuroscience are shedding light on the mixed motor and sensory representations used by the brain during reaching, grasping, and object recognition. We now know a great deal about what happens in the brain during these activities, but not necessarily why. Is the integration we see functionally important, or just a reflection of evolution's lack of enthusiasm for sharp modularity? We wish to instantiate these results in robotic form to probe their technical advantages and to find any lacunae in existing models. We believe it would be missing the point to investigate this on a platform where dextrous manipulation and sophisticated machine vision are already implemented in their mature form, and instead follow a developmental approach from simpler primitives. We begin with a precursor to manipulation, simple poking and prodding, and show how it facilitates object segmentation, a long-standing problem in machine vision. The robot can familiarize itself with the objects in its environment by acting upon them. It can then recognize other actors (such as humans) in the environment through their effect on the objects it has learned about. We argue that following causal chains of events out from the robot's body into the environment allows for a very natural developmental progression of visual competence, and relate this idea to results in neuroscience.