This correspondence describes a method of building and maintaining a spatial respresentation for the workspace of a robot, using a sensor that moves about in the world. From the known camera position at which an image is obtained, and two-dimensional silhouettes of the image, a series of cones is projected to describe the possible positions of the objects in the space. When an object is seen from several viewpoints, the intersections of the cones constrain the position and size of the object. After several views have been processed, the representation of the object begins to resemble its true shape. At all times, the spatial representation contains the best guess at the true situation in the world with uncertainties in position and shape explicitly represented. An octree is used as the data structure for the representation. It not only provides a relatively compact representation, but also allows fast access to information and enables large parts of the workspace to be ignored. The purpose of constructing this representation is not so much to recognize objects as to describe the volumes in the workspace that are occupied and those that are empty. This enables trajectory planning to be carried out, and also provides a means of spatially indexing objects without needing to represent the objects at an extremely fine resolution. The spatial representation is one part of a more complex representation of the workspace used by the sensory system of a robot manipulator in understanding its environment.