We describe the current state of the 3-D Mosaic project, whose goal is to incrementally acquire a 3-D model of a complex urban scene from images. The notion of incremental acquisition arises from the observations that 1) single images contain only parfial information about a scene, 2) complex images are difficult to fully interpret, and 3) different features of a given scene tend to be easier to extract in different images because of differences in viewpoint and lighting conditions. In our approach, multiple images of the scene are sequentially analyzed so as to incrementaly construct the model. Each new image provides information which refines the model. We describe some experiments toward this end. Our method of extracting 3-D shape information from the images is stereo analysis. Because we are dealing with urban scenes, a junction-based matching technique proves very useful. This technique produces rather sparse wire-frame descriptions of the scene. A reasoning system that relies on task-specific knowledge generates an approximate model of the scene from the stereo output. Gray scale information is also acquired for the faces in the model. Finally, we describe an experiment in combining two views of the scene to obtain a rermed model.