Man made indoor environments possess regularities, which can be efficiently exploited in automated model acquisition by means of visual sensing. In this context we propose an approach for inferring a topological model of an environment from images or the video stream captured by a mobile robot during exploration. The proposed model consists of a set of locations and neighborhood relationships between them. Initially each location in the model is represented by a collection of similar, temporally adjacent views, with the similarity defined according to a simple appearance based distance measure. The sparser representation is obtained in a subsequent learning stage by means of learning vector quantization (LVQ). The quality of the model is tested in the context of qualitative localization scheme by means of location recognition: given a new view, the most likely location where that view came from is determined.