Skip to Main Content
We propose a place recognition algorithm for simultaneous localization and mapping (SLAM) systems using stereo cameras that considers both appearance and geometric information of points of interest in the images. Both near and far scene points provide information for the recognition process. Hypotheses about loop closings are generated using a fast appearance-only technique based on the bag-of-words (BoW) method. We propose several important improvements to BoWs that profit from the fact that, in this problem, images are provided in sequence. Loop closing candidates are evaluated using a novel normalized similarity score that measures similarity in the context of recent images in the sequence. In cases where similarity is not sufficiently clear, loop closing verification is carried out using a method based on conditional random fields (CRFs). We build on CRF matching with two main novelties: We use both image and 3-D geometric information, and we carry out inference on a minimum spanning tree (MST), instead of a densely connected graph. Our results show that MSTs provide an adequate representation of the problem, with the additional advantages that exact inference is possible and that the computational cost of the inference process is limited. We compare our system with the state of the art using visual indoor and outdoor data from three different locations and show that our system can attain at least full precision (no false positives) for a higher recall (fewer false negatives).