Skip to Main Content
This article addresses the problem of imagebased localization in indoor environments. The localization is achieved by querying a database of omnidirectional images that constitutes a detailed visual map of the building where the robot operates. Omnidirectional cameras have the advantage, when compared to standard perspectives, of capturing in a single frame the entire visual content of a room. This, not only speeds up the process of acquiring data for creating the map, but also favors scalability by significantly decreasing the size of the database. The problem is that omnidirectional images have strong non-linear distortion, which leads to poor retrieval results when the query images are standard perspectives. This paper reports for the first time thorough experiments in using perspectives to index a database of para-catadioptric images for the purpose of robot localization. We propose modifications to the SIFT algorithm that significantly improve point matching between the two types of images with positive impact in the recognition based in visual words. We also compare the classical bags-of-words against the recent framework of visual-phrases, showing that the latter outperforms the former.