Skip to Main Content
In this work, we address the problem of building recognition across two camera views with large changes in scales and viewpoints. The main idea is to construct a semantically rich sketch-based representation for buildings which is invariant under large scale and perspective changes. After multi-scale maximally stable extremal regions (MSER) detection, the proposed approach finds repeated structural components of buildings, such as window, doors, and facades, and extracts semantically rich features, which are organized into a sketch-based representation of buildings. These descriptors are then clustered in association with different planes of the building and matched across video frames using spectral graph analysis. Our experiments demonstrate that the proposed approach outperforms SIFT-based matching schemes, especially for images with large viewpoint changes.