Many successful indoor mapping techniques employ frame-to-frame matching of laser scans to produce detailed local maps as well as the closing of large loops. In this paper, we propose a framework for applying the same techniques to visual imagery. We match visual frames with large numbers of point features, using classic bundle adjustment techniques from computational vision, but we keep only relative frame pose information (a skeleton). The skeleton is a reduced nonlinear system that is a faithful approximation of the larger system and can be used to solve large loop closures quickly, as well as forming a backbone for data association and local registration. We illustrate the workings of the system with large outdoor datasets (10 km), showing large-scale loop closure and precise localization in real time.