Micro aerial vehicles (MAVs) are gaining importance as image acquisition tools in urban environments, where areas of interest are often close to buildings and to the ground. While GPS is still the most widely used sensor for outdoor localization, urban applications motivate the change towards visual localization. We present a framework based on metric, geo-referenced visual landmarks, which can be obtained by taking images with a consumer camera at ground level. Visual landmarks serve as prior knowledge to the MAV and allow robust, high-accuracy localization in urban environments. The issue of differing camera views in higher altitudes is reduced by incremental feature updates, a novel technique which boosts the performance by 30% in comparison to previous work, facilitates long-term operation, and results in a localization rate of 83%. We validate the visual pose estimation in-flight by comparison to IMU and GPS data, and evaluate our positioning accuracy with respect to differential GPS.