• Abstract

# Visual SLAM for Flying Vehicles

The ability to learn a map of the environment is important for numerous types of robotic vehicles. In this paper, we address the problem of learning a visual map of the ground using flying vehicles. We assume that the vehicles are equipped with one or two low-cost downlooking cameras in combination with an attitude sensor. Our approach is able to construct a visual map that can later on be used for navigation. Key advantages of our approach are that it is comparably easy to implement, can robustly deal with noisy camera images, and can operate either with a monocular camera or a stereo camera system. Our technique uses visual features and estimates the correspondences between features using a variant of the progressive sample consensus (PROSAC) algorithm. This allows our approach to extract spatial constraints between camera poses that can then be used to address the simultaneous localization and mapping (SLAM) problem by applying graph methods. Furthermore, we address the problem of efficiently identifying loop closures. We performed several experiments with flying vehicles that demonstrate that our method is able to construct maps of large outdoor and indoor environments.

SECTION I

## Introduction

The problem of learning maps with mobile robots is a large and active research field in the robotic community. In the past, a variety of solutions to the simultaneous localization and mapping (SLAM) problem have been developed that focus on learning2-D maps. Additionally, several approaches for building 3-D maps have been proposed [8], [15], [21]. However, most of these methods rely on bulky sensors that have a high range and accuracy (e.g., SICK laser range finders) but cannot be used on robots such as small flying vehicles. As a result, several researchers focused on utilizing vision sensors instead of laser range finders. Cameras are an attractive alternative due to their limited weight and low power consumption. Many existing approaches that address the vision-based SLAM problem focus on scenarios in which a robot repeatedly observes a set of features [5], [7] and have been shown to learn accurate feature-based maps.

In this paper, we present a system that allows aerial vehicles to acquire visual maps of large environments using an attitude sensor and low-quality cameras pointing downward. In particular, we are interested in enabling miniature aerial vehicles such as small blimps or helicopters to robustly navigate. To achieve this, our system deals with cameras that provide comparably low-quality images that are also affected by significant motion blur. Furthermore, it can operate in two different configurations: with a stereo as well as with a monocular camera. If a stereo setup is available, our approach is able to learn visual elevation maps of the ground. If, however, only one camera is carried by the vehicle, our system can be applied by making a flat ground assumption providing a visual map without elevation information. To simplify the problem, we used an attitude (roll and pitch) sensor. Our current system employs an XSens MTi IMU, which has an error below 0.5 . The advantages of our approach are that it is easy to implement, provides robust pose and map estimates, and is suitable for small flying vehicles. Fig. 1 depicts the blimp and the helicopter used to carry out the experimental evaluations described in this paper.

Fig. 1. Two small-scale aerial vehicles used to evaluate our mapping approach.
SECTION II

## Related Work

Building maps with robots equipped with perspective cameras has received increasing attention in the last decade. Davison et al. [7] proposed a single-camera SLAM algorithm based on a Kalman filter. The features considered in this approach are initialized by using a particle filter that estimates their depth. Civera et al. [5] extended this framework by proposing an inverse-depth parameterization of the landmarks. Since this parameterization can be accurately approximated by a Gaussian, the particle filter can be avoided in the initialization of the features. Subsequently, Clemente et al. [6] integrated this technique in a hierarchical SLAM framework, which has been reported to successfully build large-scale maps with comparably poor sensors. Chekhlovet al. [3] proposed an online visual SLAM framework that uses a scale-invariant feature transform (SIFT)-like feature descriptors and tracks the 3-D motion of a single camera by using an unscented Kalman filter. The computation of the features is speeded up by utilizing the estimated camera position to guess the scale. Jensfelt et al. [11] proposed an effective way for online mapping applications by combining a SIFT feature extractor with an interest points tracker. While the feature extraction can be performed at low frequency, the movement of the robot is constantly estimated by tracking the interest points at high frequency.

Other approaches utilize a combination of inertial sensors and cameras. For example, Eustice et al. [8]rely on a combination of highly accurate gyroscopes, magnetometers, and pressure sensors to obtain an accurate estimate of the orientation and altitude of an underwater vehicle. Based on these estimates, they construct a precise global map using an information filter based on high-resolution stereo images. Piniés et al. [17] present a SLAM system for handheld monocular cameras and employ an inertial measurement unit (IMU) to improve the estimated trajectories. Andreasson et al. [1] presented a technique that is based on a local similarity measure for images. They store reference images at different locations and use these references as a map. In this way, their approach is reported to scale well with the size of the environment. Recently, Konolige and Agrawal [13]presented a technique inspired by scan matching with laser range finders. The poses of the camera are connected by synthetic measurements obtained from incremental bundle adjustment performed on the images. An optimization procedure is used to find the configuration of camera poses that is maximally consistent with the measurements. Our approach uses a similar SLAM formulation but computes the synthetic measurements between poses based on an efficient pairwise frame alignment technique.

Jung et al. [12] proposed a technique that is close to our approach. They use a high-resolution stereo camera for building elevation maps with a blimp. The map consists of 3-D landmarks extracted from interest points in the stereo image obtained by a Harris corner detector, and the map is estimated using an extended kalman filter (EKF). Due to the wide field of view and the high quality of the images, the nonlinearities in the process were adequately solved by the EKF. In contrast to this, our approach is able to deal with low-resolution and low-quality images. It is suitable for mapping indoor and outdoor environments and for operating on small-size flying vehicles. We furthermore apply a highly efficient error minimization approach [9], which is an extension of the research of Olson et al. [16].

Our previous study [19] focused more on the optimization approach and neglected the uncertainty of the vehicle when seeking for loop closures. In the approach presented in this paper, we consider this uncertainty and provide a significantly improved experimental evaluation using real aerial vehicles.

SECTION III

## Graph-Based SLAM

Throughout this paper, we consider the SLAM problem in its graph-based formulation. Accordingly, the poses of the robot are described by the nodes of a graph. Edges between these nodes represent spatial constraints between them. They are typically constructed from observations or from odometry. Under this formulation, a solution to the SLAM problem is a configuration of the nodes that minimizes the error introduced by the constraints.

We apply an online variant of the 3-D optimization technique recently presented by Grisetti et al. [9]to compute the maximum likely configuration of the nodes. Online performance is achieved by optimizing only those portions of the graph that require updates after introducing new constraints. Additional speedups result from reusing previously computed solutions to obtain the current one, as explained by Grisettiet al. [10]. Our system can be used as a black box to which one provides an initial guess of the positions of the nodes as well as the edges, and it computes the new configuration of the network (see Fig. 2). The computed solution minimizes the error introduced by contradicting constraints.

Fig. 2. System overview. This paper describes the visual SLAM system represented by the grayish box.

In our approach, each node xi models a 6-DoF camera pose. The spatial constraints between two poses are computed from the camera images and the attitude measurements. An edge between two nodes i and j is represented by the tuple 〈 δjiji〉, where δji and Ωji are the mean and the information matrix of the measurement. Leteji(x) be the error introduced by the constraint 〈 j,i\rangle. Assuming the independence of the constraints, a solution to the SLAM problem is given by TeX Source $${\bf x}^{*}=\mathop{\rm argmin}_{{\bf x}} \sum_{\left\langle j, i \right\rangle} e_{ji}({\bf x})^T \Omega_{ji} e_{ji}({\bf x}) .\eqno{\hbox{(1)}}$$

Our approach relies on visual features extracted from the images obtained from downlooking cameras. We use speeded-up robust features (SURF) [2] that are invariant with respect to rotation and scale. Each feature is represented by a descriptor vector and the position, orientation, and scale in the image. By matching features between different images, one can estimate the relative motion of the camera, and thus, construct the graph that serves as input to the optimizer. In addition to that, the attitude sensor provides the roll and pitch angle of the camera. In our experiments, we found that the roll and the pitch measurements are comparably accurate even for low-cost sensors and can be directly integrated into the estimate. This reduces the dimensionality of each pose that needs to be estimated from .

In this context, the main challenge is to compute the constraints between the nodes (here, camera poses) based on the data from the camera and the attitude sensor. Given these constraints, the optimizer processes the incrementally constructed graph to obtain estimates of the most likely configuration on-the-fly.

SECTION IV

## Spatial Relation Between Camera Poses

As mentioned in the previous section, the input to the optimization approach is a set of poses and constraints between them. In this section, we describe how to determine such constraints.

As a map, we directly use the graph structure of the optimizer. Thus, each camera pose corresponds to one node. Additionally, we store for each node the observed features as well as their 3-D positions relative to the node. The constraints between nodes are computed from the features associated with the nodes. In general, at least three pairs of correspondences between image points and their3-D positions in the map are necessary to compute the camera position and orientation [18]. However, in our setting we need only two such pairs since the attitude of the camera is known from the IMU.

In practice, we can distinguish two different situations when extracting constraints: visual odometry and place revisiting. Odometry describes the relative motion between subsequent poses. To obtain an odometry estimate, we match the features in the current image to the ones stored in the previous n nodes. This situation is easier than place revisiting because the set of potential feature correspondences is relatively small. In the case of place revisiting, we compare the current features with all the features acquired from robot poses that lie within the 3σ confidence interval given by the pose uncertainty. This interval is computed by using the approach of Tipaldi et al. [20] and applies covariance intersection on a spanning tree to obtain conservative estimates of the covariances. Since the number of features found during place revisiting can be quite high, we introduce further approximations in the search procedure. First, we use only a small number of features from the current image when looking for potential correspondences. These features are the ones that were better matched when computing visual odometry (which have the lowest descriptor distance). Second, we apply a k-D tree to efficiently query for similar features, and we use the best-bins-first technique proposed by Lowe [14].

Every time a new image is acquired, we compute the current pose of the camera based on both visual odometry and place revisiting, and augment the graph accordingly. The optimization of the graph is performed only if the computed poses are contradictory.

In the remainder of this section, we first describe how to compute a camera pose given a pair of known correspondences, and subsequently, we describe our PROSAC-like procedure for determining the best transformation given a set of correspondences between the features in the current image and another set of features computed either via visual odometry or place revisiting.

### A. Computing a Transformation From Feature Correspondences

In this section, we explain how to compute the transformation of the camera if we know the 3-D position of two features f1 and f2in the map and their projections i1 and i2 on the current image. Assuming known camera calibration parameters, we can compute the projections of the points on the normalized image plane. By using the attitude measurements from the IMU, we can compute the positions of these points as they would have been captured from a perfectly downward facing camera. Let these transformed positions bei1′, i2′.

Subsequently, we compute the altitude of the camera according to the procedure illustrated in Fig. 3, by exploiting the similarity of triangles. Once the altitude is known, we can compute the yaw of the camera by projecting the map features f1 and f2 into the same plane as i1′, i2′. As a result, the yaw is the angle between the two resulting lines on this plane.

Finally, we determine x and y as the difference between the positions of the map features and the projections of the corresponding image points, by reprojecting the image features into the map according to the known altitude and yaw angle.

Fig. 3. Illustration of how to compute the height of the camera, given two corresponding features, under known attitude. Whereas cam is the camera position, i1′, i2′are the projections of the features f1 and f2 on the normalized image plane, already rotated according to the attitude measured by the IMU. The term pp is the principle point of the camera (vertically downward from the camera) on the projection plane. The termspp′ and f1′ are the projections of pp and f1 at the altitude of f2. Also, h is the altitude difference between the camera and f2. It can be determined by exploiting the similarity of the triangles {i1′, i2′, pp} and{f1′, f2, pp′}.

### B. Computing the Best Camera Transformation Based on a Set of Feature Correspondences

In the previous section, we described how to compute the camera pose given only two correspondences. However, both visual odometry and place revisiting return a set of correspondences. In the following, we describe our procedure to efficiently select from the input set the pair of correspondences for computing the most likely camera transformation.

We first order these correspondences according to the Euclidean distance of their descriptor vectors. Let this ordered set be C = {c1,…,cn}. Then, we select pairs of correspondences in the order defined by the following predicate: TeX Source \eqalignno{\left\langle c_{a_1}, c_{b_1} \right\rangle < \left\langle c_{a_2}, c_{b_2} \right\rangle&\Leftrightarrow(b_1<b_2 \lor (b_1=b_2 \land a_1 < a_2)) \cr&\qquad\!\!\! \land\; a_1 < b_1 \land a_2 < b_2.&{\hbox{(2)}}}

In this way, the best correspondences (according to the descriptor distance) are used first but the search procedure will not get stuck for a long time in case of false matches with low descriptor distances. This is illustrated in the following example: assume that the first correspondence c1 is a false match. Our selection strategy generates the sequence 〈 c1,c2 〉, 〈 c1,c3 〉, 〈 c2,c3 〉, 〈 c1,c4 〉, 〈 c2,c4〉, 〈 c3,c4〉, etc. A pair without the false match 〈 c2,c3〉 will be selected in the third step. A more naïve selection strategy would be to first try all pairs of correspondences 〈 c1,cx〉 with c1in the first position. However, this results in a less effective search.

Only pairs that involve different features are used. The corresponding transformation Tca, cb is then determined for the current pair (see Section IV-A). This transformation is then evaluated based on the other features in both sets using a score function, which is presented in the next section. The process can be stopped when a transformation with a satisfying score is found or when a timeout is reached. The solution with the highest score is returned as the current assumption for the transformation.

### C. Evaluating a Camera Transformation

In the previous sections, we explained how to compute a camera transformation based on two pairs of corresponding features and how to select those pairs from two input sets of features. By using different pairs, we can compute a set of candidate transformations. In this section, we explain how to evaluate them and how to choose the best one among them.

To select the best transformation, we rank them according to a score function, which first projects the features in the map into the current camera image and then compares the distance between the feature positions in the image. The score is given by TeX Source $${\rm score}(T_{c_a, c_b}) = \sum_{ \left\{i \,\mid\, i\notin\{a, b\} \right\} } v(c_i).\eqno{\hbox{(3)}}$$In this equation, the function v(ci) calculates the weighted sum of the relative displacement of the corresponding features in the current image and the Euclidean distance of their feature descriptors TeX Source $$v(c_i) = 1 - \left[\alpha {{d^{\rm desc}(c_i)}\over { d^{\rm desc}_{\rm max}}} +(1-\alpha){{d^{\rm img}(c_i)}\over {d^{\rm img}_{\rm max}}}\right].\eqno{\hbox{(4)}}$$In the sum on the right-hand side of (3), we consider only those feature correspondences ci whose distancesdimg(ci) in the image and distancesddesc(ci) in the descriptor space are smaller than the thresholds dimgmax andddescmax introduced in (4). This prevents single outliers from leading to overly bad scores.

More detailed, dimgmax is the maximum distance in pixels between the original and the reprojected feature. In our experiments, this value was set to 2 pixels for images of320 × 240 pixels. The higher the motion blur in the image, the larger the value for dimgmax should be chosen. The minimum value depends on the accuracy of the feature extractor. Increasing this threshold also allows the matching procedure to return less accurate solutions for the position estimation. The blending factor α mixes the contribution of the descriptor distance and the reprojection error. The more distinct the features are, the higher α can be chosen. In all our experiments, we used α = 0.5. The value ddescmax has been tuned manually. For the 64-D SURF descriptors, we obtained the best results by setting this threshold to values close to 0.3. The lower the quality of the image, the higher the value that should be chosen for ddescmax.

Note that the technique to identify the correspondences between images is similar to the PROSAC [4] algorithm, which is a variant of random sample consensus (RANSAC). Whereas PROSAC takes into account a quality measure of the correspondences during sampling, RANSAC draws the samples uniformly. We use the distance between feature descriptors as a quality measure. In our variant of PROSAC, the correspondences are selected deterministically. Since we only need two correspondences to compute the camera transformation, the chances that the algorithm gets stuck due to wrong correspondences are very small.

After identifying the transformation between the current pose of the camera and a node in the map, we can directly add a constraint to the graph. In the subsequent iteration of the optimizer, the constraint is thus taken into account when computing the updated positions of the nodes.

SECTION V

## Experiments

In this section, we present the experiments carried out to evaluate our approach. We used only real world data that we partially recorded with a sensor platform carried in the hand of a person as well as with a real blimp and a helicopter (see Fig. 1). In all experiments, our system was running at 5–15 Hz on a 2.4-GHz dual-core. Videos of the experiments can be found at http://ieeexplore.ieee.org.

### A. Outdoor Environments

In the first experiment, we measured the performance of our algorithm using data recorded in outdoor environments. Since even calm winds outside buildings prevent us from making outdoor experiments with our blimp or our small-size helicopter, we mounted a sensor platform on the tip of a rod and carried this by hand to simulate a freely floating vehicle. This sensor platform is equipped with two standard Web cams (Logitech Communicate STX). The person carried the platform along an extended path around a building over different types of grounds like grass and pavement. The trajectory has a length of about 190 m. The final graph contains approximately1400 nodes and 1600 constraints. The trajectory resulting from the visual odometry is illustrated in the left image of Fig. 4. Our system autonomously extracted data association hypotheses and constructed the graph. These matching constraints are colored as light blue/gray in the same image. After applying our optimization technique, we obtained a map in which the loop has been closed successfully. The corrected trajectory is shown in the right image of Fig. 4. A perspective view, which also shows the elevations, is depicted in Fig. 5.

Fig. 4. (Left) Path of the camera in black and the matching constraints in gray. (Right) Corrected trajectory after applying our optimization technique.
Fig. 5. (Left) Perspective view of the map of the outdoor experiment together with two camera images recorded at the corresponding locations. (Right) Person with the sensor platform mounted on a rod to simulate a freely floating vehicle.

This experiment illustrates that our approach is able to build maps of comparably large environments and that it is able to find the correct correspondences between observations. Note that this result has been achieved without any odometry information and despite the fact that the cameras are of low quality and that the images are blurry due to the motion and mostly show grass and concrete.

### B. Statistical Experiments

The second experiment evaluates the performance of our approach quantitatively in an indoor environment. We acquired the data with the same sensor setup as in the previous experiment. We moved the sensor platform through the corridor of our building that has a wooden floor. For a statistical evaluation of the accuracy of our approach, we placed several objects on the ground at known locations, whose positions we measured manually with a measuring tape (up to an accuracy of approximately 3 cm). The distance between neighboring landmarks is 5 m in the x-direction and1.5 m in the y-direction. The six landmarks are labeled A to F in Fig. 6. We used these six known locations as ground truth, which allowed us to measure the accuracy of our mapping technique. Fig. 6 depicts a resulting map after applying our least-square error minimization approach. We repeated the experiment ten times and calculated the average error obtained by the optimization procedure.

Fig. 6. Top view of the map of the indoor experiment. The image shows the map after least-square error minimization. The labels A to F present six landmarks for which we determined the ground truth location manually to evaluate the accuracy of our approach.
TABLE I Accuracy of the Relative Pose Estimate Between Landmarks

Table I summarizes this experiment. As can be seen, the error of the relative pose estimates is always below 8% and typically around 5% compared to the true difference. This results mainly from the error in our self-made and comparably low-quality stereo setup. To our opinion, this is an accurate estimate for a system consisting of two low-cost cameras and an IMU, lacking sonar, laser range data, and real odometry information.

### C. Experiments With a Blimp

The third experiment is also a statistical analysis carried out with our blimp. The blimp has only one camera looking downward. Instead of the stereo setup, we mounted a sonar sensor to measure its altitude. Since no attitude sensor was available, we assumed the roll and pitch angle to be zero (which is an acceptable approximation given the smooth motion of a blimp). We placed two landmarks on the ground at a distance of 5 m and flew ten times over the scene. The mean estimated distance between the two landmarks was 4.91 m with a standard deviation of 0.11 m. Thus, the real position was within the 1σ interval.

Fig. 7. Map constructed by the blimp. The ground truth distance between both landmarks is 5 m and the estimated distance was4.91 m with 0.11 m standard deviation (averaged over ten runs). The map was used to autonomously steer the blimp to user-specified locations.

The next experiment in this paper is designed to illustrate that such a visual map can be used for navigation. We constructed the map shown in Fig. 7 with our blimp. During this task, the blimp was instructed to return always to the same location and was repeatedly pushed away several meters (see Fig. 8). The blimp was always able to register its current camera image against the map constructed so far and in this way kept track of its location relative to the map. This enabled the controller of the blimp to steer the air vehicle to the desired starting location. The experiment lasted 18 min, and the blimp recorded 10 800 images. The robot processed around 500 000features, and the map was constructed online.

Fig. 8. After a person pushes the blimp away, the vehicle is able to localize itself and return to the home location using the map shown in Fig. 7 (see video material).

### D. Experiments With a Lightweight Helicopter

We finally mounted an analog RF camera on our lightweight helicopter depicted in Fig. 1. This helicopter is neither equipped with an attitude sensor nor with a sonar sensor to measure its altitude. Since neither stereo information nor the elevation of the helicopter is known, the scale of the visual map was determined by a known size of one landmark (a book lying on the ground).Furthermore, the attitude was assumed to be zero, which is a quite rough approximation for a helicopter. We recorded a dataset by flying the helicopter and overlayed the resulting map with an occupancy grid map recorded from laser range data with a wheeled robot. Fig. 9 depicts the result. The crosses mark some locations in the occupancy grid map, while the circles mark the same places in the visual map. Even under the severe sensory limitations, our approach was able to estimate its position in a quite accurate manner. The helicopter flew a distance of around35 m, and the map has an error in the landmark locations that varies between 20 and 60 cm.

Fig. 9. Visual map built with a helicopter overlayed on a 2-D grid map constructed from laser range finder data recorded with a wheeled robot.
SECTION VI

## Conclusion

In this paper, we presented a robust and practical approach to learn visual maps based on downlooking cameras and an attitude sensor. Our approach applies a robust feature matching technique based on a variant of the PROSAC algorithm and in combination with SURF features. The main advantages of the proposed methods are that it can operate with monocular or with a stereo camera system, is easy to implement, and is robust to noise in the camera images.

We presented a series of real-world experiments carried out with a small-sized helicopter, a blimp, and by manually carrying a sensor platform. Different statistical evaluations of our approach show its ability to learn consistent maps of comparably large indoor and outdoor environments. We furthermore illustrated that such maps can be used for navigation tasks of air vehicles.

## Footnotes

Manuscript received December 12, 2007; revised May 16, 2008. First published September 30, 2008; current version published nulldate. This paper was recommended for publication by Associate Editor J. Neira and Editor L. Parker upon evaluation of the reviewers' comments. This work was supported in part by the German Research Foundation(DFG) under Contract SFB/TR-8 and in part by the European Commission(EC) under Contract FP6-2005-IST-6-RAWSEEDS and Contract FP6-IST-34120-muFly.

The authors are with the Department of Computer Science, University of Freiburg, D-79110 Freiburg, Germany.

This paper has supplementary downloadable multimedia material available at http://ieeexplore.ieee.org. provided by the author. This material includes two videos. Both videos are AVI files encoded with standard MPEG4 (XVID) codec and no audio. The first video, steder07tro-outdoorexperiment-mpeg4.avi (10.1 MB), shows the experiment in which a person was carrying the sensor platform outside a building to acquire vision data. After the data acquisition phase, the iterations of the optimization approach are illustrated. The second video, steder07tro-blimpexperiment-mpeg4.avi (19.5 MB), shows parts of the indoor experiment using the blimp. After building a map, the blimp was pushed away several times and had to return to its starting location. The visual map was used for self-localization and navigation. The live optimization during the map creation can be seen when the trajectory changes. Contact steder@informatik.uni-freiburg.de for further questions about this work.

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.org.

## References

1. Mini-SLAM: Minimalistic visual SLAM in large-scale environments based on a new interpretation of image similarity

H. Andreasson, T. Duckett, A. Lilienthal

Proc. IEEE Int. Conf. Robot. Autom. (ICRA) Rome Italy 2007 pp. 4096–4101

2. Surf: Speeded up robust features

H. Bay, T. Tuytelaars, L. Van Gool

Eur. Conf. Comput. Vis. (ECCV), presented at the, Graz, Austria, 2006

3. Robust real-time visual SLAM using scale prediction and exemplar based feature description

D. Chekhlov, M. Pupilli, W. Mayol-Cuevas, A. Calway

Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) 2007-06, pp. 1–7

4. Matching with PROSAC-progressive sample consensus

O. Chum, J. Matas

Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) Los Alamitos CA 2005 pp. 220–226

5. Inverse depth parametrization for monocular SLAM

J. Civera, A. J. Davison, J. M. M. Montiel

IEEE Trans. Robot., Vol. 24, issue (5), 2008-10

6. Mapping large loops with a single hand-held camera

L. A. Clemente, A. Davison, I. Reid, J. Neira, J. D. Tardós

Robot. Sci. Syst. (RSS) Conf., presented at the, Atlanta, GA 2007

7. MonoSLAM: Real time single camera SLAM

A. Davison, I. Reid, N. Molton, O. Stasse

IEEE Trans. Pattern Anal. Mach. Intell., Vol. 29, issue (6) pp. 1052–1067, 2007-06

8. Visually mapping the RMS Titanic: Conservative covariance estimates for SLAM information filters

R. M. Eustice, H. Singh, J. J. Leonard, M. R. Walter

Int. J. Robot. Res., vol. 25, issue (12), p. 1223–1242, 2006-12

9. Efficient estimation of accurate maximum likelihood maps in 3-D

G. Grisetti, S. Grzonka, C. Stachniss, P. Pfaff, W. Burgard

Int. Conf. Intell. Robots Syst. (IROS), presented at the, San Diego CA 2007

10. Online constraint network optimization for efficient maximum likelihood map learning

G. Grisetti, D. Lodi Rizzini C. Stachniss, E. Olson, W Burgard

IEEE Int. Conf. Robot. Autom. (ICRA), presented at the, Pasadena CA 2008

11. A framework for vision based bearing only 3-D SLAM

P. Jensfelt, D. Kragic, J. Folkesson, M. Björkman

Proc. IEEE Int. Conf. Robot. Autom. (ICRA) Orlando CA 2006 pp. 1944–1950

12. High resolution terrain mapping using low altitude stereo imagery

I. Jung, S. Lacroix

Int. Conf. Comput. Vis. (ICCV), presented at the, Nice France 2003

13. Frame-frame matching for realtime consistent visual mapping

K. Konolige, M. Agrawal

Proc. IEEE Int. Conf. Robot. Autom. (ICRA) Rome Italy 2007 pp. 2803–2810

14. Object recognition from local scale-invariant features

D. G. Lowe

Proc. Int. Conf. Comput. Vis. (ICCV) Krekyra Greece 1999 pp. 1150–1157

15. 6d SLAM with approximate data association

A. Nüchter, K. Lingemann, J. Hertzberg, H. Surmann

Proc. 12th Int. Conf. Adv. Robot. (ICAR) 2005 pp. 242–249

16. Fast iterative optimization of pose graphs with poor initial estimates

E. Olson, J. Leonard, S. Teller

Proc. IEEE Int. Conf. Robot. Autom. (ICRA) 2006 pp. 2262–2269

17. Inertial aiding of inverse depth slam using a monocular camera

P. Piniés, T. Lupton, S. Sukkarieh, J. D. Tardós

Proc. IEEE Int. Conf. Robot. Autom. (ICRA) Rome Italy 2007 pp. 2797–2802

18. Linear N-Point camera pose determination

L. Quan, Z.-D. Lan

IEEE Trans. Pattern Anal. Mach. Intell., Vol. 21, issue (8) pp. 774–780 1999-08

19. Learning maps in 3-D using attitude and noisy vision sensors

B. Steder, G. Grisetti, S. Grzonka, C. Stachniss, A. Rottmann, W. Burgard

Int. Conf. Intell. Robots Syst. (IROS), presented at the, San Diego CA 2007

20. Approximate covariance estimation in graphical approaches to SLAM

G.D. Tipaldi, G. Grisetti, W. Burgard

Proc. Int. Conf. Intell. Robots Syst. (IROS) San Diego CA 2007 pp. 3460–3465

21. Multi-level surface maps for outdoor terrain mapping and loop closing

R. Triebel, P. Pfaff, W. Burgard

Int. Conf. Intell. Robots Syst. (IROS), presented at the, Beijing China 2006

## Authors

No Authors Available

## Cited By

No Citations Available

## Keywords

### IEEE Keywords

No Keywords Available

### INSPEC: Controlled Indexing

aerospace control, aircraft, mobile robots, path planning, robot vision

### More Keywords

No Keywords Available

No Corrections

## Media

Video

20,012 KB
Video

### steder07tro-outdoorexperiment-mpeg4

10,314 KB
This paper appears in:
IEEE Transactions on Robotics
Issue Date:
OCTOBER 2008
On page(s):
1088 - 1093
ISBN:
1552-3098
Print ISBN:
N/A
INSPEC Accession Number:
10291070
Digital Object Identifier:
10.1109/TRO.2008.2004521
Date of Current Version:
31 Oct, 2008
Date of Original Publication:
30 Sep, 2008

Noreils, F.R.