THE SIMULTANEOUS localization and mapping (SLAM)problem consists in processing the information obtained by a sensor installed on a mobile platform to obtain an estimate of its own pose while building a map of the environment. It has been the subject of continuous attention during the last two decades (for recent reviews, see [1], [2], [3]). The first consistent solution proposed, and still a popular one, is extended Kalman filter (EKF)-SLAM [4], [5], [6], which represents the vehicle pose and the location of a set of environment features in a joint state vector that is estimated using the EKF. Under the assumption of white Gaussian noise, the EKF provides a suboptimal way to deal with the uncertainties associated with the motion and measurement processes, due to the inherent linearization errors. To clarify, in the rest of the paper, we will refer to the EKF-SLAM solution as *suboptimal* as opposed to other *approximated* techniques that introduce additional approximations besides linearization.

Despite its relative success, the EKF-SLAM algorithm suffers from two main limitations.

It requires updating the full map covariance matrix after each measurement, giving a memory complexity of *O*(*n*^{2}) and a time complexity of*O*(*n*^{2}) per step, where *n* is the total number of features stored in the map [5].

The EKF linearization approximations produce optimistic values for the map covariance matrix and introduce errors in the estimation, which may result in inconsistency [7], [8].

Techniques based on building submaps confront both problems at the same time. The main motivation for using submaps is clear: if a large area is split into several submaps with the number of features bounded by a constant, the submaps can be built in constant time per step. To clarify terminology, in this paper, we will use the generic term *submap* for a map of a small area inside a larger map. We will call *absolute submap* a submap expressed in global coordinates. We will use the term *local submap* or simply *local map* for a submap expressed with respect to a local coordinate frame. Although there is no formal proof, there is strong empirical evidence that using local submaps also improves the consistency of the EKF-SLAM [8]. The intuitive explanation is that, in local maps, uncertainty is small and the linearization errors introduced in the EKF remain small. Another advantage of these algorithms is that they allow direct implementation of data association methods since they work with covariance matrices.

The main contribution of this paper is a novel technique that allows the use of submap algorithms, avoiding the limitations imposed by the requirement of statistical independence between maps. The technique is based on the intrinsic structure of the SLAM problem that allows us to build submaps that can share information, remaining *conditionally independent.* During exploration of new terrain, it obtains local maps in constant time. After simple loop closures, it can recover the global map in linear time, without introducing any approximations besides the inherent EKF linearizations. As it works in covariance space, robust data association algorithms such as joint compatibility branch and bound (JCBB) [9]can be directly used. The technique has been implemented using absolute submaps or local submaps. In the second case, the effects of linearization errors are minimized, and the maps obtained are actually more precise and consistent than the maps obtained by the techniques based on EKF or extended information filter (EIF) that use global coordinates.

Section II discusses the related work. The basic technique for building conditionally independent submaps is introduced in Section III and particularized in Section IV for the case of Gaussian maps in covariance form. Section V presents the algorithms for exploration and loop closing. Section VI shows the application of the technique to the challenging case of pure monocular SLAM, closing a loop of140 m with a handheld camera in a public square. Finally, in Section VII, we summarize the main characteristics of the algorithm presented and propose future work. A preliminary version of this paper was presented in [10]. Apart from a more detailed presentation and discussion of the technique, this paper adds the loop closing technique and new experiments demonstrating the performance of the method.

In the context of Gaussian filters, several techniques have been proposed to address the computational complexity problem. Postponement [11] and the compressed EKF filter(CEKF) [12] reduce the computational cost by making updates in a local area around the robot, delaying the global map update until the vehicle moves to another area. The result obtained is suboptimal, but the global map update is still*O*(*n*^{2}).

Techniques based on the information filter take advantage of the near to sparse structure of the information matrix (the inverse of the map covariance matrix)to reduce the computational burden. In the sparse EIF(SEIF) [13], the information matrix of the SLAM posterior is approximated by rounding to zero the small off-diagonal elements. This prevents the interlandmark links from forming, and therefore, limits the density of the information matrix. The Thin-Junction Tree Filter (TJTF) [14]and the exactly SEIF (ESEIF) [15]maintain the sparsity by discarding some *weak* information such as the robot odometry. The exactly sparse delayed-state filters (ESDFs)[16] avoid the previous approximations by including the trajectory of the vehicle in the state vector that makes the information matrix exactly sparse. Nevertheless, some approximations are still performed when portions of the mean state vector are recovered from its canonical (information) form in order to evaluate the jacobians.

The main advantage of information filters is that both the measurement and motion steps can be performed by updating the information vector and matrix in constant time. However, to recover the estimated value of the map state, a sparse linear system has to be solved. This has been addressed using conjugate gradient [13], relaxation [17], or multilevel relaxation [18], which require quadratic or, at best, linear time to converge (see [19] for a discussion). A recent and very efficient technique that also works in information space is the treemap algorithm [20], which requires*O*(log *n*) time per step to recover a part of the state and *O*(*n*)to recover the whole map. However, it has only been tested using simulations, with known data association.

One important limitation of the techniques based on the information form is the difficulty of performing data association since the covariance matrix is not available. Most techniques resort to approximating the classical individual Mahalanobis gating, which is known to be problematic in difficult data association scenarios [9].

The problem of map consistency has motivated algorithms such as the unscented Kalman filter (UKF) [21] that achieve better consistency properties, but do not take into account the computational complexity problem.

Finally, some techniques based on building submaps confront complexity and consistency issues at the same time. The first technique using absolute submaps was decoupled stochastic mapping [22]. The main difficulty of the technique was that absolute submaps are not statistically independent and some approximations were needed to get rid of the dependencies, introducing inconsistency in the map. In local submaps, the base reference is usually chosen to be the first robot pose when the local map was started. This allows local maps to be initialized with zero uncertainty in the robot pose. Under the assumption of white noise and if no information is shared between maps, local maps are statically *independent* and thus uncorrelated [23]. Local maps can be consistently combined using map joining [23], or the equivalent constrained local submap filter (CLSF) [24], to obtain the global map in an *O*(*n*^{2}) operation. The more recent Divide and Conquer SLAM [25] is able to recover the global map in amortized *O*(*n*) time, provided that the overlap between maps remains small.

However, it is important to note that for a set of local maps to be independent, no information can be shared between them. This has several consequences.

Features that are seen from two neighboring local maps have a different estimation in each map. If the information that both features are the same were used, the map independence would be destroyed. This information can only be used when recovering the global map with map joining, which has *O*(*n*^{2})cost. More efficient techniques such as constant-time SLAM (CTS)[26], Atlas [27] or hierarchical SLAM [28] discard this information that results in weak links between maps, obtaining approximated solutions.

Loop consistency can be imposed at the global map level as hierarchical SLAM does, but the corrections obtained cannot be propagated to the individual features inside the local maps because this would destroy map independence. Other techniques such as CTS and Atlas simply discard the loop constraints to remain efficient.

Sensors with partial observability such as monocular vision require the integration of measurements taken from several robot poses to obtain an accurate estimation of a feature. With independent maps, features that are observed at the end of one local map and at the beginning of the next map remain quite imprecise in both maps.

Vehicle states such as velocities or sensor biases that have been estimated in real time cannot be transferred between maps. For example, this precludes the use of inertial sensors. Also, sensors that give absolute measurements such as Global Positioning System (GPS) or compass cannot be used without destroying map independence.

These limitations are particularly important in the extreme case of pure monocular SLAM where the only sensory input is a single camera, with no odometry. Under these conditions, real-time EKF-SLAM has been successfully demonstrated in small areas [29], [30], [31]. The first system able to extend the approach to large outdoor areas is based on building independent local maps that are combined using the hierarchical SLAM approach [32]. In that system, the constraint of map independence forces to start each local map from scratch, without any information about the environment or camera velocities. This makes the system slightly unreliable as the most critical part, map initialization, is repeated once and again along the trajectory. Furthermore, as in monocular SLAM the scale is intrinsically unobservable, the different local maps obtained have quite different scale factors.

The method proposed in this paper avoids these problems by building*conditionally independent* submaps, which can share information about the environment and vehicle states: the submaps are conditionally independent given the common states. The idea of conditional independence has been previously used in Rao–Blackwellized particle filter (RBPF) SLAM in a different sense:the estimations of the elements in the map are conditionally independent given the robot trajectory [33]. Recent optimizations of this approach have produced very efficient and accurate techniques for indoor and outdoor SLAM with laser data [34]. The idea of conditional independence between local maps has been recently applied in [35], but the method makes the approximation that there are no common features between submaps; this approximation is not needed in our technique.

Our method presents some similitude with the treemap algorithm [20] in the sense that flows of information are transferred between submaps to update previous map estimates. However, we use the covariance form instead of the information form, which allows us to apply effective data association algorithms such as JCBB. The second crucial difference is the use of local coordinates that improves precision, as shown in our experiments. Finally, our technique represents the information using sequential local maps instead of ordering features in a tree structure that has to be maintained and balanced, resulting in an algorithm that is easier to implement.

SECTION III

## Building Conditionally Independent Submaps

### A. Basic Probability Concepts

For the reader's convenience, this section summarizes the basic probability concepts that will be used in the rest of the paper. More detailed presentations can be found in [1] and [36].

The *conditional probability* of a random variable **x** given the value of the random variable **y** is defined as
TeX Source
$$p({\bf x} \vert {\bf y})={p({\bf x},{\bf y}) \over p({\bf y})}\eqno{\hbox{(1)}}$$where *p*(**x**,**y**) is the *joint distribution* and *p*(**y**)is the *marginal* distribution of **y**. In a more general case,
TeX Source
$$p({\bf x}\vert {\bf y},{\bf z})={p({\bf x},{\bf y}\vert {\bf z}) \over p({\bf y}\vert {\bf z})}.\eqno{\hbox{(2)}}$$Two random variables are *independent* when
TeX Source
$$p({\bf x}, {\bf y})= p({\bf x})p({\bf y})\eqno{\hbox{(3)}}$$which is equivalent to
TeX Source
$$p({\bf x}\vert {\bf y})= p({\bf x}).\eqno{\hbox{(4)}}$$Intuitively, this means that knowledge of **y** does not provide any information about **x**.

Two random variables **x** and **y** are *conditionally independent* given **z** when
TeX Source
$$p({\bf x}, {\bf y}\vert {\bf z})= p({\bf x}\vert {\bf z})p({\bf y}\vert {\bf z})\eqno{\hbox{(5)}}$$which is equivalent to
TeX Source
$$p({\bf x}\vert {\bf y},{\bf z})= p({\bf x}\vert {\bf z}).\eqno{\hbox{(6)}}$$In this case, if **z** is known, **y** does not provide any additional information about **x**.

In the case of two random variables that are jointly Gaussian, with mean and covariance given by
TeX Source
$$p({\bf x},{\bf y}) = {\cal N} \left(\left[\matrix{\hat{{\bf x}} \cr\hat{{\bf y}} }\right],\left[\matrix{P_x & P_{xy} \cr P_{yx} & P_{y} }\right]\right)\eqno{\hbox{(7)}}$$the process of *marginalization* consists simply in choosing the appropriate rows and columns of the mean vector and the covariance matrix
TeX Source
$$p({\bf y})=\int p({\bf x},{\bf y})\, d{\bf x} = {\cal N} \left(\hat{{\bf y}}, P_y \right)\eqno{\hbox{(8)}}$$and the process of *conditioning* is performed by [37]
TeX Source
$$\eqalignno{p({\bf x} \vert {\bf y})&={p({\bf x},{\bf y}) \over p({\bf y})} ={\cal N} \left(\hat{{\bf x}}^\prime, P^\prime_x \right)&\hbox{(9)}\cr\hat{{\bf x}}^\prime &= \hat{{\bf x}} + P_{xy}P_{yy}^{-1}({\bf y}-\hat{{\bf y}})&\hbox{(10)}\cr P^\prime_x &= P_x-P_{xy}P_{yy}^{-1}P_{yx}.&\hbox{(11)}}$$

### B. Conditionally Independent Absolute Submaps

Fig. 1 (top) shows an example of a Bayesian network that represents the probabilistic dependencies between stochastic variables involved in SLAM. Node **x**_{i} represents the state of the platform at the *i* th time step, **u**_{i} models the motion applied to the system at **x**_{i}, node **f**_{j} represents the *j* th feature of the map, and **z**_{i} compactly represents all feature observations taken from the *i* th platform location. Without loss of generality, we will use this example to illustrate the development of the technique.

The graph describes a map in which the vehicle has moved along five different locations **x**_{1:5} and has observed five features **f**_{1:5} during the trajectory. As the inputs**u**_{1:4} and observations **z**_{1:4} are known, the probability density function (pdf)associated with the graph is given by
TeX Source
$$p({\bf x}_{1:5},{\bf f}_{1:5}\vert {\bf z}_{1:4},{\bf u}_{1:4}).\eqno{\hbox{(12)}}$$

This pdf represents the joint distribution of the whole map and the trajectory. Now assume that we want to estimate the same map by building two submaps as shown in Fig. 1. In submap 1, the vehicle starts at **x**_{1} and finishes at **x**_{3} and has observed features **f**_{1:3} through measurements **z**_{1:2}. Therefore, at the end of submap 1, the pdf that describes the map estimate is given by
TeX Source
$$p({\bf x}_{1:3},{\bf f}_{1:3}\vert {\bf z}_{1:2},{\bf u}_{1:2}).\eqno{\hbox{(13)}}$$

Differences with current independent submap techniques begin now. Instead of starting submap 2 from scratch, we want to take advantage of the available estimation of features that are in the border between both submaps. In the example, feature **f**_{3}, which is visible from both submaps, will be copied to the second map. In addition, if we want to build absolute submaps, we should also include in submap 2 the current vehicle estimate **x**_{3}. So, the pdf that describes the initial state of submap 2 is just the result of marginalizing out the elements of the pdf (13) we are not interested in
TeX Source
$$p({\bf x}_3,{\bf f}_3\vert {\bf z}_{1:2},{\bf u}_{1:2}) = \int p({\bf x}_{1:3},{\bf f}_{1:3}\vert {\bf z}_{1:2},{\bf u}_{1:2})\, d{\bf x}_{1:2}\, d{\bf f}_{1:2}.\eqno{\hbox{(14)}}$$

Then, the vehicle continues traversing the second area, building submap 2. The vehicle has been in two new positions **x**_{4:5}, has reobserved feature **f**_{3} and indirectly **x**_{3} through**z**_{3}, and has observed two new features **f**_{4:5} through measurements **z**_{3:4}. Therefore, the final pdf of submap 2 is given by
TeX Source
$$p({\bf x}_{3:5},{\bf f}_{3:5}\vert {\bf z}_{1:4},{\bf u}_{1:4}).\eqno{\hbox{(15)}}$$As can be noticed in (13) and (15), both local maps share in common a robot location **x**_{3}, a feature **f**_{3}, and some measurements (**z**_{1:2},**u**_{1:2}); hence, they are not independent.

For clarity and generality, several nodes in the Bayesian network will be grouped together, as shown in Fig. 1(bottom). The notations used are as follows.

**x**_{A}: Features and robot positions that are only observed in the first submap. In the example, this corresponds to**f**_{1:2} and **x**_{1:2}.

**x**_{B}: Features and robot positions that are only observed in the second submap, i.e., **f**_{4:5} and **x**_{4:5}.

**x**_{C}: Common features and robot position that are observed both in the first and second submaps, i.e., **f**_{3} and **x**_{3}.

**z**_{a}: Inputs and observations in the first submap gathered from features in**x**_{A} and **x**_{C}, i.e., **u**_{1:2} and **z**_{1:2}.

**z**_{b}: Inputs and observations in the second submap gathered from features in**x**_{B} and **x**_{C}, i.e., **u**_{3:4} and **z**_{3:4}.

As can be seen in Fig. 1, the only connection between the set of nodes (**x**_{A}, **z**_{a}) and (**x**_{B},
**z**_{b}) is through node **x**_{C}, i.e. , both subgraphs are *d-separated* given
**x**_{C} [38]
. This implies that nodes **x**_{A} and
**z**_{a} are conditionally independent of nodes **x**_{B} and **z**_{b}
given node **x**_{C}. Intuitively, this means that if **x**_{C} is known, submaps 1 and 2 do not carry any additional information about each other. In the following, we will call this the
*submap conditional independence (CI) property*, which can be stated as
TeX Source
$$\eqalignno{
& p({\bf x}_A\vert {\bf x}_B,{\bf x}_C, {\bf z}_a,{\bf z}_b) = p({\bf x}_A\vert {\bf x}_C, {\bf z}_a) \cr
& p({\bf x}_B\vert {\bf x}_A,{\bf x}_C, {\bf z}_a,{\bf z}_b) = p({\bf x}_B\vert {\bf x}_C, {\bf z}_b).
&\hbox{(16)}
}$$

### C. Conditionally Independent Local Maps

The method described earlier can be easily adapted to building sequences of conditionally independent local maps, each with its own local base reference. Let us return to the moment when the first map was finished in the example of Fig. 1. With absolute maps, the last vehicle position **x**_{3} and feature **f**_{3} were chosen to initialize submap 2 in order to represent both maps with respect to the same reference and take the advantage of the available estimation of the vehicle and the feature. Instead, we now want to represent submap2 with respect to a local reference given by the current vehicle position **x**_{3}, and still use the information about feature**f**_{3} in submap 2. For doing so in a consistent way, a copy of feature **f**_{3} expressed in the new reference must be calculated and included in submap 1. In the following, a prime will be used to denote entities relative to the new base reference:
TeX Source
$${\bf f}_3^{\prime} = \ominus {\bf x}_3 \oplus {\bf f}_3.\eqno{\hbox{(17)}}$$After this process, the pdf that describes submap 1 is
TeX Source
$$p({\bf x}_{1:3},{\bf f}_{1:3}, {\bf f}_3^\prime\vert {\bf z}_{1:2},{\bf u}_{1:2}).\eqno{\hbox{(18)}}$$The new local map will start with robot position **x**_{3}′ being exactly zero. Obviously, this variable is completely independent of submap 1. By marginalizing (18), we obtain the pdf that describes the initial state of submap 2
TeX Source
$$p({\bf f}_3^\prime\vert {\bf z}_{1:2},{\bf u}_{1:2}).\eqno{\hbox{(19)}}$$Once the vehicle has traversed the second submap and has incorporated all observations gathered in it, the pdf associated with the final estimate of submap 2 is
TeX Source
$$p({\bf x}_{4:5}^\prime,{\bf f}_{3:5}^\prime\vert {\bf z}_{1:4},{\bf u}_{1:4}).\eqno{\hbox{(20)}}$$Fig. 2 shows the Bayesian network that corresponds to the new algorithm. As it can be seen, the structure of the network is the same as in Fig. 1, bottom. The only difference is that the part shared by both maps**x**_{C} in this case corresponds to the local representation of feature **f**_{3}′. As a consequence, the submap CI property (16) is valid for local submaps as well as for absolute submaps.

### D. Recovering the Global Map

The process of building the two conditionally independent submaps can be summarized in three steps.

Build the first submap, obtaining
TeX Source
$$p({\bf x}_A,{\bf x}_C\vert {\bf z}_a).\eqno{\hbox{(21)}}$$

In the case of local maps, add to the first map the common elements relative to the last robot pose. Start the second map with the result of marginalizing out the noncommon elements
TeX Source
$$p({\bf x}_C\vert {\bf z}_a)=\int p({\bf x}_A,{\bf x}_C\vert {\bf z}_a)\, d{\bf x}_A.\eqno{\hbox{(22)}}$$

Continue building the second submap adding new features to it, obtaining
TeX Source
$$p({\bf x}_B,{\bf x}_C\vert {\bf z}_a,{\bf z}_b).\eqno{\hbox{(23)}}$$

Our objective now is to combine the maps in (21) and (23) to obtain the joint distribution that corresponds to the global map. For doing so, the global map can be factorized as follows:
TeX Source
$$\eqalignno{
& p({\bf x}_A,{\bf x}_B,{\bf x}_C\vert {\bf z}_a,{\bf z}_b) \cr
&\quad = p({\bf x}_A\vert {\bf x}_B,{\bf x}_C,{\bf z}_a,{\bf z}_b)p({\bf x}_B,{\bf x}_C\vert {\bf z}_a,{\bf z}_b) \cr
&\quad = p({\bf x}_A\vert {\bf x}_C,{\bf z}_a)p({\bf x}_B,{\bf x}_C\vert {\bf z}_a,{\bf z}_b)
&\hbox{(24)}
}$$where the first equality comes from (2) and the second from the submap CI property (16). The second term in the factorization is directly the second submap
(23). The first term can be obtained from the first submap by conditioning
TeX Source
$$p({\bf x}_A\vert {\bf x}_C,{\bf z}_a)={p({\bf x}_A,{\bf x}_C\vert {\bf z}_a) \over p({\bf x}_C\vert {\bf z}_a)}.$$

Therefore, all the information needed to recover the global map can be obtained from the information stored in each of the submaps. Notice that no assumptions have been made about the particular distribution of the probability densities. The previous factorizations only depend on general probabilistic theorems and the intrinsic structure of SLAM.

SECTION IV

## Case of Gaussian Submaps

In this section, we will focus on the case when the probability densities are Gaussians represented in covariance form. Suppose we have built two submaps
TeX Source
$$\eqalignno{p({\bf x}_A,{\bf x}_C\vert {\bf z}_a) &= {\cal N} \left(\left[\matrix{\hat{{\bf x}}_{A_a} \cr\hat{{\bf x}}_{C_a} }\right],\left[\matrix{P_{A_a} & P_{AC_a} \cr P_{CA_a} & P_{C_a} \cr}\right]\right)&\hbox{(25)}\cr p({\bf x}_C,{\bf x}_B\vert {\bf z}_a,{\bf z}_b) &= {\cal N}\left(\left[\matrix{\hat{{\bf x}}_{C_{ab}} \cr\hat{{\bf x}}_{B_{ab}} \cr}\right],\left[\matrix{P_{C_{ab}} & P_{CB_{ab}} \cr P_{BC_{ab}} & P_{B_{ab}} }\right]\right)&\hbox{(26)}}$$where upper case subindices are for state vector components whereas lower case subindices describe which observations **z** have been used to obtain the estimate. For example, in the first submap, common elements **x**_{C} have been estimated using only observations **z**_{a}; hence, the mean and covariance estimates are denoted by and*P*_{Ca}, respectively.

We are interested in recovering the global map, represented by
TeX Source
$$\eqalignno{& p({\bf x}_A,{\bf x}_B,{\bf x}_C\vert {\bf z}_a,{\bf z}_b) \cr&= {\cal N} \left(\left[\matrix{\hat{{\bf x}}_{A_{ab}} \cr\hat{{\bf x}}_{C_{ab}} \cr\hat{{\bf x}}_{B_{ab}} }\right],\left[\matrix{P_{A_{ab}} & P_{AC_{ab}} & P_{AB_{ab}} \cr P_{CA_{ab}} & P_{C_{ab}} & P_{CB_{ab}} \cr P_{BA_{ab}} & P_{BC_{ab}} & P_{B_{ab}} }\right]\right).&\hbox{(27)}}$$Comparing (26) and (27), we observe that the second local map by itself coincides exactly with the last two blocks of the global map. Only the terms related to **x**_{A} in the global map will need to be computed. This is because the first submap has only been updated with the observations **z**_{a}, but not with the more recent observations **z**_{b}. In the next sections, we will show how to *backpropagate* **z**_{b} to update the first submap and how to compute the correlation between both submaps *P*_{ABab}.

### A. Backpropagation

From the submaps CI property, we know that
TeX Source
$$p({\bf x}_A\vert {\bf z}_a,{\bf z}_b,{\bf x}_C) = p({\bf x}_A\vert {\bf z}_a,{\bf x}_C) = {\cal N} (\hat{{\bf x}}_{A\vert C}, P_{A\vert C}).\eqno{\hbox{(28)}}$$

The conditional distribution *p*(**x**_{A}|**z**_{a},**z**_{b},**x**_{C}) can be obtained from the global map by marginalizing out **x**_{B} using (8) and conditioning on **x**_{C} using (10) and (11)
TeX Source
$$\eqalignno{\hat{{\bf x}}_{A\vert C} &= \hat{{\bf x}}_{A_{ab}}+P_{{AC}_{ab}}P_{C_{ab}}^{-1}({\bf x}_C-\hat{{\bf x}}_{C_{ab}})&\hbox{(29)}\cr P_{A\vert C} &= P_{A_{ab}}-P_{{AC}_{ab}}P_{C_{ab}}^{-1}P_{{CA}_{ab}}.&\hbox{(30)}}$$

The conditional probability *p*(**x**_{A}|**z**_{a},**x**_{C}) can also be obtained from the first map by conditioning on **x**_{C}, which gives
TeX Source
$$\eqalignno{ \hat{{\bf x}}_{A\vert C} &= \hat{{\bf x}}_{A_a}+P_{{AC}_a}P_{C_a}^{-1}({\bf x}_C-\hat{{\bf x}}_{C_a})&\hbox{(31)}\cr P_{A\vert C} &= P_{A_a}-P_{{AC}_a}P_{C_a}^{-1}P_{{CA}_a}.&\hbox{(32)}}$$Equating (29)– (32) for all **x**_{C}, and after some manipulations, we obtain the following backpropagation equations:
TeX Source
$$\eqalignno{K &= P_{{AC}_a}P_{C_a}^{-1} \cr&= P_{{AC}_{ab}}P_{C_{ab}}^{-1}&\hbox{(33)}\cr P_{{AC}_{ab}} &= K P_{C_{ab}}&\hbox{(34)}\cr P_{A_{ab}} &= P_{A_a} + K(P_{{CA}_{ab}} - P_{{CA}_a})&\hbox{(35)}\cr\hat{{\bf x}}_{A_{ab}}&= \hat{{\bf x}}_{A_a}+ K(\hat{{\bf x}}_{C_{ab}}-\hat{{\bf x}}_{C_a}).&\hbox{(36)}}$$

Observe that, in order to propagate the influence of the new observations **z**_{b} to the first map, we only need the mean and covariance of the common elements from the second map: and *P*_{Cab}. An important property of the previous equations is that **x**_{A} can be updated with the information contained in **z**_{b} without having to compute the correlations between both maps *P*_{ABab}.

An interesting property of the backpropagation equations is that they can be applied at any moment. They work correctly even if we backpropagate twice the same information: the terms inside the parentheses in (35) and (36) will be zero and the maps will remain unchanged. This allows us to schedule the backpropagation in moments with low CPU loads, or to delay it until a loop closure is detected.

### B. Computing the Correlation Between Submaps

If you want to obtain the covariance matrix of the whole map, the correlation term *P*_{ABab} should also be computed. For doing so, we first obtain the expression of the covariance of *p*(**x**_{A},**x**_{B}|**z**_{a},**z**_{b},**x**_{C}) by conditioning the global map on**x**_{C}:
TeX Source
$$\left[\matrix{P_{A_{ab}}-P_{AC_{ab}}P_{C_{ab}}^{-1}P_{CA_{ab}}& P_{AB_{ab}}-P_{AC_{ab}}P_{C_{ab}}^{-1}P_{CB_{ab}}\cr P_{BA_{ab}}-P_{BC_{ab}}P_{C_{ab}}^{-1}P_{CA_{ab}} & P_{B_{ab}}-P_{BC_{ab}}P_{C_{ab}}^{-1}P_{CB_{ab}}\cr}\right]\!.\eqno{\hbox{(37)}}$$

Due to the submaps CI property, we know that **x**_{A} and **x**_{B}are conditionally independent given **x**_{C}, and therefore, the correlation term in (37) must be zero, which gives the following expression for the correlation term:
TeX Source
$$\eqalignno{P_{AB_{ab}}&= P_{AC_{ab}}P_{C_{ab}}^{-1}P_{CB_{ab}} \cr&= KP_{CB_{ab}}.&\hbox{(38)}}$$

However, computing all the correlation blocks in the global map is an *O*(*n*^{2}) operation and, in fact, they are never required by our method, as will be explained next.

SECTION V

## EKF-SLAM With Conditionally Independent Submaps

### A. Exploration

Fig. 3 shows a schematic view of the elements of the total covariance matrix that are actually calculated during the process of building up a sequence of conditionally independent submaps. Notice that the off-diagonal blocks of the matrix are not zero because the submaps are not *independent.* However, they are not required to obtain the global map. If the maximum size of the submaps is bounded by a constant, the process of building the CI submaps is *O*(1) per step. In the absence of loop closures, the last submap, including the current robot pose, is already suboptimal. The suboptimal estimation of the previous submaps can be obtained with a complete backpropagation in *O*(*n*). It is important to point out that the backpropagation operation is, in fact, delayed until a submap is revisited or loop closing is detected reducing even more the computational complexity of the algorithm.

A simple implementation of our SLAM method is shown in Algorithm 1. The implementation follows the structure of the standard EKF SLAM algorithm but introduces two new functions: *map_transition* and*back_propagation.* Function *back_propagation* is implemented directly using (33)– (36). Function*map_transition* creates a new submap when the number of features in the current map exceeds a given threshold. When using absolute submaps, the common features are directly copied to the new map **m**_{j+1}, and the last robot pose in map **m**_{j} is replicated twice in the new submap. One of the copies will change as the robot moves through the new map carrying the current position, while the other will remain as a common element with map**m**_{j} to perform backpropagation. In the case of local submaps, map **m**_{j} is augmented with the common features expressed relative to the last robot pose in map **m**_{j}. These features are then copied to map **m**_{j+1} that is started with the robot pose equal to zero.

When using absolute submaps, our technique is similar to postponement [11] and compressed EKF [12] in the sense that most operations are performed in a local area, and then, the results are propagated to the rest of the map, obtaining the same solution as with the basic EKF-SLAM.However, we never need to compute the covariance matrix of the whole map, which reduces the computational to *O*(1) for the local operations and to *O*(*n*) for the complete backpropagation.

### B. Loop Closing

Fig. 4 (top) shows the dependencies between three absolute submaps that have been built using our technique, before a loop closure. The pdfs that define the state of each submap are
TeX Source
$$\eqalignno{
\hbox{Submap} \quad 1 & \rightarrow p({\bf x}_{1:3},{\bf f}_{1:3}\vert {\bf z}_{1:2}) \cr
\hbox{Submap} \quad 2 &\rightarrow p({\bf x}_{3:5},{\bf f}_{3:5}\vert {\bf z}_{1:4}) \cr
\hbox{Submap} \quad 3 &\rightarrow p({\bf x}_{5:7},{\bf f}_{5:7}\vert {\bf z}_{1:6}).
&\hbox{(39)}}$$

Observe that the most updated map is the current map, submap 3, that takes into account all the available observations.

Now assume that the robot is at position **x**_{7} and it closes a loop by observing feature **f**_{1} through measurement **z**_{7}. The algorithm used to maintain the CI between submaps is as follows.

The loop closing features, in this example **f**_{1}, are copied to the common parts of all the intermediate submaps belonging to the loop, including the current submap. The correlation of the copied features with the elements of each submap is calculated with (38). The pdfs of the submaps are now given by
TeX Source
$$\eqalignno{\hbox{Submap} \quad 1 &\rightarrow p({\bf x}_{1:3},{\bf f}_{1:3}\vert {\bf z}_{1:2}) \cr\hbox{Submap} \quad 2 &\rightarrow p({\bf x}_{3:5},{\bf f}_{3:5},{\bf f}_1\vert {\bf z}_{1:4}) \cr\hbox{Submap} \quad 3 &\rightarrow p({\bf x}_{5:7},{\bf f}_{5:7},{\bf f}_1\vert {\bf z}_{1:6}).&\hbox{(40)}}$$

The current submap (submap 3) is updated with the loop closing observations (**z**_{7}) using the standard EKF equations. The state of the Bayesian network after performing the previous operations is shown in Fig. 4, middle. In Fig. 4 (bottom), we have grouped some nodes together to clearly show that the CI property between submaps still holds. Submap 3 is now described by
TeX Source
$$\eqalignno{\hbox{Submap} \quad 3&\rightarrow p({\bf x}_{5:7},{\bf f}_{5:7},{\bf f}_1\vert {\bf z}_{1:7}).&\hbox{(41)}}$$

Due to the CI property, submaps 1 and 2 are updated using the*backpropagation* equations (33)– (36), obtaining
TeX Source
$$\eqalignno{\hbox{Submap} \quad 1&\rightarrow p({\bf x}_{1:3},{\bf f}_{1:3}\vert {\bf z}_{1:7}) \cr\hbox{Submap} \quad 2&\rightarrow p({\bf x}_{3:5},{\bf f}_{3:5},{\bf f}_1\vert {\bf z}_{1:7}).&\hbox{(42)}}$$

Notice that after applying this procedure, all the submaps are suboptimal (up to EKF linearization errors) because they have been updated with all the available information. The price paid to maintain conditional independence is that all the submaps belonging to the loop contain a copy of the loop closing features.

In our algorithm, the current submap is always kept suboptimal. With the loop closing procedure described earlier, when the information is propagated to a neighboring map, it becomes suboptimal. Repeating the process along the chain of submaps allows obtaining the global suboptimal map. Under the assumption that the size of the common part between submaps is bounded by a constant, the cost of each propagation is *O*(1), and the total cost of obtaining the global map after a loop closure is *O*(*n*). For this assumption to hold, the number of loops each local map belongs to must be bounded by a constant, regardless of the size of the environment. In extremely loopy environments, like a Manhattan-like world, this requirement can be easily violated. Maintaining efficiency in such situations is the subject of further investigations.

### C. Revisiting a Map

When a submap is revisited, the robot state that performs the transition to the revisited submap has to be included as a common element between both maps in order to preserve the CI property. Fig. 5 (top) shows an example in which the robot returns to the first submap when it is at **x**_{6} and reobserves feature **f**_{2}. Including **x**_{6} as a common element of both maps preserves their CI, without introducing any approximation. A potential drawback of this approach is that the size of the common parts can grow without bound when revisiting the same environment indefinitely. However, if the number of times the submaps are revisited is bounded by a constant, the global map can still be obtained in *O*(*n*).

An alternative approximate solution that improves efficiency is to marginalize out the robot in the current map and relocate it in the revisited map, as shown in Fig. 5(bottom). A similar technique is used in ESEIF [15]to maintain the sparsity of the information matrix. In this case, the odometry link is disregarded, and therefore, we lose some information (in the figure, node **u**_{5} has disappeared).Nevertheless, the information loss is minimal because we can use the features common to both maps to relocate the robot with good precision. In our pure monocular SLAM application, we do not even have odometry. Instead, we simply have a prediction of the camera location using a constant velocity model, whose accuracy is negligible compared with the accuracy of the visual observations.

SECTION VI

## Experimental Results

The algorithm proposed has been tested using real data obtained in an urban environment using a handheld monocular camera. The features extracted from the images are Harris points. Data association is performed by predicting the feature locations in the image and searching for them with normalized correlation [30]. The set of matched features is further verified using the JCBB algorithm [9] that has been demonstrated to add the needed robustness to build monocular maps in urban areas [32]. The method implemented to detect loop closing is based on the map-to-map matching algorithm proposed in [32]. Basically, this method uses unary constraints, in this case, the normalized correlation between features patches, and binary constraints, the relative distances between feature points in space, to find the maximal subset of geometrically compatible matchings. To speed up the search, a specialized version of the geometric constraints branch and bound (GCBB) algorithm [39] is implemented. In case of positive matches, we obtain which subset of features in the current map corresponds to a subset of features in a previous map.

The state vector of each submap **M**_{i} contains the final camera location **x**_{c}^{i} and the 3-D location of all features(**y**_{1}^{i}, …,**y**_{n}^{i}), with respect to the map base reference (absolute or local). For the feature representation, we use the inverse-depth model proposed in [31]
TeX Source
$$\eqalignno{{\bf x}^T &= ({\bf x}_c^T, {\bf y}_1^T, {\bf y}_2^T, \ldots, {\bf y}_n^T)&\hbox{(43)}\cr{\bf x}_c^T &= ({\bf r}^T, {\bf \Psi}^T, {\bf v}^T, {\bf w}^T)&\hbox{(44)}\cr{\bf y}_i &= (x_i \, y_i \, z_i \, {\theta}_i \, {\phi}_i \, {\rho}_i)^T.&\hbox{(45)}}$$This feature model represents the feature state as the camera optical center location (*x*_{i} *y*_{i} *z*_{i}) when the feature point was first observed, and the azimuth and elevation(θ_{i} φ_{i}) of the ray from the camera to the feature point. Finally, the depth *d*_{i} along this ray is represented by its inverse ρ_{i} = 1/*d*_{i}. The main advantage of the inverse-depth parametrization is that it allows consistent undelayed initialization of the 3-D point features, regardless of their distance to the camera.

The camera state **x**_{c} contains the position of the camera in Cartesian coordinates **r**, its attitude in Euler angles**Ψ**, the linear velocity **v**, and its angular velocity **w**. The process model used for the camera motion is a constant velocity model with white Gaussian noise in the linear and angular accelerations. Using pure monocular vision, without any kind of odometry, the scale of the map is not observable. However, by choosing appropriate values for the initial velocities and the covariance of the process noise, the EKF-SLAM is able to obtain an approximate scale for the map.

The experiment has been carried out along a public square in our hometown. The trajectory was performed with the handheld camera looking to the right and closing a loop following approximately the same path. The sequence contains 2700 images taken at 20 Hz along a path of around 140 m. During the map building process, approximately 500 salient features are extracted and tracked from the surrounding buildings and objects. Fig. 6 shows one of the images obtained in the experiment with the corresponding features extracted and the map that is being built. The process of building a sequence of conditionally independent local submaps can be seen in the accompanying video. On the images, the features are depicted on their predicted locations.

The characteristics of the experiment make it suitable to show the benefits of sharing information between maps. Since a feature can be seen from different local maps, the technique proposed turns out to be very useful, allowing us to reuse a feature without having to reinitialize it in each local map. In addition, linear and angular velocities of the camera, **v** and**w**, can be consistently shared between consecutive submaps avoiding significant scale changes, a problem that needs to be addressed in techniques that build independent submaps [32]. Fig. 7 shows an example of the advantages of our technique with respect to previous local mapping techniques. By sharing information with the first map, the second submap has the same scale and is more precise than in the case of independent maps. Actually, the second submap is suboptimal (up to the EKF linearization approximations). After backpropagation, the estimation of the features in the first submap is also improved to become suboptimal.

Fig. 8 compares the solutions obtained by a standard EKF algorithm and our method with absolute submaps for the first 1000 steps of the experiment. The number of absolute submaps created along this trajectory is 6. The EKF and the absolute submaps are superimposed in the figures to facilitate the comparison. Left figure shows our solution before performing the*backpropagation*, notice that both maps present several differences although the last absolute submap gives the same solution as the EKF since it is equally updated. On the right figure, we can see that after updating the previous maps with the*backpropagation*, both solutions are exactly identical as was expected from the theoretical analysis.

The running times of both algorithms in a MATLAB implementation are shown in Fig. 9. Notice that our algorithm runs in constant time since the size of the submaps is bounded whereas the standard EKF solution grows quadratically, being more than ten times slower than our method after the first 1000 steps. When the *backpropagation* is performed to update the previous maps, the extra time required turns out to be just 0.17 s, which has little effect in the time of the last submap as can be seen in the figure.

For comparison purposes, the whole dataset has been processed building absolute submaps and local submaps. In both cases, the maximum number of features per map has been limited to 50 and the total number of local maps created is 15. Fig. 10 presents the results obtained by both algorithms. The top plots show the maps obtained until the moment where the loop was detected by the map-to-map matching algorithm. The ellipsoids show the uncertainty in the camera position at the end of each local map, in absolute coordinates.

It can be noticed in the top left figure that the absolute submap technique gives an optimistic result. The uncertainty associated with the last camera position, around *x* = −8, *y* = 17, cannot explain the big gap that appears between the first estimate of the top wall and the new estimate obtained on the second pass. Nevertheless, the map matching algorithm used allows us to realize that both walls are indeed the same, and the loop can be closed as it is shown in the bottom left figure. However, as the estimate was inconsistent, the final map has to be slightly deformed to accomplish the loop constraint.

In contrast, the local submap technique achieves better consistency, and as a consequence, a better estimate of the map and the trajectory. This is noticeable in the larger size of ellipsoids before the loop closure constraint is applied, which include the path performed during the first pass. After imposing the loop closure, the map obtained is quite precise. Fig. 11 shows the final map superimposed on a satellite image of the environment obtained from Google Earth. The scale and the absolute position and orientation of the map, which are not observable with pure monocular SLAM, were adjusted by hand to draw the figure. Notice how the feature points mapped follow the shape formed by the surrounding buildings.

Using the implementation described in [32], both algorithms are able to build the sequence of submaps up to 60features in real time at 20 Hz, including all image processing. For this map size, the running time for a standard EKF-SLAM implementation would increase quadratically up to about 2 s per step. In our current MATLAB implementation, the whole process of loop detection, loop closing, and backpropagation takes 1.15, 0.3, and 1.8 s, respectively. We expect that an optimized C++implementation will take a fraction of a second. To maintain real-time performance at video frequency, loop closing can be implemented on a separate lower priority thread.

In this paper, we have proposed a new technique that allows the use of submap algorithms, avoiding the constraint imposed by the requirement of probabilistic independence between them. Using this method, salient features of the environment or vehicle state components, such as velocity or global attitude, can be shared between local maps in a consistent manner. Our experiments show that this is extremely valuable to reduce the errors committed during the first steps of map initialization, specially for monocular vision.

Under the assumption that the size of the common parts between submaps is bounded by a constant, the backpropagation algorithm allows us to make updates from local map to local map in constant time. In addition, a loop closing algorithm that takes advantage of the structure of the conditionally independent maps has been proposed. By means of this algorithm, the loop closure can be performed with a computational cost that is linear in the number of local maps instead of quadratic in the total number of features. So, the global cost of our method is *O*(1) during exploration and *O*(*n*) during loop closing. Memory requirements are also *O*(*n*), because the whole covariance matrix is never computed.

Unlike many other techniques, this performance gain is not obtained by sacrificing precision. Our technique does not use sparsification or other approximations, apart from the intrinsic EKF linearizations. Using absolute submaps, the result obtained is the same as with the classical EKF-SLAM algorithm. Using local submaps, the inconsistencies introduced by the linearization errors are reduced, and the results obtained are much better, for a small fraction of the cost.

We believe that this paper opens the way for developing new efficient submapping algorithms. We plan to extend the technique to larger environments, where hierarchical map decomposition and nonlinear optimization techniques may be useful. The method presented here relies on the common part between submaps being small to achieve efficiency. Environments with more complicated topologies may require the development of new algorithms and maybe approximations. Regarding applications, we have demonstrated real-time monocular SLAM in moderately large urban environments. For reliable loop detection in larger areas, appearance-based methods [40] or image-to-map matching techniques [41] will be investigated. We are also interested in large-scale SLAM with systems that include inertial or other sensors, where the proposed technique will allow us to consistently share global information or sensor biases across submaps. A work in this line is [42] where the CI technique proposed here allows the consistent sharing of vehicle states and compass measurements between the local maps.

### Acknowledgment

The authors would like to thank J. Neira, L. M. Paz, J. M. M. Montiel, and J. Civera for fruitful discussions and their help with the experimental setup.