SIMULTANEOUS localisation and mapping (SLAM) has been widely used to estimate the position of a robot in an initially unknown environment. In the original formulation [1], [2], [3], the state of the robot and the position of a set of features extracted from observations of the environment are jointly estimated using an extended Kalman filter (EKF). The complexity of updating the filter after acquiring an observation is quadratic in the number of estimated features, resulting in a large research effort to produce more scalable SLAM methods. Examples include partitioned updates [4] and submapping techniques [5], [6], [7].

Recently, there has been increasing interest in SLAM algorithms using an extended information filter (EIF), in which an observation update can be performed in constant time. Sparsification approximations can be used to ignore many near-zero elements of the information matrix in feature-based SLAM approaches [8], [9], while the information matrix is exactly sparse when past vehicle poses are maintained by the filter, such as in the viewpoint augmented navigation (VAN) framework [9], [10], [11]. Exploiting the sparsity of the information matrix can reduce both the computational complexity and memory requirements of the filter.

A related approach is the smoothing and mapping (SAM) framework [12], [13], which estimates the states of a set of features and a history of robot poses. Unlike EKF or EIF approaches in which linearization errors are permanently incorporated into the filter, the SAM algorithm can perform an iterative least-squares optimization process to converge to an optimal state estimate. The information matrix in the normal equations solved during each iteration possesses a similar sparsity structure to that of the VAN framework.

The main difficulty with information-form SLAM algorithms is the recovery of state estimates and covariances. State estimates are required in the EIF prediction, observation, and update operations, while state covariances are required for data association or loop-closure hypothesis generation. Efficient state estimate and covariance recovery is the main focus of this paper.

In a previous VAN implementation [9], [10], [11], state estimates and covariances were recovered using a Cholesky factor of the information matrix that was recalculated each time an image was acquired. Using Cholesky factorization modifications to keep a factor up-to-date in a SAM application was previously proposed, but not implemented due to the complexity of the algorithms when applied to sparse matrices [12], [13]. In this paper, the use of Cholesky factorisation modifications in the VAN framework is investigated, utilizing a recently developed implementation [14].

In parallel to the paper presented here, an incremental SAM approach has been developed [15], [16], in which a QR factorization of the SAM measurement Jacobian is updated using Givens rotations. The two approaches are closely related, since the upper triangular matrix **R** in a QR factorization of the SAM measurement Jacobian is a Cholesky factor of the information matrix [13].

This paper is organized as follows. Section II provides a justification for using the VAN framework for visual navigation applications. Section III summarizes the information-form VAN filtering process. Section IV describes the Cholesky factorization process, and the modifications used to maintain a factor of the VAN information matrix. Section V describes state estimate recovery methods. Section VI describes state covariance recovery methods. Section VII outlines the process to generate loop-closure hypotheses. Section VIII presents the results of the efficient VAN algorithm applied to data acquired by an autonomous underwater vehicle (AUV).Finally, Section IX provides concluding remarks.

SECTION II

## SLAM Frameworks and Visual Navigation

Two main SLAM frameworks have been proposed: feature-based and view-based algorithms. In feature-based SLAM [1], [2], [3], [4], [5], [6], [7], [8], the positions of features are estimated, and a loop closure is performed by observing a previously initialized feature. In view-based SLAM [9], [10], [11], [17], a set of vehicle poses at locations where sensor data was acquired is estimated. A loop closure is performed by registering two sets of sensor data to produce an observation of the relative pose between the vehicle locations where the data was acquired.

A disadvantage of the view-based method is the need to find pairs of previously unused sensor data to construct independent loop-closure observations. Two relative pose measurements created using common feature observations will be correlated, and ignoring these correlations will cause the filter to become inconsistent. Applying multiple relative pose observations to the filter simultaneously while considering the correlations is possible; however, it is impractical since loop-closure events involving a single pose may occur at multiple different times. In comparison, the feature-based approach has no such problem, since the filter maintains all correlations and observations can be applied individually.

The feature-based approach has the disadvantages of requiring the filter to estimate the feature states, and the need to select which features will be used at the time they are first observed. In comparison, the view-based approach has the advantage that the selection of a subset of features used in a loop-closure observation can be delayed until the feature association process is performed.

As a result of these properties, feature-based approaches are more suitable for applications in which a small set of features can reliably be extracted and associated, while pose-based methods are more appropriate for applications in which large numbers of features can be extracted, particularly when it is uncertain which features can be associated in the future.

When evaluating the suitability of each framework for large-scale visual navigation, the properties of visual feature extraction and association algorithms need be considered. A range of wide-baseline approaches suitable for loop-closure situations have been developed [18], [19], [20], [21]. Association of such features can typically be performed at high precision, but at low recall rates (incorrect feature associations are uncommon; however, the number of associations produced is small) [22], [23].

When used within a feature-based SLAM algorithm, the properties of wide-baseline visual feature extraction and association algorithms result in a difficult feature selection problem. Thousands of features can be extracted from an image; however, few will be matched in a loop closure situation. Estimating the positions of all features becomes infeasible; however, if only a few are selected, a loop-closure observation becomes unlikely. The ability to use all the sensor data, rather than a sparse set of previously selected features at a loop-closure event is a critical advantage for view-based SLAM algorithms in vision applications.

An additional benefit of the view-based approach for visual navigation applications is its ability to handle delayed observations. Visual feature extraction and association are time-consuming processes, so a delay is likely to occur between the time an image is acquired and a loop-closure observation is produced. In the view-based framework, a relative pose constraint can be applied between two previously augmented poses whenever the image analysis operations are complete.

Due to avoidance of the feature selection problem, the inherent ability to handle delayed observations, and the efficiency when using the information form, the view-based VAN framework will be utilized in this paper.

### B. Estimation Process

The VAN estimation process uses the standard EIF three-step prediction, observation, and update cycle. The vehicle states are assumed to evolve according to a process model of the form
TeX Source
$${{\bf x}_{{v}}}\left(t_{k}\right) = {{\bf f}_{{v}}} \big[{{\bf x}_{{v}}}\left(t_{k-1}\right), {\bf u}\left(t_{k}\right) \big] + {\bf {w}}\left(t_{k}\right)\eqno{\hbox{(7)}}$$in which **u**(*t*_{k}) is a vector of control inputs, and **w**(*t*_{k}) is an error vector from a zero-mean Gaussian distribution with covariance **Q**(*t*_{k}).

When propagating the vehicle states to a new timestep with a prediction operation, a decision on whether or not the current vehicle pose should be kept in the state vector is required. The current vehicle pose should be kept if it marks the location where data that may be used in future loop-closure observations were acquired.
TeX Source
$$\eqalignno{{{{\hat{{\bf y}}}}^{-}\left(t_{k}\right)} & =\left[\matrix{{{{\hat{{\bf y}}}}_{t}^{+}\left(t_{k-1}\right)} \cr{{{\hat{{\bf y}}}}_{v}^{+}\left(t_{k-1}\right)} - {{\bm\nabla}_{x}^{{\ssr T}} {{\bf f}_{{v}}}}\left(t_{k}\right) {{\bf Q}^{-1}}\left(t_{k}\right) \Big({{\bf f}_{{v}}}\left[{\hat{{\bf x}}_{{v}}^{+}\left(t_{k-1}\right)},{\bf u}\left(t_{k}\right) \right]-{{\bm \nabla}_{x} {{\bf f}_{{v}}}}\left(t_{k}\right) {\hat{{\bf x}}_{{v}}^{+}\left(t_{k-1}\right)} \Big) \cr{{\bf Q}^{-1}}\left(t_{k}\right) \Big({{\bf f}_{{v}}}\left[{\hat{{\bf x}}_{{v}}^{+}\left(t_{k-1}\right)},{\bf u}\left(t_{k}\right) \right] -{{\bm \nabla}_{x} {{\bf f}_{{v}}}}\left(t_{k}\right) {\hat{{\bf x}}_{{v}}^{+}\left(t_{k-1}\right)} \Big)}\right]&\hbox{(8)}\cr{{\bf Y}^{-}\left(t_{k}\right)} &=\left[\matrix{{{\bf Y}_{tt}^{+}\left(t_{k-1}\right)} & {{\bf Y}_{tv}^{+}\left(t_{k-1}\right)} & {\bm 0} \cr{{\bf Y}_{tv}^{+{\ssr T}}\left(t_{k-1}\right)} & {{\bf Y}_{vv}^{+}\left(t_{k-1}\right)} +{{\bm \nabla}_{x}^{{\ssr T}}{{\bf f}_{{v}}}}\left(t_{k}\right) {{\bf Q}^{-1}}\left(t_{k}\right){{\bm \nabla}_{x} {{\bf f}_{{v}}}}\left(t_{k}\right) & -{{\bm\nabla}_{x}^{{\ssr T}} {{\bf f}_{{v}}}}\left(t_{k}\right) {{\bf Q}^{-1}}\left(t_{k}\right) \cr{\bm 0} & - {{\bf Q}^{-1}}\left(t_{k}\right) {{\bm \nabla}_{x} {{\bf f}_{{v}}}}\left(t_{k}\right) & {{\bf Q}^{-1}}\left(t_{k}\right)}\right]&\hbox{(9)}}$$
TeX Source
$$\eqalignno{{{{\hat{{\bf y}}}}^{-}\left(t_{k}\right)} &=\left[\matrix{{{{\hat{{\bf y}}}}_{t}^{+}\left(t_{k-1}\right)} - {{\bf Y}_{tv}^{+}\left(t_{k-1}\right)}{{\bm \Omega}}^{-1}\left(t_{k}\right)\Big({{{\hat{{\bf y}}}}_{v}^{+}\left(t_{k-1}\right)} - {{\bm\nabla}_{x}^{{\ssr T}} {{\bf f}_{{v}}}}\left(t_{k}\right){{\bf Q}^{-1}}\left(t_{k}\right) {\bm \delta}\left(t_{k}\right) \Big)\cr{{\bf Q}^{-1}}\left(t_{k}\right) {{\bm \nabla}_{x} {{\bf f}_{{v}}}}\left(t_{k}\right) {{\bm \Omega}}^{-1}\left(t_{k}\right){{{\hat{{\bf y}}}}_{v}^{+}\left(t_{k-1}\right)} + {\bm\Psi}\left(t_{k}\right) {\bm \delta}\left(t_{k}\right)\cr}\right]&\hbox{(10)}\cr{{\bf Y}^{-}\left(t_{k}\right)} &=\left[\matrix{{{\bf Y}_{tt}^{+}\left(t_{k-1}\right)} - {{\bf Y}_{tv}^{+}\left(t_{k-1}\right)} {{\bm\Omega}}^{-1}\left(t_{k}\right) {{\bf Y}_{tv}^{+{\ssr T}}\left(t_{k-1}\right)} & {{\bf Y}_{tv}^{+}\left(t_{k-1}\right)} {{\bm\Omega}}^{-1}\left(t_{k}\right) {{\bm \nabla}_{x}^{{\ssr T}} {{\bf f}_{{v}}}}\left(t_{k}\right) {{\bf Q}^{-1}}\left(t_{k}\right) \cr{{\bf Q}^{-1}}\left(t_{k}\right) {{\bm \nabla}_{x} {{\bf f}_{{v}}}}\left(t_{k}\right) {{\bm \Omega}}^{-1}\left(t_{k}\right){{\bf Y}_{tv}^{+{\ssr T}}\left(t_{k-1}\right)} & {\bm\Psi}\left(t_{k}\right)}\right]&\hbox{(11)}}$$

For example, in the AUV application detailed in Section VIII in which loop-closure observations are produced from stereovision, prediction with augmentation (keeping the current pose) is performed after each stereo image pair is acquired. When propagating the filter forward from the time of a vehicle depth, attitude, or velocity observation, prediction without augmentation is performed.

Prediction with augmentation is performed as in [9], using (8) and (9), shown at the bottom of the page, in which ∇_{x}**f**_{v}(*t*_{k}) is the Jacobian of the vehicle model with respect to the vehicle states.

The equations for prediction without augmentation can be obtained by marginalizing the previous pose from the augmented system of (8) and (9). The result is (10) and (11), shown at the bottom of this page, in which three subterms are defined in (12)– (14).
TeX Source
$$\eqalignno{{\bm \delta}\left(t_{k}\right) & = {{\bf f}_{{v}}}\left[{\hat{{\bf x}}_{{v}}^{+}\left(t_{k-1}\right)},{\bf u}\left(t_{k}\right) \right]- {{\bm \nabla}_{x} {{\bf f}_{{v}}}}\left(t_{k}\right){\hat{{\bf x}}_{{v}}^{+}\left(t_{k-1}\right)}&\hbox{(12)} \cr{{\bm \Omega}}\left(t_{k}\right) & = {{\bf Y}_{vv}^{+}\left(t_{k-1}\right)} + {{\bm \nabla}_{x}^{{\ssr T}}{{\bf f}_{{v}}}}\left(t_{k}\right) {{\bf Q}^{-1}}\left(t_{k}\right){{\bm \nabla}_{x} {{\bf f}_{{v}}}}\left(t_{k}\right)&\hbox{(13)} \cr{\bm \Psi}\left(t_{k}\right) & = \big({\bf Q}\left(t_{k}\right) +{{\bm \nabla}_{x} {{\bf f}_{{v}}}}\left(t_{k}\right) \left[{{\bf Y}_{vv}^{+}\left(t_{k-1}\right)}\right]^{-1} {{\bm\nabla}_{x}^{{\ssr T}}{{\bf f}_{{v}}}}\left(t_{k}\right)\big)^{-1}.\cr&&\hbox{(14)}}$$Observations are assumed to be made according to a model of the form
TeX Source
$${{\bf z}}\left(t_{k}\right) = {\bf h} \big[{\bf x}\left(t_{k}\right) \big] + {\bf v}\left(t_{k}\right)\eqno{\hbox{(15)}}$$in which **z**(*t*_{k}) is an observation vector, and **v**(*t*_{k}) is a vector of observation errors with covariance **R**(*t*_{k}).The difference between the actual and predicted observations is the innovation
TeX Source
$${\bm \nu}\left(t_{k}\right) = {{\bf z}}\left(t_{k}\right) - {\bf h}\left[{\hat{{\bf x}}^{-}\left(t_{k}\right)} \right] .\eqno{\hbox{(16)}}$$The innovation is used to update the information vector and matrix
TeX Source
$$\eqalignno{{{{\hat{{\bf y}}}}^{+}\left(t_{k}\right) } & = {{{\hat{{\bf y}}}}^{-}\left(t_{k}\right)} + {\bf i}\left(t_{k}\right)&\hbox{(17)}\cr{{\bf Y}^{+}\left(t_{k}\right) } & = {{\bf Y}^{-}\left(t_{k}\right)} + {\bf I}\left(t_{k}\right)&\hbox{(18)}}$$in which
TeX Source
$$\eqalignno{ {\bf i}\left(t_{k}\right) & = {{\bm \nabla}_{x} {\bf h}}\left(t_{k}\right) {{\bf R}^{-1}}\left(t_{k}\right) \left({\bm \nu}\left(t_{k}\right) +{{\bm \nabla}_{x} {\bf h}}\left(t_{k}\right){\hat{{\bf x}}^{-}\left(t_{k}\right)} \right)\qquad\,\,& \hbox{(19)} \cr{\bf I}\left(t_{k}\right) & = {{\bm \nabla}_{x} {\bf h}}\left(t_{k}\right) {{\bf R}^{-1}}\left(t_{k}\right) {{\bm \nabla}_{x}^{{\ssr T}} {\bf h}}\left(t_{k}\right)&\hbox{(20)}}$$where ∇_{x}**h**(*t*_{k}) is the Jacobian of the observation function with respect to the vehicle states.

The observation and update process is efficient in the information form, since only the elements of the information vector and matrix corresponding to the observed states are modified.

The prediction operation equations (8) and (12) require the prior vehicle pose state estimate, while in the update operation, (16) and (19) require the prior estimates of the observed states. In addition, the prediction operations require the vehicle process model to be linearized at the prior vehicle state estimate, and the update operation requires the observation model to be linearized at the estimate of the observed states. Once the necessary state estimates have been recovered, prediction (with or without augmentation) and observations are constant-time operations independent of the number of estimated poses.

### C. Structure of the Information Matrix

Elements of the VAN information matrix off the block diagonal are nonzero only if an observation relating the two corresponding poses has been applied to the filter. Fig. 1(a) shows an example of an information matrix sparsity pattern and Markov graph that results from dead reckoning (DR). Since each pose is related to the previous and next pose through odometry constraints, DR results in a block tridiagonal matrix. The Markov graph provides a visual representation of the relationship between the estimated variables, with an edge in the graph corresponding to a nonzero block in the information matrix.

When loop-closure observations are applied to the filter, additional nonzero elements in the information matrix are created at the locations corresponding to the two observed poses. Fig. 1(c) displays the information matrix resulting from adding loop closure observations between the first and last two poses.

The sparsity of the information matrix is important for the computational efficiency and storage requirements of the filter. In large-scale applications with many augmented poses, EKF-based approaches are infeasible due to the memory requirements of dense covariance matrices.

SECTION IV

## Cholesky Factorization and Modifications

The Cholesky factorization is commonly used to solve linear systems of the form
TeX Source
$${\bf A}{\bf X} = {\bf B}\eqno{\hbox{(21)}}$$where **A** is a positive definite symmetric matrix and **X** is a matrix of unknowns.

In this SLAM application, a Cholesky factor of the information matrix will be used to recover state estimates and covariances. Relationships for the state estimate vector and covariance matrix in the form of (21) can be produced by rearranging (3) and (4) to obtain
TeX Source
$$\eqalignno{{{\bf Y}^{+}\left(t_{k}\right) }{\hat{{\bf x}}^{+}\left(t_{k}\right) } &={{{\hat{{\bf y}}}}^{+}\left(t_{k}\right) }&\hbox{(22)} \cr{{\bf Y}^{+}\left(t_{k}\right) }{{\bf P}^{+}\left(t_{k}\right)} &={\bf I}.&\hbox{(23)}}$$The *LDL*^{T}} form of the Cholesky decomposition of the matrix **A** is defined by
TeX Source
$${\bf A} = {\bf L} {\bf D} {{\bf L}^{{\ssr T}}}\eqno{\hbox{(24)}}$$where the **L** is a lower triangular matrix with all elements on the diagonal equal to one, and **D** is a diagonal matrix.

The solution to a system of equations in the form of (21) is calculated from the Cholesky factorization using a two-step forward and backward solve process. First, a forward solve step is performed on the lower triangular system
TeX Source
$${\bf L} {\bf Z} = {\bf B}\eqno{\hbox{(25)}}$$to recover the rows of the forward-solve result **Z** in order from first to last. The solution **X** can then be recovered using a backward solve operation on the upper triangular system
TeX Source
$${\bf D} {{\bf L}^{{\ssr T}}} {\bf X} = {\bf Z}\eqno{\hbox{(26)}}$$in which the rows of **X** are recovered in order from last to first.

The structure of the Cholesky factor **L** of a sparse matrix is related to the sparsity pattern of the original matrix **A**. Nonzero elements in the Cholesky factor are present at the locations of all nonzero elements in the original matrix; however, additional nonzeros known as “fill-in” are introduced. Fill-in is undesirable, since additional nonzero elements increase the computational complexity of the factorization and equation-solving processes.

Many algorithms to calculate the Cholesky decomposition exist [24]. The experiments presented in Section VIII use an efficient sparse uplooking algorithm [14]. In Fig. 2, the right-looking factorization algorithm is used to demonstrate the process of fill-in. During each iteration of the algorithm, the *j* th column of the factor is produced by dividing the *j* th column of the active submatrix by its element on the diagonal. The *j* th variable is then eliminated by marginalizing it from the remaining active submatrix. The fill-in produced in the Cholesky factor is equivalent to the additional edges produced by marginalizing a variable from the Markov graph, in which the neighbors of an eliminated node form a clique.

### A. Reducing Fill-in With Variable Reordering

Fill-in can be reduced by reordering the variables to change the sequence in which they are eliminated during the factorization process. Since finding the optimal permutation that produces minimal fill-in is NP-hard, heuristic-based approaches, such as the approximate minimum degree (AMD) algorithm, are typically used [24], [25].

In each iteration of a right-looking Cholesky factorization, the AMD algorithm employs the greedy strategy of selecting for elimination of the variable corresponding to the graph node with the smallest degree (the number of neighbors), or equivalently, the sparsest row of the remaining active submatrix to be factorized.

Fig. 2(b) illustrates the factorization process for the matrix previously decomposed in Fig. 2(a), using the variable ordering produced by AMD. The selected order in which the poses are eliminated is 5, 4, 1, 2, 3.The benefit of variable ordering can be observed by comparing the three blocks of fill-in produced when using the natural ordering to the one block of fill-in with the AMD ordering.

### B. Scalability of the Factorization Process

The computational complexity of the Cholesky factorization for a dense *n*× *n* matrix is *O*(*n*^{3}). For sparse matrices, however, the complexity is dependent on the number of nonzeros in the Cholesky factor, which is influenced by the structure of the matrix being factorized and the variable ordering.

If the number of nonzeros in the Cholesky factor grows linearly with the number of estimated poses, as is the case for a DR VAN information matrix with a tridiagonal structure or a VAN system with a constant number of loop closures, the complexity of the Cholesky decomposition process is *O*(*n*). However, in general, where the Cholesky factor contains *O*(*n*^{2}) nonzeros, as can be expected in SLAM applications where the number of loop-closure observations grows linearly with the number of poses, the complexity of the factorization process is *O*(*n*^{3}).

### C. Modifying a Factor

If a previously factorized system of equations is changed, it is often possible to efficiently modify an existing factor instead of repeating the computationally expensive factorization process.

The complex equations and algorithms used to compute modified components of a sparse Cholesky factorization will not be presented here. Instead, the focus will be on illustrating which components of the factorization change, and the resulting complexity of the operation. Further details on the Cholesky modification algorithms can be found in [26] and [27], and the implementation used in the experiments presented in the paper is described in [14].

Four Cholesky factor modification operations are used: row additions, row deletions, updates, and downdates. The row addition and deletion operations allow the introduction of a new variable or removal of an existing variable from the system of linear equations. A two-step process of row deletion and addition can be used to perform an arbitrary change to a row of the factorized matrix.

Update and downdate operations allow a special case modification to the factorized system of equations. A modification of the form
TeX Source
$${\bar{{\bf A}}} = {\bf A} + {\bf W}{\bf W}^{{\ssr T}}, \quad {\bar{{\bf B}}} = {\bf B} + {{\bm \Delta}_{\bf B}}\eqno{\hbox{(27)}}$$where **W** is an *n* × *k* matrix is known as a rank-*k* update, while a modification of the form
TeX Source
$${\bar{\bf A}}= {\bf A} - {\bf W}{\bf W}^{{\ssr T}}, \quad {\bar{\bf B}} = {\bf B} + {\bm \Delta}_{\bf B}\eqno{\hbox{(28)}}$$is a rank-*k* downdate.

Equations (27) and (28) include a change to the right-hand-side matrix **B** of the system of linear equations to enable the forward solve result **Z** to be modified in addition to the Cholesky factor.

Fig. 3 illustrates an update modification performed on the system of equations previously factorized in Fig. 2(b). For each of the modification operations, if an element in row *j* of the factorized matrix is modified (added, removed, updated, or downdated), the elements of the factor that are changed are limited to the columns *j* to *n.* Considering the Cholesky factorization process in Fig. 2, this is a logical result, since these columns of the Cholesky factor were previously produced after the modified variable was marginalized from the active submatrix.

### D. Maintaining a Factor of the VAN Information Matrix

The information-form VAN filter operations of Section III-B can all be described using the row addition, row deletion, update, and downdate modifications.

The prediction with augmentation equations (8) and (9) can be implemented with row additions for the new pose variables, and an update on the previous pose states. Similarly, the prediction without augmentation equations (10) and (11) can be implemented with row removal and row addition operations to perform the changes to the current vehicle pose states, and a downdate on the previous pose states. The observation update equations (17) and (18) can simply be implemented with a single update modification.

Modifications are used to maintain an up-to-date factor after prediction and vehicle state observation operations. However, when a loop-closure observation is applied between past poses, the structure of the information matrix is significantly changed, causing the previous variable ordering to be ineffective in minimizing fill-in. Therefore, when a loop-closure observation is applied to the filter, a new variable ordering is found, and a new factor of the information matrix is calculated.

### E. Variable Ordering for Efficient VAN Operations

Prediction operations and observations of the current vehicle states are the most frequent procedures in a SLAM algorithm, with the number of loop-closure observations being relatively small. After considering the pattern of modified factor elements in Fig. 3, it is clear that ordering the vehicle states last will minimize the complexity of maintaining a factor of the VAN information matrix.

If the current vehicle states are ordered last, the number of elements in the factor that need to be recalculated is independent of the number of augmented poses, allowing the Cholesky factorization modifications for the prediction and vehicle state observation operations to be performed in constant time.

While ordering the vehicle states last may not result in the the minimal amount of fill-in, the benefit of constant-time prediction and observation operations outweigh the additional computational complexity due to the additional fill-in caused by this constraint.

SECTION V

## State Estimate Recovery

### A. Complete State Recovery

The complete state estimate vector can be recovered by solving the relationship
TeX Source
$${{\bf Y}^{+}\left(t_{k}\right) }{\hat{{\bf x}}^{+}\left(t_{k}\right) }={{{\hat{{\bf y}}}}^{+}\left(t_{k}\right) }\eqno{\hbox{(29)}}$$using the Cholesky factor of the information matrix and the process described in Section IV.

The efficiency of the forward and backward solve process used to solve (29) is dependent on the sparsity of the Cholesky factor. If the factor contains *O*(*n*) nonzero elements, as is the case for VAN systems with only odometry constraints or a constant number of loop-closure observations, the complete state estimate vector can be recovered in *O*(*n*) time. However, in general, where the Cholesky factor contains *O*(*n*^{2}) nonzeros, as can be expected in SLAM applications where the number of loop-closure observations grows linearly with the number of poses, the computational complexity of recovering the complete vector is *O*(*n*^{2}).

### B. Approximate Vehicle State Recovery

In a previous VAN implementation [9], [10], [11], approximate estimates of the current vehicle states were produced by partitioning the state vector into a “local” portion consisting of the states to be recovered, and the remaining “benign” states for which an approximate estimate is available. Using the subscript *l* for the local subvector and *b* for the benign states, the partitioned version of (29) is
TeX Source
$$\left[\matrix{{{\bf Y}^{+}_{{bb}}\left(t_{k}\right) } & {{\bf Y}^{+}_{{bl}}\left(t_{k}\right)} \cr{{\bf Y}^{{+{\ssr T}}}_{{bl}}\left(t_{k}\right) } & {{\bf Y}^{+}_{{ll}}\left(t_{k}\right)}}\right]\left[\matrix{{\hat{{\bf x}}_{{b}}^{+}\left(t_{k}\right) } \cr{\hat{{\bf x}}_{{l}}^{+}\left(t_{k}\right) }}\right]=\left[\matrix{{{{\hat{{\bf y}}}}^{+}_{{b}}\left(t_{k}\right) } \cr{{{\hat{{\bf y}}}}^{+}_{{l}}\left(t_{k}\right) }}\right].\eqno{\hbox{(30)}}$$If the benign states have not changed significantly since they were last recovered, providing a good approximation (a tilde is used to denote approximate estimates), an approximate estimate of the local states can be calculated with
TeX Source
$${\tilde{{\bf x}}}_{{l}}\left(t_{k}\right) = \left[{{\bf Y}^{+}_{{ll}}\left(t_{k}\right) }\right]^{-1} \left({{{\hat{{\bf y}}}}^{+}_{{l}}\left(t_{k}\right) } - {{\bf Y}^{{+{\ssr T}}}_{{bl}}\left(t_{k}\right) } {\tilde{{\bf x}}}_{{b}}\left(t_{k}\right) \right).\eqno{\hbox{(31)}}$$Only one block of corresponding to the previous-to-current pose cross-information submatrix contains nonzero elements, allowing the approximate vehicle state estimate to be calculated in constant time.

The assumption underlying this approximation is that the past vehicle poses have not been significantly updated by observations applied to the filter since the estimates of the benign states were last recovered.

If an observation such as a loop closure or global positioning system (GPS) fix that provides a large correction to states with drifting estimates is applied to the filter, a significant correction will be propagated to the previous pose states. As a result, the accuracy of the approximation will be poor and the complete state vector including new estimates of the benign states would need to be recovered using the method of Section V-A.

### C. Exact Vehicle State Recovery

In Section IV-C, it was shown that a Cholesky factor and the forward solve result can be efficiently modified to reflect changes to the original system of linear equations. The only remaining operation required to solve the modified system of linear equations is the backward solve of the upper-triangular system of (26), which has the form
TeX Source
$$\left[\matrix{{{\bf D}_{1}} & & \cr& \ddots & \cr& & {{\bf D}_{n}}}\right]\left[\matrix{{{\bf L}_{11}^{{\ssr T}}} & \ldots & {{\bf L}_{n1}^{{\ssr T}}} \cr& \ddots & \vdots \cr& & {{\bf L}_{nn}^{{\ssr T}}} \cr}\right]\left[\matrix{{{\bf X}_{1}} \cr\vdots \cr{{\bf X}_{n}} \cr}\right]=\left[\matrix{{\bf Z}_{1} \cr\vdots \cr{\bf Z}_{n} \cr}\right].\eqno{\hbox{(32)}}$$The backward solve operation recovers the variables in reverse order from last to first row. The last block of the solution **X** can, therefore, be calculated by solving
TeX Source
$${{\bf D}_{{n}}}{{\bf L}_{nn}^{{\ssr T}}}{{\bf X}_{n}} = {\bf Z}_{n}.\eqno{\hbox{(33)}}$$If the current vehicle pose variables are ordered last, and the forward substitution result is updated along with the Cholesky factor each time it is modified during a prediction or observation operation, this approach allows the current vehicle state estimates to be recovered in constant time. This is an important improvement over the method of Section V-B, since it allows prediction and observation operations to be performed without corrupting the filter with approximate estimates. As a result, the EIF will have the same optimality properties as an EKF solution.

SECTION VI

## Covariance Recovery

### A. Complete Inverse Recovery

Using the Cholesky decomposition of the information matrix, the complete covariance matrix can be recovered by solving the equation
TeX Source
$${{\bf Y}^{+}\left(t_{k}\right) }{{\bf P}^{+}\left(t_{k}\right)}={\bf I}.\eqno{\hbox{(34)}}$$While an information matrix may be sparse, the corresponding covariance matrix is dense. Recovering the complete covariance matrix is only feasible for problems with small state vectors.

### B. Recovery of Columns of the Inverse

The *j* th column of the covariance matrix can be recovered by solving the equation
TeX Source
$${{\bf Y}^{+}\left(t_{k}\right) } {{\bf P}^{+}_{*j}\left(t_{k}\right)} = {\bf I}_{*j}\eqno{\hbox{(35)}}$$where **P**^{+}_{*j}(*t*_{k}) is the *j* th column of the covariance matrix, and **I**_{*j} is the *j* th column of an identity matrix with the same dimensions as the information matrix.

If the Cholesky factor contains *O*(*n*) nonzero elements, which occur in VAN systems containing only odometry constraints or a constant number of loop closures, the computational complexity of recovering a column of the covariance matrix is *O*(*n*). However, in general, where the Cholesky factor contains *O*(*n*^{2}) nonzeros, the complexity of recovering a column is *O*(*n*^{2}).

### C. Recovery of the Sparse Inverse

Recovering the joint pose distributions used for loop-closure hypothesis generation requires the coariance of the augmented poses, which are located on the block diagonal of the covariance matrix. The covariance recovery method of Section VI-B is inefficient for this task, since many irrelevant elements of the inverse are calculated.

An alternative recovery method [28], [29], [30], [13] can be derived from the Takahashi relationship
TeX Source
$${\bf A}^{-1} = ({{\bf L}^{{\ssr T}}})^{-1} {\bf D}^{-1} - {\bf A}^{-1}({\bf L}-{\bf I}) .\eqno{\hbox{(36)}}$$If (36) is used to calculate the lower triangle of the inverse, the upper triangular component (**L**^{T}})^{−1}, which contains ones on its diagonal can be ignored. Individual elements of the lower triangle of the inverse can, therefore, be calculated using the recursive relationship
TeX Source
$$[{\bf A}^{-1}]_{ij} = [{\bf D}^{-1}]_{ij} - \sum_{k=j+1}^{n} [{\bf A}^{-1}]_{ik} {\bf L}_{kj}, \quad \hbox{for}\; i \ge j .\eqno{\hbox{(37)}}$$In (37), an element of the inverse in column *j* is described in terms of other elements of the inverse in columns *j* to *n*, along with the Cholesky factorization components **L** and **D**. If the matrices **A** and **L** are sparse, not all elements of the inverse need to be recovered.

The set of elements of the inverse at the locations of nonzeros in the Cholesky factor is known as the “sparse inverse,” which is illustrated in Fig. 4. All elements of the sparse inverse can be calculated using only other members of the sparse inverse and the factorization components [29].When applied to the factorization of a VAN information matrix, the sparse inverse includes the block diagonal, providing a method to recover the augmented pose covariances.

If the Cholesky factor contains *O*(*n*) nonzero elements, which occur in VAN system containing only odometry constraints or a constant number of loop closures, the sparse inverse can be recovered in *O*(*n*) time. However, in general, where the factor contains *O*(*n*^{2}) nonzeros, the complexity of recovering the sparse inverse is *O*(*n*^{3}).

SECTION VII

## Generating Loop-Closure Hypotheses

Since visual feature extraction and association is computationally expensive, generating a small set of loop-closure hypotheses on which image analysis will be performed is critical for the efficiency of the VAN algorithm. Deciding if a pair of poses is accepted as a loop-closure hypothesis is performed by evaluating their joint distributions to estimate the likelihood that images acquired at each pose overlap.

Due to the computational complexity of recovering covariances from an information filter, a previous VAN implementation [9], [10], [11] used covariances recovered at previous timesteps to generate loop-closure hypotheses. Since the uncertainty of augmented past poses can only decrease, the use of old covariances is a conservative strategy. The filter is not corrupted, since no approximate values are used in any prediction or observation operation. However, the use of conservative covariances may increase the number of loop-closure hypotheses generated.

The conservative pose covariances can be used to create an approximation of the predicted joint distribution covariance of the form
TeX Source
$$\tilde{{\bf P}}_{(i, v)}\left(t_{k}\right) =\left[\matrix{\tilde{{\bf P}}_{ii}\left(t_{k}\right) & {{\bf P}^{-}_{iv}\left(t_{k}\right)} \cr{{\bf P}^{-{\ssr T}}_{iv}\left(t_{k}\right)} & {{\bf P}^{-}_{vv}\left(t_{k}\right)}}\right]\eqno{\hbox{(38)}}$$where is the conservative covariance of pose *i*, and **P**^{−}_{iv}(*t*_{k}) and **P**^{−}_{vv}(*t*_{k}) are the optimal past-to-current cross covariance and current pose covariances, which can be recovered from the vehicle columns of the covariance matrix using the method of Section VI-B.

To maintain the set of conservative past pose covariances, the current vehicle pose covariance is appended to the set each time a new pose is augmented to the state vector. When a loop-closure observation that significantly changes the past pose distributions is applied to the filter, the approximate covariances are updated.

In previous VAN applications [9], [10], [11], each time a loop-closure observation is applied to the filter, an EKF update is performed on the approximate joint distribution covariance to yield an updated covariance for the past pose. Since all of the estimated poses are correlated, a loop-closure observation will reduce the uncertainty of all trajectory states. This approach, however, only reduces the uncertainty in one of the maintained pose covariances, leaving the others highly conservative.

The sparse inverse recovery method of Section VI-C provides an alternative method to efficiently update all the augmented pose covariances. While this operation is more computationally complex than the single-pose EKF update, the reduction in the conservative pose uncertainties will cause fewer loop-closure hypotheses to be analyzed, and is likely to result in an overall improvement in efficiency.

The SLAM algorithm described in this paper has been applied to data acquired by the AUV *Sirius*, a modified version of the *SeaBED* AUV [31] developed at the Woods Hole Oceanographic Institution. DR is performed using a Doppler velocity log (DVL) that provides the velocity of the vehicle in three axes relative to the seafloor, a compass and a tilt sensor that observe the vehicle's orientation, and a pressure sensor to measure depth. A stereovision rig is used to provide loop-closure observations. Due to the accuracy of the DVL over short distances, the vision system is not used to provide odometry information.

In this application, loop-closure hypotheses are created using the simplified visibility model illustrated in Fig. 5, which is designed to be conservative and computationally efficient. In this model, the terrain is assumed to be planar, a conservative circular bound is used to approximate the stereo rig's field of view, and the vehicle is assumed to have zero roll and pitch (a reasonable approximation for the stable *Sirius* vehicle). Under these assumptions, image overlap occurs if the magnitude of the 2-D stereo rig displacement [*x*_{ij},*y*_{ij}] between poses *i* and *j* is less than the sum of the circular image footprint radii *r*_{i} and *r*_{j}.A distribution for the 2-D displacement is created from the conservative joint pose covariance in (38).The likelihood of image overlap is calculated by integrating the 2-D displacement distribution over the circular region defined by . In this experiment, an approximate integration is performed by sampling the 2-D displacement distribution on a 20 × 20 cell grid as demonstrated in Fig. 6, and pose pairs with an overlap likelihood greater than 0.005 are accepted as loop-closure hypotheses.

Loop-closure observations are created using a six degree-of-freedom stereovision relative pose estimation algorithm [32]. The SURF algorithm [20] is used to extract and associate visual features, and epipolar geometry [33] is used to reject inconsistent feature observations within each stereo image pair. Triangulation [33] is performed to calculate initial estimates of the feature positions relative to the stereo rig, and a redescending *M*-estimator [34], [35] is used to calculate a relative pose hypothesis that minimizes a robustified registration error cost function. Any remaining outliers with observations inconsistent with the motion hypothesis are then rejected. Finally, the maximum likelihood relative vehicle pose estimate and covariance are then calculated from the remaining inlier features. An example set of stereo image pairs and the visual features used to produce a loop-closure observation are presented in Fig. 7.

In a deployment to survey sea sponges in the Ningaloo Marine Park near Exmouth in Western Australia, the AUV traversed a grid pattern within a square region of 150 m × 150 m, collecting 2156 pairs of stereo images. The ocean depth at the survey site is approximately 40 m, and the AUV maintained an altitude of 2 m above the seafloor. The vehicle trajectory is approximately 2.2 km in length, and required approximately 75 min to complete.

A comparison of the estimated trajectories produced by DR and SLAM is shown in Fig. 8. A total of 111 loop-closure observations were applied to the SLAM filter, shown by the red lines joining observed poses. Applying the loop-closure observations results in a trajectory estimate that suggests the vehicle drifted approximately 30 m southwest of the desired survey area.

While no ground truth for the survey is available, arguments for the superiority of the SLAM solution can be created by considering the consistency of the final vehicle position estimates with GPS observations acquired after the vehicle surfaced at the end of the mission, and the self-consistency of each estimated trajectory.

Estimates of the final vehicle position at the end of the mission produced by DR, SLAM, and GPS are listed in Table I.The difference between the SLAM estimate and GPS is approximately half that of the DR solution. It is likely that a large portion of the error in the SLAM solution was accumulated in the descent to the seafloor and ascent to the surface, since during these times, no visual observations are available to correct drifting estimates.

The superior self-consistency of the SLAM solution can be observed in mosaics of images acquired at trajectory crossover points. Fig. 9 presents mosaics for the crossover points marked “A” and “B” within the DR and SLAM trajectory estimates in Fig. 8.The mosaic of the DR crossover point in Fig. 9(a) is inconsistent, since images hypothesized to overlap contain no common features. In contrast, the mosaic of Fig. 9(b), produced using vehicle pose estimates from SLAM displays, accurately registered overlapping images, demonstrating the correction of DR drift.

Table II lists the processing times for the SLAM algorithm with and without the use of Cholesky modifications, and using the approximate and exact vehicle state recovery methods. The exact vehicle state recovery process is slightly more efficient due to the complexity of the matrix inverse operation required by the approximate method. Using Cholesky factor modifications provides a significant advantage, since many computationally expensive factorization operations (worst case *O*(*n*^{3})) are replaced by constant-time modifications. When applied to larger datasets, the difference between modifying and recalculating the factor will be greater.

The processing times listed in Table II do not include the time required to produce the loop-closure observations. In the current implementation, all vision processing is performed on the same CPU as the SLAM filter. For the Ningaloo dataset, the vision processing required an additional 4 min and 1 s of processing time. In the future, the computationally expensive image analysis operations, such as feature extraction, may be performed on a separate device such as a graphics processing unit.

In this application, exact vehicle state recovery provides little benefit in accuracy over the approximate method. The DVL, orientation, and depth sensors provide high frequency and accurate observations, resulting in only small corrections to the past vehicle pose states. If DR is performed using each vehicle state recovery method, the maximum difference in the vehicle position estimates for the Ningaloo survey is 9 cm. The benefits of the exact vehicle state recovery method may be greater in other applications, where observations provide larger corrections to the past pose estimates.

The superiority of the sparse inverse method to update the past pose conservative covariances is demonstrated in Fig. 10, where the trace of the covariances is used as a measurement of their uncertainty. For comparison, optimal (nonconservative) values were produced by recovering the true pose covariances at each timestep. For a survey pattern with few crossover points, applying a single-pose EKF update after each loop closure provides little benefit. The strategy of updating the conservative poses using the sparse inverse method after each loop closure produces near-optimal results. The numbers of loop-closure hypotheses and observations produced when using each conservative pose update strategy are listed in Table III. As expected, the sparse inverse method results in a significant reduction in the number of generated loop-closure hypotheses.

The final state vector for the Ningaloo Marine Park experiment contains 25 884 variables from 2157 poses. Each vehicle pose contains 12 states: three for position, three for orientation, three for velocity, and three for angular velocity.

The final information matrix is 99.86% sparse, and its lower triangle contains 482 706 nonzero elements. Most of the nonzero elements result from odometry constraints; however, each of the 111 loop-closure observations result in a block of nonzeros below the block tridiagonal.

If the natural variable ordering is used, the Cholesky factor of the final information matrix contains 5 165 838 nonzero elements. The AMD variable ordering produces a factor with 804 222 nonzeroes (approximately one-sixth of the number produced by the natural ordering), resulting in significant computational efficiency advantages when performing state estimate and covariance recovery.

The growth in the number of nonzeros in the Cholesky factor for the Ningaloo experiment is shown in Fig. 11.In general, the number of nonzeros for SLAM is *O*(*n*^{2}) in the number of poses;however, due to the sparse set of crossover points in the Ningaloo experiment, the number of nonzeros caused by new poses and odometry constraints (which grow linearly) outnumbers those from loop-closure observations. As a result, in this case, the growth in the number of nonzeros is not much worse than linear.

The most computationally expensive operations in the SLAM algorithm are loop-closure observations. The processing times for each component of the loop-closure observations in the Ningaloo experiment are shown in Fig. 12. Updating the information matrix is a *O*(*n*) operation in this implementation due to the use of a compressed row storage format requiring *O*(*n*) values to be shifted when a new nonzero element is inserted. Recalculating the Cholesky factor and recovering the sparse inverse to update the conservative pose covariances are the most time-consuming components. While the computational complexity of these operations, in general, is *O*(*n*^{3}), the growth of their processing times in this experiment is not much worse than linear due to the near-linear growth in the number of nonzeros in the Cholesky factor.

A 3-D reconstruction of the survey site has been produced by triangulating features in the stereo images, and registering the point clouds in a common reference frame using the SLAM-estimated vehicle trajectory. The source images have then been projected onto the resulting mesh, which can be observed in Fig. 13.

A SLAM algorithm using the VAN framework was presented and demonstrated using data acquired by the *Sirius* AUV at Ningaloo Marine Park in Western Australia.

The use of Cholesky factorization modifications to update a decomposition of the information matrix prevents the need to repeatedly perform the computationally expensive factorization process each time state estimates and covariances are recovered.

Through the selection of an appropriate variable ordering, recovery of the vehicle state estimates can be performed in constant time, allowing prediction and vehicle state observation operations to be performed without corrupting the filter with approximate vehicle state estimates.

Updating the conservative covariances of all past poses using the sparse inverse recovery method results in the generation of significantly fewer loop-closure hypotheses than the previously used single-pose update method.

Currently, all processing is performed offline on logged data. While the wost case computational complexity of some filter operations is *O*(*n*^{3}) in the number of augmented poses, the result for a typical underwater survey with a sparse set of loop-closure events suggests an online implementation is feasible for our application.

### Acknowledgment

The authors thank the Australian Institute of Marine Science (AIMS) for providing ship time aboard the R/V Cape Ferguson. In particular, they wish to thank A. Heyward, M. Rees, J. Collquhoun, and the crew of the Cape Ferguson for providing this opportunity and lending a hand whenever necessary. They also acknowledge the help of all those working behind the scenes to keep our AUV operational, including P. Rigby, J. Randle, the late A.Trinder, and B. Crundwell.