• Abstract

# A Kalman Filter-Based Algorithm for IMU-Camera Calibration: Observability Analysis and Performance Evaluation

Vision-aided inertial navigation systems (V-INSs) canprovide precise state estimates for the 3-D motion of a vehicle when no external references (e.g., GPS) are available. This is achieved bycombining inertial measurements from an inertial measurement unit (IMU) with visual observations from a camera under the assumption that the rigid transformation between the two sensors is known. Errors in the IMU-camera extrinsic calibration process cause biases that reduce the estimation accuracy and can even lead to divergence of any estimator processing the measurements from both sensors. In this paper, we present an extended Kalman filter for precisely determining the unknown transformation between a camera and an IMU. Contrary to previous approaches, we explicitly account for the time correlation of the IMU measurements and provide a figure of merit (covariance) for the estimated transformation. The proposed method does not require any special hardware (such as spin table or 3-D laser scanner) except a calibration target. Furthermore, we employ the observability rank criterion based on Lie derivatives and prove that the nonlinear system describing the IMU-camera calibration process is observable. Simulation and experimental results are presented that validate the proposed method and quantify its accuracy.

SECTION I

## Introduction

IN RECENT years, inertial navigation systems (INSs) have been widely used for estimating the motion of vehicles moving in a 3-D space such as airplanes, helicopters, automobiles, etc. [2]. At the core of most INS lies an inertial measurement unit (IMU) that measures linear accelerations and rotational velocities. By integrating these signals in real time, an INS is capable of tracking the position, velocity, and attitude of a vehicle. This deadreckoning process, however, cannot be used over extended periods of time because the errors in the computed estimates continuously increase. This is due to the noise and biases present in the inertial measurements. For this reason, current INSs rely on GPS in order to receive periodic corrections. In most cases, a Kalman filter estimator is used for optimally combining the IMU and GPS measurements [3]. One of the main limitations of the GPS-aided INS configuration is that it cannot be used when the GPS signals are not available (e.g., indoors, underground, underwater, in space, etc.), or their reliability is limited (e.g., in the vicinity of tall buildings and structures due to specular reflections and multipath error). Furthermore, high-precision GPS receivers are prohibitively expensive, and often, the acquired level of accuracy is not sufficient for certain applications (e.g., parallel parking a car within a tight space).

An alternative approach to provide corrections to an INS is via the use of visual sensors such as cameras. Cameras are small-size, light-weight, passive sensors that provide rich information for the surroundings of a vehicle at low cost. When observing a known scene, both the position and attitude of the camera can be computed [4]. Furthermore, by tracking visual features through sequences of images, the motion of the camera can be estimated [5], [6]. Cameras and IMUs are complementary in terms of accuracy and frequency response. An IMU is ideal for tracking the state of a vehicle over short periods of time when it undergoes motions with high dynamic profile. On the other hand, a camera is best suited for state estimation over longer periods of time and smoother motion profiles. Combining these two sensors to form a vision-aided INS (V-INS) has recently become a popular topic of research [7].

In order to fuse measurements from an IMU and a camera in a V-INS, the 6-DOF transformation between these two devices must be known precisely (cf. Fig. 1). Inaccuracies in the values of the IMU-camera relative pose (position and attitude) will appear as biases that will reduce the accuracy of the estimation process or even cause the estimator to diverge. In most cases, in practice, this unknown transformation is computed manually (e.g., from computer-aided design (CAD) plots) or through the use of additional sensors. For example, for the Mars Exploration Rover (MER) mission [8], a high precision 3-D laser scanner was employed to measure the location of the four corners of the IMU housing with respect to a checkerboard placed in front of the camera for calibration purposes. Although this method achieved subdegree relative attitude accuracy and less than 1 cm relative position error [9], it is prohibitive for many applications due to the high cost of the equipment (3-D laser scanner) involved. Additionally, every time one of the two sensors is removed (e.g., for service) and repositioned, the same process needs to be repeated, which requires significant time and effort. Automating this procedure will reduce the cost of deploying a V-INS, increase the accuracy of the computed state estimates during regular operation, and minimize the probability of failure due to bias-induced divergence.

In this paper, we present an extended Kalman filter (EKF)-based algorithm for determining the 6-DOF transformation between a single camera and an IMU using measurements only from these two sensors [1]. Contrary to existing approaches [10], [11] that rely on modified hand–eye calibration processes (e.g., [12], [13], [14]), our method takes into account the time correlation of the IMU measurements by explicitly modeling them using an augmented-state EKF [15], [16]. Additionally, our algorithm computes the uncertainty in the estimated quantities, or equivalently, the covariance. Furthermore, we do not separate the task of translation estimation from rotation estimation that prevents potential error propagation. Moreover, unlike existing approaches, the described method does not require any special testbed except a calibration pattern that is also needed for estimating the intrinsic parameters of the camera. Therefore, it offers the inherent capability of recalibrating the IMU-camera system as frequently as needed. Finally, a comprehensive observability analysis based on Lie derivatives [17], [18] is performed to ensure that the sensor measurements provide sufficient information for accurately estimating the IMU-camera transformation.

Fig. 1. Geometric relation between the known landmarks fi and the camera {C}, IMU {I}, and global {G} frames of reference. The unknown IMU-camera transformation is denoted by the position and quaternion pair . This transformation is determined using estimates of the IMU motion, , the projections of the landmarks' positions, Cpfi, on the camera frame (image observations), and the known positions of the landmarks, Gpfi, expressed in the global frame of reference.

The rest of this paper is structured as follows. Section II provides an overview of the related literature. Section III presents the proposed EKF-based algorithm, and Section IV investigates the observability of the nonlinear system describing the IMU-camera calibration process. Simulation and experimental results are provided in Section V, and finally, Section VI concludes the paper and suggests future research directions.

SECTION II

## Related Work

A well-known related process is the hand–eye calibration [12], whose objective is to estimate the 6-DOF transformation between a camera and a robot manipulator. Recently, there have been some attempts to modify existing hand–eye calibration algorithms to determine the IMU-camera alignment [10], [11]. Specifically, the rotation part of the hand–eye equation is solved in [11] using nonlinear optimization software under the assumption that the translation between the IMU and the camera is negligible. However, in most realistic situations, this assumption is not valid, and ignoring the translation introduces biases in estimation algorithms using these alignment parameters.

A different approach to this problem is proposed by Lobo and Dias [10], [19]. First, they obtain the vertical direction of the IMU and the camera frames by measuring the direction of gravity while viewing a vertically installed calibration pattern. Then, using Horn's method [20], they estimate the rotation between the IMU and the camera. Finally, they use a spin table to rotate the system about the IMU's center and zero out the linear acceleration of the IMU due to rotation. This process allows one to compute the translation between the camera and the IMU based only on the camera measurements. The main drawback of this approach is that it ignores the time correlation between the inertial measurements due to the IMU biases. Additionally, it does not provide any figure of merit of the achieved level of accuracy (e.g., covariance of the estimated quantities). Furthermore, this two-stage process decouples the computation of rotation and translation, and hence, allows error propagation from the rotation estimates to the translation estimates. Finally, this method requires fine adjustment of the IMU-camera system on a spin table that limits its applicability when recalibration is frequently needed.

The IMU-camera and hand–eye calibration problems require separate treatments due to the different noise characteristics of the IMU and shaft-encoder signals. Specifically, while the shaft-encoder measurements at different time instants are uncorrelated, consecutive IMU measurements are not. This is due to the IMU biases. Ignoring the time correlation of the inertial measurements limits the accuracy of the IMU-camera calibration process and can lead to inconsistent estimates.

To the best of our knowledge, the presented EKF-based algorithm is the first approach to the IMU-camera calibration problem that does not ignore the correlations between the IMU measurements and requires no specialized hardware. Furthermore, the uncertainty in the estimated alignment parameters is provided at every time step by computing their covariance. Finally, it is shown that it suffices to rotate the camera in place in order for these parameters to become observable.

SECTION III

## Description of the Algorithm

The IMU-camera calibration is achieved through a two-step process. First, camera images are processed in a batch algorithm to compute an initial estimate for the camera pose. Additionally, the approximate value of the unknown transformation (e.g., hand-measured or from CAD plots) is combined with the camera-pose estimate to compute an initial estimate for the IMU pose (Section III-A). In the next step, both these estimates are used to initialize the corresponding variables in the EKF estimator. By sequentially processing additional measurements from the camera and the IMU, the EKF is able to refine the initial estimate for the unknown transformation while simultaneously tracking the position, velocity, and attitude of the two sensors (Section III-BIII-E).

### A. Filter Initialization

The purpose of this process is to determine the initial estimate for the IMU pose , where GpI denotes the position of the IMU with respect to the global frame of reference and is the rotation quaternion between the IMU and the global frames.

We first compute an estimate for the camera pose using visual features (corners of the squares in the calibration pattern) whose positions, Gpfi, are known in global coordinates. Specifically, the initial estimates of the depth to these features are computed using Ansar's method [4], while the initial estimate for the camera pose is determined by employing Horn's method [20]. Finally, a least-squares algorithm refines the camera-pose estimate and additionally computes its covariance [21].

In the next step of the initialization process, we use an approximate estimate for the unknown IMU-camera transformation . This was determined manually in our case but it can also be found using the CAD plots showing the IMU-camera placement. We should note that the requirement for an approximate estimate for the initial IMU-camera transformation is not limiting since it can also be determined by employing any hand–eye calibration algorithm. An initial estimate for the IMU pose is then computed from the following relations (cf. Fig. 1): TeX Source \eqalignno{{}^{ {G}}{\bf p}_{ {I}}&= {}^{ {G}}{\bf p}_{ {C}}- {{\bf C}}^{ {T}}({}^{ {C}}\bar{q}_{ {G}}) {{\bf C}}^{ {T}}({}^{ {I}}\bar{q}_{ {C}}) {}^{ {I}}{\bf p}_{ {C}}\cr{}^{ {I}}\bar{q}_{ {G}}&= {}^{ {I}}\bar{q}_{ {C}}\otimes {}^{ {C}}\bar{q}_{ {G}}&\hbox{(1)}}where is the rotational matrix corresponding to quaternion and ⊗ denotes quaternion multiplication. Finally, after computing the corresponding Jacobians [by linearizing (1)] and considering the uncertainty (covariance) in the estimates of and , the covariance of the initial IMU pose estimate is readily found.

### B. Filter Propagation

The EKF estimates the IMU pose and linear velocity as well as the unknown transformation (rotation and translation) between the camera and the IMU. Additionally, the filter estimates the biases in the IMU signals.

#### Continuous-Time System Model

We first derive the linearized continuous-time system model that describes the time evolution of the errors in the state estimates. Discretization of this model will allow us to employ the sampled measurements of the IMU for state propagation. The filter state is described by the vector TeX Source $${\bf x} = \left[^{{I}}\bar{q}_{ {G}}^{{ {T}}}\quad {{\bf b}_g^{{ {T}}}}\quad{^{{G}} {\bf v}_{ {I}}^{{ {T}}}}\quad {{\bf b}_a^{{ {T}}}}\quad {^{{G}}{\bf p}_{{I}}^{{ {T}}}}\quad {^{{I}}\bar{q}_{ {C}}^{{ {T}}}}\quad {^{ {I}}{\bf p}_{ {C}}^{{{T}}}}\right]^{{ {T}}}\eqno{\hbox{(2)}}$$where and are the quaternions that represent the orientation of the global frame and the camera frame in the IMU frame, respectively. The position and velocity of the IMU in the global frame are denoted by GpI(t) and GvI(t), respectively. IpC(t) is the position of the camera in the IMU frame, and bg and ba are the 3 × 1 bias vectors affecting the gyroscope and accelerometer measurements, respectively. These biases are typically present in the signals of inertial sensors, and need to be modeled and estimated, in order to attain accurate state estimates. In our study, the IMU biases are modeled as random walk processes driven by the zero-mean white Gaussian noise vectors nwg and nwa, respectively.

The system model describing the time evolution of the IMU state and of the IMU-camera transformation is given by the following equations [22], [23]: TeX Source \eqalignno{{{ }^{{ {I}}}\dot{\bar{q}}_{{ {G}}}}(t) &= { 1\over 2} {\bm \Omega} ({\bm \omega} (t)){ }^{{ {I}}}{\bar{q}}_{{ {G}}}(t)&\hbox{(3)}\cr {{}^{ {G}}\dot{{\bf p}}_{ {I}}}(t) &= {}^{ {G}}{\bf v}_{ {I}}(t) \qquad {}^{ {G}}{\dot{{\bf v}}_{ {I}}}(t) = {}^{ {G}}{\bf a} (t) &\hbox{(4)}\cr {\dot{{\bf b}}_g} (t)&= {\bf n}_{wg}(t) \qquad {\dot{{\bf b}}_a} (t) = {\bf n}_{wa} (t)&\hbox{(5)}\cr {{ }^{{ {I}}}\dot{\bar{q}}_{{ {C}}}}(t) &= {\bf 0}_{3\times 1} \qquad {{}^{ {I}}\dot{{\bf p}}_{ {C}}}(t) = {\bf 0}_{3\times 1}.&\hbox{(6)}}In these expressions, ω(t) = [ωx ωy ωz]T is the rotational velocity of the IMU, expressed in the IMU frame, and TeX Source $${\bm \omega} ({\bm \omega}) = \left[\matrix{ -\lfloor \omega \, \times \rfloor & \omega \cr -\omega ^{ {T}} & 0 }\right] \quad \lfloor \omega \, \times \rfloor =\left[\matrix{ 0 &-\omega _z & \omega _y \cr \omega _z & 0 & -\omega _x \cr -\omega _y & \omega _x & 0 }\right]$$Finally, Ga is the acceleration of the IMU, expressed in the global frame.

The gyroscope and accelerometer measurements, respectively, which are employed for state propagation, are modeled as TeX Source \eqalignno{{\bm \omega} _m(t) &= {\bm \omega} (t) + {\bf b}_g(t) + {\bf n}_g(t)&\hbox{(7)}\cr {\bf a}_m(t) &= {\bf C}({ }^{{ {I}}}{\bar{q}}_{{ {G}}}(t)) ({}^{ {G}}{\bf a}(t) - {}^{ {G}}{\bf g}) + {\bf b}_a(t) + {\bf n}_a(t)&\hbox{(8)}}where ng and na are zero-mean white Gaussian noise processes and Gg is the gravitational acceleration.

By applying the expectation operator on both sides of (3)(6), we obtain the state estimates' propagation equations: TeX Source \eqalignno{{{ }^{{ {I}}}\dot{\hat{\bar{q}}}_{{ {G}}}}(t) &= { 1\over 2} {\bm \Omega} (\hat{\bm \omega}(t)) { }^{{ {I}}}\hat{\bar{q}}_{{ {G}}}(t)&\hbox{(9)}\cr {}^{ {G}}\dot{\hat{{\bf p}}}_{ {I}}(t) &= {}^{ {G}}\hat{{\bf v}}_{ {I}}(t) \qquad {}^{ {G}}\dot{\hat{{\bf v}}}_{ {I}}(t) = {\bf C}^{ {T}}({ }^{{ {I}}}\hat{\bar{q}}_{{ {G}}}(t))\hat{{\bf a}}(t) + {}^{ {G}}{\bf g}\qquad&\hbox{(10)}\cr \dot{\hat{{\bf b}}}_g(t) &= {\bf 0}_{3\times 1} \qquad \dot{\hat{{\bf b}}}_a (t) = {\bf 0}_{3\times 1}&\hbox{(11)}\cr {{ }^{{ {I}}}\dot{\hat{\bar{q}}}_{{ {C}}}}(t) &= {\bf 0}_{3\times 1} \qquad {}^{ {I}}\dot{\hat{{\bf p}}}_{ {C}}(t) = {\bf 0}_{3\times 1}&\hbox{(12)}}with TeX Source $$\hat{{\bf a}}(t) = {\bf a}_m(t) -\hat{{\bf b}}_a(t)\ \hbox{and}\\hat{{\bm \omega} }(t) = {\bm \omega} _m (t) - \hat{{\bf b}}_g (t).\eqno{\hbox{(13)}}$$

The 21 × 1 filter error-state vector is defined as TeX Source $$\widetilde{{\bf x}} {=} \left[\matrix{ {^{ {I}}{\bm \delta \theta} _{ {G}}^{{{T}}}}\quad {\widetilde{{\bf b}}_g^{{ {T}}}}\quad {^{ {G}}\widetilde{{\bf v}}^{{ {T}}}_{{ {I}}}}\quad {\widetilde{{\bf b}}_a^{{ {T}}} }\quad {^{{G}}\widetilde{{\bf p}}_{ {I}}^{{ {T}}}}\quad {^{ {I}}{\bm \delta \theta} _{{C}}^{{ {T}}}}\quad {^{ {I}}\widetilde{{\bf p}}_{{C}}^T}}\right]^{{ {T}}}\eqno{\hbox{(14)}}$$For the IMU and camera positions, and the IMU velocity and biases, the standard additive error definition is used (i.e., the error in the estimate of a quantity x is ). However, for the quaternions, a different error definition is employed. In particular, if is is the estimated value of the quaternion , then the attitude error is described by the error quaternion: TeX Source $$\delta {\bar{q}}= {\bar{q}}\otimes { }^{}\hat{\bar{q}}_{}^{-1}\simeq \left[\matrix{ { 1\over 2} {{\bm\delta}{\bm\theta} }^{{ {T}}} &1 }\right]^{{ {T}}} .\eqno{\hbox{(15)}}$$Intuitively, the quaternion describes the (small) rotation that causes the true and estimated attitude to coincide. The main advantage of this error definition is that it allows us to represent the attitude uncertainty by the 3 × 3 covariance matrix E {δ θ δ θT}. Since the attitude corresponds to 3 DOF, this is a minimal representation.

The linearized continuous-time error-state equation is TeX Source $$\dot{ \widetilde{{\bf x}}} = {\bf F}_c \widetilde{{\bf x}} + {\bf G}_c {\bf n}\eqno{\hbox{(16)}}$$where TeX Source \eqalignno{{\bf F}_c &{=} {\left[\matrix{ -\lfloor \hat{{\bm \omega} }\, \times \rfloor & -{\bf I}_{3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 9} \cr {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 9} \cr -{{\bf C}}^{ {T}}({ }^{{ {I}}}\hat{\bar{q}}_{{ {G}}})\lfloor \hat{{\bf a}}\, \times \rfloor & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & -{{\bf C}}^{ {T}}({ }^{{ {I}}}\hat{\bar{q}}_{{ {G}}}) & {\bf 0}_{3\times 9}\cr {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 9}\cr {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf I}_{3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 9}\cr {\bf 0}_{6\times 3} & {\bf 0}_{6\times 3} & {\bf 0}_{6\times 3} & {\bf 0}_{6\times 3} & {\bf 0}_{6\times 9} }\right]} \cr {\bf G}_c &{=} { \left[\matrix{ -{\bf I}_{3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} \cr {\bf 0}_{3\times 3} & {\bf I}_{3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3}\cr {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & -{{\bf C}}^{ {T}}({ }^{{ {I}}}\hat{\bar{q}}_{{ {G}}}) & {\bf 0}_{3\times 3}\cr {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf I}_{3}\cr {\bf 0}_{3\times 3} & {\bf I}_{3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3}\cr {\bf 0}_{6\times 3} & {\bf 0}_{6\times 3} & {\bf 0}_{6\times 3} & {\bf 0}_{6\times 3} }\right]} \qquad {\bf n} = \left[\matrix{{\bf n}_g \cr {\bf n}_{wg} \cr {\bf n}_a \cr {\bf n}_{wa} }\right]}and I3 is the 3 × 3 identity matrix. The covariance Qc of the system noise depends on the IMU noise characteristics and is computed offline according to [24] and [25].

#### Discrete-Time Implementation

The IMU signals ωm and am are sampled at 100 Hz (i.e., T = 0.01 s). Every time a new IMU measurement is received, the state estimate is propagated using the fourth-order Runge–Kutta numerical integration of (9)(12). In order to derive the covariance propagation equation, we evaluate the discrete-time state transition matrix TeX Source $${\bm\Phi} _{k} = {\bm\Phi} (t_{k}+T, t_k) = {\rm exp } \left(\int _{t_k}^{t_{k}+T} {\bf F}_c(\tau) \rm d\tau \right)\eqno{\hbox{(17)}}$$and the discrete-time system noise covariance matrix TeX Source $${\bf Q}_d = \int _{t_k}^{t_k+T} {\bm\Phi} (t_{k+1},\tau) {\bf G}_c {\bf Q}_c {\bf G}_c^T \Phi ^T (t_{k+1},\tau) {\rm d} \tau .\eqno{\hbox{(18)}}$$The propagated covariance is then computed as TeX Source $${\bf P}_{k+1\vert k} = {\bm\Phi}_{k} {\bf P}_{k\vert k} \Phi_{k}^{{ {T}}} +{\bf Q}_d.$$

### C. Measurement Model

The IMU camera moves continuously and records images of a calibration pattern. These are then processed to detect and identify point features whose positions Gpfi are known with respect to the global frame of reference (centered and aligned with the checker-board pattern of the calibration target). Once this process is completed for each image, a list of point features along with their measured image coordinates (ui, vi) is provided to the EKF, which uses them to update the state estimates.

The projective camera measurement model employed is TeX Source $${\bf z}_i = \left[\matrix{ u_i \cr v_i }\right]+ {\bm\eta}_i = \left[\matrix{ {x_i}/{z_i} \cr {y_i}/{z_i} }\right] + {\bm\eta}_i = {\bf h}_i ({\bf x},\;^{{G}}{\!\bf p}_{f_i}) + {\bm\eta}_i\eqno{\hbox{(19)}}$$where (cf. Fig. 1) TeX Source $${}^{ {C}}{\bf p}_{f_i}\! =\!\left[\matrix{ x_i\cr y_i\cr z_i }\right]\! =\! {{\bf C}}({}^{{ {C}}}{\bar{q}}_{{ {I}}}) {{\bf C}}({}^{{ {I}}}{\bar{q}}_{{ {G}}}) \left({}^{ {G}}{\bf p}_{f_i}\! -\! {}^{ {G}}{\bf p}_{ {I}}\right) - {{\bf C}}({}^{{ {C}}}{\bar{q}}_{{ {I}}}) {}^{ {I}}{\bf p}_{ {C}}$$and ηi is the feature-measurement noise with covariance R}i = σi2I2.

The measurement Jacobian matrix Hi is TeX Source $${\bf H}_i = {{\bf J}_{\rm cam}^i \left[\matrix{ {\bf J}_{{\bf \theta }_{ {G}}}^i & {\bf 0}_{3\times 9} & {\bf J}_{{\bf p}_{ {I}}}^i & {\bf J}_{{\bf \theta }_{ {C}}}^i & {\bf J}_{{\bf p}_c}^i}\right]}\eqno{\hbox{(20)}}$$with TeX Source \eqalignno{{\bf J}_{\rm cam}^i &= { 1\over \hat{z}_i^2}\left[\matrix{ {\hat{z}_i} & 0 & -{\hat{x}_i}\cr 0 & {\hat{z}_i} & -{\hat{y}_i} }\right]&\hbox{(21)}\cr{\bf J}_{{\bf \theta }_{ {G}}}^i &= {{\bf C}}({ }^{{ {C}}}\hat{\bar{q}}_{{ {I}}}) \lfloor {{\bf C}}({ }^{{ {I}}}\hat{\bar{q}}_{{ {G}}})(^{ {G}}{\bf p}_{f_i}-{ }^{{ {G}}}\hat{ {\bf p}}_{{ {I}}})\, \times \rfloor \cr{\bf J}_{{\bf \theta }_{ {C}}}^i &= -{{\bf C}}({ }^{{{C}}}\hat{\bar{q}}_{{ {I}}})\lfloor {{\bf C}}({ }^{{{I}}}\hat{\bar{q}}_{{ {G}}}) (^{ {G}}{\bf p}_{f_i}-{ }^{{ {G}}}\hat{{\bf p}}_{{ {I}}}) - { }^{{ {I}}}\hat{ {\bf p}}_{{ {C}}}\, \times\rfloor \cr{\bf J}_{{\bf p}_I}^i &= -{{\bf C}}({ }^{{ {C}}}\hat{\bar{q}}_{{ {I}}}) {{\bf C}}({ }^{{ {I}}}\hat{\bar{q}}_{{ {G}}}) \qquad {\bf J}_{{\bf p}_c}^i = - {{\bf C}}({ }^{{ {C}}}\hat{\bar{q}}_{{ {I}}}) \cr\left[\matrix{ \hat{x}_i\cr \hat{y}_i\cr \hat{z}_i }\right] &= {{\bf C}}({}^{{ {C}}}\hat{\bar{q}}_{{ {I}}}) {{\bf C}}({}^{{{I}}}\hat{\bar{q}}_{{ {G}}}) \left({}^{ {G}}{\bf p}_{f_i} - {}^{ {G}}\hat{\bf p}_{ {I}}\right) - {{\bf C}}({}^{{ {C}}}\hat{\bar{q}}_{{ {I}}}) {}^{ {I}}\hat{\bf p}_{ {C}} .}

When observations to N features are available concurrently, we stack these in one measurement vector z = [z1T⋅⋅⋅zNT]T to form a single batch-form update equation. Similarly, the batch measurement Jacobian matrix is defined as H = [H1T⋅⋅⋅HNT]T. Finally, the measurement residual is computed as TeX Source $${\bf r}\; {\buildrel \triangle \over =}\; {\bf z} - {\bf \hat{z}} \simeq {\bf H} \widetilde{{\bf x}} + {\bm\eta}\eqno{\hbox{(22)}}$$where η = [η1T…;ηNT]T is the measurement noise with covariance R = Diag(Ri), i = 1,…,N.

### D. Iterated EKF Update

In order to increase the accuracy and numerical stability in the face of the highly nonlinear measurement model, we employ the iterated EKF [26], [27] to update the state. The iterative scheme proceeds as follows.

At each iteration step j, the following steps are done.

1. Compute as a function of the current j th iterate using the measurement function (19).

2. Evaluate the measurement Jacobian matrix Hj [cf. (20)] using the current iterate .

3. Form the residual , and compute its covariance Sj = HjPk+1| kHjT+R.

4. Using the Kalman gain Kj = Pk+1| kHjT(Sj)−1, compute the correction TeX Source $$\Delta {\bf x}^j ={\bf K}^j({\bf r}^j+{\bf H}^j\!\Delta {\bf x}^{j-1})\eqno{\hbox{(23)}}$$with Δx0 = 021 × 1, necessary for determining the next iterate of the updated state estimate .

The iteration begins using the propagated state estimate as the zeroth iterate, which makes the first iteration equivalent to a regular EKF update. This process is repeated till the reduction in the cost function TeX Source $$J^j = {{\widetilde{{\bf x}}}}^{j^T} {\bf P}_{k+1\vert k}^{-1} {\widetilde{{\bf x}}}^{j} + {\bf r}^{j^{T}} {\bf R}^{-1} {\bf r}^j\eqno{\hbox{(24)}}$$with falls below the threshold τ = max (0.01,0. 001 × Jj−1), or when a maximum number of iterations is reached [28]. Finally, the covariance matrix for the current state is updated using the values for K and S from the last iteration: TeX Source $${\bf P}_{k+1\vert k+1} = {\bf P}_{k+1\vert k} - {\bf K} {\bf S} {\bf K}^T.\eqno{\hbox{(25)}}$$

### E. Outlier Rejection

Before using the detected features in the measurement update, we employ a Mahalanobis-distance test to detect and reject mismatches or very noisy observations. Every time a new measurement becomes available, we compute the following Mahalanobis distance: TeX Source $$\chi ^2 = \left({\bf z}_{i_{k}} - \hat{{\bf z}}_{i_{k}} \right)^T{\bf S}_i^{-1} \left({\bf z}_{i_{k}} - \hat{{\bf z}}_{i_{k}} \right).\eqno{\hbox{(26)}}$$In this equation, zik is the measurement of the i th landmark at time step k, is the expected measurement of the same landmark based on the latest state estimate, and Si = HiPk+1| kHi+Ri is the covariance of the corresponding measurement residual. A probabilistic threshold on χ2 is used to specify whether the measurement is reliable or not. Measurements that pass this test are processed by the iterative update procedure as described before.

SECTION IV

## Observability Analysis

A system is observable if its state at a certain time instant can be uniquely determined given a finite sequence of its outputs [29]. Intuitively, this means that the measurements of an observable system provide sufficient information for estimating its state. In contrast, the state vector of unobservable systems cannot be recovered regardless of the duration of the estimation process. In dynamic systems that contain a number of constant but unknown quantities in their state vector (e.g. , the IMU-camera 6-DOF transformation in our case), system observability results in an additional interesting property: given sufficient number of measurements, the unknown constants can be estimated with arbitrarily small uncertainty [30]. For this reason, it is important to study the observability of the system describing the IMU-camera calibration process.

The observability of linear time-invariant systems can be investigated by employing any of the well-known observability tests such as the rank of the Gramian matrix [15] or the Popov–Belevitch–Hautus (PBH) test [31]. However, due to the time-invariance requirement, the PBH test cannot be applied to (linearized) nonlinear systems since these are usually time varying (i.e., the current state used as the linearization point changes dynamically). On the other hand, application of the Gramian matrix criterion, which does not assume time invariance, to linearized systems in 2-D [32] often becomes intractable when extended to 3-D.

An alternative tool for studying the observability properties of nonlinear systems is the observability rank condition based on Lie derivatives [17]. Bonnifait and Garcia [33] were the first to employ this method for examining the observability of map-based bearing-only single-robot localization in 2-D. Later on, Martinelli and Siegwart [34] used Lie derivatives to analyze the observability of cooperative localization for pairs of mobile robots navigating in 2-D. In a related problem, Mariottini et al. [35] investigated the observability of 2-D leader-follower formations based on Lie derivatives and the observability rank condition. Recently, Lie derivatives were also used for examining the observability of the single-robot simultaneous localization and mapping (SLAM) in 2-D [36], and the camera-odometry extrinsic calibration process in 2-D [37]. However, to the best of our knowledge, there exists no similar analysis for sensors or robots navigating in 3-D. Furthermore, the observability of 3-D systems with additional unknown parameters, such as the 6-DOF IMU-camera transformation, has never been considered.

In this paper for the first time, we study the observability of the nonlinear system describing the IMU-camera calibration process. Specifically, after a brief review of the Lie derivatives and the observability rank condition (Section IV-A), in Section IV-B, we prove that the IMU-camera calibration system is locally weakly observable when at least two rotations about different axes are performed.

### A. Nonlinear Observability

Consider the state-space representation of the following infinitely smooth nonlinear system: TeX Source $$\cases{ \dot{{\bf x}} = {\bf f}({\bf x}, {\bf u}) \cr {\bf y} = {\bf h}({\bf x}) }\eqno{\hbox{(27)}}$$where is the state vector, is the vector of control inputs, and is the measurement vector, with yk = hk(x), k = 1, …, m.

If the process function, f, is input-linear [38], it can be separated into a summation of independent functions, each one corresponding to a different component of the control input vector. In this case, (27) can be written as TeX Source $$\cases{ \dot{{\bf x}} = {\bf f}_0({\bf x}) + {\bf f}_1({\bf x}) u_1 + \cdots + {\bf f}_l({\bf x}) u_l\cr {\bf y} = {\bf h}({\bf x}) }\eqno{\hbox{(28)}}$$where f0 is the zero-input function of the process model.

The zeroth-order Lie derivative of any (scalar) function is the function itself, i.e., . The first-order Lie derivative of function hk(x) with respect to fi is defined as TeX Source \eqalignno{{{\L}}_{{\bf f}_i}^{1} h_k({\bf x}) &= { \partial h_k({\bf x})\over \partial x_1} f_{i1}({\bf x}) + \cdots + { \partial h_k({\bf x})\over \partial x_n} f_{in}({\bf x}) \cr&= \nabla h_k({\bf x}) \cdot {\bf f}_i({\bf x})&\hbox{(29)}}where fi(x) = [fi1(x),…, fin(x)]T, ∇ represents the gradient operator, and '·' denotes the vector inner product. Considering that is a scalar function itself, the second-order Lie derivative of hk(x) with respect to fi is TeX Source $${{\L}}_{{\bf f}_i}^{2} h_k({\bf x}) = {{\L}}_{{\bf f}_i}^{1} \left({{\L}}_{{\bf f}_i}^{1} h_k({\bf x})\right) = \nabla {{\L}}_{{\bf f}_i}^{1} h_k({\bf x}) \cdot {\bf f}_i({\bf x}).\eqno{\hbox{(30)}}$$Higher order Lie derivatives are computed similarly. Additionally, it is possible to define mixed Lie derivatives, i.e., with respect to different functions of the process model. For example, the second-order Lie derivative of hk with respect to fj and fi, given its first derivative with respect to fi, is TeX Source $${{\L}}_{{\bf f}_j {\bf f}_i}^{2} h_k({\bf x}) = {{\L}}_{{\bf f}_j}^{1} \left({{\L}}_{{\bf f}_i}^{1} h_k({\bf x})\right) = \nabla {{\L}}_{{\bf f}_i}^{1} h_k({\bf x}) \cdot {\bf f}_j({\bf x}).\eqno{\hbox{(31)}}$$

Based on the preceding expressions for the Lie derivatives, the observability matrix is defined as the matrix with rows: TeX Source $${\cal O} {\buildrel \triangle \over =} \{{\nabla {{\L}}_{{\bf f}_i\cdots {\bf f}_j}^{\ell } h_k({\bf x}) \vert i, j=0,\ldots, l; k = 1,\ldots, m; \ell \in \hbox{IN} }\} .\eqno{\hbox{(32)}}$$

The important role of this matrix in the observability analysis of a nonlinear system is captured by the following proposition [17].

Proposition 1 (Observability rank condition): If the observability matrix [cf. (32)] of the nonlinear system defined in (27) is full rank, then the system is locally weakly observable.

Remark 1: Since the process and measurement functions [cf. (27)] are infinitely smooth, the observability matrix can have infinite number of rows. However, to prove that is full rank, it suffices to show that a subset of its rows are linearly independent.

In general, there exists no systematic method for selecting the suitable Lie derivatives and corresponding rows of when examining the observability of a system. Instead, this selection is performed by sequentially considering the directions of the state space along which the gradient of each of the candidate Lie derivatives provides information.

### B. Observability of IMU-Camera Calibration

In this section, we compute the observability matrix of the system describing the IMU-camera calibration process and prove that it is full rank.

First and in order to simplify the notation, we retain only few of the subscripts describing the variables in the system state vector [cf. (2)]: TeX Source $${{\bf x}(t)} = \left[\matrix{ \bar{q}_{ {I}}^{{ {T}}}\quad {\bf b}_g^{{ {T}}}\quad {\bf v}^{{ {T}}}\quad {\bf b}_a^{{ {T}}}\quad {\bf p}_{ {I}}^{{ {T}}}\quad \bar{q}_{ {C}}^{{ {T}}}\quad {\bf p}_{ {C}}^{{ {T}}} }\right]^{{ {T}}}.\eqno{\hbox{(33)}}$$

Then, we rearrange the nonlinear kinematic equations (3)(6) in a suitable format for computing the Lie derivatives: TeX Source $$\left[\!\matrix{\dot{\bar{q}}_{ {I}}\cr \dot{{\bf b}}_g\cr \dot{{\bf v}}\cr \dot{{\bf b}}_a\cr \dot{{\bf p}}_{ {I}}\cr \dot{\bar{q}}_{ {C}}\cr \dot{{\bf p}}_{ {C}}}\!\right] \!=\! \underbrace{ \left[\matrix{ -{ 1\over 2} {\bm \Xi} (\bar{q}_{ {I}}) {\bf b}_g \cr {\bf 0}_{3\times 1} \cr {\bf g}\! -\! {{\bf C}}^{ {T}}(\bar{q}_{ {I}}){\bf b}_a \cr {\bf 0}_{3\times 1} \cr {\bf v}\cr {\bf 0}_{3\times 1} \cr {\bf 0}_{3\times 1} }\right]}_{{\bf f}_0}\! +\! \underbrace{\left[\matrix{ { 1\over 2} {\bm \Xi} (\bar{q}_{ {I}}) \cr {\bf 0}_{3\times 3} \cr {\bf 0}_{3\times 3} \cr {\bf 0}_{3\times 3} \cr {\bf 0}_{3\times 3} \cr {\bf 0}_{3\times 3} \cr {\bf 0}_{3\times 3} }\right]}_{\underline{{\bf f}}_1} {{\bm \omega} _m }\! +\! \underbrace{\left[\matrix{ {\bf 0}_{3\times 3} \cr {\bf 0}_{3\times 3} \cr {{\bf C}}^{ {T}}(\bar{q}_{ {I}})\cr {\bf 0}_{3\times 3} \cr {\bf 0}_{3\times 3} \cr {\bf 0}_{3\times 3} \cr {\bf 0}_{3\times 3} }\right]}_{\underline{{\bf f}}_2}{\bf a}_m\eqno{\hbox{(34)}}$$where ωm and am are considered the control inputs, and TeX Source $${\bm \Xi} (\bar{q}) = \left[\matrix{ q_4 {\bf I}_{3\times 3} + \lfloor {\bf q}\, \times \rfloor \cr -{\bf q}^{ {T}}}\right] \hbox{with}\ \bar{q} =\left[\matrix{ {\bf q} \cr q_4 }\right].\eqno{\hbox{(35)}}$$Also, note that f0 is a 23 × 1 vector, while f1 and f2 are both compact representations of three vectors of dimension 23 × 1, i.e., TeX Source $$\underline{{\bf f}}_1 {{\bm \omega} _m } = {\bf f}_{11} \omega _{m1} + {\bf f}_{12} \omega _{m2} + {\bf f}_{13} \omega _{m3}$$where, for i = 1, …, 3, f1i denotes the i th column vector comprising f1 and ωmi is the i th scalar component of the rotational velocity vector.

A well-known result that we will use in the observability analysis of (34) is the following: When four or more 1 known features are detected in each calibration image processed by the filter, the camera pose is observable [39] and can be computed in closed-form [4]. Based on this fact, we replace the measurement equation [cf. (19)] with the following pair of inferred measurements of the camera pose expressed with respect to the global frame of reference: TeX Source \eqalignno{{}^{ {G}}\bar{q}_{ {C}}&= {\bm\xi}_1 ({\bf z}_1,{\bf z}_2,{\bf z}_3,{\bf z}_4) = {\bf h}_1^\ast ({\bf x}) = {\bf J} \bar{q}_{ {I}}\otimes \bar{q}_{ {C}}&\hbox{(36)}\cr {}^{ {G}}{\bf p}_{ {C}}&= {\bm\xi}_2 ({\bf z}_1,{\bf z}_2,{\bf z}_3,{\bf z}_4) = {\bf h}_2^\ast ({\bf x}) = {\bf p}_{ {I}}+ {{\bf C}}^{ {T}}(\bar{q}_{ {I}}) {\bf p}_{ {C}}&\hbox{(37)}}where is the rotational matrix corresponding to the quaternion , ⊗ denotes quaternion multiplication, and TeX Source $${\bf J} {\bar{q}}_{ {I}}= {\bar{q}}_{ {I}}^{-1}, {\bf J}\; {\buildrel \triangle \over =}\; \left[\matrix{ -{\bf I}_{3\times 3} & 0 \cr 0 & 1 }\right].\eqno{\hbox{(38)}}$$At this point, we should note that the functions ξ1 and ξ2 in (36) and (37) need not to be known explicitly. Instead, what is required for the observability analysis is their functional relation with the random variables, and pI, and the unknown parameters, and pC, appearing in the system's state vector.

Furthermore, we enforce the unit-quaternion constraints by employing the following additional measurement equations: TeX Source \eqalignno{ & h_3^\ast ({\bf x}) = \bar{q}_{ {I}}^{ {T}}\bar{q}_{ {I}}- 1 = 0&\hbox{(39)}\cr & h_4^\ast ({\bf x}) = \bar{q}_{ {C}}^{ {T}}\bar{q}_{ {C}}- 1 = 0.&\hbox{(40)}}

According to Remark 1, it suffices to show that a subset of the rows of the observability matrix [cf. (32)] are linearly independent. In the remaining of this section, we prove that the system described by (34) and (36)(40) is observable by computing among the candidate zeroth-, first-, and second-order Lie derivatives of h1∗, h2∗, and h3∗, the ones whose gradients ensure that is full rank.

1) Zeroth-Order Lie Derivatives (): By definition, the zeroth-order Lie derivative of a function is the function itself, i.e., TeX Source \eqalignno{&{{{\L}}}^0 {\bf h}_1^\ast = {\bf h}_1^\ast = \bar{q}_{{ {I}}}^{-1} \otimes \bar{q}_{{ {C}}}&\hbox{(41)}\cr&{{{\L}}}^0 {\bf h}_2^\ast = {\bf h}_2^\ast = {\bf p}_{{ {I}}} + {{\bf C}}^{T}(\bar{q}_{{ {I}}}){\bf p}_{{ {C}}}&\hbox{(42)}\cr&{{{\L}}}^0 h_3^\ast = h_3^\ast = \bar{q}_{ {I}}^{ {T}}\bar{q}_{ {I}}- 1.&\hbox{(43)}}Therefore, the gradients of the zeroth-order Lie derivatives are exactly the same as the Jacobians of the corresponding measurement functions: TeX Source \eqalignno{&\nabla {{{\L}}}^0 {\bf h}_1^\ast = \left[\matrix{ {\cal R}(\bar{q}_{{ {C}}}) {\bf J} & {\bf 0}_{4\times 12} & {\cal L}({\bf J} \bar{q}_{{ {I}}}) & {\bf 0}_{4\times 3} }\right]&\hbox{(44)}\cr &\nabla {{{\L}}}^0 {\bf h}_2^\ast = \left[\matrix{{\bm \Psi} (\bar{q}_{{ {I}}},{\bf p}_{{ {C}}}) & {\bf 0}_{3\times 9} & {\bf I}_{3\times 3} & {\bf 0}_{3\times 4} & {{\bf C}}^{ {T}}(\bar{q}_{{ {I}}}) }\right]\qquad&\hbox{(45)}\cr &\nabla {{{\L}}}^0 h_3^\ast = \left[\matrix{ 2{\bar{q}}_{ {I}}^{ {T}} & {\bf 0}_{1\times 19} }\right]&\hbox{(46)}}where, for a quaternion and a vector p, we define TeX Source \eqalignno{ & {{\cal L}({\bar{q}})\; {\buildrel \triangle \over =}\; \left[\matrix{ q_4 {\bf I}_{3\times 3} - \lfloor {\bf q}\, \times \rfloor & {\bf q} \cr -{\bf q}^{ {T}} & q_4 }\right]}&\hbox{(47)}\cr & {{\cal R}({\bar{q}}) \;{\buildrel \triangle \over =}\; \left[\matrix{ q_4 {\bf I}_{3\times 3} + \lfloor {\bf q}\, \times \rfloor & {\bf q} \cr -{\bf q}^{ {T}} & q_4 }\right]}&\hbox{(48)}}and TeX Source $$\bm\Psi ({\bar{q}},{\bf p})\ {\buildrel \triangle \over =}\ { \partial \over \partial {\bar{q}}} {{\bf C}}^{ {T}}({\bar{q}}) {\bf p}. \eqno{\hbox{(49)}}$$Also, note that for deriving (44), we have used the following identities [23]: TeX Source \eqalignno{\bar{q}_{{ {I}}}^{-1} \otimes \bar{q}_{{ {C}}} &= {\cal R} (\bar{q}_{{ {C}}}) \bar{q}_{{ {I}}}^{-1} = {\cal R} (\bar{q}_{{ {C}}}) {\bf J} \bar{q}_{{ {I}}}\cr &={\cal L} ({\bf J}\bar{q}_{{ {I}}}^{-1}) \bar{q}_{{ {C}}} = {\cal L}({\bf J} \bar{q}_{{ {I}}}) \bar{q}_{{ {C}}}} TeX Source $${\cal O} = \left[\matrix{ \nabla {{{\L}}}^{0} {\bf h}_1^\ast \cr \nabla{{{\L}}}^{0} {\bf h}_2^\ast \cr \nabla {{{\L}}}_{{\bf f}_0}^1 {\bf h}_1^\ast \cr \nabla {{{\L}}}_{{\bf f}_0}^1 {\bf h}_2^\ast \cr \nabla{{{\L}}}_{\underline{{\bf f}}_{1ij}}^1 {\bf h}_2^\ast \cr \nabla{{{\L}}}^0 h_3^\ast \cr \nabla {{{\L}}}_{{\bf f}_0}^2 {\bf h}_2^\ast}\right] = \left[\matrix{ {\cal R}(\bar{q}_{{ {C}}}) {\bf J} & {\bf0}_{4\times 3} & {\bf 0}_{4\times 3} & {\bf 0}_{4\times 3} & {\bf0}_{4\times 3} & {\cal L}({\bf J} \bar{q}_{{ {I}}}) & {\bf0}_{4\times 3} \cr {\bm\Psi} (\bar{q}_{{ {I}}},{\bf p}_{{ {C}}}) &{\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} &{\bf I}_{3\times 3} & {\bf 0}_{3\times 4} & {{\bf C}}^{{T}}(\bar{q}_{{ {I}}})\cr {\bf X}_1 & -{ 1\over 2} {\cal R}({\bar{q}}_{ {C}}) {\bf J} {\bm\Xi} ({\bar{q}}_{ {I}}) & {\bf0}_{4\times 3} & {\bf 0}_{4\times 3} & {\bf 0}_{4\times 3} & {\bf X}_2 & {\bf 0}_{4\times 3} \cr {\bf X}_3 & {\bf X}_4 & {\bf I}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf0}_{3\times 4} & {\bf X}_5\cr {\bm\Gamma} _{ij}({\bar{q}}_{ {I}},{\bf p}_{ {C}}) & {\bf 0}_{6\times 3} & {\bf 0}_{6\times 3} & {\bf0}_{6\times 3} & {\bf 0}_{6\times 3} & {\bf 0}_{6\times 4} & {\bm\Upsilon} _{ij} ({\bar{q}}_{ {I}})\cr 2{\bar{q}}_{ {I}}^{ {T}} & {\bf0}_{1\times 3} & {\bf 0}_{1\times 3} & {\bf 0}_{1\times 3} & {\bf0}_{1\times 3} & {\bf 0}_{1\times 4} & {\bf 0}_{1\times 3}\cr {\bf X}_6 & {\bf X}_7 & {\bf 0}_{3\times 3} & -{{\bf C}}^{{T}}({\bar{q}}_{ {I}}) & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 4} &{\bf X}_8 }\right]\eqno{\hbox{(50)}}$$

2) First-Order Lie Derivatives (): The first-order Lie derivatives of h1∗ and h2∗ with respect to f0 are computed as (50), shown at the bottom of the page TeX Source \eqalignno{&{{{\L}}}^1_{{\bf f}_0} {\bf h}_1^\ast = \nabla {{{\L}}}^{0} {\bf h}_1^\ast \cdot {\bf f}_0 = {\textstyle -{ 1\over 2} } {\cal R}({\bar{q}}_{ {C}}) {\bf J} {\bm\Xi} ({\bar{q}}_{ {I}}) {\bf b}_g&\hbox{(51)}\cr &{{{\L}}}^1_{{\bf f}_0} {\bf h}_2^\ast = \nabla {{{\L}}}^{0} {\bf h}_2^\ast \cdot {\bf f}_0 = {\textstyle -{ 1\over 2} } {\bm \Psi} ({\bar{q}}_{ {I}},{\bf p}_{ {C}}) {\bm\Xi} ({\bar{q}}_{ {I}}) {\bf b}_g + {\bf v}&\hbox{(52)}}while their gradients are given by TeX Source \eqalignno{&\nabla {{{\L}}}^1_{{\bf f}_0} {\bf h}_1^\ast = \left[\matrix{ {\bf X}_1 & -{ 1\over 2} {\cal R}({\bar{q}}_{ {C}}) {\bf J} {\bm\Xi} ({\bar{q}}_{ {I}}) & {\bf 0}_{4\times 9} & {\bf X}_2 & {\bf 0}_{4\times 3} }\right]\cr &\nabla {{{\L}}}^1_{{\bf f}_0} {\bf h}_2^\ast = \left[\matrix{ {\bf X}_3 & {\bf X}_4 & {\bf I}_{3\times 3} & {\bf 0}_{3\times 10} & {\bf X}_5 }\right].&\hbox{(53)}}In these last expressions, Xi, i = 1, …, 5, are matrices of appropriate dimensions (4 × 4 the first two, 3 × 4 the third one, and 3 × 3 the last two) which, regardless of their values, will be eliminated in the following derivations; hence, they need not be computed explicitly.

The next first-order Lie derivative of interest is that of h2∗ with respect to f1, i.e., . At this point, we remind the reader that f1 as defined in (34) is a compact representation of three column vectors. Similarly, we can also write the resulting Lie derivative in a compact form (i.e., a 3 × 3 matrix): TeX Source $${{{\L}}}^1_{\underline{{\bf f}}_1} {\bf h}_2^\ast = \nabla {{{\L}}}^{0} {\bf h}_2^\ast \cdot \underline{{\bf f}}_1 = { 1\over 2} {\bm\Psi} ({\bar{q}}_{ {I}},{\bf p}_{ {C}}) {\bm\Xi} ({\bar{q}}_{ {I}}).\eqno{\hbox{(54)}}$$The gradients of the three columns of stacked together give TeX Source $$\nabla {{{\L}}}^1_{\underline{{\bf f}}_{1}}{\bf h}_2^\ast = \left[\matrix{ {\bm \Gamma} ({\bar{q}}_{ {I}},{\bf p}_{ {C}}) & {\bf 0}_{9\times 16} & {\bm \Upsilon} ({\bar{q}}_{ {I}}) }\right]\eqno{\hbox{(55)}}$$where the matrices TeX Source $${{\bm\Gamma}} ({\bar{q}}_{ {I}},{\bf p}_{ {C}}) = \left[\matrix{{{\bm\Gamma}} _1 ({\bar{q}}_{ {I}},{\bf p}_{ {C}}) \cr{\bm\Gamma} _2 ({\bar{q}}_{ {I}},{\bf p}_{ {C}}) \cr{\bm\Gamma} _3 ({\bar{q}}_{ {I}},{\bf p}_{ {C}}) }\right] \qquad {\bm\Upsilon} ({\bar{q}}_{ {I}}) = \left[\matrix{ {\bm\Upsilon} _1({\bar{q}}_{ {I}}) \cr{\bm\Upsilon} _2({\bar{q}}_{ {I}}) \cr{\bm\Upsilon} _3({\bar{q}}_{ {I}}) }\right]\eqno{\hbox{(56)}}$$of dimensions 9 × 4 and 9 × 3, respectively, have block-row elements (for i = 1, …, 3) TeX Source \eqalignno{{\bm\Gamma} _i ({\bar{q}}_{ {I}},{\bf p}_{ {C}}) &= { \partial \over \partial {\bar{q}}_{ {I}}} \left[\left({{{\L}}}^1_{\underline{{\bf f}}_{1}}{\bf h}_2^\ast \right) {\bf e}_i \right] \cr{\bm\Upsilon} _i ({\bar{q}}_{ {I}}) &= { \partial \over \partial {\bf p}_{ {C}}} \left[\left({{{\L}}}^1_{\underline{{\bf f}}_{1}}{\bf h}_2^\ast \right) {\bf e}_i \right]}with e1 = [1 0 0]T, e2 = [0 1 0]T, and e3 = [0 0 1]T.

Note that inclusion of all the block-row elements of the gradient (55) in the observability matrix [cf. (50)] implies that all components of ωm are nonzero. However, as it will become evident later on, in order to prove observability, only two of the elements of ωm need to be nonzero. In such case, matrix will contain the block matrices TeX Source $${\bm\Gamma} _{ij} ({\bar{q}}_{ {I}},{\bf p}_{ {C}}) = \left[\matrix{ {\bm\Gamma} _i ({\bar{q}}_{ {I}},{\bf p}_{ {C}}) \cr {\bm\Gamma} _j ({\bar{q}}_{ {I}},{\bf p}_{ {C}}) }\right] \qquad {\bm\Upsilon} _{ij} ({\bar{q}}_{ {I}}) = \left[\matrix{ {\bm\Upsilon} _i({\bar{q}}_{ {I}}) \cr {\bm\Upsilon} _j({\bar{q}}_{ {I}}) }\right]\eqno{\hbox{(57)}}$$with i,j = 1, …, 3, ij, instead of and .

3) Second-Order Lie Derivative (): Finally, we compute the second-order Lie derivative of h2∗ with respect to f0: TeX Source \eqalignno{{{{\L}}}^2_{{\bf f}_0} {\bf h}_2^\ast &= {{{\L}}}^1_{{\bf f}_0}{{{\L}}}^1_{{\bf f}_0} {\bf h}_2^\ast = \nabla {{{\L}}}^1_{{\bf f}_0} {\bf h}_2^\ast \cdot {\bf f}_0 \cr &= -{ 1\over 2} {\bf X}_3 \, \bm\Xi ({\bar{q}}_{ {I}}) {\bf b}_g + {\bf g} - {{\bf C}}^{ {T}}({\bar{q}}_{ {I}}){\bf b}_a&\hbox{(58)}}and its gradient TeX Source $$\nabla {{{\L}}}^2_{{\bf f}_0} {\bf h}_2^\ast = \left[\matrix{ {\bf X}_6 & {\bf X}_7 & {\bf 0}_{3\times 3} & -{{\bf C}}^{ {T}}({\bar{q}}_{ {I}}) & {\bf 0}_{3\times 7} & {\bf X}_8}\right]\eqno{\hbox{(59)}}$$where the matrices X5, X6, and X7 (of dimensions 3 × 4 the first one and 3 × 3 the last two) will be eliminated in the ensuing derivations, and therefore, we do not need to compute them explicitly.

Stacking together all the previously computed gradients of the Lie derivatives, we form the observability matrix shown in (50).

In order to prove that the system described by (34) and (36)(40) is observable, we employ the result of Proposition 1 and show that matrix is full rank (i.e. , the state space of the system is spanned by the gradients of the Lie derivatives of the measurement functions [17], [18]). Before presenting the main result of this section (cf. Lemma 3), we first state the following two lemmas whose proofs are detailed in Appendix I.

Lemma 1: The matrix TeX Source $${\bf A}_{ij} = \left[\matrix{ {\bm\Gamma} _{ij} & {\bm\Upsilon} _{ij} \cr 2{\bar{q}}_{ {I}}^{ {T}} & {\bf 0}_{1\times 3} }\right]\eqno{\hbox{(60)}}$$formed by the fifth and sixth block-row elements of the first and last block-columns of the observability matrix [cf. (50)], with Γij and ϒij defined in (57), is full rank.

Corollary 1: The matrix described by (60) is full rank if the IMU-camera rig is rotated about at least two different axes.

Note that only two block rows of and [cf. (56)]—the ones corresponding to two nonzero components of ωm—are included in Aij [cf. (60)]. Therefore, the third component of ωm can be zero (i.e., no rotation around the corresponding axis) without affecting the observability properties of the system.

Lemma 2: For any unit-quaternions and , matrix is full rank.

Lemma 3: The observability matrix [cf. (50)] is full rank when the IMU-camera rig is rotated about at least two different axes.

Proof: Here, we provide a sketch of the proof based on block Gaussian elimination (for details please see [40]). We start by employing Lemma 1 and Corollary 1 to eliminate all the matrices in the first and last columns of . The next step is to eliminate X2 using , i.e., the (1,6) block element of in (50). Note that is unit quaternion and [cf. (38), (40), and (47)]. Finally, since is full rank (cf. Lemma 2), it can be used to eliminate X4 and X7. Following these steps, reduces to TeX Source $${\left[\matrix{ {\bf 0}_{4\times 4} & {\bf 0}_{4\times 3} & {\bf 0}_{4\times 3} & {\bf 0}_{4\times 3} & {\bf 0}_{4\times 3} & {\bf I}_{4\times 4} & {\bf 0}_{4\times 3} \cr {\bf 0}_{3\times 4} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf I}_{3\times 3} & {\bf 0}_{3\times 4} & {\bf 0}_{3\times 3}\cr {\bf 0}_{3\times 4} & {\bf I}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 4} & {\bf 0}_{3\times 3} \cr {\bf 0}_{1\times 4} & {\bf 0}_{1\times 3} & {\bf 0}_{1\times 3} & {\bf 0}_{1\times 3} & {\bf 0}_{1\times 3} & {\bf 0}_{1\times 4} & {\bf 0}_{1\times 3} \cr {\bf 0}_{3\times 4} & {\bf 0}_{3\times 3} & {\bf I}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 4} & {\bf 0}_{3\times 3}\cr {\bf I}_{4 \times 4} & {\bf 0}_{4\times 3} & {\bf 0}_{4\times 3} & {\bf 0}_{4\times 3} & {\bf 0}_{4\times 3} & {\bf 0}_{4\times 4} & {\bf 0}_{4\times 3}\cr {\bf 0}_{3\times 4} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 4} & {\bf I}_{3\times 3}\cr {\bf 0}_{3\times 4} & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 3} & -{{\bf C}}^{ {T}}({\bar{q}}_{ {I}}) & {\bf 0}_{3\times 3} & {\bf 0}_{3\times 4} & {\bf 0}_{3\times 3} }\right]}.\eqno{\hbox{(61)}}$$Considering that a property of a rotation matrix is that it is full rank (), it is easy to see that (61) is full rank, indicating that is also full rank.

Corollary 2: The system described by (34) and (36)(40) is observable regardless of the linear motion of the IMU-camera rig.

This is evident from the fact that for proving Lemma 3, we did not use any Lie derivatives with respect to f2 [cf. (34)]. Therefore, am, the measured linear acceleration can take arbitrary values without compromising the observability of the system. This observation has important practical implications when no significant linear motion is possible due to physical constraints (e.g., calibration of an IMU-camera rig in an indoor laboratory): the IMU-camera transformation can be accurately estimated even if no linear acceleration is exerted.

Remark 2: Since no noise is injected into the system along the directions of the IMU-camera transformation [cf. (6)], regardless of the observability of the system, the uncertainty of the IMU-camera transformation will never increase.

When the linearization of the IMU-camera calibration system is sufficiently accurate, this remark has the following important implication: running the estimation algorithm during periods when the observability conditions are not met (e.g., as a result of stopping the IMU-camera rig) will not decrease the accuracy of the IMU-camera estimates, although it might not improve their quality either. However, it is advisable to excite at least two degrees of rotational freedom for sufficiently long time at the beginning of the calibration process, so as to significantly reduce the error in the IMU-camera transformation and ensure the validity of the linear approximation.

SECTION V

## Simulation and Experimental Results

### A. Simulation Results

Fig. 2. Trajectory of the IMU-camera system for 15 s.
In order to validate the proposed EKF algorithm for estimating the IMU-camera transformation when ground truth is available, we have performed a number of simulation experiments. In our simulation setup, an IMU-camera rig moves in front of a calibration target containing 25 known features. These correspond to the vertices of a rectangular grid with 50 × 50 cm cell size, which is aligned with the yz plane (cf. Fig. 2). The camera is assumed to have 50° field of view. Additionally, the image measurements received at a rate of 10 Hz are distorted with noise of σ = 1 pixel. The IMU noise characteristics are the same as those of the ISIS IMU used in the real-world experiments (cf. Section V-B). The IMU measurements are received at 100 Hz.
Fig. 3. State-estimate error and 3 σ bounds for the IMU-camera transformation. Translation along axes x, y, and z. The initial error is cm.
Fig. 4. State-estimate error and 3σ bounds for the IMU-camera transformation. Rotation about axes x (roll), y (pitch), and z (yaw). The initial alignment errors are .

The initial alignment error for translation is set to cm with a standard deviation of 5 cm in each axis. The initial alignment error for rotation is set to [cf. (15)] with 3° standard deviation of uncertainty in each axis of rotation. Consequently, the filter state vector and error-state covariance matrix are initialized according to the process described in Section III-A.

Following the initialization step, the system performs a spiral motion within 3–5 m off the calibration pattern. The EKF processes the IMU and camera measurements, and concurrently estimates all the components of the state vector [cf. (2)]. The actual and estimated trajectories are shown in Fig. 2. For the duration of this simulation (only 15 s), 150 images were processed and, on the average, 21.7 landmarks were visible in each image. The state-estimate errors and their 3σ bounds for the 6-DOF transformation between the IMU and the camera in a typical simulation are shown in Figs. 3 and 4. As evident from these plots, even with a relatively large initial error for the IMU-camera transformation, the algorithm is still able to attain very accurate estimates of the calibration parameters. The final uncertainty (3σ) of the estimates is [0.96 0.84 0.90]T cm for translation and [0.072° 0.120° 0.120°]T for rotation.

#### General Motion Versus Rotation Only

In Section IV, we have shown that the system describing the IMU-camera calibration process is observable when the IMU-camera rig undergoes rotational motion even if no translation occurs. Hereafter, we examine the achievable accuracy for motions with and without translation after 100 s when the IMU-camera rig undergoes: 1) spiral motion (i.e., exciting all 6 DOF) and 2) pure rotation (i.e., exciting only the DOF corresponding to attitude). In all these simulations, the initial uncertainty of the IMU-camera translation and rotation are set to 15 cm and 9° deg (3σ) in each axis, respectively. A summary of these results is shown in Table I.

TABLE I Final Uncertainty (3σ) of the IMU-Camera Parameters After 100 s for Two Motion Scenarios

The third row of Table I, (xyzrpy), corresponds to motion with all 6 DOF excited. In this case, after sufficient time, the translation uncertainty is reduced to less than 2 mm (3σ) in each axis.

By comparing the results of Table I to those corresponding to Figs. 3 and 4, it is obvious that by allowing the EKF algorithm to run for longer period of time (i.e., 100 s instead of 15 s), we can estimate the calibration parameters more accurately. Additionally, as it can be seen in this particular example, the translational uncertainty along the x-axis is slightly higher than the uncertainty along the other two axes. This is a typical result observed in all simulations with similar setup. The main reason for this is the limited range of pitch and yaw rotations (i.e., about the y- and z-axes, respectively) required for keeping the landmarks within the field of view. On the other hand, the roll rotation (about the x-axis) is virtually unlimited, and it can span a complete circle without losing visual contact with the landmarks (note that the optical axis of the camera is aligned with the local x-axis).

The fourth row of Table I corresponds to a scenario where the motion is constrained to pure rotation. As expected, the system is still observable, and both the translation and the rotation between the IMU and the camera are accurately estimated [40]. The accuracy of the rotation estimation between the IMU and the camera in both scenarios (i.e., with or without translation) is shown in the last three columns of Table I. As evident, in all cases, the rotational parameters can be estimated extremely accurately, even when the system has not undergone any translation.

#### Monte Carlo Simulations

Finally, we have conducted Monte Carlo simulations to statistically evaluate the accuracy of the filter. We ran 100 simulations with a setup similar to the first simulation described in this section. The initial standard deviation of the IMU-camera transformation is set to 3 cm for translation and 3° for rotation. The initial values in each run are randomly generated according to a Gaussian probability distribution with these standard deviations. Each simulation is run for 15 s and the final calibration errors along with their estimated uncertainty are recorded. The ensemble mean of the recorded errors is [0.058 −0.002 0.044]T cm for translation and [−0.0038° 0.0013° −0.0009°]T for rotation. It can be seen that the mean error is at least one order of magnitude smaller than the typical error, demonstrating that the filter is indeed unbiased.

The standard deviations of the recorded errors are shown in the second row of Table II. The third row of this table shows the average of the standard deviations computed by the EKF at each realization of the experiment. Comparison of these two rows indicates consistency of the filter as the standard deviation of the actual error is smaller than or equal to the standard deviations computed by the EKF (i.e., the filter estimates are not overconfident).

TABLE II Monte Carlo Simulations: Comparison of the Standard Deviations of the Final IMU-Camera Transformation Errorerr) and the Average Computed Uncertainty of the Estimatesest)

### B. Experimental Results

Fig. 5. Testbed used for the experiments.

In order to demonstrate the validity of our EKF algorithm in realistic situations, we have conducted experiments using a testbed that consists of an ISIS IMU, a firewire camera, and a PC104 computer for data acquisition (cf. Fig. 5). The IMU and the camera are rigidly mounted on the chassis and their relative pose does not change during the experiment. The intrinsic parameters of the camera were calibrated prior to the experiment [41] and are assumed constant. The camera's field of view is 60° with a focal length of 9 mm. The resolution of the images is 1024 × 768 pixels. Images are recorded at a rate of 3.75 Hz while the IMU provides measurements at 100 Hz. The PC104 stores the images and the IMU measurements for postprocessing using our EKF algorithm. Furthermore, considering that the exact values of the IMU-camera transformation (ground truth) were not available in this experiment, a batch least squares (BLS) estimator was implemented to provide the best possible estimates of the alignment parameters by postprocessing all the collected IMU and image measurements concurrently (cf. Appendix II).

A calibration pattern (checker board) was used to provide 72 globally known landmarks that were placed 5.5–11 cm apart from each other. The bottom-left corner of this checker board was selected as the origin of the global reference frame, and the calibration pattern was aligned with the direction of the gravitational acceleration. The landmarks (i.e., the corners of the squares) were extracted using a least-squares corner detector. We have assumed that the camera measurements are corrupted by additive white Gaussian noise with standard deviation equal to 2 pixels. 2

Fig. 6. Estimated trajectory of the IMU for 50 s. The starting point is shown by a circle on the trajectory.

The hand-measured translation and rotation between the IMU and the camera was used as an initial guess for the unknown transformation. Additionally, the pose of the IMU was initialized as described in Section III-A. Finally, initialization of the gyro and the accelerometer biases was performed by placing the testbed in a static position for approximately 80 s. During this time, the EKF processed IMU and camera measurements while enforcing the static constraint (zero position and attitude displacement). The resulting state vector along with the error-state covariance matrix were then directly used to run the experiment.

Once the initialization process was complete, we started moving the testbed while the camera was facing the calibration pattern. For the duration of the experiment, the distance between the camera and the calibration pattern varied between 0.5 and 2.5 m in order to keep the corners of the checker board visible. Additionally, the testbed was moved in such a way so as to excite all degrees of freedom while at the same time keeping the landmarks within the camera's field of view.

During the motion of the testbed (∼ 50 s), 180 images were recorded, of which 24 were not processed due to motion blur. The EKF algorithm was able to estimate the IMU-camera transformation while keeping track of the IMU pose, velocity, and IMU biases. The estimated trajectory of the IMU is shown in Fig. 6.

Fig. 7. Time evolution of the estimated IMU-camera translation along the x-, y-, and z-axes (solid blue lines) and the corresponding 3σ bounds centered around the BLS estimates (dashed red lines).
Fig. 8. Time evolution of the estimated IMU-camera rotation about the axes x, y, and z (solid blue lines), and the corresponding 3σ bounds centered around the BLS estimates (dashed red lines).

The time evolution of the estimated calibration parameters along with their estimated 3σ bounds centered around the BLS estimates are depicted in Figs. 7 and 8. As evident from these plots, the calibration parameters converge to steady-state values after approximately 130 s (including the 80 s of the duration of the initialization process). The small inconsistencies observed during the initial transient period are due to the nonlinearities of the system and measurement models, and the imprecise initialization of the filter state vector. In particular, evaluating the Jacobians using the inaccurate state estimates available at the beginning of the calibration process causes the estimates to fluctuate significantly around their true values. As more feature observations become available, the accuracy of the state estimates improves, which subsequently increases the accuracy of the system Jacobian and eventually leads to convergence to the states' true value.

TABLE III Initial, EKF, and BLS Estimates of the IMU-Camera Parameters and Their Uncertainty for the Described Experiment
Fig. 9. [Calibrated IMU-Camera] Measurement residuals along with their 3σ bounds for the horizontal u (top plot) and vertical v (bottom plot) axes of the images.
Fig. 10. [Uncalibrated IMU-Camera] Measurement residuals along with their 3σ bounds for the horizontal u (top plot) and vertical v (bottom plot) axes of the images.

A summary of the results from this experiment is provided in Table III. It is worth mentioning that the initial uncertainty of 9 cm (3σ) in the translation parameters improves to less than 0.8 cm (3σ) for all axes. Additionally, the initial uncertainty of 6° (3σ) decreases to less than 0.1° (3σ) for each axis of rotation. Moreover, this table shows that the EKF estimator, which can run in real time, attains a level of accuracy close to that of the BLS. Also, note that the final accuracy of the EKF is consistent with that of the BLS, demonstrating that the EKF is not overconfident. A further indicator of the consistency of the EKF is provided in Fig. 9. As shown in these plots, the measurement residuals of the filter along the image axes (i.e., reprojection errors) lie within their estimated 3σ bounds.

In order to stress the importance of acquiring precise IMU-camera calibration estimates, we have also tested with the same experimental setup, an EKF-based estimator that does not estimate the calibration parameters online. Instead, this filter uses the initial guess for the unknown IMU-camera transformation to estimate the IMU pose, velocity, and biases. In this case, as evident from the camera measurement residuals shown in Fig. 10, the approximate values for the calibration parameters lead to large inconsistencies of the estimator.

SECTION VI

## Conclusion

In this paper, we have presented an EKF-based algorithm for estimating the transformation between an IMU and a camera rigidly attached on a mobile platform. To the best of our knowledge, this is the first approach to the IMU-camera calibration problem that appropriately accounts for the time correlation between the IMU measurements. Additionally and contrary to previous work on this subject, we do not separate the task of translation estimation from rotation estimation, and hence, prevent error propagation. Moreover, by treating the problem within the Kalman filtering framework, we are also able to compute the covariance of the estimated quantities as an indicator of the achieved level of accuracy. Therefore, by accounting for this uncertainty in the consequent estimation algorithm, we are able to explicitly model their impact. Last but not the least, an important feature of this algorithm is the ability to perform the calibration process without requiring any specific testbed (such as rotating table [10] or high precision 3-D laser scanner [8]) except the calibration pattern that is also needed when calibrating the intrinsic parameters of the camera. The derived estimator was tested both in simulation and experimentally, and it was shown to achieve accuracy in the order of millimeters and subdegrees, respectively, for the translational and rotational components of the IMU-camera transformation. Additionally and for the first time, the observability of the nonlinear system describing the IMU-camera calibration was investigated by employing the observability rank condition based on Lie derivatives. As presented, estimating the IMU-camera transformation requires exciting only two of the rotational DOF, while no translational motion is necessary.

Currently, we are investigating the possibility to extend this study to the most challenging case of the IMU-camera calibration, where instead of using a calibration pattern, we will consider naturally occurring visual features whose positions are unknown.

APPENDIX I

## PROOF OF LEMMAS 1 AND 2

Lemma 1: Matrix Aij defined in (60) is full rank.

Proof: We prove this lemma for the case of practical interest when the elements of pC = [p1 p2 p3]T (i.e., the vector denoting the position of the camera expressed with respect to the IMU frame) are nonzero. 3

For i,j = 1, …, 3, ij, we expand Aij as TeX Source \eqalignno{{\bf A}_{ij} = \left[\matrix{ {\bm\Gamma} _i & {\bm\Upsilon} _i \cr {\bm\Gamma} _j & {\bm\Upsilon} _j \cr 2{\bar{q}}_{ {I}}^{ {T}} & {\bf 0}_{1\times 3} }\right] \quad\matrix{ &\!\!\!\!\!\!\!\! \left.\right\} (1:3) \leftrightarrow \omega _{m\rm i}\cr &\!\!\!\!\!\!\!\! \left.\right\} (4:6) \leftrightarrow \omega _{m\rm j}\cr &\!\!\!\!\!\!\!\! \left.\right\} 7.\hfill}\qquad\hbox{(62)}}The variables on the right side of the matrix next to the row numbers specify the component of ωm = [ωm1 ωm2 ωm3]T that are excited in order for these rows to be included in the observability matrix [cf. (50)]. After considerable algebra, it can be shown that [40] TeX Source $$\det \left({\bf A}_{ij} \right) = 8(-1)^k p_k\left(p_j^2+p_i^2\right)\eqno{\hbox{(63)}}$$where i,j,k = 1, …, 3, ki, kj, and ij. We conclude the proof by noting that since all elements of pC are nonzero, the determinant of Aij in (63) is nonzero; hence, Aij is full rank.

Lemma 2: For any unit quaternions and , is full rank.

Proof: This can be readily proved by computing BTB, which is a 3 × 3 matrix: TeX Source $${\bf B}^{ {T}}{\bf B} = \bm\Xi ^{ {T}}(\bar{s}) {\bf J}^{ {T}}{\cal R}^{ {T}}({\bar{q}}) {\cal R}({\bar{q}}) {\bf J} {\bm\Xi} (\bar{s}) = {\bf I}_{3\times 3}.\eqno{\hbox{(64)}}$$Therefore, matrix B is full rank. For computing (64), we used the identities and [23].

APPENDIX II

In order to compare the results of the proposed EKF algorithm for estimating the 6-DOF IMU-camera transformation with the best achievable (offline) estimates, we compute the batch least-squares estimate, also known as bundle adjustment [42]. For this purpose, we minimize the following linearized cost function: TeX Source \eqalignno{{\cal C}_M^i & = { 1\over 2} \left({\bar{{\bf x}}}_0 - \hat{{\bf x}}_0^i \right)^T {\bf P}_0^{-1} \left({\bar{{\bf x}}}_0 - \hat{{\bf x}}_0^i\right) \cr &\quad +{ 1\over 2} \sum _ {k=0} ^ {M-1} \left(\widetilde{{\bf z}}_k - {\bf H}_k^i\widetilde{{\bf x}}^i_k \right)^T {\bf R}_k^ {-1} \left(\widetilde{{\bf z}}_k - {\bf H}_k^i \widetilde{{\bf x}}^i_k\right) \cr &\quad + { 1\over 2} \sum _ {k=0} ^ {M-1} \left(\widetilde{{\bf x}}^i_{k+1} - {\bm\Phi} _k^i \widetilde{{\bf x}}^i_k\right)^T {\bf Q}_k ^{-1}(\widetilde{{\bf x}}^i_{k+1} - {\bm\Phi} _k^i \widetilde{{\bf x}}^i_k)\qquad&\hbox{(65)}}where Φki [cf. (17)], Hki [cf. (20)], and [cf. (22)] are evaluated at , the i th iterate of the system's state-vector estimates at time step k, k = 0, …, M [cf. (2) and (14)]. Additionally, Rk represents the measurement noise covariance matrix (cf. Section III-C) and Qk is the discrete-time system noise covariance matrix [cf. (18)]. Furthermore, and P0 represent the initial state estimate and its covariance, respectively. Minimizing this cost function requires solving the following system of equations iteratively: TeX Source $${\bf M} \widetilde{\underline{{\bf x}}}^i = {\bm\epsilon}\eqno{\hbox{(66)}}$$where M and ∊ are computed as functions of , zk, Φki, Hki, P0, Rk, Qk, and [43]. In our implementation, we have initialized , with the results from the EKF proposed in this paper, and employed the sparse Cholesky factorization with symmetric approximate minimum degree permutation to solve (66) [44]. The resulting estimates are used as the EKF benchmarks in our experiments (cf. Section V).

## Footnotes

Manuscript received September 24, 2007; revised March 30, 2008. First published October 3, 2008; current version published nulldate. This paper was recommended for publication by Associate Editor P. Rives and Editor F. Park upon evaluation of the reviewers' comments. This work was supported in part by the University of Minnesota (DTC) and in part by the National Science Foundation under Grant EIA-0324864, Grant IIS-0643680, and Grant IIS-0811946. A preliminary version of this paper was presented at the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems [1].

The authors are with the Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: faraz@cs.umn.edu; stergios@cs.umn.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

1. If an initial estimate of the pose is available, then observation of only three known features is sufficient for uniquely determining the camera pose [39].

2. The actual pixel noise is less than 2 pixels. However, in order to compensate for the existence of unmodeled nonlinearities and imperfect camera calibration, we have inflated the noise standard deviation to 2 pixels.

3. Note that pC = 03 × 1 is not physically realizable since it means that the centers of the IMU and the camera coincide. Also, the case when one or more elements of pC are zero is extremely rare in practice, since it requires perfect position alignment of the camera and the IMU. However, the latter case is addressed in [40], where it is shown that the system is still observable when all three degrees of rotational freedom are excited.

## References

1. A Kalman filter-based algorithm for IMU-camera calibration

F. M. Mirzaei, S. I. Roumeliotis

San Diego, CA
Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Oct. 29–Nov. 2, 2007, 2427–2434

2. Fundamentals of High Accuracy Inertial Navigation

A. B. Chatfield

Reston, VA
American Institute of Aeronautics and Astronautics, Inc., 1997

D. Titterton, J. Weston

Piscataway, NJ
Strapdown Inertial Navigation Technology, IEE, 2005

4. Linear pose estimation from points or lines

A. Ansar, K. Daniilidis

IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, issue (5), p. 578–589, 2003-05

5. Recursive estimation of motion, structure, and focal length

A. Azarbayejani, A. Pentland

IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, issue (6), p. 562–575, 1995-06

6. Vision-aided inertial navigation for spacecraft entry, descent, and landing

A. I. Mourikis N. Trawny S. I. Roumeliotis A. E. Johnson A. Ansar, L. H. Matthies

IEEE Trans. Robot., in press

7. Special issue: 2nd workshop on integration of vision and inertial sensors

J. Dias M. Vinzce P. Corke J. Lobo

The International Journal of Robotics Research, vol. 26, Newbury Park, CA: SAGE Publications, 2007-06, no. 6, pp. 519–535

8. Field testing of the mars exploration rovers descent image motion estimation system

A. E. Johnson, R. Willson, J. Goguen, J. Alexander, D. Meller

Barcelona, Spain
Proc. IEEE Int. Conf. Robot. Autom., Apr. 18–22, 2005, 4463–4469

9. Vision guided landing of an autonomous helicopter in hazardous terrain

A. E. Johnson, J. F. Montgomery, L. H. Matthies

Barcelona, Spain
Proc. IEEE Int. Conf. Robot. Autom., Apr. 18–22, 2005, 3966–3971

10. Relative pose calibration between visual and inertial sensors

J. Lobo J. Dias

InerVis, presented at the Barcelona, Spain, 2005-04

11. Callibration of hybrid vision/inertial tracking systems

P. Lang A. Pinz

InerVis presented at the, Barcelona, Spain, 2005-04

12. A new technique for fully autonomous and efficient 3-D robotics hand/eye calibration

R. Tsai, R. Lenz

IEEE Trans. Robot. Autom., vol. 5, issue (3), p. 345–358, 1989-06

13. Finding the position and orientation of a sensor on a robot manipulator using quaternions

J. Chou, M. Kamel

Int. J. Robot. Res., vol. 10, issue (3), p. 240–254, 1991-06

14. Hand–eye calibration using dual quaternions

K. Daniilidis

Int. J. Robot. Res., vol. 18, issue (3), p. 286–298, 1999-03

15. Stochastic Models, Estimation and Control.

P. S. Maybeck

New York
Stochastic Models, Estimation and Control., Academic Press, 1979, Vol. 1

16. Generalized architecture for simultaneous localization, auto-calibration, and map-building

E. M. Foxlin

Lauzanne, Switzerland
Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Sep. 30–Oct. 4, 2002, 527–533

17. Nonlinear controlability and observability

R. Hermann, A. Krener

IEEE Trans. Autom. Control, vol. AC-22, issue (5), p. 728–740, 1977-10

18. Nonlinear Systems: Analysis, Stability, and Control

S. Sastry

New York
Nonlinear Systems: Analysis, Stability, and Control, Springer-Verlag, 1999

19. Relative pose calibration between visual and inertial sensors

J. Lobo, J. Dias

Int. J. Robot. Res., vol. 26, issue (6), p. 561–575, 2007-06

20. Closed-form solution of the absolute orientation using quaternions

B. K. P. Horn

J. Opt. Soc. Amer. A, vol. 4, p. 629–642, 1987-04

21. Pose estimation from corresponding point data

R. Haralick

IEEE Trans. Syst., Man, Cybern., vol. 19, issue (6), p. 1426–1446, 1989-11/12

22. Kalman filtering for spacecraft attitude estimation

E. J. Lefferts, F. L. Markley, M. D. Shuster

J. Guid., Control, Dyn., vol. 5, issue (5), p. 417–429, 1982-09/10

23. Indirect Kalman filter for 3-D pose estimation

N. Trawny S. I. Roumeliotis

Dept. Comput. Sci. & Eng., Univ. Minnesota, Minneapolis, MN, Tech. Rep., 2005-03

24. IEEE Standard Specification Format Guide and Test Procedure for Single-Axis Laser Gyros

The Institute of Electrical and Electronics Engineers, New York, IEEE Standard 647-2006

25. IEEE Standard Specification Format Guide and Test Procedure for Linear, Singleaxis, Nongyroscopic Accelerometers

The Institute of Electrical and Electronics Engineers, New York, IEEE Standard 1293-1998, 1999

26. Stochastic Models, Estimation and Control,

P. S. Maybeck

New York
Stochastic Models, Estimation and Control,, Academic Press, 1979, Vol. 2

27. Stochastic Processes and Filtering Theory (series Mathematics in Science and Engineering)

A. H. Jazwinski

New York
Stochastic Processes and Filtering Theory (series Mathematics in Science and Engineering), Academic Press, 1970, Vol. 64

28. Numerical Recipes in C

W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery

Cambridge, MA
Cambridge Univ. Press, 1992

29. Modern Control Theory

W. L. Brogan

Englewood Cliffs, NJ
Prentice-Hall, 1990

30. Fundamentals of Statistical Signal Processing: Estimation Theory

S. M. Kay

Prentice-Hall, 1993

31. Linear System Theory

W. J. Rugh

Linear System Theory, 2nd, Prentice-Hall, 1996

32. The effects of partial observability when building fully correlated maps

IEEE Trans. Robot., vol. 21, issue (4), p. 771–777, 2005-08

33. Design and experimental validation of an odometric and goniometric localization system for outdoor robot vehicles

P. Bonnifait, G. Garcia

IEEE Trans. Robot. Autom., vol. 14, issue (4), p. 541–548, 1998-08

34. Observability analysis for mobile robot localization

A. Martinelli, R. Siegwart

Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Aug. 2–6, 2005, 1471–1476

35. Vision-based localization of leader follower formations

G. L. Mariottini, G. Pappas, D. Prattichizzo, K. Daniilidis

Seville, Spain
Proc. IEEE Conf. Decision and Control, Dec. 12–15 2005, 635–640

36. On the observability and observability analysis of SLAM

K. W. Lee, W. S. Wijesoma, J. I. Guzman

Beijing, China
Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Oct. 9–15 2006, 3569–3574

37. Automatic self-calibration of a vision system during robot motion

A. Martinelli, D. Scaramuzza, R. Siegwart

Proc. IEEE Int. Conf. Robot. Autom., May 15–19 2006, 43–48

38. Nonlinear Dynamical Control Systems

H. Nijmeijer, A. van der Schaft

New York
Nonlinear Dynamical Control Systems, Springer-Verlag, 1990

39. Observability analysis of six-degree-of-freedom configuration determination using vector observations

D. Sun, J. L. Crassidis

J. Guid., Control, Dyn., vol. 25, issue (6), p. 1149–1157, 2002-11

40. IMU-camera calibration: Observability analysis. Dept. Comput. Sci. & Eng., Univ.

F. M. Mirzaei S. I. Roumeliotis

2007-01, Minnesota, Minneapolis, Tech. Rep., [Online]. Available: http://mars. cs.umn.edu/tr/reports/Mirzaei07.pdf

41. Camera calibration toolbox for matlab

J.-Y. Bouguet

2006, [Online]. Available: http://www.vision.caltech.edu/bouguetj/calibdoc/

42. Bundle adjustment for structure from motion

B. Triggs, P. McLauchlan, R. Hartley, Fitzgibbon

New York
Vision Algorithms: Theory & Practice, B., Triggs, A., Zisserman, R., Szeliski, Springer-Verlag, 2000

43. IMU-camera calibration: Bundle adjustment. Dept. Comput. Sci. & Eng., Univ. Minnesota

F. M. Mirzaei S. I. Roumeliotis

2007-08, Minneapolis, Tech. Rep., [Online]. Available: http:// mars.cs.umn.edu/tr/reports/Mirzaei07a.pdf

44. Direct Methods for Sparse Linear Systems

T. A. Davis

SIAM, 2006

## Cited By

No Citations Available

## Keywords

### IEEE Keywords

No Keywords Available

### More Keywords

No Keywords Available

No Corrections

## Media

No Content Available
This paper appears in:
IEEE Transactions on Robotics
Issue Date:
OCTOBER 2008
On page(s):
1143 - 1156
ISBN:
1552-3098
Print ISBN:
N/A
INSPEC Accession Number:
10301458
Digital Object Identifier:
10.1109/TRO.2008.2004486
Date of Current Version:
31 Oct, 2008
Date of Original Publication:
03 Oct, 2008