Skip to Main Content
This paper investigates the problem of vision and inertial data fusion. A sensor assembling that is constituted by one monocular camera, three orthogonal accelerometers, and three orthogonal gyroscopes is considered. The first paper contribution is the analytical derivation of all the observable modes, i.e., all the physical quantities that can be determined by only using the information in the sensor data that are acquired during a short time interval. Specifically, the observable modes are the speed and attitude (roll and pitch angles), the absolute scale, and the biases that affect the inertial measurements. This holds even in the case when the camera only observes a single point feature. The analytical derivation of the aforementioned observable modes is based on a nonstandard observability analysis, which fully accounts for the system nonlinearities. The second contribution is the analytical derivation of closed-form solutions, which analytically express all the aforementioned observable modes in terms of the visual and inertial measurements that are collected during a very short time interval. This allows the introduction of a very simple and powerful new method that is able to simultaneously estimate all the observable modes with no need for any initialization or a priori knowledge. Both the observability analysis and the derivation of the closed-form solutions are carried out in several different contexts, including the case of biased and unbiased inertial measurements, the case of a single and multiple features, and in the presence and absence of gravity. In addition, in all these contexts, the minimum number of camera images that are necessary for the observability is derived. The performance of the proposed approach is evaluated via extensive Monte Carlo simulations and real experiments.