Body-Centered Dynamically-Tuned Error-State Extended Kalman Filter for Visual Inertial Odometry in GNSS-Denied Environments

The use of visual odometry in autonomous land vehicle and unmanned ground vehicle (UGV) applications has shown to be promising in providing robust and accurate positioning and navigation in GNSS-denied environments. While GNSS is the most accurate and reliable positioning source, it cannot be received indoors. Furthermore, low-cost micro-electro-mechanical system (MEMS)-based inertial sensors are used for positioning but can result in rapid drift during prolonged GNSS outages. To address these limitations, this paper proposes an innovative method of integrating MEMS-based inertial sensors with a vision-based navigation system. The vision-based navigation relies on a visual odometry pipeline modified to replace the feature-matching step with Lucas-Kanade optical flow. While the inertial-based system is derived from the three-dimensional reduced inertial sensor system (3D-RISS) and modified for a body-centered update. The fusion of both systems is based on an Extended Kalman filter (EKF) to estimate the position and heading information and correct for inertial sensor errors. To assess the effectiveness of the proposed method, real indoor and road test trajectories were conducted using both a Husky A200 UGV and a sensor-equipped minivan. The results show that the body-centered inertial system, aided with reliable vision updates, provides a reliable positioning solution for an extended duration in GNSS-denied environments.


I. INTRODUCTION
Positioning and navigation of autonomous vehicles, unmanned ground vehicles (UGVs), human-operated, and teleoperated platforms have enormous potential for numerous applications.Therefore, developing a reliable navigation system is critical for creating a completely autonomous The associate editor coordinating the review of this manuscript and approving it for publication was Venkata Ratnam Devanaboyina .platform.Full autonomy requires a navigation system that can work effectively even in challenging situations where global navigation satellite systems (GNSSs) signals may be unavailable or denied such as indoor scenarios.This has led to increased interest in self-contained odometry systems [1].Consequently, extensive research has been conducted to develop various positioning systems and techniques for mobile robots [2] and self-driving cars [3] using sensors [4], [5], [6] such as wheel odometry, ultrasonic odometry, GNSS, inertial navigation system (INS), and visual odometry (VO).It is also crucial to note that these methods are not restrictive, as odometry methods can be multi-modal, i.e., sensor fusion of different sensors can be combined in a single algorithm [7], [8].Each system possesses certain strengths and weaknesses; for instance, despite being simple to implement, wheel odometry suffers from position drift due to wheel slippage [4].An INS system is highly susceptible to accumulated error, and the use of a highly accurate INS is cost-prohibitive and not a practical option for many commercial applications [9].The most common localization technique that has been widely employed in autonomous platforms is GNSS, specifically global positioning system (GPS).GPS can provide absolute position without error accumulation.However, it performs the best in open-sky conditions [10].GPS is susceptible to errors that reduce its reliability for precise self-localization, such as multipath [11], vulnerability to interference and jamming [12], and dependency on atmospheric conditions, to name a few.Additionally, GPS is not ideal for indoor navigation as radio signals are obstructed by walls and other objects, making it unsuitable for indoor positioning and navigation of mobile robots.VO can offer high-precision motion estimates, particularly when integrated with other sensors like INS.It is a cost-effective solution that can operate in real-time.However, it is susceptible to noise and outliers in image data, which can lead to incorrect feature matching and pose estimation.Moreover, the accuracy of VO degrades with a reduced number of features in the image data.
This paper addresses the challenge of ensuring reliable and accurate navigation for mobile robots and autonomous vehicles in GNSS-denied environments.The work in this paper extends the work in [13] and [14] by introducing a wider range of indoor experimental trajectories and evaluation metrics.These trajectories are intentionally designed to be more challenging.Furthermore, a significant contribution of this paper involves the presentation of a challenging scenario within a parking garage at the Yorkdale Shopping Center in Toronto, Ontario, Canada.This carefully selected scenario aims to challenge the capabilities of the proposed algorithm and assess its performance in a complex setting.Additionally, within this particular trajectory, the positioning solution computed by the proposed algorithm is directly compared with a reference solution acquired from GNSS NovAtel-SPAN ProPak6 and a tactical grade KVH 1750 Inertial Measurement Unit (IMU).The use of the robot operating system (ROS) to process various sensory information onboard the moving platform is explored and exploited in terms of data acquisition.Improvements to state-of-the-art methods are introduced in handling vision and inertial-based navigation solutions and their integration.The developed algorithm and platform are implemented in real-world indoor trajectories to test the performance of the developed positioning method(s) in GNSS-denied environments.As it entirely relies on dead-reckoning (DR) technologies, the proposed integrated system provides reliable and accurate positioning and navigation solutions starting from a known position.The main components presented herein include, • an optical flow-based vision-based semi-direct VO solution, • a self-evaluation procedure to assess VO motion estimation reliability, • an innovative body-centered three-dimensional reduced inertial sensors system (3D-RISS), • a body-centered Loosely-Coupled Error-State Extended Kalman Filter (EKF) employing the VO self-evaluation for dynamic tuning of the filter.The remainder of this paper is structured as follows: the basics of INS, VO and their fusion are explained in Section II.The proposed methodology is explained in Section III.Section IV illustrates the experimental setup of the indoor and the road parking and garage trajectories.Afterward, the results of the proposed technique in various scenarios are discussed in Section V. Finally, Section VI gives the conclusion and future work.

II. BACKGROUND A. SENSOR FUSION FOR DEAD-RECKONING (DR)
DR is the most commonly used approach for positioning and navigation of moving platforms in GNSS-denied environments [15].DR is a relative positioning approach that can be performed using onboard inertial sensors, machine vision through cameras, or any onboard perception system such as Light Detection and Ranging (LiDAR) [16], [17], Radio Detection and Ranging (RADAR) [18], or ultrasound [19] mounted on the moving platform.This vehicle and payload developed for this research rely on a well-established autonomous navigation system, the INS.A method for mitigating INS errors using a low-cost extra sensor, namely a camera, as an aiding source is developed and tested in this paper.Over the last decade, advancements in camera technology, both in hardware and processing capabilities, have enabled cameras to be operated in dynamic lighting environments and to detect rapid motion.Thus, it was our sensor of choice to correct for INS errors.

B. INERTIAL NAVIGATION SYSTEM (INS)
INS has a long history of use, starting from the German guidance system used in the V-2 missiles in 1942, as highlighted in [20].The most robust feature of INS is that it works under numerous circumstances and does not require a particular environment in order to operate.The only requirement for INS to operate is an estimated position or navigation status as the initial position, velocity, and attitude, called navigation states (since it is a DR system [21]).The proposed method incorporates a variant of INS proven efficient for wheeled platforms, 3D-RISS, with a novel body-centered formulation.

C. VISUAL ODOMETRY (VO)
The proposed vision-based subsystem relies on VO to estimate the ego-motion of a platform by analyzing the motion of visual features captured by onboard cameras.VO has garnered significant attention in recent years due to its potential applications in autonomous vehicles, drones, and augmented reality.
Semi-direct visual odometry [22] represents a variant of VO that combines aspects of both direct and indirect methods.It offers several advantages and limitations compared to traditional VO methods.Semi-direct VO balances accuracy and efficiency by leveraging direct methods, which directly compare pixel intensities, and indirect methods, which rely on feature detection and matching.This approach has been widely explored in the literature, with notable works such as [23] and [24] showcasing its effectiveness in various scenarios.Mainly because Semi-direct VO is known for its computational efficiency.This aspect is crucial in dynamic environments where rapid updates are required to maintain accurate motion estimation.The reduced computational load of Semi-direct VO allows for faster processing and better responsiveness to dynamic changes [24].
The fundamental concept underlying VO is epipolar geometry, which involves solving for the essential matrix by considering the epipolar constraint and the epipolar line.The epipolar line represents the line formed by the intersection of the plane established between a point in the real world and the two positions of the camera's optical center.This is illustrated in Figure 1, where the epipolar plane, represented by the light blue triangle, intersects the two image planes along the epipolar line.
Numerous algorithms and techniques have been developed in the field of VO, including structure-from-motion (SFM) approaches [25] and bundle adjustment methods [26].These techniques play a crucial role in estimating the camera's trajectory and the 3D structure of the environment based on the captured images.
Recent advancements have focused on addressing the challenges associated with VO, such as robustness to lighting changes, motion blur, occlusions, and scale drift.Techniques like feature tracking, robust feature detection, visual loop closure detection [27], and robust optimization [28] have enhanced the accuracy and reliability of vision-based systems.
Utilizing the principles of VO and integrating it with other sensors, such as inertial sensors, is the core of our proposed method.Instead of attempting to solve the challenges in a VO solution, we are assessing its reliability and using it to correct the errors within a well-modeled inertial system.

D. VISUAL-INERTIAL ODOMETRY (VIO)
In recent years, Visual-Inertial Odometry (VIO) has emerged as a pivotal technology in the field of robotics and autonomous navigation [29], [30].Researchers have made significant strides in enhancing the performance and robustness of VIO systems.For instance, Sun et al. [31] presented a robust stereo VIO method tailored for fast autonomous flight.Their work addresses the challenges of high-speed motion and rapid scene changes, demonstrating the potential of VIO in dynamic environments.Moreover, Xu et al. [32] delved into the realm of underwater robotics, developing a robust inertial-aided localization system based on imaging sonar keyframes.This work showcases the adaptability of VIO techniques beyond traditional land-based applications, highlighting its versatility in various domains.
In the broader context of VIO research, Huang's concise review [33] offers a comprehensive overview of the state of the field.This review is a valuable resource for understanding the evolution of VIO techniques, their key challenges, and potential future directions.By systematically analyzing the literature and advancements in VIO, researchers gain insights into the latest innovations and the areas where further development is needed.Our proposed method addresses one of the major directions highlighted in studying VIO for extended periods of time.The synergy of vision and inertial sensing in VIO systems continues to drive robotics and autonomous navigation advances.Leveraging insights from these studies, our approach builds upon this foundation, aiming to contribute to the growing body of knowledge in VIO technology.This paper presents a solution that benefits from the ability to measure the forward speed of a ground vehicle to limit the accumulation of errors.

III. METHODOLOGY
The proposed methodology includes the dataflow of each of the inertial and vision-based subsystems as well as the integration scheme utilized to compute a position estimate.The methodology focuses on integrating vision-based and inertial-based technologies for reliable positioning and navigation in GNSS-denied environments.Dependence on a vision-based algorithm alone is difficult in situations with low illumination, and the algorithm's performance may be inadequate in areas where visual quality is degraded, thereby affecting subsequent positioning solutions.Relying solely on inertial measurements causes the positioning solution to drift over time due to the inherent inertial sensor errors [34].Thus, the integration of a vision-based positioning solution with inertial sensors can provide more reliable performance and sustain positioning accuracy over extended periods of GNSS outage.Figure 2 shows the integrated system architecture with inputs shown in baby blue, processing blocks shown in red and positioning output information shown in green.The proposed system incorporates a camera, a gyroscope, and a three-axis accelerometer, in addition to the forward speed, to provide control input for the inertial solution.The position estimates from the vision and the inertial solutions are fused in a unique approach, producing an integrated positioning solution.This architecture is suitable for wheel-based vehicular platforms such as UGV and autonomous vehicles operating in GNSS-denied environments and is used in this research to provide uninterrupted positioning and navigation.
Two reference frames are utilized in this research.First, the body frame, which is centered on the vehicle's current location, and the measurements obtained from inertial sensors and the vision system are referenced to it.The second reference frame is the local-level navigation frame, where the final position and attitude information are computed in terms of the East and North directions relative to the initial point of the trajectory.
The proposed system uses a body-centered approach for sensor fusion.Thus, instead of estimating the position relative to a fixed frame in each subsystem and then fusing that estimate to a position, an increment is estimated within each subsystem, fused, and then added to the previous position.This approach decouples the states of motion, avoiding the influence of the errors in heading (attitude angles) on the position change components.The motion increment for each epoch is computed using each subsystem ego-motion estimate and weighted through the sensor fusion algorithm and is transformed to a local-level frame for a current (updated) position.A sample planar motion of ( P x , P y ) is represented in Figure 3.At each instance of the trajectory, a three-dimensional frame is introduced at an arbitrary point, chosen to be the front center herein.The motion is counted as the difference between those body-centered frames through two consecutive epochs.This navigation frame is represented in x for transversal, y for forward, z for up and θ as orientation angle about the vertical axis.It is crucial to notice that if the vehicle is static, those values are zero, which means that the increments of difference are what create deltas of non-zero values in this representation.

A. BODY-CENTERED EXTENDED KALMAN FILTER-BASED INTEGRATION SCHEME
The subsystems are integrated through an inertial-based subsystem error model that estimates the INS stochastic drift parameters through regular vision-based subsystem updates.By removing the effect of that drift from the position increment, the drift is corrected before accumulation.The integration scheme shown in Figure 4 relies on estimating the error in position increment provided by the inertial system.It is designed to provide a more robust positioning solution than the inertial system in its standalone case.This position increment calculation is possible even though vision-based DR experiences failures when the number of points detected falls below a certain threshold.A condition is added to avoid integrating the vision-based subsystem for the instance when the number of points in the scene falls below the critical threshold for the essential matrix estimation.The threshold is set empirically based on the vision data analysis.The EKF consists of two main steps: (i) prediction, which is triggered by a time update on the system model, and (ii) correction, which relies on using new sensor measurements to update the prior estimates of the model.
The body-centered extended Kalman filter is a variant of EKF, serving as a recursive algorithm that estimates the state of a nonlinear system affected by Gaussian noise.It is an enhancement over the traditional EKF as it employs body-centered coordinate system to represent the orientation of the platform [35].The body-centered approach helped to simplify the mathematical model of motion increments in measurement space with respect to changes in position.The error state vector is detailed as follows: where δ P x is the error in the transversal displacement, δ P y is the error in the forward displacement, δ P z is the error in the upward displacement, δv x is the error in the transversal velocity, δv y is the error in the forward velocity, δv z is the error in the upward velocity, δ θ is the error in the vehicle's angular change, δω z is the stochastic (drift) component of the gyroscope error, and δa od is the stochastic odometer scale factor error.

1) MEASUREMENT MODEL
The error measurement model for the EKF is represented by Equation 1 [21]: where H is the design matrix.η is the measurement noise vector, zero-mean normal distributed white noise with variable covariance R. δz(k+1) is the measurement vector at a time (k+1), as follows: The measurement vector consists of the difference between the incremental 3D-RISS and VO measurements: Also, given that VO measures only the increment in the three components of position and the increment in the heading angle quantities out of the error states, the design matrix is given by the following equation: 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 Figure 5 describes the detailed steps of applying the EKF.That is detailed by Equation 5, representing the state vector prediction of the error model, and Equation 6representing the covariance matrix prediction of the error model.
2) VO AUTO-TUNING VO auto-tuning is achieved through scaling the covariance matrix of measurement noise (R).The covariance of the measurement noise R(k + 1) is tuned based on the number of features that are used to estimate the update within the vision-based subsystem at time (k + 1).The value changes at each epoch, depending on the number of points tracked in the most recent vision-based update.This dynamic R matrix tuning contrasts with the covariance of the process noise Q that is tuned based on the Inertial Measurement Unit (IMU) grade and is fixed per system.The tuning of the covariance of the measurement noise is determined as follows: The term (n points ) is the number of points tracked in the most recent vision-based update.c is a tuning constant for the filter.

B. INERTIAL NAVIGATION SUBSYSTEM
For a wheel-based platform, such as passenger cars or wheeled robots, 3D-RISS can be sufficient to provide a complete 3D position solution [36].3D-RISS utilizes the measurement from only one gyroscope with its sensitive axis mounted along the moving platform's vertical direction, three accelerometers with their sensitive axes mounted along with three mutually orthogonal directions.The pitch (p) and the roll (r) angles are computed from the accelerometer measurements, assuming that land vehicles do not perform significant dynamics along the transversal and vertical directions [36], [37].The computation of pitch and roll angles is facilitated by incorporating the forward velocity obtained from the wheel encoder odometry.The main advantage of the 3D RISS lies in its capacity to achieve a complete navigation solution with a reduced number of sensors, thereby reduced cost.Furthermore, the vehicle's forward velocity is derived from the speedometer or odometer within the 3D RISS, thereby reducing the accumulated error of the dual integration process [38].The developed subsystem data flow is detailed through the equations listed below.This developed bodycentered 3D-RISS mechanization relies on the same concepts introduced for integration with GNSS in [36] with a change to the frame of reference such that it is more suitable for integration with the vision-based Navigation Subsystem [14].
where V Od is the forward velocity detected in the last wheel encoder measurement in m/sec; V y (k + 1) is the forward velocity at time instance (k + 1) and measured in m/sec; V x (k + 1) is the transversal velocity at time instance (k + 1) and measured in m/sec; V z (k + 1) is the vertical velocity at time instance (k +1) and measured in m/sec; θ is the angular change from the previous epoch of the system in radians (rad); ω z is the reading of the vertical gyroscope in rad/sec; δω z is an estimate of the deterministic noise component (bias offset) in the gyroscope reading calculated when a stop is detected and is measured in rad/sec; ω e sin ϕ is the stationary vertical component of the Earth rotation, where ω e is the Earth's rotation rate around its spin axis (15deg/hr); and ϕ is the latitude of the moving platform.

C. VISION-BASED NAVIGATION SUBSYSTEM
The vision-based navigation subsystem relies on features visible in the surrounding environment to estimate the ego-motion between two consecutive image frames (epochs).
The developed algorithm for vision-based ego-motion estimation is shown in Figure 6, implemented through multiple steps, which are: is applied in a pyramidal scheme (Lucas-Kanade algorithm) [39].This method is employed to track the displacement of the features that have been recently extracted or tracked between consecutive camera image frames.By analyzing the motion of these features, the system can estimate the ego-motion of the platform.2) FEATURE POINT DETECTION involves a feature extractor that periodically runs to detect new features appearing in the camera's field of view based on the Shi-Tomasi method [40] that extracts features (corners) in the scene that are favorable to track, this detector runs every few frames or when the number of tracked features drops below a certain threshold.

3) ESTIMATION OF THE ESSENTIAL MATRIX
is performed to provide the platform's motion in the world frame [25].The decomposition of the essential matrix into rotation and translation parameters allows for the transformation of the motion into metric measurements.In the case of monocular cameras, the odometer provides the scale of the translation parameter.This estimation relies on Nistés five-point algorithm [41] applied in an iterative manner utilizing a random sample consensus scheme (RANSAC).The determination of the discarding threshold (red line in results) in the vision-based solution was influenced by the estimation of the essential matrix, specifically by considering it as a multiple of five points.This choice was made to facilitate outlier rejection.Conversely, the threshold for assessing the degradation (orange line in results) of a vision-based motion estimate was informed by a statistical analysis of the detected points count in an average scene across all studied environments.[41].

IV. FIELD TESTS EXPERIMENTAL SETUP A. INDOOR UGV TRAJECTORIES
A Husky A200 UGV [42], shown in Figure 7, is used for all test trajectories.It is a medium-sized 50 kg robotic development platform from Clearpath Robotics.The UGV dimensions are 99 cm in length, 67 cm in width, and 39 cm in height [42].Data collection was achieved using a laptop connected through a USB cable to the micro-controller unit Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.(MCU) of the robot to acquire all the sensors in the setup.The original Husky A200 UGV setup was modified as part of this project by adding a custom mount to hold additional remote sensing sensors shown in Figure 7, partially exploiting the maximum 75 kg payload.The UGV travels with a maximum speed of 1 m/s (3.6 km/h).Synchronization of different sensors was facilitated using an Ubuntu 16.04 Linux environment, running the open-source ROS packages for each sensor.All the sensor measurements were time-stamped by the Linux operating system.

1) INERTIAL MEASUREMENT UNIT (IMU)
A low-cost MEMS-grade IMU (CHR-UM6) [43] was used to measure the accelerations and angular velocities with a rate of 20 Hz.The Husky setup comes equipped with this IMU mounted on one of the inner walls.A USB connector is used to power the IMU and collect its readings simultaneously.Some other IMUs were tested, but the rigid mount and system compatibility made replacing the IMU inefficient.Of note is the operating temperature range between −40 • C and +85 • C, which is suitable for most environments.However, the sensitivity change due to temperature may result in challenges in environments where the temperature fluctuates significantly.This is a consideration that is typically not an issue for standard indoor (building) environments.

2) CAMERA
A ZED stereo camera [44] was used to capture the right camera frames with 15 frames per second (fps).Highdefinition (HD) quality (720p) was used as a trade-off between image size and quality.The ZED camera hardware specifications are listed below.

3) HUSKY PLATFORM BASE
The Husky A200 UGV used for this research can be manually teleoperated when connected with its designated ROS package, 1 setting the package on the control computer and connecting the USB of the wireless joystick.

4) WHEEL ENCODER
The Husky comes equipped with wheel encoders that operate at a frequency of 10 Hz.These encoders are linked to the MCU of the Husky, which sends and receives control signals via USB connection.This connection enables updates at a rate of over 200,000 pulses per minute regarding the movement of the Husky's wheels.

5) DATA ACQUISITION COMPUTER
Alienware m15 laptop was used for controlling the Husky robotic platform, equipped with one of Nvidia's RTX GPUs: an 8GB GeForce RTX 2070.The operating system installed on the control computer was Ubuntu 16.04 LTS, 2 and the ROS Kinetic Kame 3 was set up to run the sensor-specific ROS packages.Kinetic is the tenth ROS distribution release.Since it was released on May 23rd, 2016, Kinetic supported all the sensors in the setup, most importantly, the Zed camera [44].ROS utilization enabled the time synchronization of sensors, limiting errors due to the fusion of multi-sensor subsystems.Furthermore, ROS enabled data saving and processing using the bagfile 4 format, which enabled the repetition of calculation with different parameters on the same data.
Each sensor operates at a different rate, so the closest time readings were used.The difference in rate was compensated for with the help of timestamps, calculating the difference in time between every two consecutive updates from a subsystem.

6) JOYSTICK
The Husky setup was accompanied by Logitech F710 wireless gamepad.The joystick was utilized to move the robot during the field tests manually, and the ROS driver 5  was installed to enable its utilization within the software environment in the control computer.

B. ROAD TRAJECTORY EXPERIMENTAL SETUP
In this section, the experimental framework that is utilized for the parking garage road trajectory is explained.In order to test the proposed methodology, an indoor road trajectory was conducted in the parking garage of Yorkdale Shopping Centre in Toronto, Ontario, Canada, by a testing minivan that is shown in Figure 8.The sensor setup is split into two parts:  an exterior testbed mounted on the vehicle and an internal testbed installed in the trunk.The exterior testbed contains the Zed Camera (described in subsection IV-A2), which was employed for this scenario.The interior testbed is positioned at the rear of the minivan using a customized mount to ensure the sensors' stability and centralization.Figure 9 displays the interior testbed, which comprises three primary units: a reference GNSS NovAtel-SPAN ProPak6, a tactical grade KVH 1750 IMU, and a low-cost MEMS-grade VTI-IMU.The ProPak6 and KVH units were utilized to generate the reference solution using the NovAtel span unit, while the VTI-IMU was employed in the RISS mechanization.

V. RESULTS AND DISCUSSION
Three different indoor trajectories; two of which were conducted using the Husky UGV, and one was conducted using the minivan setup described in IV-B; were used to assess the performance of the developed methodology and platform setup.Each trajectory is designed to pose various challenges present in positioning an autonomous UGV in the GNSS-denied environment.Such challenges face both vision-based and inertial-based solutions differently.The developed sensor fusion mechanism and its performance on system stability thrive is exemplified herein.Results from each of the trajectories are presented.For indoor trajectories conducted inside Royal Military College of Canada (RMCC), loop closure errors are calculated due to the absence of a reliable reference system.loop closure errors is a common error-measurement used in vision-based positioning, mapping, and surveying [14].On the other hand, in the road trajectory, a high-end commercial receiver system by NovAtel was utlized to provide the reference solution.Thus, the RMSE metric is used to assess the performance of our developed solution.
A. TRAJECTORY 1 Trajectory 1 consists of two loops within the fifth floor inside the Sawyer building at RMCC; The trajectory incorporated a smaller loop that involves entering the eastern door of the robotics lab and exiting from the southern entrance, making an inner loop inside the outer one, as shown in Figure 10.The duration of this trajectory was around seven and a half minutes.This trajectory faces some challenging scenarios; for instance, few to no features inhibiting the vision-based solution and some sharp turns where the robot wheels suffer from slipping.The trajectory was conducted with an average forward velocity of 0.498 m/s (1.8 km/hr).Figure 10 and 11 demonstrate the performance of the proposed method for trajectory 1, where it can be depicted that the RISS/VO solution (green) was able to give the vehicle position within the corridor boundaries with minimal drift.With the exception of the final portion of the trajectory due to refraction resulting from features seen through the glass from inside a room.As a result, a loop closure error of 3 m was observed.It can also be seen that the standalone solutions  based on 3D-RISS (blue) and VO (red) have both experienced position drift that resulted in a positioning solution off the actual trajectory and outside the corridor boundaries.For instance, the standalone 3D-RISS positioning solution drifts to a 12 m loop closure error, while the standalone VO solution resulted in a 30 m loop closure error.In this trajectory, the VO standalone solution has suffered from several degraded vision areas (highlighted in yellow in Figure 10).In Figure 11, it can be observed that the vision-based solution is discarded if the feature count drops below fifteen, as the five-point algorithm would not have enough redundancies to detect outliers in such cases.Also, a dampening factor was added to the KF corrections of the body-centered RISS solution when the feature count drops below a hundred, representing ten percent of the thousand feature target.

B. TRAJECTORY 2
This trajectory lasted around 9 minutes and incorporated some repeated turns and extended runtime to show the effect  of drift on the inertial subsystem and the challenging visual maneuvers and increased areas of degradation of the visionbased solution.This trajectory highlights the strength of the integrated navigation scheme and its ability to improve the solution significantly, as obvious in Figures 12 and 13.This trajectory was conducted with an average forward velocity of 0.42 meters per second (1.5 km/hr); using the proposed method, a loop closure of 1 m was observed.This trajectory highlights the strength of the proposed method as the standalone 3D-RISS solution loop closure error was 30 meters, and the standalone VO solution resulted in a 40 m loop closure error.

C. YORKDALE PARKING GARAGE TRAJECTORY
In order to comprehensively assess the proposed method especially due to the absence of a reference solution for UGV trajectories.A trajectory involving a reference solution is conducted.The trajectory shown in Figure 14 shows a parking garage scenario for 90 seconds, representing the longest duration for which our reference solution was maintained.
The VO point density of this trajectory, as shown in Figure 15, shows a different behavior to the VO points detection, especially in stopping scenarios where the detected points count never reaches the 1000 points detected while standing still by the UGV, due to the dark and featureless nature of the parking garage.In this trajectory, Figure 16 shows the rootmean-square error (RMSE) behavior and how our integrated solution is showing a competitive edge over a costly high-end solution using two low-cost sub-systems and an elegantly computationally efficient algorithm.The figure indicates the significant effect of our proposed method, showing the error persistent with either of the methods even with a bodycentered implementation.However, the integrated solution is capable of maintaining an error range within the confidence bounds of the NovAtel high-end reference.

VI. CONCLUSION
In conclusion, the method developed in this paper represents a significant advancement in the field of navigation and positioning.Notably, this study achieved a unique milestone by conducting both the RISS mechanization and the RISS/VO integration in the body frame.This approach yielded impressive results, with the competitive positioning solution exhibiting minimal drift, often less than one meter, in the majority of GNSS-denied field tests spanning up to ten minutes.One of the key strengths of our proposed method lies in its ability to harness the strengths of both vision-based and inertial-based systems.In particular, integrating vision-based information allowed for a more reliable navigation solution.It facilitated the correction of gyroscope bias drift and resetting RISS position and velocity errors through sensor fusion updates, enhancing overall accuracy.
During indoor testing scenarios, the vision-based navigation solution seamlessly collaborated with the RISS solution, limiting drift only when sufficient confidence in the vision-based solution was detected.Moreover, during periods of degraded vision environments, the RISS solution served as a robust bridge, ensuring continuous and reliable positioning and navigation.This integrated system effectively exploits the complementary characteristics of both vision and inertialbased solutions.
Looking forward, there is considerable potential for further improvements in positioning accuracy.We envision that incorporating machine learning techniques for dynamic tuning holds promise in enhancing the results achieved in this study.Such techniques can optimize system performance in real-time, adapting to various environmental conditions and reducing positioning errors.
As part of our future work, we are committed to exploring the integration of machine learning into our approach, aiming to unlock even greater levels of precision and robustness instead of manual trust thresholds within the VO solution.Specifically, our next study involves incorporating reinforcement learning policy to learn the tuning between the vision-based and inertial-based estimations.The main constraint we are currently working to overcome is the amount of data needed to train a reliable policy.

FIGURE 1 .
FIGURE 1. Illustration of the two camera positions and the epipolar plane formed by the light blue triangle intersects the two image planes in the epipolar line.

FIGURE 2 .
FIGURE 2. Integrated system architecture developed for the position and attitude estimation of wheel-based vehicular system operating in a GNSS-denied environment.

FIGURE 3 .
FIGURE 3. Position change represented in the body frame.The position at instance 2 is measured with respect to the position at instance 1 and the new axes are introduced to measure the next iteration with respect to it.

FIGURE 4 .
FIGURE 4.Loosely-coupled open-loop Kalman filter integration scheme developed for the ego-motion estimation of the moving platform using the vision-based and inertial-based subsystems with a dynamic system error model.

FIGURE 5 .
FIGURE 5. Flow of the kalman filter steps for prediction and correction.

FIGURE 6 .
FIGURE 6. Vision-based navigation subsystem dataflow where the dashed lines indicate conditional steps.

FIGURE 7 .
FIGURE 7. Setup used in experimental field tests showing the Husky robotic vehicle and its sensor payload.

FIGURE 8 .
FIGURE 8.The ZED stereo camera mounted on top of the minivan for road trajectory.

FIGURE 9 .
FIGURE 9. Interior testbed installed inside the minivan's trunk showing the used equipment.

FIGURE 10 .
FIGURE 10.Trajectory 1: The integrated RISS/VO solution (green) is demonstrated compared to the standalone RISS (blue) and VO solutions (red).Also, points where VO is degraded/discarded are highlighted.

FIGURE 11 .
FIGURE 11.Points detected and tracked in the vision-based subsystem for trajectory 1.The red line represents the threshold at which the vision-based solution is considered discarded, and the orange line is considered the degraded visual environment.

FIGURE 12 .
FIGURE 12. Trajectory 2: The integrated RISS/VO solution (green) is demonstrated compared to the standalone RISS (blue) and VO solutions (red).Also, points where VO is degraded/discarded are highlighted.

FIGURE 13 .
FIGURE 13.Points detected and tracked in the vision-based subsystem for trajectory 2. The red line represents the threshold at which the vision-based solution is considered discarded, and the orange line is considered the degraded visual environment.

FIGURE 14 .
FIGURE 14. Parking garage trajectory: The integrated RISS/VO solution (green) is demonstrated compared to the standalone RISS (blue) and VO solutions (red), while the Novatel reference is shown in purple.Also, points where VO is degraded/discarded are highlighted.

FIGURE 15 .
FIGURE 15.Points detected and tracked in the vision-based subsystem for the parking garage trajectory.The red line represents the threshold at which the vision-based solution is considered discarded, and the orange line is considered the degraded visual environment.