Processing math: 0%
LiDAR-IMU Tightly-Coupled SLAM Method Based on IEKF and Loop Closure Detection | IEEE Journals & Magazine | IEEE Xplore

LiDAR-IMU Tightly-Coupled SLAM Method Based on IEKF and Loop Closure Detection


Abstract:

Simultaneous Localization and Mapping (SLAM) technology based on LiDAR can achieve real-time robot positioning and establish environmental maps in unknown environments. L...Show More
Topic: Forest observation and parameter retrieval from remote sensing data

Abstract:

Simultaneous Localization and Mapping (SLAM) technology based on LiDAR can achieve real-time robot positioning and establish environmental maps in unknown environments. LiDAR odometry can achieve accurate pose estimation in short distances or small-scale environments, but the accuracy will decrease with the accumulation of errors. At the same time, under scenes with insufffcient structural features, point cloud-based LiDAR SLAM will show degradation phenomena, leading to failures in localization and mapping. We propose a loop closure detection method based on fusion of Inertial Measurement Unit (IMU) and LiDAR information to reduce system cumulative errors and solve environment degradation problems. Firstly, by fusing LiDAR key feature and IMU information, an odometry calculation method is proposed based on Iterative Extended Kalman Filtering (IEKF) to improve the accuracy of FAST-LIO initial pose. Then in loop closure detection, by considering geometric and intensity information simultaneously, a new scan context global descriptor is constructed from dedistortion feature points of front-end IMU to enhance the accuracy of loop closure detection for original point cloud descriptors. Finally, GTSAM is used for global optimization and GPS constraint is introduced to reduce trajectory drifts, obtaining globally consistent trajectories and mapping. Compared with existing LiDAR SLAM on KITTI dataset and self-collected dataset, proposed method has smaller trajectory errors on KITTI sequences, which reduced by 15% than baseline FAST-LIO2, and average time consumption of loop closure detection is reduced by about 50% than SC-ALOAM, and mapping drifts is reduced, which enhanced mapping accuracy and robustness while ensuring global consistency of constructed maps.
Topic: Forest observation and parameter retrieval from remote sensing data
Page(s): 6986 - 7001
Date of Publication: 09 February 2024

ISSN Information:

Funding Agency:


SECTION I.

Introduction

Simultaneous localization and mapping (SLAM) is a crucial technology for unmanned autonomous operations, enabling effective environment perception and real-time localization and mapping of mobile robots. SLAM methods can be categorized as LiDAR SLAM and visual SLAM based on the sensors used [1]. Visual SLAM, relying on cameras, faces uncertainties in outdoor environments due to light and environmental changes, impacting its accuracy and robustness [2]. In contrast, LiDAR SLAM offers significant advantages in measurement accuracy, range, and resistance to environmental interference [3]. Consequently, LiDAR SLAM excels in mapping and localization, finding widespread applications indoors and outdoors.

With 360^{\circ } horizontal field of view, accurate distance measurement, insensitivity to environmental lighting and optical textures, as well as effective perception in dark environments, 3-D LiDAR finds wide application in diverse scenarios [4]. However, LiDAR-based methods encounter degradation in scenarios like long corridors, tunnels, and open roads [5]. This degradation leads to significant errors in LiDAR SLAM state estimation and mapping overlapping and intersecting. At the same time, in large-scale or complex environments, there are high requirements for data processing, real-time performance, and stability of the system, while a single sensor may not provide enough information to ensure high-precision SLAM. In view of the above problems, this article studies the laser SLAM scheme based on multisensor fusion. Inertial measurement unit (IMU) data and laser mileage count are fused in a tightly coupled way to improve the accuracy of positioning map construction. The research shows that the fusion of these two sensor data can improve the accuracy and robustness of vehicle positioning and navigation. However, due to the differences between 3-D LiDAR and IMU in frequency, precision and coordinate system, it is difficult to learn from each other, calibrate, and synchronize multisensor parameters.

On the one hand, loop closure detection presents a challenging issue in LiDAR SLAM as it prevents significant deviations in state estimation over time [6], which crucial for accurate and effective back-end mapping. LiDAR-based loop closure detection methods are typically classified into local and global descriptors [3], [7]. Global descriptors, proposed to address the instability of matching local key points, calculate similarity between the current frame and others to detect loop closures [8]. Dube et al. [9] proposed SegMatch, a segment-based scene recognition method, as a loop closure detection module for global optimization of vehicle pose [10], [11]. However, due to additional point cloud processing, it increases the computational cost of loop closure detection.

On the other hand, incorporating back-end nonlinear global optimization, such as GTSAM [12] and Levenberg–Marquardt (LM), trajectory is optimized to produce high-precision map on a larger scale. The main methods for back-end global optimization include filter-base and graph-based approaches. Due to computational complexity and limited robustness of former, graph-based optimization dominates the field and with the following four categories:

  1. least squares;

  2. random gradient descent;

  3. relaxation iteration;

  4. manifold iteration, which can conveniently construct and solve large-scale factor graph problems.

Aiming at inaccuracy of positioning in existing LiDAR SLAM [3], [10], [13], especially in large-scale environments, the data information of multiple sensors is huge and complex, requiring a large amount of computing resources and effective data processing schemes, and for long-term system operation, it may lead to sensor drift, cumulative errors and other problems, in this article, by introducing loop closure detection and back-end optimization, FAST-LIO2 [14] is improved to reduce the cumulative errors and improve the accuracy of localization and mapping. The main contributions are as follows.

  1. A tightly coupled localization method is proposed based on LiDAR key features and IMU. Curvature-based feature extraction scheme is proposed to enhance the spatial constraints of planar feature points. The initial pose estimation of LiDAR-Inertial Odometry is then obtained by an IEKF to improve the accuracy of the localization.

  2. A loop closure detection method based on height-intensity scan context (HISC) global descriptor is proposed. By integrating geometric and intensity information, HISC global descriptor is designed and an adaptive distance threshold is introduced to correct the accumulated errors over time in LiDAR-IMU system, improve the trajectory and mapping accuracy of large-scale environment.

  3. A feature matching is proposed for calculating the pose transformation between pairs of loop-point clouds, to reduce the computational cost of pose estimation. In addition, we introduce a trajectory global optimization method based on GTSAM, which constructs a factor graph using iSAM2 and incorporates GPS factors to impose absolute position constraints, thereby reducing mapping trajectory drift.

The rest of this article is organized as follows. In Section II, we discuss current research works of LiDAR SLAM methods. Section III describes the details of the proposed method, including LiDAR inertial odometry, loop closure detection, global optimization and mapping. The experiments are presented in Section IV. And the experimental results are discussed in Section V. Finally, Section VI concludes this article.

SECTION II.

Related Work

At present, the research of LiDAR SLAM has been very extensive. In this section, we focus on analyzing the research status of LiDAR SLAM, loop closure detection, and tightly coupled of LiDAR and IMU sensors.

A. LiDAR SLAM

The development of SLAM can be categorized into three stages: early (1986–2010), middle (2010–2014), and modern (2014–present). In the early stage, methods based on Kalman filter (KF) held prominence, followed by the emergence of methods based on extended Kalman filter (EKF) and particle filter, among others. In 2010, the introduction of Karto SLAM [15] marked the arrival of optimization-based SLAM, which demonstrated better performance compared to filter-based methods. In 2014, Zhang Ji [5] proposed Lidar Odometry and Mapping in Real-time (LOAM), which marked the basic maturity of LiDAR SLAM, and separated the localization and mapping into two algorithms. One performs high-frequency odometry with low accuracy (localization), while the other operates at a lower frequency to perform point cloud matching and registration (mapping and odometry correction). By combining these, a high-precision and real-time LiDAR odometry system is achieved. However, it also has certain limitations, such as reduced optimization efficiency in feature-rich environments due to the lack of a closed-loop detection function [16]. As a result, enhancing the robustness of SLAM has become a new research focus, leading to continuous improvements in LOAM's performance.

Based on Ceres Solver [17], Wang et al. [1] proposed Fast LiDAR Odometry and Mapping (F-LOAM), which improves the frame matching accuracy of LOAM by removing scan-to-scan pose estimation, retaining only scan-to-map pose optimization, and adopting a noniterative two-level distortion compensation method to reduce computational complexity and cost. According to the official evaluation criteria of KITTI dataset [18], although F-LOAM achieved the best accuracy in pose estimation, it lacks loop closure detection, leading to accumulated errors and reducing effectiveness in large-scale scenarios [3]. Shan [19] proposed Light-weight and Ground-Optimized LiDAR Odometry and Mapping (LeGO-LOAM), which incorporated loop closure detection and lightweight and ground optimization on feature extraction, by combining ICP and Euclidean distance to identify loop closure points. Compared to other methods, it achieved similar or better accuracy. However, the loop closure detection in LeGO-LOAM is unstable and may occasionally result in detection errors or missed identifications [13].

B. Optimization Methods for LiDAR SLAM

Many researchers have attempted to enhance the performance of SLAM systems by incorporating additional modules. Among them, loop closure detection is a critical module in SLAM. Over time, LiDAR odometry accumulates errors, leading to drift problems in long-term navigation and mapping. To address this issue in large-scale scenarios, integrating a loop closure detection module in the back-end can correct odometry drift [1]. Kim et al. [8] proposed a method using scan context global descriptor to reduce the dimensionality of point cloud data frames, which are stored in a 2-D matrix, the rows represent the distance of the divided ground regions (i.e., bins) from the LiDAR center, and the columns represent the angle of these regions relative to the x-direction. By using the global descriptor to calculate the similarity between current frame and others, loop closure detection results can be obtained (e.g., [13] and [20]).

However, existing 3-D loop closure detection methods often use local or global geometric descriptors while neglecting the intensity information in point clouds. To address this limitation, a global descriptor called intensity scan context (ISC) was proposed based on F-LOAM, which incorporates both geometry and intensity information [21]. Furthermore, to enhance the efficiency of loop closure detection, a two-stage hierarchical rerecognition method was introduced. Optimized-sc-f-loam [3] is also a loop closure detection method based on F-LOAM. Unlike [21], it employs a feature-point matching method instead of using the original LiDAR point cloud to calculate the pose transformation between the current frame and the closed-loop frame, resulting in reduced computation time. In the back-end, global optimization is based on GTSAM, and an adaptive distance threshold is utilized for more precise loop closure detection.

Integrating an IMU is another approach to enhance the performance of LiDAR SLAM system. The incorporation of IMU can significantly improve the accuracy and robustness of the LiDAR odometry and compensate motion distortions in LiDAR scans. With the rapid development of robotics technology, localization and mapping techniques are being applied to increasingly complex and dynamic scenarios [22]. The combination of LiDAR and IMU offers advantages such as high accuracy, fast speed, and immunity to environmental lighting conditions, which makes it widely employed [23].

At present, LiDAR and IMU fusion can be categorized into the loosely-coupled and the tightly-coupled. The former deal with two sensor information separately to infer their motion constraints which are fused later (e.g., [5], [19], [24], and [25]). The authors in [26] proposed combining IMU measurements with attitude estimates, which obtained from a LiDAR-based Gaussian particle filter and a prebuilt map. Generally speaking, the loosely-coupled fusion method is computationally effective and has good real-time performance. However, the motion constraints between LiDAR and IMU may lead to information loss [4], making it difficult to ensure accuracy in high-speed motion or degraded scenes.

Another approach is the tightly-coupled method that directly fuses LiDAR information and IMU data by joint optimization, which can be further divided into optimization-based [27], [28] and extended KF-based approaches [4], [29]. The authors in [30] proposed a graph optimization based tightly-coupled approach, which combined prior information from the LiDAR-IMU odometry and optimization method based on rotational constraints to further refine the odometry pose, it can obtain a globally consistent and robust mapping trajectory. However, constraints and batch optimization in constructing local map windows are time consuming and have poor real-time performance. Shan et al. proposed LIO-SAM [28], which is based on incremental smoothing [31] that introduced a global factor graph consisting of LiDAR odometry factors, IMU preintegration factors, GPS factors, and loop closure factors, and it achieved high-precision global consistency in mapping and motion estimation for the robot. The authors in [29] proposed a LiDAR-IMU fusion odometry framework with high computational efficiency and good robustness, which adopted the iterative extended Kalman Filter similar to [4], and used forward propagation to predict the state as well as backward propagation to correct motion distortion in LiDAR scanning, and proposed a new Kalman gain calculation formula to reduce the computational complexity. But the system lacked back-end optimization and only worked in small environment. The authors in [14] inherited FAST-LIO and introduced a new data structure called the incremental k-d tree (ikd-tree) [32], which supports incremental update (e.g., point insertion and deletion) as well as dynamic rebalancing, significantly reducing the amount of computation. This approach improves the accuracy and robustness of odometry and map generation. However, similar to [29], this method is also limited to small-scale environments.

In summary, 3-D LiDAR faces challenges in long-range environments with sparse structural features, leading to localization and mapping failures and inadequate construction of surrounding environment maps. The tightly-coupled of LiDAR and IMU incurs high computational costs, significant time consumption, and large memory usage, which making it difficult to ensure real-time and accurate operation of the system. Therefore, improving the precise localization and mapping of 3D point cloud maps in large-scale environments, both indoors and outdoors, is of great significance for applications such as mobile robotics and autonomous driving.

SECTION III.

Methodology

The proposed LiDAR SLAM method consists of three parts: 1) LiDAR-IMU odometry; 2) loop closure detection; and 3) global optimization and mapping, as in Fig. 1, to enable real-time six-DOF state estimation of the robot and establish a globally consistent map. To address the asynchronization issue between the LiDAR and IMU, the LiDAR point cloud data is first preprocessed. Considering challenges as the large quantity of raw LiDAR points, slow data processing speed, and computational complexity, local neighbor that based feature extraction is given to compute curvature and extracted feature points, including planar points and edge points. Next, the motion model provided by the IMU is employed to predict the state of feature points, which is fused and optimized using iterative Kalman Filter to estimate the robot's pose and position during motion. Simultaneously, the IMU backward propagation is employed to mitigate the motion distortion of the LiDAR, to obtain more accurate and robust localization results. Subsequently, a loop closure detection method based on the HISC is proposed to reduce the localization drift caused by cumulative errors in the LiDAR-IMU odometry. Then, an adaptive distance threshold is introduced to reduce the probability of wrong loop closure detection and the similarity between the current frame and historical frames is computed based on proposed global descriptor, which is constructed by the distortion-corrected feature points. Finally, by adding odometry factors, loop closure factors, and GPS factors to the iSAM2 factor graph, a global optimization-based GTSAM is presented to correct the accumulated drift errors and generate globally consistent trajectories and maps.

Fig. 1. - System framework overview.
Fig. 1.

System framework overview.

A. LiDAR-Inertial Odometry

1) Feature Extraction

Due to the large number of points contained in a single scan of LiDAR, traditional point cloud matching methods, such as ICP and NDT, have low computational efficiency. Therefore, it is necessary to process the raw point cloud data \mathcal {P}_{k}. In practical applications, feature-based matching methods have shown stronger robustness and efficiency compared to raw point cloud matching methods. To improve the speed, matching accuracy, and efficiency of the algorithm, this article extracts plane feature points and edge feature points [5] from the raw point cloud while discarding noisy or less significant points. As the 3-D mechanical rotating LiDAR scan produces sparser points in the vertical direction and denser point clouds in the horizontal direction, the algorithm focuses on points on the plane for each scan and calculates the local plane curvature \sigma as follows: \begin{align*} \sigma _{k}^{(m,n)}=\frac{1}{|S_{k}^{(m,n)}|}\sum_{\text{p}_{k}^{(m,j)}\in S_{k}^{(m,n)}} {||\text{p}_{k}^{(m,j)}-\text{p}_{k}^{(m,n)}||} \tag{1} \end{align*} View SourceRight-click on figure for MathML and additional features.where k\in {{\text{Z}}^{+}} is the scan number; {{\mathcal {P}}_{k}} is the point cloud obtained from the kth scan, each point is \text{p}_{k}^{(m,n)},m\in [1,M],n\in [1,N]; S_{k}^{(m,n)} is the local point cloud set formed by the neighboring points of \text{p}_{k}^{(m,n)} in the horizontal direction (along the clockwise and anticlockwise, respectively, choose five points as a local adjacent point cloud); |S_{k}^{(m,n)}| is the number of point clouds in the local point cloud set. Compared to the local search method, S_{k}^{(m,n)} can be obtained more quickly according to the index number n of the points, reducing the computational cost. For planar points, such as feature points on walls, they have smaller smoothness values. On the other hand, edge points have larger smoothness values. Therefore, for a given scan point cloud {{\mathcal {P}}_{k}}, point with larger \sigma value is edge point (\sigma >{{\sigma }_{e}}), while point with smaller \sigma value is planar point (\sigma < {{\sigma }_{s}}). {{\mathcal {E}}_{k}} is the set of edge points and {{\mathcal {S}}_{k}} is the set of planar points. {{\mathbb {F}}_{k}}=\lbrace {{\mathcal {E}}_{k}},{{\mathcal {S}}_{k}}\rbrace is the set of feature points. The planar and edge points extracted using this method on the KITTI dataset and self-collected dataset are shown in Fig. 2.

Fig. 2. - Extract feature points (yellow indicates edge points, green indicates planar points). (a) KITTI dataset. (b) Self-collected dataset.
Fig. 2.

Extract feature points (yellow indicates edge points, green indicates planar points). (a) KITTI dataset. (b) Self-collected dataset.

2) Tightly-Coupled LiDAR-IMU With IEKF

To address the issue of localization and mapping failures in scenes with insufficient structural features, this article inherits the FAST-LIO2 [14] by integrating IMU and LiDAR with IEKF. This approach aims to handle degraded and weak-textured scenes and improve system stability. It mainly includes forward propagation based on IMU measurements and iterative updating based on LiDAR scanning.

1) Forward Propagation Based on IMU

State definitions as follows: \begin{align*} x={{[{{p}^{T}},{{v}^{T}},{{R}^{T}},b_{\omega }^{T},b_{a}^{T},{{g}^{T}}]}^{T}} \tag{2} \end{align*} View SourceRight-click on figure for MathML and additional features.where p, R are the position and attitude of IMU, v is the velocity, g is the gravity vector, and {{b}_{\omega }} and {{b}_{a}} are the IMU bias.

Assume the optimal state estimate after fusing the last (i.e., k-1th) LiDAR scan is {{\bar{x}}_{k-1}} with covariance matrix {\bar{\text{P}}}_{k-1}. The forward propagation is performed upon the arrival of an IMU measurement. The state and covariance are propagated as follows: \begin{align*} {{\hat{x}}_{i+1}}&={{\hat{x}}_{i}}\boxplus \left(\Delta t\text{f}\left({{{\hat{x}}}_{i}},{{\mu }_{i}},0 \right) \right);{{\hat{x}}_{0}}={{\bar{x}}_{k-1}} \tag{3} \\ {{{\hat{\text{P}}}}_{i+1}}&={{{\text{F}}}_{{{{\tilde{x}}}_{i}}}}{{{\hat{\text{P}}}}_{i}}{{\text{F}}_{{{{\tilde{x}}}_{i}}}}^{T}+{{\text{F}}_{{{w}_{i}}}}{{Q}_{i}}{{\text{F}}_{{{w}_{i}}}}^{T};{{{\hat{\text{P}}}}_{0}}={{{\bar{\text{P}}}}_{k-1}} \tag{4} \end{align*} View SourceRight-click on figure for MathML and additional features.where i represent the index of IMU data, {{x}_{i}}, {{\hat{x}}_{i}}, and {{\bar{x}}_{i}} represent the ground-true, propagated, and updated value of {{x}_{i}}, respectively. Based on the definition of \boxplus and \boxminus in [29], the continuous state transition equation {{x}_{i+1}}={{x}_{i}}\boxplus (\Delta t\text{f}({{x}_{i}},{{\mu }_{i}},{{w}_{i}}))({{w}_{i}}=0) is discretized using the IMU sampling interval \Delta t to obtain (3). And in (4), the matrix {{{\hat{\text{P}}}}_{i+1}} is the covariance of the error of {{\tilde{x}}_{i+1}}={{x}_{i+1}}\boxminus {{\hat{x}}_{i+1}}={{\text{F}}_{{{{\tilde{x}}}_{i}}}}\tilde{x}+{{\text{F}}_{{{w}_{i}}}}{{w}_{i}}. Then, {{Q}_{i}} is the covariance of the noise {{w}_{i}} and the matrix {{\text{F}}_{{{{\tilde{x}}}_{i}}}} and {{\text{F}}_{{{w}_{i}}}} are computed as follows: \begin{align*} {{\text{F}}_{{{{\tilde{x}}}_{i}}}}&=\frac{\partial \left({{x}_{i+1}}\boxminus {{{\hat{x}}}_{i+1}} \right)}{\partial {{{\tilde{x}}}_{i}}}|{{\tilde{x}}_{i}}=0,{{w}_{i}}=0 \tag{5}\\ {{\text{F}}_{{{{\tilde{x}}}_{i}}}}&=\left[ \begin{matrix}\text{I} & \text{I}\Delta t & 0 & 0 & 0 & 0 \\ 0 & \text{I} & -\hat{R}{{ \lfloor {{a}_{m}}-{{b}_{a}} \rfloor }_{\wedge }} & 0 & -\hat{R}\Delta t & \text{I}\Delta t \\ 0 & 0 & \text{Exp}(-({{\omega }_{m}}-{{b}_{\omega }})\Delta t) & \text{-I}\Delta t & 0 & 0 \\ 0 & 0 & 0 & \text{I} & 0 & 0 \\ 0 & 0 & 0 & 0 & \text{I} & 0 \\ 0 & 0 & 0 & 0 & 0 & \text{I} \\ \end{matrix} \right] \tag{6}\\ {{\text{F}}_{{{w}_{i}}}}&=\frac{\partial ({{x}_{i+1}}\boxminus {{{\hat{x}}}_{i+1}})}{\partial {{w}_{i}}}|{{\tilde{x}}_{i}}=0,{{w}_{i}}=0 \tag{7}\\ {{\text{F}}_{{{w}_{i}}}}&=\left[ \begin{matrix}0 & 0 & 0 & 0 \\ -\text{I}\Delta t & 0 & 0 & 0 \\ 0 & -\text{I}\Delta t & 0 & 0 \\ 0 & 0 & -\text{I}\Delta t & 0 \\ 0 & 0 & 0 & -\text{I}\Delta t \\ 0 & 0 & 0 & 0 \\ \end{matrix} \right]. \tag{8} \end{align*} View SourceRight-click on figure for MathML and additional features.

The forward propagation continues until reaching the end time of the current kth scan where the propagated state and covariance are denoted as {{\hat{x}}_{k}}, {{{\hat{\text{P}}}}_{k}}, respectively, serving as the predicted state for the forward propagation and then continues to the next scan.

2) Residual Computation

Assume the estimate of state {{x}_{k}} at the current kth iterate update is \hat{x}_{k}{ }^{\mathrm{k}}, when {\mathit{k}}=0, {{\hat{x}}_{k}}{ }^\text{{k}}={{\hat{x}}_{k}}, where {{\hat{x}}_{k}} is the predicted state from the propagation in (3). Then, using the coordinate transformation defined in (9), the scan points {\text{{p}}_{j}}^{L} of the LiDAR, after distortion correction using IMU [29], are projected onto the global coordinate system. \begin{align*} {\hat{\text{p}}}{{_{j}^{\text{k}}}^{W}}={\hat{\text{T}}}{{_{{{I}_{k}}}^{\text{k}}}^{W}}{\hat{\text{T}}}{{_{{{L}_{k}}}^{\text{k}}}^{I}}{{\text{p}}_{j}}^{L} \tag{9} \end{align*} View SourceRight-click on figure for MathML and additional features.where L,I,W represent LiDAR coordinate system, IMU coordinate system, and global coordinate system, respectively; {\hat{\text{T}}}_{{{L}_{k}}}^{\text{k}} and {\hat{\text{T}}}_{{{I}_{k}}}^{\text{k}} are the state transition matrix of coordinate transformation.

For each LiDAR feature point, we assume that the nearest plane or edge defined by its neighboring feature points in the map represents its true location. The residual is defined as the distance between the estimated global coordinates {\hat{\text{p}}}{{_{j}^{\text{k}}}^{W}} of the feature point and the nearest plane (or edge) in the map. Let {{\mu }_{j}} be the normal vector of the corresponding plane or the direction of the corresponding edge, and let {{\text{q}}_{j}}^{W} be a point on the plane or edge. Then, the residual is defined as follows: \begin{align*} {{\text{z}}_{j}}^{\text{k}}&={{\mu }_{j}}^{T}({\hat{\text{p}}}{{_{j}^{\text{k}}}^{W}}-{{\text{q}}_{j}}^{W}) \tag{10} \\ {{\text{z}}_{j}}^{\text{k}}&={{ \lfloor {{\mu }_{j}} \rfloor }_{\wedge }}({\hat{\text{p}}}{{_{j}^{\text{k}}}^{W}}-{{\text{q}}_{j}}^{W}) \tag{11} \end{align*} View SourceRight-click on figure for MathML and additional features.where (10) is the residual of plane feature points, and (11) is the residual of edge feature points.

Substituting (9) into (10) and (11) yields the measurement model 0={{\text{h}}_{j}}({{x}_{k}},{{n}_{j}}^{L})={{G}_{j}}({\hat{\text{T}}}{{_{{{I}_{k}}}^{\text{k}}}^{W}}{\hat{\text{T}}}{{_{{{L}_{k}}}^{\text{k}}}^{I}}{{\text{p}}_{j}}^{L}-{{\text{q}}_{j}}^{W}) (when the point is a planer point {{G}_{j}}={{\mu }_{j}}^{T}, otherwise {{G}_{j}}={{ \lfloor {{\mu }_{j}} \rfloor }_{\wedge }}. Moreover, approximating the measurement equation by its first order approximation made at {{\hat{x}}_{k}}{ }^{\text{k}} leads to (12). \begin{align*} 0&=\mathrm{h}_{j}\left(x_{k}, n_{j}{ }^{L}\right) \\ & =\mathrm{h}_{j}\left(\hat{x}_{k}{ }^{\mathrm{k}} \boxplus \tilde{x}_{k}{ }^{\mathrm{k}}, n_{j}{ }^{L}\right) \\ & \simeq \mathrm{h}_{j}\left(\hat{x}_{k}{ }^{\mathrm{k}}, 0\right)+\mathrm{H}_{j}{ }^{\mathrm{k}} \tilde{x}_{k}{ }^{\mathrm{k}}+\mathrm{v}_{j} \\ & ={\mathrm{z}_{j}}^{\mathrm{k}}+\mathrm{H}_{j}{ }^{\mathrm{k}} \tilde{x}_{k}{ }^{\mathrm{k}}+\mathrm{v}_{j} \tag{12} \end{align*} View SourceRight-click on figure for MathML and additional features.where {{\tilde{x}}_{k}}{ }^{\text{k}}={{x}_{k}}\boxminus {{\hat{x}}_{k}}{ }^{\text{k}} (or equivalently, {{x}_{k}}={{\hat{x}}_{k}}{ }^{\text{k}}\boxplus {{\tilde{x}}_{k}}{ }^{\text{k}}); {{\text{z}}_{j}}^{\text{k}}={{\text{h}}_{j}}({{\hat{x}}_{k}}{ }^{\text{k}},0) is called the residual; {{\text{H}}_{j}}^{\text{k}} is the Jacobin matrix of {{\text{h}}_{j}}({{\hat{x}}_{k}}{ }^{\text{k}}\boxplus {{\tilde{x}}_{k}}{ }^{\text{k}},{{n}_{j}}^{L}) with respect to {{\tilde{x}}_{k}}{ }^{\text{k}}, evaluated at zero; {{\text{v}}_{j}}\in (0,{\text{R}_{j}}) is due to the raw measurement noise.

3) State Iterative Update

The propagated state {{\hat{x}}_{k}} and covariance {{{\hat{\text{P}}}}_{k}} from (1) impose a prior Gaussian distribution for the unknown state {{x}_{k}}. More specifically, {{{\hat{\text{P}}}}_{k}} represents the covariance of the error state in (13). \begin{align*} x_{k} \boxminus \hat{x}_{k}&=\left(\hat{x}_{k}{ }^{\mathrm{k}} \boxplus \tilde{x}_{k}{ }^{\mathrm{k}}\right) \boxminus \hat{x}_{k}\\ &=\hat{x}_{k}{ }^{\mathrm{k}} \boxminus \hat{x}_{k}+\mathrm{J}^{\mathrm{k}} \tilde{x}_{k}{ }^{\mathrm{k}} \sim \mathcal {N}\left(0, \widehat{\mathrm{P}}_{k}\right) \tag{13} \end{align*} View SourceRight-click on figure for MathML and additional features.where \mathrm{J}^{\mathrm{k}} is the partial differentiation of (\hat{x}_{k}{ }^{\mathrm{k}} \boxplus \tilde{x}_{k}{ }^{\mathrm{k}}) \boxminus \hat{x}_{k} with respect to \tilde{x}_{k}{ }^{\mathrm{k}} evaluated at zero. For the first iteration, \hat{x}_{k}{ }^{\mathrm{k}}=\hat{x}_{k}, \mathrm{J}^{\mathrm{k}}=\mathrm{I}.

Besides the prior distribution, the state distribution of (14) is computed based on the measurement model derived from (12). \begin{align*} -\mathrm{v}_{j}=\mathrm{z}_{j}{ }^{\mathrm{k}}+\mathrm{H}_{j}{ }^{\mathrm{k}} \tilde{x}_{k}{ }^{\mathrm{k}} \sim \mathcal {N}\left(0, \mathrm{R}_{j}\right). \tag{14} \end{align*} View SourceRight-click on figure for MathML and additional features.

Combining the prior distribution in (13) and the measurement model from (14) yields the posteriori distribution of the state {x}_{k} equivalently represented by \tilde{x}_{k}{ }^{\mathrm{k}} and its maximum a-posteriori estimate (MAP) in (15). \begin{align*} \min _{\tilde{x}_{k}^{\mathrm{k}}}\left(\left\Vert x_{k} \boxminus \hat{x}_{k}\right\Vert _{\hat{\mathrm{P}}_{k}}^{2}+\sum _{j=1}^{m}\left\Vert \mathrm{z}_{j}{ }^{\mathrm{k}}+\mathrm{H}_{j}{ }^{\mathrm{k}} \tilde{x}_{k}{ }^{\mathrm{k}}\right\Vert _{\mathrm{R}_{j}}^{2}\right). \tag{15} \end{align*} View SourceRight-click on figure for MathML and additional features.

This MAP problem can be solved by IEKF method as follows: \begin{align*} \text{K}&={{({{\text{H}}^{T}}{{\text{R}}^{-1}}\text{H+}{{\text{P}}^{-1}})}^{-1}}{{\text{H}}^{T}}{{\text{R}}^{-1}} \tag{16} \\ {{\hat{x}}_{k}}{ }^{\text{k+1}}&={{\hat{x}}_{k}}{ }^{\text{k}}\boxplus (-\text{K}{{\text{z}}_{k}}^{\text{k}}-(\text{I}-\text{KH}){{({{\text{J}}^{\text{k}}})}^{-1}}({{\hat{x}}_{k}}{ }^{\text{k}}\boxminus {{\hat{x}}_{k}})) \tag{17} \end{align*} View SourceRight-click on figure for MathML and additional features.where K is the Kalman gain, and {{\hat{x}}_{k}}{ }^{\text{k+1}} is the poststate estimation. \text{H}=[ {{\text{H}}_{1}}^{{{\text{k}}^{T}}},{{\text{H}}_{2}}^{{{\text{k}}^{T}}},{\ldots },{{\text{H}}_{m}}^{{{\text{k}}^{T}}} ] , \text{R}=\text{diag}[{{\text{R}}_{1}},{{\text{R}}_{2}},{\ldots },{{\text{R}}_{m}}], \text{P}=({{\text{J}}^{\text{k}}})^{-1} {{{\hat{\text{P}}}}_{k}} ({{\text{J}}^{\text{k}}})^{-T}, {{\text{z}}_{k}}^{\text{k}}={{[{{\text{z}}_{1}}^{{{\text{k}}^{T}}},{{\text{z}}_{2}}^{{{\text{k}}^{T}}},{\ldots },{{\text{z}}_{m}}^{{{\text{k}}^{T}}}]}^{T}}.

The above process repeats until convergence, then, the final optimal state and covariance estimates after convergence are given as follows: \begin{align*} {{\bar{x}}_{k}}&={{\hat{x}}_{k}}{ }^{\text{k+1}} \tag{18} \\ {{{\bar{\text{P}}}}_{k}}&=(\text{I-KH})\text{P}. \tag{19} \end{align*} View SourceRight-click on figure for MathML and additional features.

Subsequently, the optimal state estimate {{\bar{x}}_{k}} and covariance estimate {{{\bar{\text{P}}}}_{k}} are used as inputs for the next scan, and the previous operations are repeated to obtain the estimated LiDAR-Inertial odometry.

B. Loop Closure Detection

To address the issue of accumulated errors in the LiDAR-IMU system during long-term operation, this article introduced a loop closure detection method to reduce positioning drift. The stability is improved by correcting the trajectory and optimizing the map. The Scan Context [8] and Intensity Scan Context [21] algorithms reduced computational complexity by dimensionality reduction, mapping the 3-D point cloud data to a 2-D image. The global descriptor, as shown in Fig. 3(b), is calculated to measure the similarity between point cloud frames and determine if a loop closure is formed. However, it utilized the original point cloud from the LiDAR for constructing the global descriptor, resulting in high computational cost and a high rate of wrong loop closures. By using feature points after front-end IMU distortion corrected, we introduced a global descriptor to improve accuracy in measuring the similarity between point clouds. To better represent the characteristics of the point cloud, a combined consideration of intensity and height information is proposed, known as the HISC global descriptor. To address the limitation of fixed distance threshold, which may result in missing loop closures or false positives, an adaptive distance threshold is proposed to replace it to reduce the probability of false loop closure detection.

Fig. 3. - Using the top view of a point cloud from a 3-D scan (a), we partition ground areas into bins, that are divided according to azimuth (from 0 to $2\pi$ within the LiDAR frame) and radial (from center to maximum sensing range) directions. The yellow area as a ring $({N}_{r})$, the cyan area as a sector $({N}_{s})$, and the black-filled area as a bin. Scan Context is a matrix as in (b). The ring and sector described in (a) are represented by the same-colored column and row, respectively, in (b). In this article, we use the maximum Height and Intensity of points in a bin.
Fig. 3.

Using the top view of a point cloud from a 3-D scan (a), we partition ground areas into bins, that are divided according to azimuth (from 0 to 2\pi within the LiDAR frame) and radial (from center to maximum sensing range) directions. The yellow area as a ring ({N}_{r}), the cyan area as a sector ({N}_{s}), and the black-filled area as a bin. Scan Context is a matrix as in (b). The ring and sector described in (a) are represented by the same-colored column and row, respectively, in (b). In this article, we use the maximum Height and Intensity of points in a bin.

1) Global Descriptor of HISC

This article introduced a novel global descriptor, which constructed by height [8] and intensity [21] information. The height information provides an effective summary of the vertical structure of the surrounding buildings, eliminating the need for complex calculations to analyze the point cloud characteristics. The maximum height indicates the visible portion of the surrounding structure. This self-centered visibility allows for the analysis of place characteristics [8], making it useful for loop closure detection and verifying if the system passes through the same location. Moreover, objects exhibit different intensity values, and intensity information serves as a representation of the reflectance of surfaces in the environment. For instance, highly reflective materials like metal plates have higher intensity values, while concrete surfaces have lower values. Therefore, intensity information can effectively serve as a feature to assist laser odometry in location identification.

Assuming the LiDAR scan contains n points, denotes as P=\lbrace \mathrm{p}_{1}, \mathrm{p}_{2}, \ldots, \mathrm{p}_{n}\rbrace and the intensity value is \eta. [x,y,z] represents the spatial position coordinates of each point. And in Cartesian coordinate system, each point is represented as {{\text{p}}_{k}}=[{{x}_{k}},{{y}_{k}},{{z}_{k}},{{\eta }_{k}}] (where k is the {k}th point in the point cloud P). According to the method in [21], we first divide the 3-D scan point cloud into the azimuth and radial bin in the sensor coordinates, in an equally spaced manner as shown in Fig. 3(a). {{N}_{s}}, {{N}_{r}} are the number of sectors and rings, respectively. That is to say, if we set the maximum measurement range of the LiDAR sensor to {{L}_{\text{max}}} the radial clearance between the rings is {{L}_{\text{max}}}/{{N}_{r}}, and the center angle of the sector to be equal to {2\pi }/{{N}_{s}}, this article defines {{N}_{s}} = 60, {{N}_{r}} = 20, each frame point cloud is divided into the subspace {{S}_{i,j}}(i\in \lbrace 1,2,{\ldots },{{N}_{s}}\rbrace, j\in \lbrace 1,2,{\ldots },{{N}_{r}}\rbrace) of {{N}_{r}}\times {{N}_{s}}. For the same object, the intensity values are consistent, and the number of points in each subdivided subspace is much smaller than the total number of points in a frame. Therefore, it is assumed that the intensity values of LiDAR points are the same within each subspace, combined with the height information of point clouds in each subspace, the sum of the maximum height value and intensity value of all points in each subspace is employed to calculate the value of this subspace, is defined as a new descriptor, as follows: \begin{align*} \Omega (i, j)=\max _{\mathrm{p}_{k} \in S_{i, j}}\left(\eta _{k}+z_{k}\right). \tag{20} \end{align*} View SourceRight-click on figure for MathML and additional features.

According to the above process, HISC global descriptor is finally expressed as the matrix I of {{N}_{r}}\times {{N}_{s}} in (21). \begin{align*} I=\left(a_{i j}\right) \in \mathbb {R}^{N_{r} \times N_{s}}, a_{i j}=\Omega (i, j). \tag{21} \end{align*} View SourceRight-click on figure for MathML and additional features.

The matrix contains geometric information and intensity information of the environment. The Similarity Score between Scan Contexts and Two-phase Search Algorithm method [8] is employed to calculate the similarity between point cloud pairs according to I, so as to determine whether there is a closed loop.

2) Adaptive Distance Threshold

The original Scan Context method uses a fixed distance threshold to detect and exclude point cloud pairs that have high similarity but are far apart. Setting a small fixed distance threshold may result in undetected loops, while setting a large threshold may detect more false loops. To address this problem, an adaptive distance threshold [3] is introduced to replace it, which effectively detects closed loops and reduces the occurrence of false loop closures.

When a loop closure was detected, the matrix in (21) is computed to obtain a pair of similarity loop-closing points, denoted as {{\text{p}}_{k}}^{L} and {{\text{p}}_{\text{loop}}}^{W}. Using coordinate transformations, the relative pose transformations of the two-point cloud frames with respect to the world coordinate system, {{T}_{k}}^{W} and {{T}_{\text{loop}}}^{W}, are calculated. Then, the relative pose transformation between the two-point clouds, \tilde{T}_{k}^{\text{loop}}, is computed as follows: \begin{align*} \tilde{T}_{k}^{\text{loop}}={{T}_{\text{loop}}}^{{{W}^{-1}}}{{T}_{k}}^{W}. \tag{22} \end{align*} View SourceRight-click on figure for MathML and additional features.

The distance d between the two points is calculated based on the obtained transformations in (23). \begin{align*} d=\sqrt{\tilde{T}_{k}^{\text{loop}}.{{x}^{2}}+\tilde{T}_{k}^{\text{loop}}.{{y}^{2}}+\tilde{T}_{k}^{\text{loop}}.{{z}^{2}}} \tag{23} \end{align*} View SourceRight-click on figure for MathML and additional features.where pose {{T}_{k}}^{W} and {{T}_{\text{loop}}}^{W} are obtained from the front-end LiDAR-IMU odometry. \tilde{T}_{k}^{\text{loop}} is the relative pose transformation calculated based on the current and the loop-closed frame.

Then, a linear function is constructed using the number of keyframes k from the front-end odometry to define the threshold {d}_{\text{thre}}, as shown in (24). \begin{align*} {{d}_{\text{thre}}}=60+k/100. \tag{24} \end{align*} View SourceRight-click on figure for MathML and additional features.

If d is greater than the threshold {d}_{\text{thre}}, it indicates that no loop closure has been detected. This method helps to reduce the error rate in loop closure detection, improve the results of global optimization, and enhance the efficiency of the overall LiDAR SLAM algorithm.

C. Global Optimization and Mapping

In order to address the problem of system error accumulation over time and loop closure detection, we proposed back-end optimization to correct system errors. While local optimization only considered the relationship between consecutive frames, it is prone to noise and motion deviations. Therefore, global optimization is introduced, which utilizes global observation data and pose to correct errors. Scan-to-map constraints are constructed for all observed key frames using the feature point matching method [5], forming a large-scale nonlinear least squares problem. The LM algorithm is employed to minimize the error functions, eliminate errors, and correct trajectory drift. In addition, the GTSAM library based on factor graph is employed for global optimization. The iSAM2 algorithm [31] is utilized to incorporate odometry factors, loop closure detection factors, and GPS factors. By using joint optimization, errors are effectively eliminated, enabling the determination of an optimal trajectory and pose for each key frame. This leads to more accurate map and positioning results, enhancing the precision of map construction.

1) Pose Estimation Based on Feature Matching

The feature points {{\text{p}}_{k}}^{L} of the current frame, which are in the LiDAR coordinate system, are transformed to the global coordinate system as {{\text{p}}_{k}}^{W} using the coordinate transformation matrix {{({{T}_{\text{loop}}}^{W})}_{\text{new}}} optimized in the previous frame. A submap is then constructed in the global coordinate system, consisting of the loop closure point cloud {{\text{p}}_{\text{loop}}}^{W} and the neighboring point clouds. First, (25) is used to project the feature points from the LiDAR coordinate system to the global coordinate system. \begin{align*} \mathrm{p}_{k}{ }^{W}=R_{\text{loop }}^{W} *\left(R_{L}^{\text{loop }} * \mathrm{p}_{k}{ }^{L}+t_{L}^{\text{loop }}\right)+t_{\text{loop }}^{W} \tag{25} \end{align*} View SourceRight-click on figure for MathML and additional features.where {{\text{p}}_{k}}^{L} is the feature point in the LiDAR coordinate system. (R_{L}^{\text{loop}},t_{L}^{\text{loop}}) represents the external parameter between the LiDAR coordinate system and the carrier coordinate system. (R_{\text{loop}}^{W},t_{\text{loop}}^{W}) represents the transformation of the carrier coordinate system to the global coordinate system.

Then, a feature point matching-based method is used to match the planar points and edge points in the current point cloud frame with the submap. Point-to-line and point-to-plane positional constraints are then constructed as (26) and (27), respectively. \begin{align*} d_{\varepsilon _{k}}=\frac{\left|\left(\mathrm{p}_{k}{ }^{W}-\mathrm{p}_{l}{ }^{W}\right) \times \left(\mathrm{p}_{k}{ }^{W}-\mathrm{p}_{m}{ }^{W}\right)\right|}{\left|\mathrm{p}_{l}{ }^{W}-\mathrm{p}_{m}{ }^{W}\right|} \tag{26} \end{align*} View SourceRight-click on figure for MathML and additional features.where l and m represent the two nearest points to point k that are not on the same laser ring. These two points form a line, and d_{\varepsilon _{k}} is the distance from point k to the line lm. \begin{align*} d_{S_{k}}=\frac{\left|\begin{array}{c}\left(\mathrm{p}_{k}{ }^{W}-\mathrm{p}_{l}{ }^{W}\right) \\ \left(\mathrm{p}_{l}{ }^{W}-\mathrm{p}_{m}{ }^{W}\right) \times \left(\mathrm{p}_{l}{ }^{W}-\mathrm{p}_{j}{ }^{W}\right) \end{array}\right|}{\left|\left(\mathrm{p}_{l}{ }^{W}-\mathrm{p}_{m}{ }^{W}\right) \times \left(\mathrm{p}_{l}{ }^{W}-\mathrm{p}_{j}{ }^{W}\right)\right|} \tag{27} \end{align*} View SourceRight-click on figure for MathML and additional features.where l, m, and j are the three points closest to k and not collinear, forming a plane lmj, d_{S_{k}} is the distance from point k to the plane lmj.

Then, using (26) and (27), nonlinear equations are constructed, and the L-M nonlinear optimization method is used to iteratively solve for the pose transformation matrix {{T}_{k}}^{\text{loop}} between the current point cloud frame and the submap, as follows: \begin{align*} \min \left\lbrace \sum _{i=1}^{\mathrm{m}} d_{\varepsilon }\left(\mathrm{p}_{i}(X)\right)+\sum _{j=1}^{\mathrm{n}} d_{\mathcal {S}}\left(\mathrm{p}_{j}(X)\right)\right\rbrace \tag{28} \end{align*} View SourceRight-click on figure for MathML and additional features.where X is the vector composed of the estimated state variables, i.e., {{(\text{roll},\text{pitch},\text{yaw},\text{x},\text{y},\text{z})}^{T}}. m is the total number of successfully matched edge feature points, while n is the total number of successfully matched plane feature points. d_{\varepsilon }(\mathrm{p}_{i}(X)) represents the distance from the ith corner point to the line calculated from the map matching, i.e., the point-to-line distance. Similarly, d_{\mathcal {S}}(\mathrm{p}_{j}(X)) represents the distance from the point to the plane.

Finally, the pose transformation matrix {{T}_{k}}^{\text{loop}} is passed to the map construction and optimization module for building the global map.

2) Pose Graph Construction

This module takes the transformation matrix {{T}_{k}}^{\text{loop}} obtained from the pose estimation module in (28), as well as the {{T}_{k}}^{W} obtained from the LiDAR-IMU odometry module. We introduced GTSAM to construct a factor graph for global optimization. The received odometry poses {{T}_{k}}^{W} are added to the corresponding factor graph nodes, and {{T}_{k}}^{\text{loop}} is added to the edges connecting the nodes. In addition, we introduced GPS factors to construct absolute position constraints and reduce trajectory drift.

The constructed factor graph is then subjected to global optimization, resulting in optimized poses {{({{T}_{i}}^{W})}_{\text{new}}}, yielding a globally consistent trajectory. Simultaneously, the obtained poses are used to integrate newly acquired map information into the existing map, continuously improving the overall environment map. Finally, a complete 3-D point cloud map is obtained.

SECTION IV.

Experiment

To evaluate the performance of the proposed algorithm, we conducted tests on both public dataset and self-collected dataset. The KITTI dataset sequences 05, 06, 07, and 09 are used to validate the localization accuracy of the proposed algorithm, and compared with A-LOAM [5] and FAST-LIO2 [14]. In order to verify the time efficiency of the proposed loop closure detection algorithm, that compared with SC-A-LOAM on KITTI sequences. In order to demonstrate the performance of the proposed global descriptor, the proposed method, SC [8] and ISC [21] methods were tested by comparing the RMSE of the generated trajectory on KITTI dataset. For further evaluation, self-collected dataset by VLP-16 LiDAR and N100 IMU are named Park dataset and Street dataset. A comparison of trajectory results was performed against SC-A-LOAM [20], Optimized-sc-f-loam [3], and FAST-LIO2 [14]. Then, comparisons were made with SC-A-LOAM, Optimized-sc-f-loam, LIO-SAM,FAST-LIO2 on the KITTI sequence 07, the outdoor Park dataset, and indoor dataset to validate the effectiveness of the proposed algorithm in mitigating mapping drift. Finally, the CPU usage and memory usage of five different algorithms running three different data sets are analyzed. The experimental data in this article is the best result of multiple measurements.

A. Hardware Platform

The experimental hardware platform in this article consisted of a laptop with an Intel i5-12500H, 16 GB RAM, and an NVIDIA GeForce RTX 3050 graphics card. The operating system used was Ubuntu 20.04. The software framework employed was Robot Operating System, which version is Noetic, and the primary programming language used was C++. The sensors utilized in this study included the VLP-16 mechanical LiDAR and the WHEELTEC N100 IMU.

For the experimental evaluation, the publicly available KITTI dataset, which contains data from outdoor autonomous driving scenarios, was utilized. The dataset includes 22 different outdoor environment sequences, and for 11 of these sequences (00-10), ground-truth trajectories obtained from GPS are provided. Each data sequence consists of grayscale images, color images, and LiDAR point cloud data. In addition, a campus environment data acquisition platform was used. Fig. 4(a) illustrates the configuration of this data collection platform, and Fig. 4(b) shows the environment.

Fig. 4. - Self-data collection in the campus environment. (a) Shows the self-data collection platform consisting of LiDAR and IMU. (b) Shows the self-data collection environment and the experimental operating environment.
Fig. 4.

Self-data collection in the campus environment. (a) Shows the self-data collection platform consisting of LiDAR and IMU. (b) Shows the self-data collection environment and the experimental operating environment.

B. Localization Accuracy

1) KITTI Dataset

In this article, we use the EVO evaluation tool to assess the localization accuracy of the proposed algorithm. Specifically, we compared the performance of the proposed algorithm with the ground-truth on KITTI dataset sequences 05, 06, 07, and 09. These sequences contain loop closures, making them suitable for evaluating the effectiveness of the improved loop closure detection algorithm proposed in this article. The evaluation was based on trajectory visualization, Absolute Trajectory Error (ATE), and Relative Pose Error (RPE).

Fig. 5 illustrates the visual comparison results of the proposed algorithm, A-LOAM, and FAST-LIO2 running on KITTI dataset sequences 05, 06, 07, and 09, along with the ground truth trajectories.

Fig. 5. - Visualization of trajectory comparison between A-LOAM, FAST-LIO, and our method on KITTI dataset sequences. (a), (b), (c), and (d) correspond to sequences 05, 06, 07, 09, respectively. (a) 05 sequence. (b) 06 sequence. (c) 07 sequence. (d) 09 sequence.
Fig. 5.

Visualization of trajectory comparison between A-LOAM, FAST-LIO, and our method on KITTI dataset sequences. (a), (b), (c), and (d) correspond to sequences 05, 06, 07, 09, respectively. (a) 05 sequence. (b) 06 sequence. (c) 07 sequence. (d) 09 sequence.

Table I presents the ATE statistics comparing the estimated trajectories obtained by the three algorithms on the sequences of the KITTI dataset with the ground truth trajectories. From the comparison of the statistical values in the table, it can be observed that the proposed algorithm has smaller error values in sequences 05 and 07. The proposed algorithm also achieves the lowest average error (mean) and root mean square error (RMSE) across all four sequences. Compared to the A-LOAM algorithm, the RMSE of the proposed algorithm are reduced by 65.1%, 73.6%, 78.3%, and 61.3% in the four sequences, respectively. Similarly, compared to the FAST-LIO2 algorithm, the RMSE is reduced by 59.4%, 1.16%, 36.4%, and 12.0%. These results demonstrate that the proposed algorithm achieves smaller trajectory estimation errors compared to the ground truth trajectories in the KITTI sequences, indicating higher accuracy of the algorithm.

TABLE I ATE Evaluation for Different Sequences and Algorithms in KITTI Dataset
Table I- ATE Evaluation for Different Sequences and Algorithms in KITTI Dataset

The RPE is used to compare pose increments and describe the accuracy of the pose difference between two frames relative to the real pose under a fixed time difference. The error values of the three algorithms on the four sequences are shown in Table II. It can be observed that the algorithm proposed in this article does not achieve the minimum error in each case. However, for sequences 05, 06, and 09, the RMSE statistics of the proposed algorithm are the lowest among the three algorithms. In addition, the proposed algorithm also exhibits the majority of the lowest errors in other statistical values. Overall, the algorithm proposed in this article demonstrates superior performance.

TABLE II RPE Evaluation for Different Sequences and Algorithms in KITTI Dataset
Table II- RPE Evaluation for Different Sequences and Algorithms in KITTI Dataset

2) Self-Collected Dataset

The experimental platform shown in Fig. 4 was used to collect dataset within the campus of Northwest A&F University, including the Park dataset and Street dataset, which contain LiDAR and IMU information. The satellite map is shown in Fig. 6, and it was used for evaluating the algorithm. The algorithm proposed was compared with three other algorithms: SC-A-LOAM [20] and Optimized-sc-f-loam [3], these only use LiDAR data, and FAST-LIO2 [14] which utilize both LiDAR and IMU data. The trajectories obtained by running different algorithms are shown in Fig. 7, which include long-distance loop closures. Fig. 8 displays the trajectories in the x, y, and z directions. Due to different numbers of keyframes extracted, the length of the displayed trajectories are different.

Fig. 6. - Satellite map of campus dataset. (a) The dataset consists of a long-distance navigation around the campus garden. (b) The dataset depicts walking around the teaching building. (a) Park dataset. (b) Street dataset.
Fig. 6.

Satellite map of campus dataset. (a) The dataset consists of a long-distance navigation around the campus garden. (b) The dataset depicts walking around the teaching building. (a) Park dataset. (b) Street dataset.

Fig. 7. - Visual comparison of the trajectory projections on the x-y plane obtained by running four different algorithms on the campus environment data set. (a) Park dataset. (b) Street dataset.
Fig. 7.

Visual comparison of the trajectory projections on the x-y plane obtained by running four different algorithms on the campus environment data set. (a) Park dataset. (b) Street dataset.

Fig. 8. - Trajectories output in the x-y-z directions by running four different algorithms on the campus environment dataset. (a) Park dataset. (b) Street dataset.
Fig. 8.

Trajectories output in the x-y-z directions by running four different algorithms on the campus environment dataset. (a) Park dataset. (b) Street dataset.

As shown in Fig. 7, it can be observed that the algorithm proposed in [3] exhibited significant drift in the trajectories that obtained from both environments, indicating poor robustness of the algorithm. Although the method proposed in [20] obtained a complete trajectory in the Street dataset, suffered from severe drift and incomplete trajectory in Park dataset. The algorithm in [14] produced a complete trajectory but fails to form valid loop closures. As seen in Figs. 7 and 8, our method can generate complete trajectories and effectively detect loop closures, and the output trajectory in the z direction is smoother, indicating lower elevation errors.

C. Time Consumption of Loop Closure Detection

This module is compared with the A-LOAM algorithm with Scan Context (SC-A-LOAM) to evaluate the computational efficiency of the loop closure detection process using the publicly available KITTI dataset (05, 06, 07, 09), and the number of keyframes extracted by the algorithm is also recorded, as shown in Table III. According to the data, our method achieved the best computational time for four sequences. For each sequence, compared to the SC-A-LOAM algorithm, the time consumption of loop closure detection of the proposed algorithm is reduced by 64.4%, 59.4%, 78.3%, and 74.3%, respectively, achieving improved algorithm efficiency while ensuring algorithm accuracy.

TABLE III Comparison of Average Loop Closure Detection Time for Different Sequences and Algorithms in KITTI Dataset
Table III- Comparison of Average Loop Closure Detection Time for Different Sequences and Algorithms in KITTI Dataset

D. Performance Analysis of Global Descriptor

A new method for computing global descriptors was proposed. We used four sequences (05, 06, 07, 09) from the KITTI dataset to validate its effectiveness. The RMSE of the ATE and RPE between the trajectories obtained using three different global descriptor computation methods and the ground truth was compared using the EVO evaluation tool, and the results are shown in Table IV. It can be observed that for the 05 sequence, the ATE error of our is slightly higher than ISC. However, for the sequences 06, 07, and 09, our method achieved the lowest error values. Compared to the SC and ISC methods, our method shows an average improvement of approximately 16.18% and 8.97% for ATE, 0.23% and 0.21% for RPE, respectively.

TABLE IV Comparison of ATE and RPE for Different Sequences and Global Descriptor Algorithms in KITTI Dataset
Table IV- Comparison of ATE and RPE for Different Sequences and Global Descriptor Algorithms in KITTI Dataset

E. Analysis of Mapping

In the experiment of outdoor 3-D point cloud map reconstruction, we used the KITTI sequence 07 and self-collected dataset for analysis. Four LiDAR SLAM methods (SC-A-LOAM, Optimized sc-f-loam, LIO-SAM, FAST-LIO2) are compared to validate the mapping effectiveness of the proposed method in this article. The results are shown in Fig. 9–​12.

Fig. 9. - Visualization of the global map reconstruction for KITTI sequence 07. (a) for A-LOAM algorithm with Scan Context loop closure detection, (b) for F-LOAM algorithm with loop closure detection, (c) shows the LIO-SAM algorithm, (d) shows the mapping results of the FAST-LIO2 algorithm, (e) for the proposed method in this article. (a) SC-A-LOAM. (b) Optimized-sc-f-loam. (c) LIO-SAM. (d) FAST-LIO2. (e) Our.
Fig. 9.

Visualization of the global map reconstruction for KITTI sequence 07. (a) for A-LOAM algorithm with Scan Context loop closure detection, (b) for F-LOAM algorithm with loop closure detection, (c) shows the LIO-SAM algorithm, (d) shows the mapping results of the FAST-LIO2 algorithm, (e) for the proposed method in this article. (a) SC-A-LOAM. (b) Optimized-sc-f-loam. (c) LIO-SAM. (d) FAST-LIO2. (e) Our.

Fig. 10. - Visualization of the local map reconstruction for KITTI sequence 07 (showing the map reconstruction on straight roads and at corners). (a) SC-A-LOAM. (b) Optimized-sc-f-loam. (c) LIO-SAM. (d) FAST-LIO2. (e) Our.
Fig. 10.

Visualization of the local map reconstruction for KITTI sequence 07 (showing the map reconstruction on straight roads and at corners). (a) SC-A-LOAM. (b) Optimized-sc-f-loam. (c) LIO-SAM. (d) FAST-LIO2. (e) Our.

Fig. 11. - Global Map of Park dataset. (a) SC-A-LOAM. (b) Optimized-sc-f-loam. (c) LIO-SAM. (d) FAST-LIO2. (e) Our.
Fig. 11.

Global Map of Park dataset. (a) SC-A-LOAM. (b) Optimized-sc-f-loam. (c) LIO-SAM. (d) FAST-LIO2. (e) Our.

Fig. 12. - Loop closures detection of Park dataset. (a) and (b) show severe trajectory drift and the absence of proper loop closures. (c) The closed loop can be detected. (d) Cannot form a closed loop. (e) The proposed method performs loop closure correction. (a) SC-A-LOAM. (b) Optimized-sc-f-loam. (c) LIO-SAM. (d) FAST-LIO2. (e) Our.
Fig. 12.

Loop closures detection of Park dataset. (a) and (b) show severe trajectory drift and the absence of proper loop closures. (c) The closed loop can be detected. (d) Cannot form a closed loop. (e) The proposed method performs loop closure correction. (a) SC-A-LOAM. (b) Optimized-sc-f-loam. (c) LIO-SAM. (d) FAST-LIO2. (e) Our.

According to the global map in Fig. 9, which can be observed that the lane markings and directional contours are clearly visible in the mapping results of all three algorithms. However, in Fig. 9(a), the point cloud is dense and the generated map exhibits cumulative drift; In Fig. 9(b), the mapping result is blurry and obstacles on the lane cannot be clearly displayed. The environmental map established in Fig. 9(c) has blurred boundaries and cannot clear obstacles on the displayed road. The point cloud in Fig. 9(d) is dense, and the global map is blurry, but the local map Fig. 10(d) is clear and has clear boundaries. However, the map established in Fig. 9(e) of the algorithm in this article eliminates most of the “ghosts” generated by cumulative drift. Fig. 10 shows that the mapping results of our algorithm was neat on straight roads and corners, with no significant drift. It clearly displays obstacles such as cars on the road, indicating improved robustness of the algorithm.

Meanwhile, in order to verify the accuracy of the proposed algorithm on the outdoor self-built dataset, the global mapping effect of the above five algorithms running the Park dataset is shown in Fig. 11. It can be seen from the figure that the SC-A-LOAM algorithm and the Optimized sc-f-loam algorithm are fuzzy, with large cumulative errors and serious drift. FAST-LIO2 algorithm has clear map boundaries and obvious color contrast, but the point cloud is dense and cannot effectively detect the closed loop. Then, the map established by the LIO-SAM algorithm and this algorithm is clear and can effectively detect the closed loop (Fig. 12) and make trajectory correction. However, the previous map established by LIO-SAM algorithm will disappear when the running time is too long. The map established by the algorithm in this article eliminates most of the accumulated drift and has clear boundaries, which will have high robustness and positioning accuracy in actual scenes.

For the experiment of 3-D point cloud map reconstruction in indoor scenes, this article collects the datasets by VLP-16 and N100 inertial navigation in the indoor environment of the library. By comparing the four kinds of LiDAR SLAM method (SC-A-LOAM, Optimized-sc-f-loam, LIO-SAM, FAST-LIO2) to verify the effect of the proposed method.

Fig. 13 shows the indoor environment for data collection. Fig. 14 shows the mapping results obtained by five different algorithms. It can be seen from the figure that (a) the mapping is fuzzy and the outline of the building can be generally seen but not clearly. (b) The mapping effect of the algorithm cannot see obstacles and clear routes in the environment. (c) The contour of the algorithm is clear, and the indoor environment structure can be clearly seen. (d) The algorithm creates large shadows, dense point clouds, and cannot clearly distinguish indoor objects. (e) The map built by the algorithm in this article is relatively clear, there is no double shadow, and the environment structure can be seen, but the point cloud is relatively sparse, and the map is not dense enough. To sum up, LIO-SAM and the algorithm in this article have the best effect of indoor environment mapping.

Fig. 13. - Indoor data collection environment. Walk around the library to record data.
Fig. 13.

Indoor data collection environment. Walk around the library to record data.

Fig. 14. - Indoor library global map. (a) SC-A-LOAM. (b) Optimized-sc-f-loam. (c) LIO-SAM. (d) FAST-LIO2. (e) Our.
Fig. 14.

Indoor library global map. (a) SC-A-LOAM. (b) Optimized-sc-f-loam. (c) LIO-SAM. (d) FAST-LIO2. (e) Our.

F. CPU and Memory Usage Analysis

This section mainly analyzes the CPU and memory usage of different algorithms running different datasets by comparing five different algorithms running three different datasets (KITTI 07 sequence, outdoor park, and indoor library). The results are shown in Fig. 15, which show the maximum CPU and memory usage when running the dataset.

Fig. 15. - CPU and memory usage when running three dataset using five different algorithms. (a) KITTI 07 sequence, (b) self-collected outdoor park datasets, (c) self-collected indoor library datasets.
Fig. 15.

CPU and memory usage when running three dataset using five different algorithms. (a) KITTI 07 sequence, (b) self-collected outdoor park datasets, (c) self-collected indoor library datasets.

As can be seen from the data in the figure, for the KITTI dataset 07 sequence, the CPU usage was 82%, 48.0%, 71.7%, 100%, 59.8%, and the memory usage was 42.8%, 36.8%, 37.9%, 99.2%, 37.5%, respectively. Optimized-sc-f-loam is seemed low with poor accuracy, followed by the algorithm in this article. For the self-collected outdoor dataset, the CPU usage is 100%, 99.0%, 60.6%, 96.0%, 46.5%, and the memory usage is 45.0%, 54.7%, 40.9%, 88.9%, 63.3%, respectively. The lowest CPU usage is the algorithm in this article, with a little memory usage increasing. For the self-collected indoor dataset, the CPU usage was 98.0%, 49.0%, 48.0%, 53.5%, 39.8%, and the memory usage was 28.9%, 28.4%, 29.9%, 46.7%, 30.9%, respectively. The CPU usage was the lowest in this algorithm with a little memory usage increasing. According to the above analysis, it can be seen that the CPU usage of the proposed algorithm is small in different datasets, but for some outdoor datasets, our algorithm will produce a little memory usage increasing due to accurate mapping in complex environment.

SECTION V.

Discussion

In this article, we proposed a LiDAR-IMU tightly-coupled SLAM method with IEKF and loop closure detection to address the problems of decreased localization and mapping accuracy caused by the accumulation of errors in a wide range of indoor and outdoor scenes. A loop closure detection method based on HISC global descriptor and a trajectory global optimization method based on GTSAM are proposed. The trajectory accuracy was measured by comparing with the ground-truth value of the public KITTI dataset, which show that in most cases, the proposed algorithm has lower trajectory errors and is close to the ground truth, as well as achieve more accurate pose estimation and improve localization accuracy. Then, the Global Descriptor experimental results show that the characteristics of points can be displayed more comprehensively by combining the intensity information and the height information, which reducing the probability of error loop closure detection and improving the accuracy. At the same time, the backend global optimization reduces the drift of the mapping trajectory. When constructing the global descriptor, we set a threshold for dividing the point cloud into grids. From the experimental results of both public and self-built datasets, it can be seen that the consistency of intensity information in the same subspace is basically ensured. However, if in different lighting conditions, reflective surfaces, or occlusions, a smaller threshold may need to be set to ensure strength consistency, which may lead to a slight increasing in time consumption. Time comparison with SC-A-LOAM on KITTI dataset indicates that the positional constraints constructed on extracted feature points can address high point cloud volume, reduce estimation cost, and shorten system runtime. Through the introduction of loop closure detection and global optimization, the problems of motion trajectory drift and map overlap are solved.

Furthermore, to verify the generalization ability of the algorithm, we conducted experiments on the Velodyne dataset of FAST-LIO2. As ground truth was not provided, we compared the trajectory and mapping result of the benchmark framework and the improved network on this dataset, as shown in Figs. 16–​17. The trajectory results show that our algorithm coincides completely with the FAST-LIO2 trajectory. According to the mapping results, although the point cloud density of our drawing is not as dense as that of FAST-LIO2, the result of mapping is clear with less memory consumption and time consumption. Overall, the proposed algorithm can quickly, accurately and clearly reconstruct 3-D scenes in real time in the lowest CPU usage during runtime, with a little memory usage increasing.

Fig. 16. - Trajectory results of FAST-LIO2 and Our algorithms on the publicly available Velodyne dataset are shown in the figure. They are 2-D visualization results, x-y-z three-axis visualization results, and RPY visualization results.
Fig. 16.

Trajectory results of FAST-LIO2 and Our algorithms on the publicly available Velodyne dataset are shown in the figure. They are 2-D visualization results, x-y-z three-axis visualization results, and RPY visualization results.

Fig. 17. - Figure shows FAST-LIO2 and our algorithm for global and local mapping. (a) FAST-LIO2. (b) Our.
Fig. 17.

Figure shows FAST-LIO2 and our algorithm for global and local mapping. (a) FAST-LIO2. (b) Our.

Based on the above experimental results, the SC-A-LOAM algorithm and the Optimized sc-f-loam algorithm have poor trajectory accuracy and mapping effect on the outdoor park dataset, with severe trajectory drift and blurred mapping; the FAST-LIO2 algorithm cannot detect and form a closed loop when running on a dataset with loopback, and occupies a large amount of CPU and memory; The proposed algorithm and LIO-SAM have high mapping accuracy in indoor and outdoor environments, good real-time performance, can accurately identify and detect loops, with low CPU usage during runtime. The experiments have verified that our proposed algorithm has high accuracy of positioning map construction, which can be applied to urban roads, forest parks, and spacious indoor environments, and can build environmental maps in real time with low CPU usage and a little memory increasing.

SECTION VI.

Conclusion

In order to solve the problem of accuracy degradation caused by cumulative errors in a wide range of environments and the failure of localization map construction caused by LiDAR SLAM degradation in scenes with insufficient structural features, this article-based FAST-LIO2 proposed a LiDAR-IMU tightly-coupled SLAM method with IEKF and loop closure detection. The proposed method has been thoroughly evaluated in a variety of environments on both public datasets and data collected on self-built data acquisition platforms. The results show that compared with A-LOAM, FAST-LIO2, and LIO-SAM, the proposed method can achieve similar or better accuracy, and the CPU usage is low, and good real-time performance of the algorithm. Future work will continue to optimize back-end mapping and reduce memory usage.

References

References is not available for this document.