Submap-Based Indoor Navigation System for the Fetch Robot

In this paper, we present a novel navigation framework for the Fetch robot in a large-scale environment based on submapping techniques. This indoor navigation system is divided into a submap mapping part and an on-line localization part. For the mapping part, in order to deal with large environments or multi-story buildings, a submap mapping framework fusing two-dimensional (2D) laser scan and 3D point cloud from RGBD sensor is proposed using Google Cartographer. Meanwhile, several image datasets with corresponding poses are created from the RGBD sensor. Thanks to the submap framework, the error is limited corresponding to the size of the map, thus localization accuracy will be improved. For the on-line localization, so as to switch the submaps, the on-line images from the RGBD sensor are used to match the database images using DeepLCD, a deep learning based library for loop closure. Based on the information from DeepLCD and odometry, adaptive Monte Carlo localization (AMCL) is reinitialized to finish the localization task. In order to validate the result accuracy, reflectors and a motion capture system are used to compute the absolute trajectory error (ATE) and the relative pose error (RPE) based on the Gaussian-Newton (GN) algorithm. Finally, the proposed framework is tested on the Fetch simulator and the real Fetch robot, including both submap mapping and on-line localization.


I. INTRODUCTION
In recent years, indoor navigation has drawn increasing attention in the robotics research area. As the basis of a whole robot system, it provides the robot pose and the surrounding map in an unknown environment. Simultaneous localization and mapping (SLAM) is one of the commonly used methods in the indoor navigation [1]. Generally speaking, in order to solve the SLAM problem, it is divided into a front-end part, which mainly deals with the input sensor data, and a back-end part involving a non-linear optimization problem to create a consistent map or several submaps [2], [3]. Based on the different sensor types for the front-end part, the SLAM methods can be further divided into laser-based SLAM [4], [5], visual SLAM [6], sonar-based SLAM [7], and so on. For the laser-based SLAM, many systems, commonly combining with odometry and inertial measurement unit (IMU) The associate editor coordinating the review of this manuscript and approving it for publication was Hiram Ponce . information, create an occupancy grid map [9] using bundle adjustment (BA) [10] or a point cloud map using iterative closest point (ICP) [11]. As an important research direction, the SLAM problem can also combine with some other areas, such as path planning [12], [13] and control [14], called as active SLAM [15] and planning under uncertainty [16], to get more accurate results. This paper aims to solve the indoor navigation problem by obtaining the occupancy grid map and then performing the localization task in the obtained map.
Based on the fusion of data from a two-dimensional (2D) laser sensor and an RGB-D camera sensor, this work focuses on the navigation system of the Fetch robot using a deep learning-based SLAM system and multiple submaps. It provides mapping and localization information for the following planning and control system in the relatively large environment. We use Google Cartographer [4] to build multiple sequential submaps with the corresponding image library, then use DeepLCD [17] and the adaptive Monte Carlo localization (AMCL) Robot Operating System (ROS) package to VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ perform the on-line map switching and localization within multiple submaps.
As the state-of-the-art technology in the laser-based SLAM, Cartographer provides the real-time 2D/3D SLAM across multiple platforms and sensor configurations with a very good performance even in some challenging indoor environments. Similarly, AMCL, which implements the Kullback-Leibler distance (KLD)-sampling Monte Carlo localization approach, provides a 2D probabilistic localization system for the mobile robot. For the map switching, we use the DeepLCD C++ library, which is a convolutional autoencoder and widely used for online SLAM loop closure. A pre-trained model with a good image matching performance has been included in the DeepLCD C++ library, which is easy to use.
The main structure of this paper is shown as follows: In Section II, we review some literature to summarize the focused research area. We describe our focused problem and present our solution framework briefly in Section III. In Section IV, we describe the submap mapping part, including the sensor data fusion and submap mapping with the image databases. In Section V, we describe the on-line localization implementation using the deep learning based method and the AMCL method. Then, the performance evaluation methods for the proposed framework using the real Fetch robot are presented in Section VI. Following, the simulations and experiments are performed to verify the practicality of the proposed method in Section VII. Finally, in Section VIII, we make a brief conclusion for this paper.
The main contributions of this paper include: • The novel submap mapping framework with corresponding image databases.
• The application of DeepLCD for map switching and AMCL initialization.
• The novel error evaluation methods to get the absolute trajectory error (ATE) and the relative pose error (RPE) using reflectors and a motion capture system.
• The on-board implementation for the presented method using a Fetch robot.

II. RELATED WORK
At the early stage of the SLAM technology, the research topic focuses on the filter-based method, including Extended Kalman Filter (EKF) [18], [19], Particle Filter (PF) [20], Extended Information Filters (EIF) [21] and so on. In [22], the authors present a SLAM framework, based on the Kalman Filter (KF) and FastSLAM algorithm, which operates a Rao-Blackwellized Particle Filter (RBPF) to fuse the multiple sensors, including stereo cameras, inertial measurements, and robot odometry. In [23], the authors apply the RBPF method in the SLAM task of a restaurant service robot using a depth camera and a motion base with odometry and gyroscope. In the inertial navigation systems (INS)/wireless sensors networks (WSNs)-integrated navigation system, the authors present an adaptive iterated Extended Kalman Filter (AIEKF), which uses the noise statistics estimator in the iterated Extended Kalman Filter (IEKF) [24]. Recently, the state-of-the-art filter-based SLAM is the invariant EKF SLAM method [25], which uses a novel uncertainty representation to derive a right invariant error EKF and preserves the invariance property. In fact, this method is inspired by the deterministic nonlinear observer on Lie groups, for continuous-time systems with discrete observations [26]. The authors in [27] prove its convergence properties and compare it with the SO(3)-EKF, the Robocentric-EKF and the ''First Estimates Jacobian'' EKF. In [28], the authors present that its result accuracy is comparable even with the nonlinear least squares optimization based SLAM.
Because of the development of the sparse linear algebra, the SLAM technology develops from the filter-based method to the optimization-based method by exploring the sparsity of the factor graph. The optimization-based SLAM methods can roughly be divided into two categories: Ones use iterative nonlinear optimization methods, such as the Gauss-Newton method [29] or the gradient method, to get locally optimal solutions. And the other recent works show how to obtain globally optimal solutions of the SLAM problem via convex relaxations [30].
Due to the better consistency and efficiency, the optimization-based methods, especially for the pose-graph SLAM framework [31], are widely applied in the indoor navigation methods of multiple kinds of the robot systems. The authors in [32] present a mobile mapping system (MMS) of which the front-end is the map-based scan matching and back-end is the graph-based optimization. Recently, the authors in [33] develop a vertex-to-edge weighted closed-form algorithm to reduce camera drift for the dense RGB-D indoor SLAM using the factor graph optimization. Two popular state-of-the-art technologies in laserbased SLAM, Cartographer and RTAB-map [34], also use the pose-graph optimization method as their efficient back-ends.
It is easy to find that using only one kind of sensor is hard to offer the platform an accurate and robust solution. The data fusion between laser and camera is one of the popular solutions in multiple sensor SLAM. Based on posegraph optimization, the authors in [35] present a reliable global localization approach using both laser and camera sensors with the capability of addressing the kidnapped robot problem. Based on ICP and pose-graph optimization, the authors present a novel algorithm for fusing laser range finder and RGB-D data to accurately track the position of a robot and generate large 3D maps [36]. As the state-of-the-art technology, the authors in [37] present an EKF-based multisensor fusion framework to achieve practical autonomous navigation of UAVs in warehouse environments based on a highly-composed low-cost sensing system, which includes a 2D laser scanner, a 1D range sensor, three cameras (forward, upward and downward), and an IMU. In [38], using the relatively simple sensing system, the authors present a lidarinertial localization and mapping framework using the Gaussian preintegrated measurements.

III. PROBLEM DESCRIPTION AND SOLUTION FRAMEWORK
This paper aims to solve the indoor navigation problem for the Fetch robot in a large-scale unknown environment. The main task of the robotic indoor navigation problem is to map its surrounding environment and to estimate the robot poses during the moving process of the robot. The mapping and localization can be performed simultaneously or decoupling using the on-board sensing system. For Fitch robot, it is powered by a RGBD sensor on its head and a 2D laser on its base. In order to solve the real-world indoor navigation, we need to deal with the following challenges: • Many robust indoor navigation systems are based on the laser system. Compared with the 3D laser, the 2D laser is much cheaper. However, the indoor environment commonly includes some stereoscopic obstacles which are hard to be detected by the 2D laser.
• Getting a consistent map with high accuracy is difficult for large-scale indoor mapping. The accuracy of the mapping will directly affect the performance of the following localization.
• Even for some small scenarios, such as the multi-storey building with the elevators, to build a single global map is difficult.
• Evaluating the accuracy of the localization result is also necessary for the real-world implementation on the Fetch robot. The navigation system is the basis of the following planning and control steps. So as to deal with these difficulities, we present a complete and practical framework based on sensor data fusion, multiple submaps mapping, and deep learning based localization. The complete solution framework is shown in Fig. 1. We firstly combine the flattened RGBD data within the 2D laser data. By this way, we can partly map the stereoscopic obstacles using the 2D fused laser data, which can deal with the first challenge and help the robot to avoid these obstacles. Then, we use the fused data to build several submaps as well as several image databases, which include the poses of the robot corresponding to each library image. When the robot moves into an area corresponding to the new submap, by using the DeepLCD-based image matching, we can switch the submap. Then, based on the image databases and the pose transformation, we can initialize the AMCL particles in a relatively small area inside the new submap. Moreover, using AMCL, we can perform localization within the submap easily. Thanks to the submap mapping and localization, we can easily get the highly-accurate submaps and map the multi-storey building, which means that it helps us to overcome the second and third challenges. Finally, in this paper, we also present several practical evaluation systems to test the accuracy of the localization results, which is corresponding to the last challenge.

IV. SUBMAPPING BASED ON SENSOR FUSION AND IMAGE LIBRARY A. SENSOR DATA FUSION
In the indoor environment, because of the environment complexity in the vertical direction, using only a 2D laser is very dangerous for navigation. For many obstacles, including desks, chairs, beds and so on, only a small part will be detected by the laser scan, which makes the mapped obstacle areas seem to be obstacle-free. Different from the 2D laser, the RGBD camera sensor can offer a stereoscopic version for the robot navigation with a limited range. There is a SICK TIM571 laser and a Primesense Carmine 1.09 shortrange RGBD sensor locating at the bottom and head of the Fetch robot, respectively. We can easily read their output data from their corresponding ros-topics 'base_scan' and 'head_camera/depth_ downsampled/points'.
So as to solve these problems, we aim to merge the data from the 2D laser and the RGBD camera together for VOLUME 8, 2020 the mapping. The basic idea for sensor data fusion is to transform the 3D point cloud from the RGB-D camera into the frame of the laser, remove points that belong to the floor, convert the point cloud into a laser scan, and then combine the generated laser scan with the actual laser scan. The transform process can be written as: where X b is the transformed 3D point in the laser frame, R c b , and T c b are the rotation and translation between camera and laser frame, which are saved in the tf rostopic, p c is the original 3D point from the RGBD camera.
The 3D point cloud from the RGBD camera commonly includes the points on the floor, which may make the environment lack of free-space for the general laser mapping tool. In order to solve this problem, any transformed points lower than the level of the laser scanner will be deleted. We achieve this goal by transforming the points into the laser scanner frame and filtering any points with Z-axis values less than zero based on Point Cloud Library (PCL). The obtained 3D point cloud can be flattened into a pseudo laser scan so that it can be combined with the real laser scan. This operation created with the same field of view and step angle as the real laser scan, and these two are combined by saving the shorter range value into the combined laser scan data. The fusion process can be written as: where ρ is the selected fusion points, ρ 1 and ρ 2 are respectively the hit points of the laser point and the flattened RGBD camera point. dist( , x v ) means the distance between and the robot 'base_link' position. Fig. 2 shows this process and the fusion data. Finally, the generated data are published as the ROS topic 'base_scan_combined', and are subscribed by Cartographer for creating a 2D occupancy map.

B. SUBMAP MAPPING AND IMAGE LIBRARY
After obtaining the merged data, we will use it with Google Cartographer to build the 2D occupancy grid map. It is easy to know that if the mapping area is too big, with the growth of the map area, the consistency of the SLAM map will reduce greatly. So we prefer to use the submap to solve this problem and the obtained maps will have better accuracy compared with the global one [39]. When the Fetch robot moves more than L F (defined by user), we will build a new local submap based on the last pose of the previous submap. The last poses in the different maps will be saved as a list {P 2 1 , P 3 2 , P 4 3 , · · · , P n n−1 }. In every submap, we use the Google cartographer to subscribe to the ROS topic 'base_scan_combined'. As shown in Fig. 3, during these processes, we meanwhile subscribe to the RGBD images from the head camera when the robot meets some thresholds, such as the rotated angles θ r and the moved distance l r . These images will be saved as an image database for the submap switching during localization. At the same time, the local poses and the submap number corresponding to the images will also be recorded. In this process, we encounter an issue for the fused laser scan, due to the significant differences of the ranges and fields of view of the two sensors ( RGB-D camera: 3 m maximum range and 54 • horizontal field of view; Laser scanner: 25 m range and 220 • horizontal field of view). We find that the occupancy grids will be quickly erased once the obstacles are out of range of the RGB-D camera. In order to solve this problem, we successively build two occupancy grid maps to keep the point cloud obstacles by adjusting the parameter ''insert_free_space''. One map containing all obstacles but no unoccupied cells and the other one with ''insert_free_space'' setting to true. Finally, the two maps are combined to produce a map that contains both free space and all obstacles. For every pixel of these two maps, the combination process can be written as: where ψ, ψ 1 , and ψ 2 are respectively the grayscale of the corresponding pixels in the combined map, the map (''insert_free_space''= false), and the map (''insert_free_space''= true); H r1 and H r2 are the grayscale thresholds. Commonly, H r1 is close to 1, which means that the pixel is occupied, and H r2 is close to 0, which means that the pixel is obstacle-free. By this way, the missing occupancy grids corresponding to the 3D RGBD data will be added into the combined map, as shown in Fig. 4.

V. SUBMAP LOCALIZATION BASED ON DEEP LEARNING
By the above steps, several continuous submaps are built with an image database. Our following steps aim to deal with the submap switching and localization in multiple submaps.

A. SUBMAP SWITCHING BASED ON DeepLCD
Thanks to the image database, we use the deep learningbased method (DeepLCD) to deal with the map switching.
DeepLCD is a C++ library for the on-line SLAM loop closure. Our submap switching method uses the image matching, similar to the loop closure, to identify the number of the submaps. Based on the pre-trained model in DeepLCD, we can match the real-time image from the head camera with the image database. In this process, every image will be scored by the DeepLCD network. If more than N t obtained camera images are continuously matched with the image database, the submap will be switched and the map for localization will be set as the corresponding submap. As shown in Fig. 5, after enough matching between the camera images with the image database, the last matched image will be selected to output its corresponding pose (red one) p 1 v .

B. LOCALIZATION INITIALIZATION USING IMAGE LIBRARY AND POSE TRANSFORMATION
After the map switching, in order to use the AMCL method, we need to reinitialize the localization by sampling some particles in some areas. However, the rough global sampling is not robust and is also slow for the particles converging, so it is common to sample in a small area. This section aims to find a relatively accurate area for the AMCL initialization by the image libraries and the pose transformation. Besides this initialization pose , · · · , P n n−1 }. Using the AMCL, we can get the localization result of the robot: Because the last pose of the previous map is the same as the first pose of the latter map, this pose is switched by the j + 1-th submap, we can use the following equation: Even though the DeepLCD algorithm commonly obtains the good matched pose p 1 v , it is still hard to guarantee that we can get the robust switched poses. So, in order to get the more robust result, we combine the above two poses and initialized the area for the AMCL algorithm by the following equation: where x c is the center position of the initialized area for the AMCL algorithm, C is the index which means that when these two poses get closed to each other, we will use the coordinate of the transformed pose x j t , and otherwise, we will believe the matched pose p j v . Based on the obtained center point x c = (x c x , x c y ), we can define a big squared area, of which 4 vertexes are (x c , and (x c x +2, x c y +2), for the initialization of the AMCL method. The particles will be spread randomly in this squared area. Then, using the AMCL, the particles will converge after matching the laser scan with the mapped environment.

VI. PERFORMANCE EVALUATION
In order to verify the accuracy of the localization results in the submaps, we present two ways to evaluate the relative position error and the absolute pose error respectively based on the reflectors and the motion capture system. The laser reflector is a reflective material, which can increase the intensity of the laser scan. By reading the intensity and angle of the laser, we can compute the reflector coordinate based on the laser frame. The motion capture system is a tool to obtain the marks trajectory and the poses of the rigid body in its defined coordinate system. We use the industry product from Opti-Track company, which can help us to get the coordinates of the marks at a very high rate (more than 40 Hz). The relative position error is defined as the distance error of two poses, of which distance values are obtained by the AMCL localization results and the other localization results. The absolute pose error is defined as the absolute coordinate errors between the estimated poses based on the AMCL method and the ground truth based on the simulation or the highly accurate localization results using the motion capture system.

A. RELATIVE POSITION ERROR EVALUATION BASED ON REFLECTORS
For the relative position error evaluation, the reflectors, which can increase the intensity of the laser scan, are applied. As shown in Fig. 6, two reflectors are set using a tape measure. Because of the intensity change of the laser on the edge of the reflector, we can easily identify the order of the laser ray in a scan and further compute its coordinate based on the 'laser_link' frame of the robot. As shown in Fig. 6, the coordinates of the left edges of two reflectors are e k = (L k cos(θ k ), L k sin(θ k )) and e m = (L m cos(θ m ), L m sin(θ m )).
We can see the reflectors as the poses by also identifying the coordinates of two edges of the reflector, which can be written as: (e k , R k ) and (e m , R m ).
Setting the left edge of the reflector 1 as the original point, based on these two reflectors, we can further estimate the coordinate of the robot pose X ref by the following equations: where S is the distance between two left edges of the reflector. This equation aims to combine the coordinates of the robot obtained using two reflectors by the weighted way to improve the accuracy of the coordinate. As shown in the last figure of Fig. 6, we can move the robot to get two poses and their corresponding distances using the AMCL method and the reflectors. The uncertainty level of the distance results using the reflector is about 1cm, which can be seen as the ground truth. The other estimated distance by the AMCL method is defined as: Then, we can define the relative position error by |D amcl −D ref |, of which meaning is to obtain the distance error between two poses using the localization result and the ground truth.

B. ABSOLUTE POSE ERROR AND RELATIVE POSITION ERROR BASED ON MOTION CAPTURE SYSTEM
As a more accurate index, the absolute pose error requires the more accurate localization system and the synchronization method to get the better ground truth result. So we use the motion capture system to get the ground truth data. For the record, we can also simply calculate the obtained absolute results to get the relative position error.

1) COMPUTATION OF THE base_link FRAME
The motion capture can get the trajectories of the marks with 1mm accuracy. We lay 9 marks on the pedestal of the Fetch robot, as shown in Fig. 7. Using these 9 marks, we aim to estimate the accurate trajectory for the base_link frame of the Fetch robot. Because the rotation center is the same as the base_link frame of the Fetch robot, we firstly control the Fetch robot to move with several circles without any translation under the record of the motion capture system. By this way, theoretically, the trajectories of the marks FIGURE 7. 9 marks on the Fetch robot.
x mark i (t) = (x mark i (t), y mark i (t)) will locate at several concentric circles. However, because of the noises, we will use the least squares method to estimate the real trajectory of the base_link frame.
As shown in Fig. 7, assuming the trajectories of the marks are x mark i (t), i = 1, 2, · · · , 9, the rotation center of the robot is x g i and their corresponding radius are l mark i , i = 1, 2, · · · , 9, we can build the following optimization problem: where t ∈ Z means the way-point number of the marks.
In fact, the above optimization problem has a closed form solution, which is obtained by setting the derivation as 0.
The obtained x g = (x g , y g ) and l mark i meet: , Then, we can obtain all radius of these circle trajectories l mark i by (8). In the new experiments, we begin to move the robot by the random way instead of the circles. Because the relative positions of the rotation center and the marks are solid, in the moving process of the robot, we can use the relative distances to estimate the robot 'base_link' frame using the trajectories of the marks. Assuming the trajectories of the marks are x mark i (t), i = 1, 2, · · · , 9, with the known of the radius l mark i , we can get the robot trajectory of the robot 'base_link' frame x ref (t), which will be regarded as the ground truth, by the following optimization problem.
It is noted that, for this new optimization, its 'base_link' frame x ref (t) at t-th step is just related to the coordinates of the marks at t-th step. It is independent of the other steps. We can solve this weighted non-linear least squares problem by the Gaussian-Newton (GN) method.
Let's consider the following problem: For a measurement vector z = F(x) + W , which W meets W ∼ N (0, P z ), let's consider the following problem: To find x ∈ R n to minimize Its iteration solution at k-th step is shown as follows: where J k is the Jaccobian matrix corresponding to the vector x k at k-th step. Based on (10) and P z = I, we can obtain the solution x ref (t).

2) SYNCHRONIZATION
Even though we can get the ground truth data by the above section, because of the different coordinate systems, the different time-stamp, and the different frequency, we still need to deal with the synchronization problem. The goal of the synchronization is to make that the absolute pose error is close to 0, so we can formulate the following problem: In fact, it is noted that the normal ICP method does not require the same point number. The ICP result can be seen as the initialization of the above problem (11). VOLUME 8, 2020 For the time difference t, we also need to solve the interpolation problem. The frequencies of the AMCL result and the ground truth are different. Commonly, the frequency of the ground truth data is much higher, so we use the principle of contiguity to find the nearest data in the ground truth data x ref (t) after applying the ICP transformation result. Then all the potential time differences t ∈ (t a , t b ) are tested t a + kt 1 , k = 1, 2, · · · , [(t b − t a )/t 1 ] exhaustively with an interval t 1 = 0.1s, where [(t b − t a )/t 1 ] means the maximal integer smaller than (t b − t a )/t 1 . The objective function of the exhaustive method is to minimize the absolute pose error after applying the interpolation operation and the contiguity law. By this way, we can easily get the sub-optimal time difference t sub as the initialization for the following optimization on the absolute pose error. This process is shown in Fig. 8.   Fig. 9: Based on this framework, we can get the corresponding poses using the AMCL results and the motion capture system. Then, according to the definition of the absolute pose error, we can compute the absolute coordinate error of each pose along the whole trajectory. Then, we can also pick out several poses to compute the relative position error easily.

VII. SIMULATIONS AND EXPERIMENTS
In this section, we present several simulations and experiments using the Fetch simulator and the real Fetch platform to verify our proposed framework.

A. SIMULATION 1) SIMULATION IN A SMALL OFFICE ENVIRONMENT
We apply the Fetch simulator in a small indoor environment to test the data fusion result and the AMCL result in only one submap. The results are shown in Fig. 10. The top-right small sub-figure is the simulation environment with multiple 3D obstacles, including the Fitch simulator, a desk, a human, a cupboard, and several colorful objects.
As shown in Fig. 10, benefiting from the fusion of the camera data and the laser data, we can build a 2D occupancy grid map with some stereoscopic obstacles, which are hard to be detected by the 2D laser. We also test the AMCL in this environment. The small red vectors around the robot show the  estimated pose and its covariance. The results indicate that the absolute pose error is smaller than 10cm compared with ground truth data, which means that it can reach a good result accuracy.

2) SIMULATION IN A LARGE-SCALE 'Willow' ENVIRONMENT
In order to further test the mapping and localization performance of the presented framework, we test the Fetch simulator in a relatively large simulation called 'Willow'. The mapping and localization results are shown in Fig. 11 and Fig. 12. In Fig. 11, the left-down sub-figure shows the simulated environment, which is an office building, and the large one is the built map. In the left-hand side of Fig. 12, we present the compared trajectories, which include the black ground truth data, the blue AMCL estimated data, and the red  difference data. The right one shows the absolute pose errors varying with the simulation time.
Based on Fig. 11 and Fig. 12, we can get that the presented framework can get an accurate mapping and localization result in one submap. The largest absolute pose error is 6cm and the mean absolute pose error is smaller than 2cm. These indexes show that the localization accuracy is high in this relatively large-scale simulation environment. Meanwhile, the error figure also shows that the level of the error is consistent and limited.

B. EXPERIMENTS 1) MAPPING USING MULTIPLE SUBMAPS AND BUILT IMAGE DATABSE
In this sub-section, we will build multiple submaps in a real indoor environment, which is in University of Technology Sydney (UTS) Tech Lab. The area of the mapping space is more than 600 m 2 . Meanwhile, we will build a corresponding database during its moving process.
We sequentially build three maps and its image database, as shown in Fig. 13a, Fig. 13b, Fig. 13c, and Fig. 13d. The small figures on the left-hand side of the maps are their corresponding topographic maps. We can simply find that every built submap can well map the surrounding environment with the flatten obstacles. These highly-accurate submaps help the robot to obtain the good localization result. We also show the global map if we do not use the multiple submaps,  The above result shows that, based on our framework, we can build multiple submaps with an image database within several minutes.

2) LOCALIZATION IN MULTIPLE SUBMAPS
Using the built map and the image database, based on our presented framework, we can easily switch map and perform the localization process in the moving process of the Fetch robot.
Our method can switch the map by using DeepLCD-based image matching. And then initialize the AMCL particles within the switching area. The on-line map matching results are shown in Fig. 15 and Fig. 16. In every sub-figure, we can find two images. In Fig. 15, when the on-board camera gets the image and it is matched with the image library, our submap will be switched and the AMCL method will be applied in the new submap. We also output a result, shown in Fig. 17, to show that, after the map switching, the particles will converge successfully based on the given switching area.
The photos of the map switching and localization using the Fetch robot are shown in Fig. 18. The localization trajectories in submaps are shown in Fig. 19.   These results show that our framework can perform the online map switching and localization in the real-world environment successfully.

3) ERROR EVALUATION FOR THE EXPERIMENTS USING REFLECTORS
Firstly, we use two reflectors to get the relative position error. The experiment process is shown in Fig. 20. The reflectors are pasted on the wall based on the height of the 2D laser. Then, the robot moves following the trajectory similar to Fig. 6. Finally, we can output the estimated poses based on the coordinates of the reflectors.
Based on the reflectors and method presented in Section VI-A, in 3 experiments, our relative position errors  using the AMCL method in one of the submaps are 4.71cm, 4.19cm, and 3.58cm, respectively. It is easy to find that our framework in one of multiple submaps can reach a good localization performance. The results also show that the lowcost reflector-based evaluation is a simple and practical way to evaluate the localization accuracy of the Fetch robot.

4) ERROR EVALUATION USING THE MOTION CAPTURE SYSTEM
Then, in order to get the absolute pose error using the motion capture system, we need to estimate the distances between the marks and the robot rotation center. We firstly implement an experiment that makes the robot only perform the rotation action without any translation. By this way, we can record the trajectories of all marks. Using the method mentioned in (8), we can get that the distances between the marks and the center point are respectively 0.2149m, 0.2546m, 0.2249m, 0.2501m, 0.2293m, 0.2552m, 0.2591m, 0.2635, and 0.2609m. The results are shown in Fig. 21 and Fig. 22.  Based on the motion capture system, the trajectories of all marks (blue lines) are captured at a very high rate (more than 40Hz). Using these distances, we can get the ground truth data and the absolute pose error of the experiment 1 based on the trajectories of 9 marks using the synchronization technologies, as shown in Fig. 23 and Fig. 24. In the upper figure of Fig. 24, we show the compared result of the estimated poses and the ground truth trajectory after using the transformation. Their ATEs varying with the simulation time are shown in the lower one after applying the syncronization result.  Based on the results, we can see that the synchronization technologies help to match the trajectories and compute the absolute pose error accurately. It is easy to find that the maximal absolute pose error will be smaller than 12cm and its mean error is about 2.38cm, which means that our framework can get a good localization result in a submap. We also repeat multiple times to perform the ATE experiments using the motion capture, named experiment 2-4. The mean ATEs of the rest of these experiments (Experiment 2-4) are respectively 1.77cm, 2.49cm, and 2.01cm. The final statistical results using the box chart are shown in the following Fig. 25. For the box chart, the lines from the top to the bottom are respectively the minimal, one fourth, half, three fourth, and maximal ATE values except the outliers. The outliers are shown in the start points. After using the synchronization technology, all experiments based on the motion capture system verify that the accuracy of the AMCL localization results in submaps can meet the requirement of the indoor navigation system of the Fetch robot, which is commonly set about 10 to 15 cm.
Of course, we can also output the relative position error using the motion capture system. An example to obtain the  relative position error based on the synchronization results is shown in Fig. 26. In the result, we first obtain the synchronized trajectories, and then pick out several lines (red and blue lines) based on the poses of the AMCL results and the motion capture data. Finally, we compute the difference of the lengths of the lines and obtain the relative position error. The RPE of this example is 1.47cm. We test multiple times to obtain the RPEs for the different trajectories. The RPEs are respectively 2.11cm 0.98cm, and 1.27cm. The obtained RPEs further prove the accuracy of the localization results. All these results also prove that this motion capture based evaluation system is practical and suitable for the Fetch robot.

VIII. CONCLUSION
This paper presents a practical indoor navigation system for the Fetch robot based on submaps and DeepLCD using Cartographer and AMCL. The navigation system is divided into the mapping part and the on-line localization part. Several sequential submaps are mapped by the fusion data from the 2D laser scan and the RGBD camera sensor. Meanwhile, the image library is also built corresponding to the robot trajectory for the following image matching. Based on the deep learning based image matching results, the AMCL method is initialized and performed in every submap to achieve the highly-accurate localization result. The novel evaluation methods for the indoor navigation are presented and applied using the reflectors and the motion capture system. The simulation and experimental results, using the Fetch robot, demonstrate that the presented framework can complete the real-world mapping and the localization tasks with a robust and accurate performance.