A SLAM Map Restoration Algorithm Based on Submaps and an Undirected Connected Graph

: Many visual simultaneous localization and mapping (SLAM) systems have been shown to be accurate and robust, and have real-time performance capabilities on both indoor and ground datasets. However, these methods can be problematic when dealing with aerial frames captured by a camera mounted on an unmanned aerial vehicle (UAV) because the flight height of the UAV can be difficult to control and is easily affected by the environment. For example, the UAV may be shaken or experience a rapid drop in height due to sudden strong wind, which may in turn lead to lost tracking. What is more, when photographing a large area, the UAV flight path is usually planned in advance and the UAV does not generally return to the previously covered areas, so if the tracking fails during the flight, many areas of the map will be missing. To cope with the case of lost tracking, many visual SLAM systems employ a relocalization strategy. This involves the tracking thread continuing the online working by inspecting the connections between the subsequent new frames and the generated map before the tracking was lost. The new frames are then localized under the coordinate system of the generated map if these corresponding connections are found. However, there is a fatal drawback in the relocalization strategy, in that the state of lost tracking will remain until the camera moves back to the place before the UAV lost tracking and the tracking and mapping is restarted. As a result, a part of the map will be missing, from the place where the tracking was lost to the place where the camera can be relocalized. To solve the missing map problem, which is an issue in many applications (e.g., 3D position information for emergency relief), after the tracking is lost, based on monocular visual SLAM, we present a method of reconstructing a complete global map of UAV datasets by sequentially merging the submaps via the corresponding undirected connected graph. Specifically, submaps are repeatedly generated, from the initialization process to the place where the tracking is lost, and a corresponding undirected connected graph is built by considering these submaps as nodes and the common map points within two submaps as edges. The common map points are then determined by the bag-of-words (BoW) method, and the submaps are merged if they are found to be connected with the online map in the undirect connected graph. To demonstrate the performance of the proposed method, we first investigated the performance on a UAV dataset, and the experimental results showed that, in the case of several tracking failures, the integrity of the mapping was significantly better than that of the current mainstream SLAM method. We also tested the proposed method on both ground and indoor datasets, where it again showed a superior performance.


Introduction
Simultaneous localization and mapping (SLAM) is a technique for obtaining 3D geometric information of an unknown environment and estimating the sensor pose in the corresponding environment.As such, the SLAM technique has a very wide application potential in automatic driving, augmented reality, virtual reality, mobile robots, and unmanned air vehicle (UAV) navigation [1].With the ongoing development of sensor and computer vision technology, many kinds of sensors have been integrated in SLAM systems (such as LiDAR, GPS, and inertial measurement unit (IMU) sensors) [2].However, only SLAM based on cameras has been actively studied because the sensor configuration is much simpler.SLAM based on stereo cameras has been widely used with both indoor and ground datasets [3]; however, this setup is not suitable for UAVs (especially micro aerial vehicles) since the length of the baseline is normally only 20 cm, which is quite short when the images are captured from a height of nearly 1 km [4].SLAM based on a single camera, which is also called monocular visual SLAM, has received extensive attention and has been widely studied for its simplicity and cheapness.Thus, we developed the proposed method using only monocular visual SLAM.In monocular visual SLAM, without the assistance of any other sensors, lost tracking often occurs for many reasons, such as image blurring caused by the camera moving too fast (UAV platforms, in particular, are quite easily affected by random shaking caused by uncertain factors such as wind), illumination variation, and weak scene texture features.Once the tracking of a monocular visual SLAM system (such as ORB-SLAM2) is lost, the motion of the subsequent video frames and the information of the corresponding map points cannot be generated, which leads to an incomplete map. Figure 1 depicts a set of UAV data, where there were three interruptions during the flight.Figure 1a is UAV data from Wuhan University.Figure 1b shows the flight trajectory of the UAV and the locations of the interruptions (marked by the red rectangles).
Figure 1c is the processing result of the ORB-SLAM2 system, where it can be seen that the processing result of ORB-SLAM2 is only partial, due to the interruptions.To solve the lost tracking problem, the current visual SLAM systems employ a relocalization strategy, which makes the tracking thread continue the online working by inspecting the connections between the subsequent new frames after tracking was lost and the generated map before tracking was lost, and localizes the new frames under the coordinate system of the generated map if the corresponding connections are found.However, there is a crucial disadvantage to this relocalization strategy, i.e., the map information and the corresponding camera motion knowledge are not estimated and will be missing from the place where the tracking failed to the place where the new frame could be successfully relocalized.
To recover this missing information, including the corresponding map and the camera motion when the unexpected tracking was lost in monocular visual SLAM and provide a complete global motion trajectory and map, which is often required in many applications (e.g., 3D position information for emergency relief), we present a method of recovering the missing map parts using submaps and an undirected connected graph.This is integrated with the local mapping of the tracking or the loop closing thread based on the widely known ORB-SLAM2 package [5].A general overview of the proposed system is shown in Figure 2. Our main contributions are threefold.Firstly, when lost tracking occurs, we generate a corresponding submap and directly begin new map initialization and continue the tracking and mapping.
Secondly, we present a method to reconstruct a complete global map and camera motion trajectory by using the generated submaps and the corresponding undirect connected graph.
Finally, we demonstrate the proposed method's performance via evaluation on UAV images, and some ground and indoor datasets are further tested to further show the method's capability.The rest of this paper is organized as follows.In Section 2, we review the related works.
Section 3 describes the proposed method of reconstructing a complete global map and camera motion trajectory.Section 4 presents the results of the experiments conducted on various datasets.Finally, Section 5 concludes the paper.

Related Works
Over recent decades, real-time visual SLAM has been broadly investigated in various fields, such as automation and robotics, computer vision, and photogrammetry.In this section, we review some of the state-of-the-art works in visual SLAM research.Specifically, we focus on the studies of motion tracking and the solutions for lost tracking and submaps in SLAM

Solutions for Motion Tracking
There has been a great deal of research on improving tracking performance and attempting to decrease the possibility of lost tracking.In visual SLAM, these works can be generally classified into three categories: feature-based methods, direct methods based on photometric consistency, and multi-sensor aided tracking methods.

Feature-based methods
Oriented brief (ORB) features are oriented multi-scale features from accelerated segment test (FAST) corners with 256-bit descriptors, which are widely used in feature-based SLAM, mostly because they are extremely fast to compute and have good invariance to the viewpoint [6].However, the corresponding processes of feature point extraction and matching can easily be negatively affected by the change of illumination, view angle, and weak texture, which leads to lost tracking.In order to relieve this drawback and improve the tracking performance, other types of features, such as plane features and line features, have been integrated with visual SLAM.
For example, Lee et al. [7] used only plane features in the tracking, which is an approach that is mostly applicable to an environment dominated by plane features; Taguchi et al. [8] presented a method combining point and plane feature primitives to obtain a minimal set of primitives in a RANSAC framework to robustly compute the correspondences and estimate the sensor pose; Raposo et al. [9] adopted plane features as the primitives in visual odometry, and point features were only extracted when the plane features were insufficient to determine the sensor motion; and Concha et al. [10] proposed an approach based on superpixel region matching, which was shown to be more reliable in tracking than the traditional point-featurebased methods for indoor scenes with weak texture and rare features, but this approach does suffer from a limited accuracy in complex outdoor scenes.
Some studies [11][12][13] have used different line segment parameterization approaches and have tried to use the line features for solving the motion tracking, while essentially employing the two endpoints to describe and track a line segment.Jeong et al. [14] used both 3D line and corner features as landmarks in tracking under an extended Kalman filter framework.Based on ORB-SLAM2, Pumarola et al. [15] presented a line segment detector method to achieve the extraction and matching of straight lines with the same name, combined with ORB features for the tracking, which turned out to be more reliable in scenes with abundant line features and rare point features.Lee et al. [16] extracted the missing points with lines to improve the tracking accuracy and robustness in an indoor scene.

Direct methods based on photometric consistency
The extraction and matching of feature points can be time-consuming, and a scene with weak texture and only very rare features may be detected, which may give a low accuracy or even failure in pose estimation.What is more, these feature points only represent a very small part of the image information, which can be improved by considering all the information given in the image.Thus, the direct methods based on global pixel information assume that the image intensity of the same spatial point should be consistent in the corresponding neighboring images, and the position and orientation of the camera are estimated through minimization of the photometric error.Direct sparse odometry (DSO) [17] combines a fully direct probabilistic model (by minimizing a photometric error) with consistent, joint optimization of all the model parameters, including the geometry, represented as the inverse depth in a reference frame, and the camera motion.In contrast to the feature-based methods, the direct methods can utilize all the information in the image and have a higher tracking accuracy and robustness in scenes with only very rare features.Dense tracking and mapping (DTAM) [18] involves selecting frames and then computing detailed textured depth maps to generate a surface patchwork and build dense maps by GPU acceleration, which are then used for the tracking by comparing the new frames with these dense maps.This approach effectively reduces the uncertainty of pose estimation, and many semi-dense algorithms based on edge and corner features have been proposed.For example, semi-direct visual odometry (SVO ) [19] involves extracting the FAST feature points in the image and then estimating the camera pose transformation by the direct method, according to the information around the feature points.Instead of feature extraction, the large-scale direct SLAM method (LSD-SLAM) [20] computes the depth of the semi-dense points with abrupt gradient changes, such as edges and corners, on the basis of SVO, and it uses the same idea as DTAM for the tracking, which is improved by considering the geometric consistency and loop closure.As a result, LSD-SLAM can deal with scenes with a weaker texture and larger scale, and can be run on a CPU in real time.

Multi-sensor aided tracking methods
To improve the tracking performance, some methods have attempted to integrate multiple sensors into visual SLAM.For example, Leutenegger et al. [21] presented a vision and IMU data combination algorithm, in which the camera pose is computed and optimized by marginalization, which has contributed to the rapid development of multi-sensor fusion algorithms; the VINS-Mono method, which was developed by Qin et al. [22], involves embedding a low-cost IMU into visual-inertial odometry, where a tightly coupled, nonlinear optimization-based method is presented to fuse the IMU and feature observations, which can obtain absolute pose estimation and reduce the risk of lost tracking; and Bu et al. [23] proposed a real-time mosaicking system for UAV video by fusing GPS data, so that the camera's pose in the WGS84 coordinate framework can be estimated without ground control points.However, GPS signals are easily obscured on the ground, so the improvement of robustness for SLAM tracking on the ground is still limited.

Solutions After Tracking is Lost
The above tracking methods normally work well; however, tracking can often be lost in practice, e.g., through rapid motion change, poor textures for the feature-based methods, illumination changes for the direct methods, and GPS-denied environments for methods using GPS or IMU signals.Currently, to deal with lost tracking and ensure that the tracking thread works for a long period of time, most of the monocular visual SLAM methods start relocalization detection after tracking is lost to determine the current pose of the camera [4,17,19,22,24].Relocalization strategies, which are similar to detection of loop closure, can be roughly divided into two types: image-to-image and image-to-map.
The image-to-image methods use a visual BoW model to describe the images by combining word bags and feature points, with the basic rationale being that correspondences should be with the same word bag.Thereby, the similarity between images can be efficiently determined, and after matching similar images, the corresponding relative positions can be solved by using either a five-point algorithm for an essential matrix or an eight-point algorithm for a fundamental matrix.The relocation problem can thus be considered as an image retrieval problem (a similar problem exists in structure-from-motion (SfM) [25][26][27]) by employing random forest and a hash image retrieval method.The image-to-image relocalization strategy is employed in filtering-based SLAM [28] and the state-of-the-art frame-based SLAM [24,29].
After similar images are found, the pose of the camera can be recovered only if there are enough correspondences between the current image and the previous ones, so that the system is able to continue tracking and mapping under the coordinate system of the previously reconstructed map.
The idea of image-to-map matching is to determine the connection between the reconstructed map and the current new frame.Specifically, they are connected if there are enough reconstructed map points that can be observed by the current new frame, and vice versa.In the filtering-based SLAM framework, Williams et al. [30] proposed a three-point-pose algorithm combined with the RANSAC algorithm to determine the position and pose of the current camera relative to the map.In the frame-based SLAM framework, Straub et al. [31] proposed to match the descriptors of the current frame with the descriptors associated with the map points stored in the map, to estimate the pose of the current frame.However, image-tomap matching involves a large amount of calculations and is slow.Therefore, Moteki et al. [32] proposed the method of selecting an image-to-image or image-to-map method based on the geometric model between the current frame and the target frame, which is a more efficient approach.
The above-mentioned relocalization strategies have been used in some widely known visual slam systems, such as ORB-SLAM2, which have obvious defects when dealing with the case of tracking failure, and require the current new frame to have a high similarity to the reconstructed map or the oriented frames.If a similar scene is not detected after the system tracking is lost, the system will remain in the lost state and cannot continue mapping and positioning until the relocation is successful.As a result, the corresponding map information (consisting of the map from the place tracking was lost to place the relocation was successfully solved) cannot be recovered.

Submaps
The concept of the "submap" for SLAM was developed by Ni et al. [33], and is defined as a local map with a local coordinate system and frames with known relative pose and 3D map points.A global map consists of several overlapping submaps covering different parts of the entire scene.
The submap was originally proposed to efficiently solve the problem of the high computational cost of global optimization with limited computational resources [34,35].The global map is divided into several overlapping submaps.The submaps are first optimized individually, and then a single submap is taken as a whole for the global optimization.This strategy of using submaps can effectively reduce the computational cost while obtaining nearoptimal results [36][37][38].
The submaps are generated to solve the problem of missing map information, where the map from the place of the previous initialization to the place where the lost tracking occurred forms a new submap.Specifically, when lost tracking occurs, the subsequent new frames are directly reinitialized, and the new frame is tracked in the new coordinate system.The map obtained before lost tracking occurs again is the corresponding new submap.In this way, submaps are recursively generated.The multiple submaps are then connected to form a global map describing the complete scene.
To solve the problem of missing map information due to tracking failure and to provide a complete global map and motion trajectory, we present a map restoration fusion method based on the generated submaps and the corresponding undirected connected graph.The proposed method is based on the monocular ORB-SLAM2 framework.After the system tracking failure, even if the relocation requirements cannot be met, the tracking and mapping thread can still continue, but the corresponding submaps are saved, and are eventually joined together into a complete map.In theory, the proposed method is applicable to all the previous methods (such as SVO, DSO, and ORB-SLAM), and can act as an effective supplement to the existing methods, to ensure the completeness and accuracy of the mapping.In this section, we introduce the details of the proposed algorithm.Because the motion trajectory of aerial photogrammetry is relatively simple and regular, we used a dataset of UAV video data to illustrate the use of the method proposed in this paper, but the method is also applicable to ground data.

Method and Thought
The light blue band in Figure 3 represents the UAV's motion trajectory.The image in the figure represents the frame.The red rectangle indicates that the matching failed at this point and the tracking was lost.The green line indicates that the images can be successfully matched and that there is a connected relationship.Figure 4 shows the processing results obtained by ORB-SLAM2.The frames covered by the light red represent the missing maps due to tracking failure.
In this paper, the red rectangles in Figure 3 are used as intervals, and the frames are grouped according to the acquisition order to obtain Figure 5a.Each set of frames and their corresponding map points constitute a local submap in the global scene map, and are represented as (A, B, C, D, E, F, G, H, I).The system is based on ORB-SLAM2, with the following processes added: The system is reinitialized when the tracking fails, and a new map is built.

•
Retrieval of connected frames.For the current frame in the map, the BoW model is used to retrieve the frames in the other submaps.Frames are found that match the current frame (which meet a certain threshold), and the retrieved frames are called connected frames.

•
The strength of the connections between the submaps is measured.

•
The optimal connection based on the undirected connected graph is selected.

When the Tracking Fails, the System is Reinitialized to Build a New Map
The main task of the visual SLAM tracking module is to output the camera poses and determine the frames in real time to complete an unoptimized visual odometer.When the tracking fails, ORB-SLAM2 performs the relocation operation.If the relocation is not successful, all the frames that have not been successfully relocated will be lost until the relocation is successful.In order to avoid the situation of missing maps, when the tracking fails, the previous map is used as a submap, and the system is reinitialized immediately to build a new map.

Retrieval of the Connected Frames
For a newly created map, the system establishes whether the current frame forms a connected relationship with the other submaps.The specific operation is as follows.The adaptive threshold selection method is adopted to retrieve the connected frames.Firstly, the words in the BoW model of the current frame are calculated.The number of words in common between the current frame and the adjacent frame is denoted as  0 (the adjacent frame refers to the previous frame of the current frame), the number of words in common between the current frame and the nearby frame whose overlap with the current frame is about 50% is denoted as  1 (the nearby frame refers to the frame closest to the current frame in the current map), and the proportionality coefficient  =  1 / 0 is calculated.Then, according to the number of common words of the BoW model, all the frames in the other submaps are searched to find the frames with the same words as the current frame.Then, the maximum number of common words of these frames  is set as the threshold, i.e., the frames with an overlap of more than 50% are selected as candidate frames, as shown in Figure 6.The BoW score between the current frame and the adjacent frame is denoted as  0 .The BoW score between the current frame and the nearby frames whose overlap with the current frame is about 50% is denoted as  1 , and the proportionality coefficient  =  1 / 0 is calculated.Then, the highest score of the BoW score of the current frame and the candidate frames  is then set as the threshold, and the connected frames are then selected based on the BoW score.The function that retrieves the loop frames in the ORB-SLAM2 loop closing thread is then called.The original fixed thresholds (0.8 and 0.75) are then replaced with the previously calculated thresholds, and the frames that meet the conditions as connected frames are finally selected.

Measuring the Strength of the Connections Between the Submaps
After a connected frame is found, the map where the current frame is located can form a connected relationship with the submap where the connected frame is located.However, when connecting submaps, there may be multiple connected frames between the submaps.By comparing the geometric configuration of the connected frames, the connection relationship with the highest accuracy and the best reliability for coordinate transformation is selected, so as to combine the multiple submaps.
The geometric configuration of the connected frames mainly refers to the intersection angle between the connected frames.In aerial photogrammetry, the greater the intersection angle of a stereo image pair, the higher the accuracy.When the intersection angle is too small, the error in the direction of the vertical image plane will be very large.In many computer vision SfM 3D reconstruction open-source systems, such as COLMAP, the intersection angle must not be less than 16° when initializing the selected stereo image pair.According to the principle of photogrammetry [39], the smaller the intersection angle of the stereo image pair, the worse the accuracy in the depth direction of the 3D point coordinates obtained from the forward intersection of the stereo image pair, as shown in Figure 7.

current frame frame
The cumulative overlap is about 50% dx Since the formulas for photogrammetry are more intuitive than the mathematical models in computer vision, we take the formula for photogrammetry forward intersection as an example.The error of the depth direction is shown in Equation ( 1): where θ represents the intersection angle of a stereo image pair, m x represents the plane error, and m h represents the median error in the depth direction.
where b represents the image baseline, f represents the camera focal length, b represents the photography baseline, and H represents the photography height, as shown in Figure 8.It is known that if the photographic baseline is too short, the error ellipse will become extremely flat and the depth direction error will be large.Therefore, if the intersection angle between the connected frames is larger, it indicates that the connection reliability of this part is better, the 3D coordinates of the calculated map points are more accurate, and the error of transforming the two submaps to the same coordinate system is smaller.Of course, the intersection angle should not be too large, and no more than 45° is appropriate.In order to better reflect the number of smaller intersection angles, the geometric configuration of the connected frames is described by the median of the intersection angle, rather than the average.
The number of existing connected frames between two submaps is denoted as F, and the number of map points contained in the connected frame is M. The median intersection angle between connected frames is θ, in degrees.In this paper, a factor C is proposed to measure the connection strength between submaps, as shown in Equation (3).In this study, the empirical values were set through experiments.The order of magnitude F is usually about 10, the order of magnitude M is usually about 100, and θ is usually about 10°.The effect of θ on the submap accuracy is not linear, but instead curvilinear, so this is set to the second power.The strength of the connected frames between the submaps is compared, i.e., the size of the value of C: (3)

Selection of the Optimal Connection Based on the Undirected Connected Graph
When there are multiple connection paths between multiple submaps, the strongest connection path needs to be selected.This is essentially a problem of finding a connected path for an undirected connected graph.For the connection between submaps, each submap is similar to a group of undirected connected graphs with weights equal to the strength of the different connection paths, as shown in Figure 9.Each submap is a node in an undirected connected graph.The connection relationship between each submap is the edge, and the connection strength corresponds to the weight of each edge.As shown in Figure 5a, the UAV's flight path is A-B-C-D-E-F-G-H-I, and Figure 5b shows the undirected connected graph formed after each submap is connected.For an undirected connected graph containing n nodes, the process of solving the minimum spanning tree is to find a path that contains n−1 edges and can connect n nodes, so as to minimize the sum of the edge weights (i.e., the cost).For the connection of the submaps, the higher the value of C in Equation ( 3), the higher the connection strength.Therefore, a negative value of C was chosen as the weight of the undirected connected graph in this study.
Kruskal's algorithm can be used to solve the minimum spanning tree.

Algorithm Flow
Each time the system tracking fails, the construction of the map is restarted.The previously established map is called a submap.The proposed method uses the idea of graph theory to connect these submaps into a complete map.In graph theory, each segment of the submap is taken as a node, and the connected relationship between each segment of the submap is taken as the edge.The connectivity detection is used to determine whether there is a connected relationship between submaps.The map currently being tracked is denoted as C, and the n submaps S1 ... Sn generated by the previous tracking failure are stored in stack L. The algorithm flow is as follows: 1. Track the current map C and denote the current frame as F. 2. Determine whether the current map C is successfully initialized.If it has not been completed, continue to initialize C; if it has been completed, go to the next step.3. Determine whether the orientation of F in C is successful.If the orientation is successful, proceed to the next step.If the orientation fails, perform the following operations: determine whether the system tracking failure conditions are met (several consecutive frames fail to be oriented), and if the system tracking failure conditions are not met, the currently read F is discarded and the next frame is read and recorded as F. Restart step 3.
If the tracking failure condition is met, suspend the tracking of C, denote C as Sn+1 and save it in L, and create a new map to start tracking.Denote it as C and go back to step 1. 4. Determine whether there is an unrecovered submap Si (i ∈ [1, n]) in the missing submap stack L. If Si exists, go to the next step; if Si does not exist, all the submaps in L are retrieved or no submap exists in L. Go back to step 1 and read the new frame.5. Determine whether Si and C are connected through F. If Si and C are connected, proceed to the next step; if Si and C are not connected, go back to step 1 and read the new frame.6. Determine whether the connected strength of the two submaps reaches the threshold.If the threshold is met, convert Si to the C coordinate system and merge the two submaps.Continue to track C, and go back to step 1 and remove Si from L; if the threshold is not met, continue tracking C, and go back to step 1.
The algorithm flowchart is shown in Figure 10: We take Figure 5 as an example to illustrate the process of merging submaps.The new submap is added to stack L1, and the currently tracked map to stack L2.As shown in Table 1, map A is first tracked, and A joins L2; when A fails to track, the system is reinitialized.A, as the  1 indicate that the two submaps currently have a connected relationship.

Experimental Results and Analysis
In order to verify the method proposed in this paper, a UAV dataset, a ground dataset, and two indoor datasets were used in the experiments.

UAV Dataset Experiment
The first dataset is video shot by UAV, as shown in  There were three interruptions during the flight.Figure 12a is the flight trajectory of the UAV. Figure 12b is the processing result of ORB-SLAM2.Figure 12c is the processing result of the proposed system (with the letters representing the submaps).Figure 12d

Outdoor Street Dataset Experiment
The second group of data is videos taken by handheld cameras, as shown in Figure 13.
The video resolution is 1920 × 1080, with a total of 24690 frames.The camera model is a GoPro Hero 6 motion camera, with a frame rate of 30 frames per second.The camera orientation is complex and the angle of view changes dramatically.The scenes captured by this dataset are mainly residential streets.The scenes are complex, including moving objects, and areas where it is difficult to extract ORB features.In the process of data collection, the tracking failure was caused by the rapid camera rotation occurring at the street corner.There are four interruptions at the corners of the street.Figure 14a is the ground running trajectory.Figure 14b is the processing result of ORB-SLAM2.The construction of the map was stopped at the breaks, resulting in many missing maps.Figure 14c is the processing result of the proposed system (with the letters representing the submaps).Figure 14d is    If the camera is shaken violently, the field of view will change greatly, leading to failure of the tracking, so there were multiple interruptions in the trajectory.In order to improve the readability, it is assumed that there are M submaps in total, in which the blue polygon box contains multiple submaps (this can be seen in Figure 16c).Figure 16a is the camera's running trajectory.Figure 16b is the processing result of ORB-SLAM2, which only reconstructs half of the room.Figure 16c is the processing result of the proposed system (with the numbers representing submaps), which reconstructs the entire room.Figure 16d

Indoor corridor dataset
The fourth set of data has the same collection conditions as the third set, as shown in Figure 17.The video resolution is 1920 × 1080, with a total of 17,880 frames.The scenes captured by this dataset are mainly of an indoor corridor.There were two interruptions in the trajectory.Figure 18a is the camera's running trajectory.Figure 18b is the processing result of ORB-SLAM2, which only reconstructs half of the corridor.Figure 18c is the processing result of the proposed system (with the letters representing the submaps), which reconstructs the entire corridor.Figure 18d is a schematic diagram of the undirected connection graph.When ORB-SLAM2 processed this set of data, it was unable to solve the position of the current frame in the world coordinate system, due to the tracking failure between submaps A and B, so submap B was lost.Submap C was successfully relocated and connected to submap A, but the final map only includes submaps A and C.After the tracking failure between the A-B submaps occurred in the proposed system, the new world coordinate system was reinitialized and the system continued tracking.At the same time, it was established whether there was a connected relationship between the submaps.

Analysis of the Experimental Results
The speed of the motion, sharp perspective changes, or the shooting area lacking in texture can lead to tracking failure, at which point ORB-SLAM2 will usually stop tracking, and will use the relocation method to try to restore the camera pose.However, the system cannot resume tracking until the camera returns to the location of the tracking failure.After the tracking failure, the proposed system keeps the previous submap in the stack, and reinitializes the tracking of the newly added video frame to form a new submap.When the system detects the connection between submaps, it transforms them into a whole connected map in the same coordinate system, so as to make the reconstructed map more complete.As shown in Table 2, the trajectory integrity of the proposed system is far more complete than that of ORB-SLAM2, which verifies the effectiveness of the proposed system.As shown in Table 3, from the trajectory error, the proposed system can basically keep close to the original ORB-SLAM2 level.To sum up, in this study, we conducted four groups of experiments aimed at different scenarios, and the current more mainstream monocular vision SLAM framework of ORB-SLAM2 was used as an open-source comparison.The experimental results show that, in the case of tracking failure, the proposed system can rebuild a more complete scene map, confirming the effectiveness of the proposed SLAM map restoration algorithm based on submaps and an undirected connected graph.

Conclusion
In this paper, we have proposed a map information restoration algorithm based on submaps and an undirected connected graph for monocular vision SLAM.When the system fails to track multiple times and generates multiple submaps, as long as there is connectivity between the submaps, it is possible to merge the maps into a coordinate system.The proposed method is able to retain more complete map information than that retained by the existing relocation methods.
The proposed system involves starting a new submap after tracking failure occurs, and initializing the newly read video frames to track the new submap.However, because ORB-SLAM2 needs to select a pair of video frames with sufficient parallax in the initialization phase, it may need to go through more video frames before the system can be initialized successfully.
To some extent, this can "waste" some video frames.Therefore, in the future, we will consider reusing the video frames during the initialization process to make the reconstructed map even more complete.

Figure 1 .
Figure 1.UAV dataset experimental results.(a) The UAV data from Wuhan University.(b) The flight trajectory of the UAV and the locations of the interruptions (the red rectangles mark the interruptions).(c) The processing result of ORB-SLAM2.

Figure 2 .
Figure 2. System threads and module.

Figure 3 .
Figure 3. UAV trajectory and matching relationship between images.

Figure 4 .Figure 5 .
Figure 4.The processing result of ORB-SLAM2 (the frames covered by the light red represent the missing maps due to tracking failure).

Figure 6 .
Figure6.The practical significance of the adaptive threshold coefficient.From left to right, the two red frames are the frames that overlap with the current frame by about 50% and the nearest frame, corresponding to the relationship between the actual overlapping range between the nearby frame and the adjacent frame in the threshold selection.The relationship is thus considered, instead of simply setting a fixed absolute parameter.

Figure 8 .
Figure 8.The relationship between the intersection angle and depth direction accuracy.

Figure 9 .
Figure 9.An undirected connected graph with weights.
first submap, is added to L1, and we start tracking the second map B. L2 is updated to B; after B fails to track, B joins L1, and we start tracking the third map C. L2 is updated to C; after C fails to track, C joins L1, and we start tracking D. L2 is updated to D; when tracking D, the system indicates that D is connected to C, and then merges them into D-C.C is removed from L1, and L2 is updated to D-C.After a while, it is found that D-C is connected to B. These are then merged into D-C-B.B is removed from L1, and L2 is updated to D-C-B, and so on.The submaps are finally merged into I-H-G-F-E-D-C-B-A, which is a complete map.The red letters in Table

Figure 11 .
The video resolution is 1920 × 1080, with a total of 16,380 frames.The camera model is the onboard camera of a DJI Phantom 4 drone, and the frame rate is 30 frames per second.The camera orientation is relatively constant, and the angle of view does not change dramatically, being mainly vertically downward.The motion track of the camera is S-shaped.The scenes shot by this group of data are mainly residential areas and vegetation, with abundant feature points.

Figure 12 .
Figure 12.UAV dataset experimental results.(a) The flight trajectory of the UAV.(b) The processing result of ORB-SLAM2.(c) The processing result of the proposed system (with the letters representing the submaps).(d) The undirected connected graph.

Figure 14 .
Figure 14.Outdoor street dataset experimental results.(a) The ground running trajectory.(b) The processing result of ORB-SLAM2.(c) The processing result of the proposed system (with the letters representing the submaps).(d) The undirected connected graph.

4. 3 .
Indoor Dataset Experiment4.3.1.Indoor office datasetThe third group of data is videos taken by handheld cameras, as shown in Figure15.The video resolution is 1920 × 1080, with a total of 12,670 frames.The camera model is a GoPro Hero 6 motion camera, with a frame rate of 30 frames per second.The camera motion is more complex, some of the perspective changes dramatically, and some of the perspective is more simple.The scenes captured by this set of data are mainly of an indoor office.The scenes are relatively complex, with large white walls, which is not conducive to ORB feature extraction, and the light changes are also quite drastic.

Figure 16 .
Figure 16.Indoor office dataset experimental results.(a) The camera's running trajectory.(b) The processing result of ORB-SLAM2.(c) The processing result of the proposed system (with the numbers representing the submaps).(d) The undirected connected graph.

Finally, the pathsFigure 18 .
Figure 18.Indoor corridor dataset experimental results.(a) The camera's running trajectory.(b) The processing result of ORB-SLAM2.(c) The processing result of the proposed system (with the letters representing the submaps).(d) The undirected connected graph.

Table 2 .
Comparison of the trajectory integrity.

Table 3 .
Comparison of the trajectory error.