MGC-VSLAM: A Meshing-Based and Geometric Constraint VSLAM for Dynamic Indoor Environments

Visual Simultaneous Localization and Mapping (VSLAM) system is considered to be a fundamental capability for autonomous mobile robots. However, most of the existing VSLAM algorithms adopt a strong scene rigidity assumption for analysis convenience, which ignored the influence of independently moving objects in the real environment on the accuracy of the SLAM system. Hence, this paper proposes MGC-VSLAM: Meshing-based and Geometric constraint VSLAM, a novel VSLAM algorithm for dynamic indoor environments, built on RGB-D mode of ORB-SLAM2, which relates to the problem of the ORB feature uniform distribution and the dynamic feature filtering. In detail, aiming at the problem of the over-uniform distribution of feature points extracted by the quadtree-based algorithm in ORB-SLAM2, a meshing-based feature uniform distribution algorithm is proposed. Meshes are divided at each layer of the image pyramid, and then a specific number of features in the meshes are reserved according to their Harris response value. In addition, aiming at the impact of features extracted from dynamic targets on the SLAM system, a dynamic feature filtering method is proposed. First, a stable matching relationship is established through a feature matching constraint method. Then a novel geometric constraint method is used to filter out the dynamic feature points in the scene. Only the remaining static features are reserved to achieve accurate camera pose estimation in dynamic environments. Experiments on the Oxford dataset and public TUM RGB-D dataset are conducted to evaluate the proposed approach. It revealed that the proposed MGC-VSLAM can effectively improve the positioning accuracy of ORB-SLAM2 in high-dynamic scenarios.


I. INTRODUCTION
Simultaneous Localization and Mapping (SLAM), as the core technology of intelligent mobile robots, refers to the fact that the robot simultaneously completes the positioning of the mobile robot itself and the map construction of the surrounding environment without any prior environmental information [1]- [2]. A SLAM system that only uses visual sensors to obtain external information is called a visual SLAM system (VSLAM). Some existing advanced VSLAM algorithms have achieved satisfactory results [3]- [6]. However, some problems in VSLAM have not been well solved until now [7]- [8]. For example, ORB features extracted by The associate editor coordinating the review of this manuscript and approving it for publication was Jinpeng Yu . the standard ORB algorithm tend to concentrate on the strong texture regions. As a consequence, these features cannot reflect the whole image well, and the number of matched features will be significantly reduced when the concentrated regions of two adjacent frames are different [9]. This problem makes the SLAM system to be unstable and more seriously will cause tracking lost [10]. Besides, most existing algorithms use the external environment as a static assumption, ignoring the impact of dynamic targets on the accuracy of the SLAM algorithm in the real environment [11]- [12]. In the real environment, independent moving objects often appear in the scene. It will introduce errors into the visual odometer estimation, and moving objects will be recorded in the resulting map, which makes the built maps unsuitable for subsequent robot intelligent capture, navigation, and other VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ complex tasks [13]. Therefore, how to eliminate the negative impact of dynamic objects on the SLAM system is a critical challenge for VSLAM. ORB-SLAM2 [3], [4] is one of the best VSLAM algorithms currently in use, which has high consistency positioning and map building accuracy. In ORB-SLAM2, ORB features [4] [14] are used for tracking. Given ORB features are prone to concentrate on the regions with strong texture. A quadtree-based method is adopted by ORB-SLAM2 to make features uniform distribution. It iteratively segments every layer of the image pyramid, extracting the feature with the largest Harris response value in child nodes [4]. However, features extracted by the quadtree-based algorithm are easy to over-uniform distribution, which keeps some weak quality features that not conducive to tracking, and the number of iterations affects its algorithm efficiency. A UR-SIFT algorithm is proposed by Sedaghat et al. [15], which can adaptively control the number of SIFI feature points in different regions. It is a better algorithm for uniformly distributing SIFT feature points. Paul and Pati [16] presented a modified UR-SIFT, which effectively generates robust enough, reliable, and uniformly distributed aligned feature points. Yao et al. proposed an improved quadtree-based ORB algorithm. Different quadtree depths are set according to the expected number of features [17]. However, this algorithm still has the problem of over-uniform distribution, and the matching state of the extracted features is not tested. So far, most of the feature uniform distribution algorithms proposed by scholars have high computational complexity, and the matching effect is not ideal.
Under all obstacles above, it remains the motivation for finding a novel VSLAM algorithm for dynamic indoor scenes. How to eliminate the ambiguity of motion caused by the camera's motion and independent moving objects is the fundamental challenge of motion removal. The main problem in the SLAM is that camera is moving. The classic motion segmentation methods as the foreground and background move independently in the image, such as background subtraction and frame difference method, are useless [18]. Researchers have proposed many excellent motion removal methods. Berta Bescos et al. [19] add a dynamic target detection module on ORB-SLAM2, which can accurately identify and segment dynamic targets through the combination of multi-view geometry and deep learning. Similarly, Chao Yu et al. [20] filter the feature points extracted on the dynamic target through the epipolar constraint and Segnet network. However, the fundamental matrix used for the epipolar constraint has a large error, which affects the performance of the algorithm. Recently, deep learning networks as one of burgeoning method which achieved good research results in the SLAM systems due to its strong scene adaptability. However, so far, high-precision deep learning segmentation networks require high-performance hardware to support real-time processing, which significantly increases application costs. Emrah [21]- [22] proposed a novel trigonometrybased kinematic positioning scheme, using a vertically suspended monocular camera and a novel geometric method to track the robot pose. The system performs well in the experimental environment, and it provides a new idea for the positioning of mobile robots for indoor environments. Wangsiripitak and Murray [23] avoid moving outliers through tracking known 3D dynamic objects. The work of Moo et al. [24] uses two single Gaussian models, which can effectively represent the foreground and the background. Sun et al. [11] extend the idea of using the intensity difference image to identify the boundary of dynamic objects. However, the huge amount of computation hinders real-time applicability. Kim et al. [12] propose to obtain the static parts of the scene by computing the difference between consecutive depths images projected over the same plane. Similarly, Zhao et al. [25] also use depth images to detect dynamic objects. However, these methods are prone to be affected by the uncertainty of depth images. More recently, Li and lee [26] use depth edge points with an associated weight indicating its probability of belong to a dynamic object.
To improve the stability and robustness of ORB-SLAM2 in the dynamic environments, a novel meshing-based uniform distribution approach is proposed, which solved the problems of the over-uniform distribution of features and the artificial FAST extraction threshold cannot suit the different environments. Moreover, a modified geometric constraint method is proposed to filter out the dynamic features, which can reduce the impact of the dynamic objects to the SLAM positing accuracy. The main content of this paper is arranged in the following section: Section 2 introduces the proposed improved SLAM algorithm system framework and the detailed process; Section 3 tests the proposed method experimentally; conclusions and future work are described in the last part of the paper.

II. METHOD
In this section, MGC-VSLAM will be introduced from three main aspects: system framework, feature uniform distribution algorithm, and dynamic feature point filtering algorithm.

A. SYSTEM OVERVIEW
The overview of the proposed MGC-VSLAM system is shown in Fig.1. It can be divided into three main threads: tracking, local mapping, and loop closing. First, a meshingbased feature uniform distribution algorithm is used to extract the stable ORB features on the current frame image. Then, the dynamic feature points extracted on the moving targets are filtered by a geometric constraint method. Only the static feature points are input into the tracking thread for pose estimation. The rest two threads keep the same as ORB-SLAM2.

B. IMPROVED ORB FEATURE POIT UNIFORM DISTRIBUTION ALGORITHM
In this section, a meshing-based feature uniform distribution method is introduced, which aims to solve the problem of the over-uniform distribution of features in ORB-SLAM2. As is  shown in Fig.2, it is a meshing model of the meshing-based algorithm. Detailed parameters of the meshing model will be illustrated in detail as follows.

1) MESH MODEL
First, the image pyramid is constructed to solve the problem of scale invariance of the ORB algorithm. The scaling ratio S i of each layer can be expressed as: where n is the total number of layers of the image pyramid, SF is the scale factor of each layer of the image pyramid.
The ith layer of the image pyramid I i can be obtained by downsampling the SF times of the image pyramid I i−1 .
Each layer of the image pyramid is divided into the meshes after obtaining the image pyramid. To make the feature points to be extracted distribute reasonably, the extracting number of each layer of the image pyramid is set as follows: where the total desired number of ORB features is denoted as N . DesiredF i is the number of feature points required for the ith layer. InvSF represents the reciprocal of the scale factor, which can be expressed as InvSF = 1/SF. The feature points in image edge are easily lost in two consecutive image frames, causing the matching to fail. Thus, a boundary is set on each layer of the pyramid image, and it is shown in Fig. 2 (a).
Hence, the width ew i and the height eh i of the feature point extraction area of the ith layer of the image pyramid can be expressed as: where the width and height of the image of the ith layer of the image pyramid are denoted as w i and h i , respectively. EdgeTH is the boundary threshold.
Each mesh area should be as equal as possible to make the distribution of features more uniform. Therefore, the number of mesh rows and columns is set as equation (5).
where t is the mesh division coefficient for controlling the number of the mesh. The aspect ratio of the ith layer of the VOLUME 8, 2020 image pyramid is noted as ImRat i , which can be expressed as Then the area of each mesh on the image pyramid can be finally determined, which can be expressed as: where min X mn (i), min Y mn (i), max X mn (i) and max Y mn (i) are the four boundary points of the mesh in m row and n column as shown in Fig2 (a). Th is the mesh expansion coefficient, which can help better extract the features on the mesh edges.Gw i and Gh i are the actual feature extracting region of the mesh, which can be expressed as: Then ORB features are extracted after each mesh is divided. The desired number of each mesh of the i th layer of the image pyramid is set as equation (11) to make the distribution of feature points in each grid as uniform as possible.

2) ADAPTIVE THRESHOLD FAST FEATURE EXTRACTION IN THE MESH
The extraction threshold of FAST is usually set according to the engineering experience value in the standard ORB algorithm and ORB-SLAM2 algorithm, which cannot better meet the real environments. Such as, the high threshold cannot extract enough features in some image regions where the texture information is weak. Hence, a threshold that can adapt to image changes is set in this method, and the initial threshold iniTH is set as: where I (x)is the grey value of each pixel in the image, κis the average value of the image grey. The total number of the image pixel is represented as ni. FAST features are extracted with an initial threshold iniTH in the mesh. Reducing the threshold by a quarter when the number of the features extracted by the initial threshold is less than cDesF i .

3) ORB FEATURE POINTS RETENTION STRATEGY IN THE MESH
First, ORB features are sorted by their Harris response value from large to small. Then the first cDesF (the desired number of each mesh) items of Fast features in the mesh are reserved as the final ORB feature points. The detailed retention strategy flow is shown in Fig. 3. In Figure3, Kp is the final reserved features extracted by the proposed method. The total number of the pyramid layer is denoted as n. m(i) represents the number of meshes in the i th layer of the image pyramid. The number of features in the j th mesh is denoted as nf(j). TemKp is the sequence to save temporary data.

C. DYNAMIC FEATURE POINTS FILTERING ALGORITHM
How to filter out dynamic features is a huge challenge for feature-based VSLAM systems using in the real world. A novel geometric-based dynamic features filtering algorithm is introduced in this part. First, ORB feature points are extracted by the meshing-based algorithm for each frame, then the feature points between the adjacent frames are matched based on the BRIEF descriptors [27], which is accelerated through the bag-of-words model [28]. These methods are similar to the processing flow in the ORB-SLAM2 [3], [4] system, and will not be repeated here.

1) FEATURE POINT MATCHING ACCURACY CONSTRAINT METHOD (MAC)
The matched feature pairs using the BRIEF descriptors exist mismatches, which will lead to disorder of the dynamic feature point judgment method. Hamming distance is a quantification of the quality of feature matching. In view of this, the matched point pairs whose nearest Hamming distance d 1 is far greater than the next nearest Hamming distance d 2 is considered as a reliable match. In detail, a pair is considered as the correct match when they satisfy the relationship: d 1 /d 2 < α, where, α is the precision constraint coefficient, which controls the strictness of matching.
To better illustrate the matching state, a matching accuracy score is defined as: The smaller the score is, the higher the probability the matching pair is the correct one, which will be used for the following dynamic feature detection.

2) FEATURE POINTS GEOMETRIC CONSTRAINT METHOD (GC)
The feature points will have a relatively stable matching relationship after they are filtered by the MAC method. However, the dynamic matched pairs are also retained. In view of this, an image feature point geometric constraint model is proposed.
There are three matched pairs in Fig. 4, for example, which construct two triangles p 1 p 2 p 3 and q 1 q 2 q 3 on the query image I 1 and the target image I 2 , respectively. I 1 and I 2 are two adjacent frames, and its time interval is particularly small. Therefore, the projection distortion due to the camera pose change is very tiny. If there is no dynamic target in the scene, the corresponding edges between the two triangles should appear in a small interval, which can be expressed as: To better describe this change of corresponding edges, a geometric constraint score function is defined as: where d(a, b) denotes the Euclidean distance between a and b feature points. A(i, j) represents the average distance between two features, which can be expressed as: If there has a dynamic target that appears in the scene, for example, q 1 is extracted on this dynamic target, and q 1 moves to q 1 on the target image I 2 . As a consequence, the geometric constraint score q g calculated from formula (15) arises abnormally. However, there has a hard decision to identify which feature is dynamic due to the value of q g represents the common value of the two pairs of features.
In view of this, the following method called two-way scoring rule is adopted to identify the dynamic features. In detail, when an edge has an abnormal distance, adding one point to both the two feature points on the edge, that is, one's abnormal score represents how many features judge it is a dynamic feature. Therefore, the abnormal score value of the real dynamic feature point will be unusually large when all features scored to each other. The mathematical expression of the abnormal score can be expressed as: where s(i, j) indicates the abnormal score increment, and it can be expressed as: where β is the scale factor for the geometric score, which controls the strictness of the geometric constraint. AS represents the average geometric score: where M is the number of matched pairs of feature points of the image. w i,j s can be expressed as: where θ th is the geometric score threshold, which means when the sum of one's abnormal score and matching accuracy score larger than this threshold, then it is not used to calculate AS. Equation (20) can largely reduce the influence of the abnormal score of dynamic features onAS. After all of the abnormal score q ab is calculated, an adaptive threshold method is used to filter out the dynamic features. For example, there have extracted 450 features in Fig.5 (b), and their abnormal scores are shown in Fig. 5(a). The red line in Fig. 5(a) represents the segmental threshold, which means if one's abnormal score q ab is more than this threshold, then it is determined as the dynamic features. In detail, the segmental threshold is set as γ M , M is the total number of the features; it is known that the abnormal score represents how many features judge one feature is dynamic. γ is set as 60% in Fig.5, which means if 60% of features determine that one feature is abnormal then filter it. The filtered dynamic features are shown in Fig. 5(b) with red color. The more test of this algorithm is taken in the experimental part of this paper. VOLUME 8, 2020

D. EXPERIMENTAL SETUP
Three experimental tests were performed to verify the effectiveness and robustness of the proposed algorithm, which is improved features uniform distribution algorithm, dynamic feature points filtering algorithm, and the improved VSLAM system performance test.
In the first experiment, Oxford datasets [29] are selected to test the uniformity and the matching accuracy. The public TUM dataset [30] is used to test the performance of the last two algorithms, which contains color and depth images with scale and rotation changes. The data is intended to evaluate the accuracy and robustness of the SLAM algorithm in indoor scenes with fast-moving dynamic objects. The real trajectory is obtained by a motion capture system, which consists of 8 high-speed (100HZ) cameras to ensure the reliability of the real trajectory. The dataset contains several typical types of camera self-motion. The camera has four types of selfmotion: halfsphere, rpy, static, and xyz in the sitting and walking sequences, where, the halfsphere indicates that the camera moves along the hemisphere trajectory; the rpy indicates that the camera rotates along the roll-pitch-deflection axis; the Static indicates that the camera is basically fixed in place; the xyz indicates that the camera moves along the xyz axis.
All experiments were run on a computer with a 3.40 GHz Intel (R) i7-6700 CPU, 20GB of memory, and an Ubuntu 14.04 operating system. All tests are tested 5 times, and the average value is taken.

III. EXPERIMENTAL RESULTS
Three experiments were conducted. The purpose of the feature point uniform distribution method was to investigate the feature uniformity and matching accuracy. The dynamic feature filtering performance is tested in the second experiment. The last experiment SLAM test is designed to verify the positing accuracy of the improved VSLAM in dynamic environments.

A. IMPROVED ORB FEATURE POINT UNIFORM DISTRIBUTION ALGORITHM TEST 1) FEATURE POINTS UNIFORMITY TEST
The uniform degree [31] of the three algorithms was calculated in Table 1. The smaller the uniform degree value, the more uniform of the feature points. For the convenience of recording, in the table, the feature point uniform distribution algorithm in ORB-SLAM2 is noted as Qtree_ORB, and the proposed method is represented as Grid_ORB.
The experimental results show that the average uniform degree value of the Qtree_ORB is smaller than that of the Grid_ORB and ORB algorithm. It means that the feature points extracted by Qtree_ORB are more uniform than the other two algorithms.
The feature extraction results of the three algorithms in a same indoor scene were shown in Fig. 6. It can be seen that the feature point uniform distribution effect of the Grid_ORB algorithm was more obvious than the traditional methods. Fig.6 (b) shows that a large number of weak feature points were extracted by Qtree_ORB. These weak features are located in the regions where gradient changes are not significant, such as the screen, floor and so on. However, more high-quality feature points are reserved by Grid_ORB.

2) FEATURE POINTS MATCHING PERFORMANCE TEST
The three algorithms were tested on the Oxford dataset in this section. SLAM problem usually relates to the scale change, Brightness change, view angle change, fuzzy change of the scene. Hence, the experimental results on these data will be analyzed as a focus. The matching results are shown in Fig. 6. The following is the detailed experimental data and analysis:

a: SCALE CHANGE
In terms of scale change, Fig.6 (a), (b) and (c) are the matching results of the three algorithms on bark data. The detailed experimental data is shown in Table 2.
It can get from Table 2 that the Grid_ORB algorithm has a higher matching number than the other two algorithms. In terms of correct matching ratio (CMR), the Grid_ORB algorithm is about 5% better than that of the standard ORB algorithm. It can be seen from the query image of the Fig.6 (a) that the feature points extracted by the standard ORB algorithm are most concentrated on the bottom right corner with the strong texture. A large number of feature points on the aggregation region were not matched due to this region does not appear in the target image. The matching quantity of the Qtree_ORB is significantly impacted because it is reserved many weak feature points.

b: BRIGHTNESS CHANGE
The leuven data is used to verify the illumination robustness of the algorithms, and the matching results are shown in Fig.7 (d), (e), and (f). The detailed test data of each algorithm is shown in Table 3.  Table 3 shows Grid_ORB is superior to the other two algorithms in both the matching quantity and the CMR. It is about 3% higher than that of the other two algorithms in CMR. It can be seen from Fig. 6 (a) that the ORB algorithm cannot extract feature points in the weak and strong illumination area because it extracts feature points based on a fixed threshold. Conversely, Qtree_ORB and Grid_ORB based on adaptive thresholds, the extracted features are relatively uniform. However, the final matching quality is affected in Qtree_ORB because the feature extracted from Qtree_ORB tends to overuniform, a large number of weak feature points are preserved, such as the feature points on the car glass and the ground.

c: VIEW ANGLE CHANGE
The data set graf and wall are used to test the impact of the view angle change on the algorithms. The experimental results are shown in Fig.7 (g)-(l), Table 4 and Table 5.
It can get from Fig. 7 (g) that the feature points extracted by the standard ORB algorithm are mostly concentrated on the center of the image. Conversely, Qtree_ORB and Grid_ORB are relatively uniform. It can obtain from Table 4 that the VOLUME 8, 2020    Grid_ORB algorithm has higher accuracy in the Graf and wall data and improves CMR by about 3% compared with the ORB algorithm and about 1.6% compared with Qtree_ORB algorithm.

d: FUZZY CHANGES
In terms of fuzzy changes, the matching results of the three algorithms on the bikes data are shown in Fig. 7 (m), (n) and (o), and the detailed test data of each algorithm is shown in Table 6.
As shown in Fig.7 (m), feature points extracted by the ORB algorithm are mostly concentrated on the car, and the distribution of the features extracted from the other two algorithms is relatively uniform. In terms of the matching accuracy, the Grid_ORB algorithm is higher than ORB algorithm but lower than that of Qtree_ORB algorithm. The reason is that the distance of a few feature points extracted by the Grid_ORB algorithm is too close, which leads to the reduction of descriptor dissimilarity and ultimately leads to mismatching.
In terms of the efficiency of the algorithm, a comparison of the operation efficiency of each algorithm is shown in Table 7. The operation efficiency of the Qtree_ORB and Grid_ORB algorithm is slower than that of the ORB algorithm because they have added the feature point selecting module

B. DYNAMIC FEATURE POINT FILTERING ALGORITHM TEST
In this section, the dynamic feature filtering algorithm is tested on the TUM dataset, and the results are shown in Fig. 8 and 9.
For convenience recording, the fr3_walking_xyx sequence in the TUM dataset is represented as w_xyz, and the TUM dataset fr3_walking_halfsphere dataset is denoted as w_half, the quadtree-based method is noted as ''Q'', the meshing-based method is represented as ''G'', and the feature geometric constraint method is noted as ''GC''. The format ''sequence/frame/method'' represents used data, at which frame, and which method is used. The red box is used to indicate the regions where the feature points change significantly, and it does not represents all the changed regions.
The analysis will be performed from the scene including fast-moving objects and slow-moving objects.

1) THE ENVIRONMENT CONTAINS FAST-MOVING OBJECTS
The first column images in Fig. 8 are feature extraction results from Qtree_ORB. It can be seen that a large number of feature points are extracted in the moving object. Conversely, almost a few features are extracted by the Grid_ORB, which can be seen in the second column of Fig. 8. The reason for the different results of the two algorithms is that the motion blur is caused by fast-moving objects. In detail, the feature points descriptors extracted in the motion blur region have a small dissimilarity with others. These feature points have a low Harris response value, which belongs to weak features. However, Qtree_ORB needs to reserve at least one feature point in the child node. Hence, some weak features on the motion object are reserved. However, Grid_ORB will preferentially extract high-quality features, which makes it extract little feature on the moving objects. It is worth noting that this still works when the camera rotated.
In terms of dynamic features removal, the dynamic features extracted from Qtree_ORB are better filtered by the GC method, which can be seen from the images of the third column in Fig. 8. These results largely proved the effectiveness of the GC method. Conversely, Grid_ORB extracts little dynamic features, so the features filtering effect is not obvious.

2) THE ENVIRONMENT CONTAINS SLOW-MOVING OBJECTS
It can be seen from images of the first two columns in Fig. 9 that both algorithms have extracted dynamic features. The reason is that slow-moving objects have little impact on the image. That means the number of low-quantity features caused by moving objects is small in this scene, so Grid_ORB also extracts some dynamic features. In general, the dynamic features extracted from Grid_ORB are fewer than those of features extracted by Qtree_ORB. However, in Fig.9 (d), Grid_ORB extracts more dynamic features than Qtree_ORB. In this scene, two people are chatting, the person on the left is stationary, and the person on the right is   slowly lifting the arm. In detail, as shown in the red box, the background of the arm of the moving target is a computer screen, and the grey difference of the hand edge changes significantly, so Grid_ORB extracts many robust features on the moving object.
In terms of dynamic filtering results, overall, the filtering effect of ''G+GC'' method is more obvious than that of the ''Q+GC'' method. The reason is that Qtree_ORB extracts a large number of dynamic features. These features have damaged the relationship between the features, causing the filtering failed of the dynamic features.

C. SLAM SYSTEM PERFORMANCE TEST
In this section, the proposed method is fused to the ORB-SLAM2 front-end as a preprocessing stage to filter out the dynamic features.

1) SLAM EXPERIMENT IN DYNAMIC ENVIRONMENTS
Quantitative evaluation was performed using absolute trajectory error (ATE) and relative pose error (RPE). Table 8-10 are the results of the quantitative test result. In the tables, MB-VSLAM represents the Grid_ORB method is fused to the ORB-SLAM2. MGC-VSLAM denotes both Grid_ORB and GC method are all fused to ORB-SLAM2. IMPROVEMENT represents the percentage improvement of the test algorithm over the standard algorithm.
RMSE in the table is highlighted as it is susceptible to large or occasional errors [32]. Therefore, the robustness of the SLAM system can be better represented by RMSE. STD is also highlighted because it represents the stability of the system. In the sitting sequence, the dynamic targets are only the local limb movement of two people, which belongs to a low dynamic motion sequence. In the tables, the low dynamic motion sequence is denoted as '' * ''.
The test results of absolute trajectory errors are shown in Table 8, and the running trajectory of each algorithm on the two test sequences is shown in Fig. 10 and 11. Table 8 shows the performance of the improved system is significantly improved compared to the original system in the high dynamic test sequence. The reason is that the improved system can effectively filter out the features extracted on the dynamic targets, avoiding the influence of the dynamic features on the SLAM positioning estimation. It can be seen from the test data that the positioning accuracy of ORB-SLAM2 can be effectively improved only simply integrates the Grid_ORB method. MGC-VSLAM reduces the average RMSE of ORB-SLAM2 on highly dynamic sequences by about 93%. This proves that the improved system can effectively improve the performance of ORB-SLAM2 in high dynamic scenarios.
However, the accuracy of ORB-SLAM2 is reduced by MB-VSLAM in the low dynamic sequences. The reason is that Grid_ORB extracted a large number of dynamic features, which introduces errors into the SLAM system. But this defect is made up by the GC method. More high-quality feature points on the background are extracted by the Grid_ORB method, which provides a stable matching relationship between image frames makes it can better to do geometric constraints. ORB-SLAM2 system, can well filter out the dynamic feature points extracted from low dynamic sequences because of the local map mechanism, is already has high accuracy, which improvement room is very small. Hence, MGC-VSLAM system has comparable performance to ORB-SLAM2 in a low dynamic sequences.
It is worth noting that ORB-SLAM2 performed unstable in the w_rpy sequence. This sequence scene contains fast-moving targets, and the camera's rotation and translation. Through analysis, the reason is that the w_rpy sequence contains a large number of images with motion blur and weak texture information. Qtree_ORB retains too many weak and dynamic feature points, resulting in unstable feature matching, and then causing system disruption. Conversely, the improved system is relatively stable.
The performance of the visual odometer is shown in tables 9 and 10. The performance of the visual odometer can be effectively improved by the proposed method when the fast-moving objects appear in the scene. Unfortunately, MGC-SLAM was found to degrade the performance of ORB-SLAM2 in the sitting sequence.

2) COMPARISONS WITH THE STATE-OF-THE-ART SLAM SYSTEMS
MGC-VSLAM is compared with state-of-the-art SLAM systems in a dynamic environment. DynaSLAM [19], DS-SLAM [20], Detect-SLAM [33] and the improved system proposed by Lin et al [34] are adopted for comparisons. They are all excellent VSLAM proposed in recent years. The above systems are all built upon ORB-SLAM2, and they adopt RMSE of ATE as a quantitative metric. However, we notice that the results of ORB-SLAM2 in the same sequence are different between these papers and ours. This may be caused by the difference in the experiment condition. So the relative accuracy improvement of each system to ORB-SLAM2 is adopted as the evaluation metric. The relative accuracy improvement is shown in Table 11. In terms of relative accuracy improvement, MGC-VSLAM is only lower than Lin's system in w_static sequence and DynaSLAM in w_rpy sequence. This is largely due to the proposed Grid_ORB and GC method can effectively filer the features extracted on dynamic objects.
The reason why Lin's system is better than MGC-VSLAM in w_static sequence is that the proposed GC method is not sensitive to slow-moving objects; some reserved dynamic features may impact the accuracy of the MGC-VSLAM. However, Lin's system adopts a sematic method to detect a priori dynamic objects to avoid the impact of some dynamic objects on the system, which may improve the accuracy of its system in the w_staic sequence; in the w_rpy sequence, it contains many numbers of frames with a large area of motion blur. Dynaslam has a higher accuracy may due to its deep learning method, which has better adaptability to the scene. However, many weak features extracted by Grid_ORB impacts the decision of the GC method, causing the lower accuracy of MGC-VSLAM.

D. DISCUSSION
MGC-VSLAM is designed to improve the robustness of ORB-SLAM2 in the dynamic environments, which improved the feature extracting method and added a dynamic feature point filtering module in ORB-SLAM2. According to these above tests, the feature points extracted by the proposed Grid_ORB are more uniform than that of extracted from standard ORB [9] and does not exist the over-uniform distribution problem such as in Qtree_ORB [3], [4]. Moreover, the extraction threshold of FAST is set according to the engineering experience value in standard ORB and ORB-SLAM2 algorithm, which cannot better meet the real environments. Grid_ORB adopts an adaptive threshold, which can better suit to the different environments; In terms of efficiency, the running efficiency of the Qtree_ORB is severely affected because each time the child node is divided, it is necessary to determine which child node the feature point in the parent node belongs to, and the more serious problem is that the number of iterations is too large. Grid_ORB extracts feature points in each mesh, the computational efficiency is faster than that of the Qtree_ORB.
The proposed dynamic filtering method called GC can effectively filter the features extracted from Grid_ORB and Qtree_ORB in a high dynamic scene, and it is also useful when the camera with rotation and transaction. But GC is not sensitive to slow-moving dynamic, which may reserve some dynamic features, impacting the accuracy of the SLAM. The biggest reason is that the projection distortion larger than the transformation caused by the moving object.
MGC-VSLAM was tested in public TUM dataset to validate the positing performance, which reduces the average RMSE of ATE in ORB-SLAM2 [3], [4] on high dynamic sequences by approximately 93%. Unfortunately, the stability is reduced by MGC-VSLAM when slow-moving objects appear in the scene. Moreover, the state-of-art SLAM system: DS-SLAM [20],Dynaslam [19],Detect-SLAM [31], and Lin's system [32] are compared with MGC-VSLAM. The results show that the relative accuracy improvement  of MGC-VSLAM is only lower than Lin's system in the w_static sequence and lower than Dynaslam in the w_rpy sequence. In the low dynamic sequence, the performance of MGC-VSLAM is better than all compared SLAM system. It is worth noting that all four systems compared with MGC-VSLAM are based on deep learning methods, which require high-performance graphics cards for support. Conversely, MGC-VSLAM system does not require, so it has lower operating costs.
There are still more ongoing works on MGC-VSLAM. It may be improved in two aspects. First, the dynamic feature point detection in MGC-VSLAM only uses two consecutive frames. Further improvement may be achieved by using more frames, which may provide more redundant information to filter dynamic feature points. Second, MGC-VSLAM currently adopts a hard decision way to determine whether slow-moving objects appear in the scene. We are considering using deep learning methods to identify a priori dynamic objects, and then filter the features on these objects.

IV. CONCLUSION
In this paper, a novel meshing-based and Geometric constrict VSLAM, i.e., MGC-VSLAM, is proposed to overcome the degeneration of the SLAM system in high dynamic environments. MGC-VSLAM uses a new approach called Grid_ORB to extract more uniform and stable ORB features. Then a dynamic feature point filtering method GC was added to the system to filter out dynamic feature points. The following conclusions are obtained through experimental verification: 1) The proposed Grid_ORB can effectively solve the problem of the over-uniform of features extracted in ORB-SLAM2. Moreover, the average matching accuracy and efficiency are higher than those of Qtree_ORB and ORB.
2) The proposed GC method can better filter out the dynamic features in the high dynamic scenes. However, GC is not sensitive to features extracted on the low dynamic object.
3) In the high dynamic test sequences, MGC-VSLAM decreases about 93% of the average RMSE of ATE compared with ORB-SLAM2. However, the stability is reduced in the low dynamic sequences. In addition, the comparison with the four state-of-the-art SLAM systems in dynamic environments shows that MGC-VSLAM achieves the highest relative RMSE reduction with respect to ORB-SLAM2. RUI LI received the M.S. and Ph.D. degrees in mechanical engineering from Xi'an Jiaotong University, Xi'an, China, in 2013 and 2019, respectively. She is currently working with the Xi'an University of technology. Her research interests include intelligent robotics, brain computer interface, and brain controlled prosthesis.
DEXIN LI was born in Yantai, Shandong, China, in 1965. He received the Ph.D. degree in mechanics from the Xi'an Jiaotong University of China, Xi'an, China, in 2003. From 1992 to 2019, he was with the Xi'an University of Technology of China. Since 2002, he has been an Associate Professor with the School of Mechanical and Precision Instrument Engineering, Xi'an University of Technology. His current research interest includes mechanical computer-aided design and manufacturing. VOLUME 8, 2020