A GMS-Guided Approach for 2D Feature Correspondence Selection

Feature correspondence selection, which aims to seek as many true matches (i.e., inliers) as possible from a given putative set while minimizing false matches (i.e., outliers), is crucial to many feature-matching based tasks in computer vision. It remains a challenging problem how to deal with putative sets with low inlier ratios. To address this problem, in this paper, we propose a novel correspondence selection strategy, which is guided by Grid-based Motion Statistics (GMS). We first adopt the GMS to generate a small correspondence set with a high inlier ratio. Then, an accurate geometric model is built using the above correspondence set. Finally, the built geometric model is used to filter the given putative correspondence set to obtain true correspondences. The experimental results on benchmark datasets demonstrate that our proposed approach outperforms the state-of-the-art approaches for putative sets with various inlier ratios, especially for cases with low inlier ratios.


I. INTRODUCTION
Establishing reliable feature correspondences (i.e., matches) between two image feature sets is not only a prerequisite but also a critical step in many computer vision tasks, such as Simultaneous Localization and Mapping (SLAM) [1], object tracking [2], stereo matching [3], image stitching [4], and Structure-From-Motion (SFM) [5]. Generally, finding good feature correspondences is a two-stage strategy, typically carried out in the context of the feature matching. In the first stage, a set of putative correspondence (i.e., putative set) is constructed by using similarity constraints of local image descriptors [6] (e.g., SIFT [7], ORB [8]), where ORB is widely used in real-time applications for its fast speed. However, the putative set inevitably contains a large number of false matches (i.e., outliers) due to ambiguities of the local descriptors, and the large variances in illumination, perspective and clarity. This inconsistency makes massive high-level vision tasks impossible. Therefore, correspondence selection The associate editor coordinating the review of this manuscript and approving it for publication was Huazhu Fu . is widely adopted during the second stage, e.g., [9]- [12], to eliminate the outliers from the constructed putative set. The main goal of the correspondence selection is to seek as many true matches (i.e., inliers) as possible from a given putative set while minimizing false matches.
During the past few decades, although a variety of methods have been proposed to deal with the mismatch removal problem, it remains a challenging problem when dealing with putative sets with low inlier ratios. Almost all of the existing methods perform well in putative sets with higher inlier ratios but poorly in putative sets with lower inlier ratios. To address this problem, the classic pipelines aim to boost the inlier ratio of the putative set using the Nearest Neighbor Similarity Ratio (NNSR) [7] to prune poor-quality correspondences before implementing the correspondence selection approach. However, NNSR is not generally applicable to other descriptors than SIFT [13]. As shown in Fig. 1(b), when the putative set is generated from ORB features, only a small portion of true matches as well as a large portion of false matches are preserved after using NNSR. Therefore, as shown in Fig. 1(c), we can get only a small amount of true matches when we further filter putative set using the correspondence selection method. This is problematic when the image pairs themselves contain very limited true matches, such as matching low-overlapping images in image stitching. Therefore, it is necessary to develop an outlier removal method that can handle putative sets with low inlier ratios and also be applicable to descriptors other than SIFT.
To solve the above-mentioned issue, in this paper, we develop a new strategy that can preserve most of the existing true matches when the putative set contains a large number of outliers or few inliers. Inspired by Grid-based Motion Statistics (GMS) [14], which is a simple, efficient (as efficient as NNSR) and ultra-robust correspondence selection method, our strategy is referred to as a GMS-guided (GMS-G) approach. As we all know, for an outlier removal method based on geometrical constraint, the more accurate the geometric model is, the better the correspondence selection performance has. However, an accurate geometric model cannot be constructed from a putative set with a high outlier ratio. Furthermore, if we want to retain the maximum number of inliers, the putative set selected for the correspondence selection method should include as many true matches as possible. Thus, in our approach, instead of building the geometric model directly from the putative set, we propose to construct the geometric model from a smaller but more reliable correspondence set. In order to do so, we propose to filter the initial putative set using the idea of GMS while retaining all possible true matches to yield the largest number of inliers. As demonstrated in comparative experiments, our strategy can significantly boost true matches and remove false matches, and effectively handle putative sets with extremely low inlier ratios.
The contributions made in this paper are in the following three aspects. (i) We propose a GMS-guided correspondence selection framework that can greatly boost the true matches while rejecting false matches, improving both precision and recall; (ii) Our method is especially advantageous when dealing with putative correspondence sets involving a large number of outliers or a small number of inliers; (iii) Our approach is highly efficient and can be applied to real-time applications.

II. RELATED WORK
Existing 2D feature correspondence selection works can be generally classified into three major categories, i.e., parametric approaches, non-parametric approaches and learning based approaches.

A. PARAMETRIC APPROACHES
The parametric model based methods try to seek consistent correspondences grounded on parametric geometric models. The classic Random Sample Consensus algorithm (RANSAC) [15] and its variants (e.g., PROSAC [16], USAC [17], GC-RANSAC [18], MAGSAC [19]) estimate a geometric model (e.g., a homographic matrix or fundamental matrix) to distinguish between inliers and outliers. They abide by a hypothesize-and-verify strategy, aiming to find the smallest consistent inlier set to fit a given geometric model and estimate a pre-defined transformation by resampling randomly [20]. Correspondences consistent with the geometric model are considered as inliers; otherwise outliers.

B. NON-PARAMETRIC APPROACHES
The non-parametric methods are independent of parametric model assumptions. Some non-parametric model-based methods use the feature similarity constraint or geometric constraint to search the corresponding inliers, such as the Nearest Neighbor Similarity Ratio (NNSR) [7] and Locality Preserving Matching (LPM) [21], [22]. The NNSR [7], using the feature similarity constraint, assigned a penalty equal to the ratio of the closest to the second-closest feature distance to each correspondence, where correspondences with low ratios were treated as inliers. The LPM [21], [22] introduced a mathematical model to select inlier correspondences having similar local neighborhood structures. There are also some constraint-independent non-parametric methods, such as Vector Field Consensus (VFC) [23], [24], Manifold Regularization-based Robust Point Matching (MR-RPM) [25], Robust Feature Matching based on Spatial Clustering Algorithm with Noisy samples (RFM-SCAN) [26] and Grid-based Motion Statistics (GMS) [14]. The VFC assumed that the noise around inliers and outliers fell in different distributions and estimated the probability of inliers by maximum likelihood estimation for parameters in the mixture probabilistic model. The MR-RPM approach [25] enforced the motion field to be smooth under manifold regularization and conquered the matching problem from a robust motion field interpolation perspective. It has shown a promising performance on addressing the deformable matching problem. The RFM-SCAN approach [26] can address image pairs undergoing any transformation models. This method cast the matching problem into spatial clustering with outliers, and customized the classic density-based spatial clustering of applications with noise (DBSCAN) [27] to automatically determine the number of clusters and eliminate the outliers simultaneously. The GMS approach [14] rejected false matches by counting the 36920 VOLUME 8, 2020 number of matches in small neighborhoods and has achieved real-time performance with an efficient grid-based score estimator. Our work in this paper is inspired by this approach.

C. LEARNING BASED APPROACHES
Some new approaches based on deep learning technique have also been well studied and achieved great success in recent years, such as the LFGC [28], LMR [29], and NM-Net [30]. The LFGC [28] firstly proposed a learning approach to find good correspondences by finding geometrically consistent correspondences with a deep network. The LMR [29] casted the mismatch removal into a two-class classification problem, and learned a general classifier to determine the correctness of an arbitrary putative match. The NM-Net [30] proposed a deep classification network that fully mined compatibility-specific locality for correspondence selection.
The approach proposed in this paper belongs to the parametric approaches. Aiming to address the challenges of putative sets with low inlier ratios, inspired by the GMS approach [14] we first generate a rather accurate geometric model using those reliable correspondence set of high inlier ratios. Our approach can greatly boost the true matches while avoiding false matches, improving both precision and recall.

A. THE FRAMEWORK OF GMS-GUIDED STRATEGY
The framework of our GMS-guided strategy for correspondence selection is shown in Fig. 2, where the modules enclosed in the red dash box indicate our contribution.
In this strategy, using the GMS algorithm, an accurate and small correspondence set can first be obtained by filtering the putative set. Then, with the RANSAC estimator, these accurate correspondences are used to construct an accurate geometric model. Finally, this accurate geometric model is adopted to filter the initial putative set to obtain inliers, where almost all of the true matches are retained with very few mismatches.

B. THE GMS ALGORITHM
Now, we briefly review the popular feature matching method GMS [14] used in our GMS-guided strategy.
The GMS algorithm proves that the number of features also contributes to the quality of correspondences besides feature descriptiveness. It shows that the quantity of correspondences in a small neighborhood around a true match is larger than that around a false match under the smooth motion.
Suppose that there is an image pair {I a , is the set of all nearest neighbor feature matches from I a to I b , and C has cardinality In the GMS algorithm, first I a and I b are divided into 20 × 20 non-overlapping cells (grids). For each cell in I a , the cell containing the maximum amount of correspondences is grouped in I b . S ij is a measure of neighborhood support in cell-pair (i, j) and their small neighborhoods (eight cellpairs), which can be estimated as: where C i k j k is the amount of correspondences in the cell-pair (i k , j k ). All correspondences in (i, j) are considered as inliers if where T indicates true correspondences (inliers) and F indicates false correspondences (outliers), τ i is a threshold approximated by α √ n i , and α is a given parameter and n i is the average (of the nine cell-pairs) amount of correspondences.

C. THE GEOMETRIC MODEL ESTIMATION
In this section, we provide the details of the geometric model estimation.
There are several geometric models in two-view image geometry theory [31], such as affine, homography and epipolar geometry. For 2D correspondence selection, we are interested in estimating the homography, which suits more general scenes than other geometric models.
Assume that images are obtained by a perspective pinhole camera, and points are presented by Cartesian coordinates. Suppose that x = (u; v) T and x = (u ; v ) T are two points of a correspondence. A planar projective transformation or homography that maps x to x can be expressed by: where H is a non-singular 3 × 3 matrix.
Since the transformation is defined up to a scaling factor, it can be normalized by scaling h 33 = 1, and as such H can be parameterized by eight parameters. Typically, homography is estimated between images by finding feature correspondences in those images, and can be computed from four correspondences.
Our goal of building an accurate geometric model to distinguish between outliers and inliers is now converted to estimating the homography matrix H from a putative set. Obviously, if we can successfully estimate the homography matrix H , the outliers can be easily identified. Furthermore, if we can calculate H from a putative set with a high inlier ratio, we can get a more accurate geometric model, which can help us to better remove outliers.

D. THE GMS-GUIDED CORRESPONDENCE SELECTION
In this section, we explain the process of our GMS-guided correspondence selection strategy in steps.

1) STEP 1: GENERATING A PUTATIVE SET
As is well-known, the ORB feature outperforms others in real-time applications due to its excellent trade-off between robustness and efficiency. In our GMS-guided correspondence selection strategy, ORB features are extracted from a given image pair {I a , I b }. Moreover, the brute-force matching tries to match all descriptors, it can not only always find the best correspondence, but also be accelerated significantly with modern GPU hardware. Therefore, the putative set C introduced in Part B is built with brute-force Hamming distance comparisons (Nearest-neighbor matching), which can be described as: where x i and x i are two feature points of correspondence of c i . Note that, this putative set usually contains a large number of matches, covering possibly all true matches. Meanwhile, with a low inlier ratio, it also has many false matches due to the limited discrimination ability of the binary ORB features and external obstructions such as repetitive patterns and noise.
2) STEP 2: IMPROVING THE ACCURACY OF THE PUTATIVE SET BY GMS ALGORITHM As aforementioned, our goal is to estimate an accurate geometric model to distinguish between outliers and inliers. Nevertheless, the putative set C contains a large number of outliers.
As the most commonly used robust estimation method for homographies, the RANSAC algorithm [15] has been widely adopted to build geometric models. However, building a geometric model directly from a putative set with high outlier ratios can result in inaccurate geometric models even failure using RANSAC. Therefore, in our GMS-guided strategy, we propose to use the process of the GMS (detailed in Part B) to boost the inlier ratio of the putative set C and obtain a reliable correspondence set C m as: where M is the maximum number of the putative set C m , and indicates relatively more inliers.
In order to accelerate the convergence of RANSAC algorithm and build a more accurate geometric model in the next step, we rank C m by similarity and select L top-ranking correspondences from M (supposing that L is not larger than M ) to obtain a new set as: Here, L is empirically set to 500 in this paper and C s is a smaller and more accurate correspondence set.

3) STEP 3: ESTIMATING THE GEOMETRIC MODEL
In this step, we use the RANSAC algorithm [15] to estimate an accurate geometric model (i.e., the homography matrix H ). The RANSAC algorithm firstly randomly samples several correspondences (at least 4) from the putative set C (if the number of C m > 500, C = C s ; otherwise, C = C m ) and generates the model hypothesis H i for those samples at the i-th iteration. Then, the hypothesis H i is verified via the following object function: where h(·) is a binary function defined by: where and t ransac is the reprojection error with a default value of 3 pixels. After n ransac iterations, the model with the maximum object function is selected as the final model H .

4) STEP 4: REMOVING OUTLIERS USING THE GEOMETRIC MODEL
Finally, the established geometric model H is used to filter the putative set C to remove the outliers. For each correspondence x i , x i in the putative set C, the Euclidean distance, denoted as d x i , y i , between points y i and x i is calculated. Specifically, if the distance is less than a pre-defined inlier threshold t, i.e., geometric model H , and d(·, ·) is the Euclidian distance between the two points, and the inlier threshold t is set to 2.5 following the practice of [32]. It should be noted that, in this step, we filter the putative set C instead of C m to yield inliers, which is the key difference between our approach and the GMS+RANSAC approach. Because the reliable correspondence set C m is generated by the GMS algorithm, which selects true matches by simply computing the number of neighborhood correspondences, it is inevitable to remove true isolate matches with no neighborhood correspondences supporting or true matches with rare neighborhood correspondences supporting. Hence, a portion, and even a large portion, of the true matches in C m will be removed if the putative set distributes sparsely. So, if we filter C m to produce inliers, we will not obtain an inlier set including the maximum number of true matches. Instead, C is generated by brute-force matching, containing almost all of the true matches. Therefore, filtering C can significantly boost the true matches by enlarging the putative set and benefit those matching problems where the image pairs involve very few true matches.

Algorithm 1 GMS-Guided Algorithm
using GMS to filter putative set C; 4: if The numbers of the reliable correspondence set C m > L then 5: Select the first M correspondences with the highest similarity combining C s = x i , x i L i=1 from C m ; 6: Calculate a Homography matrix H using C s ; 7: else 8: Calculate the Homography matrix H using C m ; 9: end if 10: Obtain the geometric model matrix H ; 11: Filter C using H ; 12: if The Euclidian distance d x i , y i < 2.5 then 13: Correspondence x i , x i is identified as an inlier. 14: end if Output: Inlier set We summarize our GMS-guided strategy in Algorithm 1. Because the geometric model constructed by this procedure is very accurate, our GMS-guided method can retain the most correct matches and contains few mismatches.

IV. EXPERIMENTS
In this section, we evaluate the performance of our proposed GMS-guided approach and compare it with the state-of-art approaches on benchmark datasets.

A. EXPERIMENT SETUP 1) IMPLEMENTATION
In our experiments, we use the open-source toolbox OpenCV 3.4 to extract ORB features, and brute-force matching is adopted to generate the putative set uniformly. All experiments are conducted on a 2.2GHZ Intel Core i7 CPU with a 16GB memory laptop.
We compare our algorithm with the state-of-the-art approaches including RANSAC [15], GMS [14], GMS+ RANSAC (referred to as 'GMS-R'), VFC [23], LFGC [28], LMR [29], and LPM [21]. The GMS+RANSAC method adopts GMS to obtain a putative set, and then uses RANSAC algorithm to remove outliers from the obtained putative set, which is listed for comparison to demonstrate the superiority of our strategy.
In the RANSAC algorithm, the number of n ransac is set to 10,000 to get a good balance between performance and speed. In order to optimize experimental results, the scale and rotation variables in the GMS algorithm are both set to True. All other parameters not mentioned here are either default values in OpenCV functions or as described in the original publications, and are consistent throughout all experiments.
The VGG dataset [33] is a hybrid dataset and contains 48 images of eight scenes. These eight scenes cover a range of special interferences such as blur ('bike', 'tree'), viewpoint change ('graffiti', 'wall'), zoom and rotation ('bark', 'boat'), lighting change ('leuven'), and JPEG compression ('ubc'). Each scene consists of six images, and the first image in each scene is a reference image to other images. Therefore, the generality to different conditions and the robustness of specific nuisances can be reflected from experimental results conducted on this dataset. The Heinly dataset [32] comprises of 40 images with dense or sparse viewpoint change, illumination, pure large-scale zoom or rotation. So, it can be used to evaluate the performance under the condition of a geometrical structure deformation (pure zoom or rotation).
The Symbench dataset [34] is composed of 46 image pairs, and each pair includes the same object with lighting changes or different rendering styles. It is meant that the dataset causes image quality variations and gives rise to potential errors in the putative set. The performance in the context of image quality variation can be specifically evaluated. Moreover, all three datasets provide ground-truth.

3) EVALUATION CRITERIA
Same as in [35], Precision, Recall and F-measure are used to measure the performance of the evaluated algorithms. In the following formulas, the putative set, the ground-truth of the VOLUME 8, 2020 TABLE 1. The Inlier Ratio (IR) of the putative set and the comparison of performance, i.e., Precision (P), Recall (R) and F-measure (F), obtained with six state-of-the-art correspondence selection methods and our GMS-guided approach on the seven typical image pairs selected from the benchmark datasets shown in Fig. 3. Results highlighted in red and boldface are the best.
putative set and the selected correct correspondence set are represented as C inlier , C GT inlier , and C correct inlier respectively. Then, the evaluation metrics, i.e., Precision, Recall and F-measure, are defined as: and where |·| denotes the cardinality of a set. where is a reprojected feature point of x i , and H gt is the ground-truth homography matrix.

1) PERFORMANCE ON PUTATIVE SETS WITH LOW INLIER RATIOS
In this part, to demonstrate the performance of the correspondence selection from putative sets with low inlier ratios, experiments comparing six classic and state-of-the-art correspondence selection methods including RANSAC [15], GMS [14], GMS+RANSAC, LPM [21], VFC [23], LMR [29] and our GMS-guided algorithm are conducted.
Note that the selected typical image pairs contain different types of large variations or multiple types of changes, such as rendering and noise ('Archeasy' and 'Townsquare'), significant lighting change ('Daynight' and 'Metz'), large viewpoint change ('Graffiti'), large scale change ('Venice'), zoom and rotation ('Boat'). Moreover, the ORB feature adopted in our approach has relatively lower discrimination ability than the floating-points features such as SIFT. This is true especially for image pairs 'Archeasy', 'Townsquare' and 'Metz', where the variation of the rendering style makes it more challenging to maintain the features' descriptiveness. Furthermore, we set a relatively strict inlier threshold (2.5 pixels) to identify the inliers. Therefore, the obtained putative sets in our experiments have extremely low inlier ratios. Table 1 presents the Precision, Recall, and F-measure statistics of the seven comparison approaches on the seven typical image pairs, along with the initial inlier ratios of the putative sets. As it shows, the inlier ratios of the seven typical image pairs ('Archcasy', 'Daynight', 'Graffiti', 'Venice', 'Boat', 'Townsqure' and 'Metz' from top to bottom respectively) are only 1.38%, 6.15%, 4.93%, 6.65%, 16.27%, 1.12% and 1.2%, respectively. These extremely low inlier ratios of the putative sets make the correspondence selection task very challenging. Fig. 3 shows some examples of the correspondence selection results obtained by our GMS-guided approach and two approaches on the seven image pairs. In the figure, the blue lines indicate the true matches and the red lines indicate the false matches. Note that, due to space limitations, only the results of the two methods that are most relevant to ours (i.e., GMS [14] and RANSAC [15]), and our GMS-guided algorithm are shown in this figure. Several observations can be made from the results shown in Table 1 and Fig. 3: First, except for our GMS-guided approach, the performance of all correspondence selection algorithms deteriorate dramatically because of the low inlier ratios. For instance, the VFC algorithm fails on four image pairs (i.e., 'Archeasy', 'Townsquare', 'Metz', and 'Venice'), and the RANSAC algorithm becomes ineffective on three image pairs (i.e., 'Archeasy', 'Townsquare', and 'Metz'). The main reason is that both of the inlier number and the inlier ratio in the putative sets of these images are very small. In other words, even many high-performing methods are unable to select correct correspondences effectively in putative sets with low inlier ratios.
On the contrary, our GMS-guided algorithm can process putative sets with extremely low inlier ratios without any problem. For example, our GMS-guided algorithm has selected all correct matches without any false matches when the inlier ratio in the putative set is only just 1.38% in the image pair 'Archeasy' (see the first row of Fig. 3), and also achieved almost perfect performance on the image pair 'Daynight' (with an inlier ratio of 6.15%). In addition, in terms of the average Precision, Recall and F-measure, in all the seven image pairs of an average inlier ratio of 5.39%, the performance obtained with our GMS-guided approach, has achieved 92.96%, 92.47%, and 92.62% respectively, which have significantly exceeded those of the comparative approaches.
Last, among the methods compared, GMS and GMS+ RANSAC have both achieved relatively good performance. But GMS has yielded relatively low Precision because it remains a fraction of false matches when removing the mismatches using the motion smoothness constraint (as shown in the third column of Fig. 3). The Precision of GMS + RANSAC is higher, but its Recall is lower, because this method limits false matches at the cost of eliminating most of the true matches at the GMS stage. In contrast, the excellent results of our GMS-guided approach indicate that it can better solve the problem of selecting correct correspondences effectively in putative sets with low inlier ratios.

2) ROBUSTNESS
To assess the robustness of the evaluated algorithms, Fig. 4 reveals the trends of Precision, Recall and F-measure curves among the RANSAC [15], GMS [14], GMS+RANSAC, LPM [21], VFC [23], LMR [29] and our GMS-guided algorithm on the VGG [33] and Heinly [32] datasets. In the experiments, different numbers of ORB feature points (such as 2,000, 4,000, 6,000, 8,000 and 10,000) are selected to get the corresponding numbers of putative correspondences. The average inlier ratio in putative sets is 34.28% on VGG dataset and 33.62% on Heinly dataset. From Fig. 4, we can see that except for the Recall on the VGG dataset, our GMS-guided approach has achieved the highest performance in terms of Precision, Recall, and F-measure on both datasets. The RASANC algorithm produced satisfying results because we set a big enough number of iterations and the inlier ratios of the putative sets are relatively high in our testing. LPM and LMR's performance is not outstanding because they are better and more suitable for putative sets with higher inlier ratios. Therefore, in our experiments, they have not shown any competitive advantages. VFC and GMS both have high Recall but low Precision. In contrast, our method has the best Precision-Recall trade-off.
Another thing worth of noting is that, the curves associated with Precision, Recall, and F-measure fluctuate slightly with the increase of the number of ORB features in Fig. 4. On the VGG dataset, the inlier ratio in the putative set reaches to the maximum (35.37%) when the number of ORB features is 4000, then drops to the minimum (33.2%) when the number of ORB features is 10,000. On the Heinly dataset, the inlier ratios in putative set have only minor changes and the trend decreases as the number of ORB features increases.
It can be seen clearly that on both datasets, the Recall curve of GMS rises with the increasing number of ORB features, indicating that GMS needs more matches for statistics to select the true matches. On the Heinly dataset (in Fig. 4(b)), we can see that with the increase of the ORB feature number, all three curves of VFC and LPM rise at first, reach the highest values when the ORB feature number is 6000, and then drop. This shows that, for LPM and VFC, a larger number of matches or true matches will lead to good performance to some extent, and 6000 matches is an optimal value in our experiments.
On the VGG dataset (in Fig. 4(a)), the trend of the three curves of RANSANC is consistent with the trend of initial ratio when the number of ORB features increases, showing that its performance is affected more by the initial inlier ratio. If we only consider the influence of the number of ORB features, LMR is most robust, and our method has the best performance among all methods.
To further test the robustness of our GMS-guided approach, some specific nuisances (e.g., zoom, rotation, blur, viewpoint change, light change, and JPEG compression) are adopted. Table 2 compares the average results obtained with different algorithms in terms of Precision, Recall and F-measure against various nuisances. As can be seen from Table 2, our GMS-guided approach has obtained the highest accuracy on Precision and F-measure besides viewpoint change. Also, only our algorithm has achieved both high Precision and high Recall, among all the tested algorithms. LMR shows excellent performance in three nuisances, i.e., zoom, rotation, and viewpoint change. VFC always has a high Recall under various nuisances.   All in all, the results in Table 2 and Fig. 4 show that our GMS-guided algorithm has the best robustness compared with the-state-of-the-art methods.

3) GENERALITY OF DIFFERENT FEATURE DESCRIPTORS
To this end, we evaluate the putative sets constructed from different feature descriptors such as SIFT [7], ORB [8] and SURF [36] on the VGG dataset. The average values of F-measure of the eight competitors are summarized in Table 3.
It should be noted that we add the LFGC for comparison in this table, some of the data are cited from literature [29], and we select the best result (LMR-RF-10) from the LMR paper for comparison.
As shown in Table 3, the inlier ratios of the correspondence sets are 52.24%, 88.10%, and 57.61% respectively. Our GMS-guided algorithm can achieve the best performance with both SIFT and ORB feature descriptors and the second best performance with SURF feature descriptors. As a learning based approach, the LFGC method performs poorly because it is designed for large baseline image matching and aims to accurately recover the transformation matrix.
Thus, we can draw a conclusion that our GMS-guided approach does not rely on any specific feature descriptors, and can work well on putative sets with both high and low inlier ratios.

4) TIME EFFICIENCY
For the convenience of the reader, both of the selection performance and efficiency are taken into consideration. Fig. 5 (a) and (b) presents the efficiency v.s. F-measure plots on VGG and Heinly datasets respectively. Note that the run time does not include the time of initial matching.
As shown in Fig. 5, our GMS-guided algorithm has achieved the best F-measure performance and good computing speed. In fact, for the GMS series (such as the GMS, GMS-guided, and GMS+RANSAC algorithm), there is little difference in their speeds. So, our GMS-guided algorithm can achieve a good balance between the selection performance and efficiency. Besides, our GMS-guided algorithm only requires dozens of milliseconds for mismatch removal from thousands of putative matches to achieve comparable results. Therefore, our proposed GMS-guided algorithm outperforms the state of the arts on both effectiveness and efficiency in terms of dealing with the correspondence sets of both high and low inlier ratios.

V. CONCLUSION
In this paper, we have proposed a GMS-guided correspondence selection strategy to handle the extreme outlier problem, which has significantly boosted true matches without sacrificing accuracy. The comparative results on testing benchmark datasets for correspondence selection have demonstrated that our GMS-guided algorithm can effectively improve the correspondence selection performance and obtain good computing speed. Therefore, it can be applied in Real-time Visual Simultaneous Localization and Mapping (VSLAM) systems. The results in putative sets with low inlier ratios also indicate that our method is effective for addressing the problem of the putative sets with low inlier ratios. This means that our method can also be used in matching low-overlapping images in image stitching.