Image Feature Matching Via Parallax Mapping for Close Range Photogrammetric Application

Image feature matching is an important step in close range photogrammetric applications, and implementing a fast, accurate and robust feature-based image matching technique is a challenging task. To solve the problems of traditional image feature matching algorithms, such as their low accuracy and long processing time, we present a parallax mapping-based matching (PMM) method that is able to improve the computation efficiency, accuracy and robustness of feature-based image matching for close range photogrammetric applications. First, the disparity of the initial corresponding points is calculated, and the mismatches are initially removed via local disparity clustering. Then, the local coordinates of the matching points are constrained by an image division grid. To ensure the correctness of the matching points, the fast polynomial transform is used for as a quadratic constraint. The method can extract high accuracy matching points from the original coarse matching points with low accuracy, and also preserve the true matching points to the greatest extent. Using a variety of experimental datasets and current mainstream feature extraction algorithms, we designed and compared commonly used feature-based image matching algorithms. The experimental results show that the proposed method is simple but effective, can meet the real-time calculation requirements, and outperforms the current state-of-the-art methods.


I. INTRODUCTION
Image feature matching is one of the basic research directions in the fields of close range photogrammetry and computer vision. It has been widely used in camera calibration [1], 3D reconstruction [2], and pattern recognition [3]. The usual feature matching algorithm follows a two-step strategy [4], [5]. First, the initial corresponding points are calculated using feature similarity constraints, such as fast approximate nearest neighbors (FANN) [6] and brute force (BF) method. However, due to the inherent defects of the feature description algorithms, there are usually mismatches. Then, the secondary image matching or matching purification method is performed to reject mismatches [7]. The main issue at this step is to remove as many false matches as possible and keep the true matches.
The secondary image matching method mainly uses the corresponding points model to solve the problem by optimizing the model parameters and applying point constraints [8]. Therefore, the matching algorithm can be developed in two directions [9]: one is the establishment of the corresponding points model, and the other is the optimization of the The associate editor coordinating the review of this manuscript and approving it for publication was Paolo Napoletano .
algorithm that estimates the model parameters. Commonly used constraint models mainly include coplanar geometric homography matrix models, epipolar geometric fundamental matrix models, linear transformation models, affine transformation models, and projection transformation models [10], [11]. Commonly used model parameter solution algorithms include the M-estimation algorithm [12], the least median of squares (LMedS) algorithm [13], and the random sample consensus (RANSAC) algorithm [14].
Corresponding points models are mostly implemented using a priori constraints between corresponding image points. The constraint model can be divided into two categories [7], [8]: the strict geometric constraint model (parametric model) and the probabilistic constraint model (nonparametric model). Strict geometric models implement strict geometric relationship constraints between corresponding points, such as polar line geometric constraints and coplanar geometric constraints [15]. Probabilistic constraint models implement local probability constraints between corresponding points, such as motion consistency constraints and local composition consistency constraints [16].
Although the strict geometric model constraints can strictly perform the purification operation between initial corresponding points, they require solving higher precision VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ model parameters, and the model parameters often have a certain locality and cannot be well adapted to matching points in complex scenes [9]. When using the homography matrix or the fundamental matrix iterative solution to match points, due to the existence of image distortion and the inconsistency of the scene depth, the solved homography matrix or the fundamental matrix cannot satisfy the threshold constraint for all correctly matched points. Thus, the calculation time is too long or correctly matched points are lost due to the parameter solution problem, as shown in Ref. [14]. The probabilistic constraint model uses a regular statistical method to match points. Generally, the statistical rules of the local matching points are used to constrain each other, and rich matching points can be obtained. However, the constraint model has too many parameters to solve, which can consume too much calculation time, as shown in Ref. [17]. Algorithms for estimating model parameters can generally be divided into two categories: robust regression estimation and random parameter estimation [18]. Robust regression estimation iteratively eliminates mismatches to obtain robust model parameters, but it only applies to cases with high original accuracy rates, such as the M-estimation [12]. The random parameter estimation uses a small number of randomly selected point sets to estimate the model parameters by continuously trying to obtain the optimal solution. This method can be applied to the case where the original accuracy rate is low, such as RANSAC [14]. Among them, the RANSAC algorithm can robustly find the inliers in a point set that has a large number of outliers and has been widely used. Several improved algorithms have been derived, such as the preemptive RANSAC (P-RANSAC) [19], randomized RANSAC (R-RANSAC) [20], and USAC [21]. However, the RANSAC algorithm needs to select a suitable initial set of inliers, and the matching result can easily fail into the local optimal solution, resulting in the omission of the correctly matching points. Furthermore, when the mismatch rate increases, the number of iterations also increases, resulting in a sharp increase in the computation time [9].
In recent years, some new probabilistic constraint models or nonparametric models have also been developed. The identifying point correspondences by correspondence function (ICF) [22] algorithm converts the mismatch removal problem into a correspondence function pair that maps the points in one image to their corresponding points in the other image. The vector field consensus (VFC) [23], [24] algorithm rejects mismatches by learning a smooth field to fit the potential inliers and it estimates a consensus inlier set. Similar to the VFC, the L2-minimizing estimate-based method (L2E) [25] also uses a nonparametric spatial transformation to deal with outliers. Ref. [26] describes a dual decomposition optimization technique for graph matching, in which they formulate the matching task as a complex cost function to remove mismatches. Ref. [27] scores candidate matches using the most promising neighbors, and uses them to update the neighbors for graph matching. Progressive graph matching (PGM) [28] alternately performs graph probabilistic progression and graph matching steps for graph matching. Ref. [29] formulates image matching as a clustering problem, and presents a linkage model and dissimilarity metric using the hierarchical agglomerative clustering (ACC). Ref. [16] propose an algorithm called Locality Preserving Matching (LPM) that uses the difference of the local neighborhood structures of inliers and outliers to identify mismatches from the given putative image feature correspondences. Ref. [30] casts feature matching as a spatial clustering problem with outliers, customizes the classic density based spatial clustering method of applications with noise (DBSCAN) to the context of feature matching, and proposes a robust feature matching algorithm based on spatial clustering, which is called RFM-SCAN. Ref. [31] proposes a robust point matching approach based on manifold regularization that is called robust point matching using manifold regularization (MR-RPM). The virtual line descriptor and semilocal matching (K-VLD) [32] method uses the local composition consistency of matching points, defines the virtual line descriptor, and designs a semiglobal matching strategy to match and purify points on the basis of an initial matching. The method obtains rich matching points and optimizes the accuracy and sufficiency, but the computational time is expensive. The coherence based decision boundaries (CODE) [17] algorithm uses motion consistency theory, combines it with the characteristics of affine-invariant scale-invariant feature transform (ASIFT) [33] descriptors, and solves the motion consistency constraint function to optimize the matching points based on the initial matching, which can obtain richer matching points than traditional method, however only ASIFT feature descriptors are considered and the computation time is too long. The grid-based motion statistics (GMS) [34] algorithm uses the local consistency of matching points and simple grid statistics for matching, can obtain rich matching points, is greatly optimized in terms of its efficiency and outperforms other real-time matching algorithms. As mentioned earlier, these methods do not use parametric geometric models as strict constraints; thus, relatively low-precision noisy matches may be difficult to separate from high-precision correct matches.
Although some matching methods work well, all of them have their own advantages and application scopes but are difficult to robustly and accurately integrate to achieve real time performance. To solve these problems, this paper utilizes motion consistency theory, parallax mapping and image calibration theory, and presents a parallax mapping-based matching (PMM) method for close range photogrammetric application in the rigid case. The parallax between the initial corresponding point is calculated, and the local motion consistency is reflected by the cumulative connected domain calculation between the parallax coordinates. Then, the local coordinates of the matching points are constrained by an image division grid. To ensure the correctness of the matching points, the fast polynomial transform is used as the quadratic constraint. Our method is effective and robust, can meet the requirements of real-time calculations, and outperforms the state-of-the-art methods in terms of its accuracy. The process is shown in Fig. 1.
In summary, our contributions are as follows: • The local motion consistency is transformed into two consistency discriminant criteria. We show that these constraints can robustly achieve feature matching; • An effective image feature matching method based on parallax mapping is designed, which can be used in real-time feature matching applications; • A complete experiment is designed, and the performances of current mainstream feature extraction and matching algorithms are compared and analyzed; • PMM is demonstrated to be significantly more accurate than RATIO (second nearest neighbor distance ratio test) [35], RANSAC [14], K-VLD [32] and the recently proposed GMS 34], LPM [16], and RFM-SCAN [30] methods. The structure of the rest of this paper is as follows: a detailed description of the method we proposed is given in Section II, the details of our experiment are given in Section III, the discussion of the results is given in Section IV, and a summary and our expected future work are finally given in Section V.

II. METHOD
In this section, the PMM feature matching method is introduced. This method takes close range photogrammetric images (rigid case) as the main matching objective. First, we construct the initial feature matching set, which is obtained using the BF or FANN algorithm. Then, we consider the local motion consistency, calculate the parallax map, and perform local disparity clustering, which effectively removes the outliers from a large number of initial matches. Then, the image is divided into grids, and the local coordinates of the matching points are constrained by the image division grid. Finally, using the polynomial constraint model, complete image feature matching.

A. MOTION CONSISTENCY THEORY
The theory of motion consistency refers to the consistency of the local-pixel motion of an image and describes the basic law of the image content changes in a scene [36]. The image motion field characterizes the variation in the local-pixel motion of an image. The motion-smoothing function describes the consistency of the local-motion field, and it is introduced below.

1) MOTION FIELD
At a certain moment, a point p 0 on the surface of an object corresponds to a point p i on an image, and the geometric relationship between the two points is determined by the projection equation. For pinhole camera photography, photographic light follows the perspective projection equation, that is, the object point, the image point and the projection center should be on the same beam of photographic light. Suppose the object point p 0 moves at a speed v 0 relative to the camera. The motion of object point p 0 causes the corresponding image point p i to move at a speed v i on the image plane. Suppose that during the time period δt, the point on the surface of the object moves by δr 0 = v 0 δt, and then the point p i on the corresponding image moves by δr i = v i δt. Therefore, the speed of motion of the points is defined as v 0 = dr 0 dtv i = dr i dt (1) where the relationship between r 0 and r i can be determined by the perspective projection equation Eq. (2) is used to guide t, and we can obtain Thus, through Eq. (3), a velocity vector can be assigned to the image point p i , all of which constitutes the motion field.
Locally adjacent points on an object have similar velocities, and so the following inferences can be obtained: Inference 1. In most areas of an image, the motion field obtained by Eq. (3) is continuous; Inference 2. Exceptions occur only in the image area corresponding to the contour of the object, that is, the motion field is discontinuous in these areas.

2) MOTION FIELD SMOOTHING FUNCTION
The motion field in most areas of an image is continuous, and so every point in the motion field {p i , v i } can be described using the function f (p). The function f (p) can be linearly combined by a series of smoothing functions {f k (p)} k=1,2,...,K [17], [37], [38] v where a k (p) is the corresponding weight function. The observation datav i corresponding to the image point p i contain noise, and so Eq. (4) can be corrected tô where n i is the noise. According to the research in Ref. [28], the series smoothing function f k (.) can be composed of the following two items: where H k is the offset and φ k (p) is the smoothing function under the motion consistency constraint. For the entire motion field function f (p), the following energy minimization conditions should be met: where λ is the corresponding weight value and (.) is the Huber function is the motion consistency constraint imposed on the function φ k (.) to suppress the noise n in the observation datav whereφ k (.) is the Fourier transform of the function φ k (.), and g(.) is the Fourier transform of the Gaussian function with the standard deviation γ . The motion field smoothing function describes the consistency of the local-pixel motion of the image. Therefore, the motion field smoothing function can be used to apply motion constraints to image points. The partial image points are used to solve the motion field smoothing function, and then the solution function is used to constrain all image points so that the mismatches can be rejected.

B. OUR APPROACH
The motion field smoothing function mentioned earlier is complex. If we directly use it to perform point matching, the costs of solving the model parameters are expensive. Therefore, further consideration needs to be given to how to simplify the overall solution process and obtain a practical solution. In this section, the parallax mapping method is used to directly skip the model-solving step and refine the matching points.

1) CORRESPONDING POINT PARALLAX
Consider a simplified stereo camera imaging model and assume that the optical axes of the two cameras are parallel to each other and perpendicular to the photographic baseline. The origin of the coordinate system is selected as the midpoint of the photographic baseline. For the object points whose coordinates are (x, y, z) T in the scene, the corresponding image points are (x l , y l ) and (x r , y r ), respectively. Then, the following can be obtained from the perspective projection equation: and, where f is the focal length. The following can be obtained by combining Eq. (10) and Eq. (11).
The transformation relation between object point coordinates and image point coordinates in the scene is described in Eq. (12), where the shift x = x l − x r is called the parallax. More generally, for ordinary stereo images, the coordinate difference between two image points x = x l − x r is the left-right parallax and y = y l − y r is the up-down parallax. The image parallax is related to the object coordinates when the photographic baseline b and the photographic focal length f remain unchanged. The smaller the parallax is, the farther the object points are from the camera; and the larger the parallax is, the closer the object points are to the camera. The change of the parallax reflects the change of the object points and the change of the scene motion field. Therefore, the parallax of the image can be used to represent the overall motion field of the scene.

2) PARALLAX MAPPING-BASED REJECTING MISMATCHES
The essence of matching purification between images is to use known constraints to eliminate mismatches. The parallax ( x , y ) between the corresponding points (p l , p r ) reflects the motion of the object relative to the camera and characterizes the overall motion field v of the image. There is a consistency constraint between the motion fields of the image, as described by the motion-smoothing function f (p). Therefore, the motion-smoothing function f (p) can be used to refine the matching points. However, the motion-smoothing function is complex and is difficult to solve. Ref. [17] simplifies the motion-smoothing function and designs an algorithm to solve the motion-smoothing function, but the time consumption cannot be ignored. The purpose of solving the motion-smoothing function is to constrain the matching points and achieve matching purification. That is, the purified matching points can realize the optimal solution of the motion-smoothing function. Therefore, we can find the matching points that can satisfy the conditions of the motion-smoothing function, skip the function-solving step, and directly purify the matching points. Finding sufficient conditions or approximate equivalence conditions for the solution of the motion-smoothing function can achieve the matching purification of matching points. Combining the motion-smoothing function f (p) and the image parallax, the motion consistency constraint of the local pixel in the image can be converted into the following two conditions. Condition 1. The parallax of the matching points should have a certain continuity.
For two adjacent correct matching points (x 1 , y 1 ) and (x 2 , y 2 ), the corresponding matching coordinates are (x 1 , y 1 ) and (x 2 , y 2 ), and the corresponding parallax coordinates are( Condition 1 can be expressed as follows: This condition characterizes the continuity between the motion fields and describes the consistency of the motion between images. This condition is characterized by the corresponding parallax map having a certain continuity or local-clustering characteristics. The parallax map of the image is formed by mapping the left-right parallax values of the matching point to the abscissae and the up-down parallax values to the ordinates. Condition 2. The coordinates of the matching points should have a certain continuity. For the above two adjacent correct matching points, condition 2 can be expressed as follows: This condition characterizes the continuity between the object points, confining the image points within the local area. This condition is expressed as the corresponding pixel coordinates having a certain continuity or local clustering, that is, there should be no abnormal outliers.
In Fig. 2, the red dots and the green dots represent feature points. For the object points with local consistency, the image point coordinates have a certain continuity, and the corresponding parallax map also has a certain continuity.
The above two conditions approximate the motion consistency and can satisfy the solution of the motion-smoothing function. By using these two conditions as constraints, the purification between matching points can be achieved. Next, we will focus on how to integrate these two conditions into the image matching process.
The existing feature extraction and description operators can be used to obtain rich original matching points, and most of the points have good continuity with the coordinates. Therefore, it is possible to apply the constraint from Condition 1 and then the constraint from Condition 2. To improve the motion consistency constraint of these matching points, first, the continuity value of the parallax value can be judged, and then the point coordinate continuity is judged, as follows.
1) After matching the points, calculate the parallax value of the initial matching points of one image with respect to the other image by using one image as a reference.  greater than the set threshold α as the rough purification result. 4) Determine whether the coordinates of the matching points in the selected cumulative connected domain have a certain continuity and eliminate the outliers to obtain the final purification result. In the above steps, the connected domain in step 3) characterizes the continuity of the parallax value, and satisfies Condition 1. The connected domain statistics should consider the gray values of the parallax map, where the gray value is the cumulative number of mapping parallax points. In step 4), the judgment of the coordinate continuity of the matching points satisfies Condition 2, and when the continuity judgment is performed, since the values of the matching points are discrete, the actual operation may be performed according to whether the point is a clustering point. The nonclustering points are eliminated as outliers to complete the constraint purification of the matching points. The overall process is shown in Fig. 3.

3) OPTIMIZATION OF THE MISMATCHES REJECTION VIA GENERALIZED POLYNOMIAL MAPPING
Parallax mapping can be used to achieve fast feature matching. To achieve higher accuracy, further geometric constraints can be used to perform secondary matching purification between image points. Commonly used geometric constraints include the translational transformation, the affine transformation, the perspective transformation, epipolar constraints, and homography constraints. The essence of geometric constraints is to establish a suitable geometric model to constrain matching points. However, the existing geometric constraint relationships are mostly designed for a particular situation and cannot be adapted to a wider range of situations. For example, in the case of epipolar constraints, an epipolar model between images is needed, but there is no strict epipolar constraint relationship between the images acquired by the line sensor. The homography constraint is used to constrain the image in the same plane, but it is not applicable to the case where the scene depth change is complicated. Therefore, this section attempts to design a more generalized polynomial mapping model to achieve perfect matchings for in multiple types of images.
Consider the motion consistency constraint. For the corresponding points (x l , y l ) and (x r , y r ), there is a locally continuous parallax ( x , y ), resulting in According to Weierstrass's first approximation theorem [39], there is a polynomial approximation for the continuous function on a closed interval. A polynomial approximation can be used to describe the relationship between a local point and the parallax. Then, we can describe the local continuous parallax ( x , y ) as a cubic function of the image point (x r , y r ) as Bringing Eq. (16) into Eq. (15), the constraint relationship between the local corresponding points (x l , y l ) and (x r , y r ) can be obtained. Thus, the generalized geometric constraint model can be defined by motion consistency theory as follows: x l =ā 0 +ā 1 x r +ā 2 y r +ā 3 x r y r +ā 4 x 2 r +ā 5 y 2 r +ā 6 x 2 r y r +ā 7 x r y 2 r +ā 8 x 3 r +ā 9 y 3 r y l =b 0 +b 1 x r +b 2 y r +b 3 x r y r +b 4 x 2 r +b 5 y 2 r +b 6 x 2 r y r +b 7 x r y 2 r +b 8 x 3 r +b 9 y 3 r Eq. (17) defines the motion consistency relationship between local corresponding points. The essence of this relationship is to simplify the motion-smoothing function and describe it as a relatively simple local polynomial, which can allow for more stringent purification optimization of the matching points. Since the matching points have been preliminarily purified, it is faster and easier to determine the polynomial optimization solution. The overall process is shown in Fig. 4. Therefore, the overall PMM process is as follows. First, the initial feature matching is performed for the input image pairs. Then, the parallax map is calculated, and the fast connected domain analysis is performed by using the two-pass algorithm. Since there are matching points corresponding to the connected domains whose accumulated numbers are greater than the selected threshold, the Condition 1 constraint is implemented. Then, the image is divided into grids, and the image points falling in each grid are counted. If the accumulated numbers in the grid are greater than the threshold, it is considered that the image points in the grid have clustering characteristics, and so the Condition 2 constraint is realized. Finally, using the polynomial constraint model, the matching points in each connected domain are constrained, the error matrix is computed, the polynomial parameters are solved  via singular value decomposition, and the fast purification of points in each connected domain is achieved, thereby completing feature matching. The overall process is shown in Fig. 5.

C. IMPORTANT DESIGN CONSIDERATIONS AND PROPERTIES
Parallax map clustering. In practical applications, a large number of initial matching points should be obtained so that relatively continuous local feature points or parallax maps can be formed. However, in the case where the partially extracted matching points are relatively discrete, a good mapping result may not be formed on the parallax map. Therefore, for this case, in the parallax mapping, the mapping can be performed by means of region expansion. For example, if the point x , y has a parallax mapping coordinate x , y , the point x , y in the parallax map is accumulated when  mapping, but to extend the parallax continuity, the neighborhood around the point x , y can be accumulated, as shown in Fig. 6. This kind of regional extension mapping, which is similar to the expansion operation in image morphology, can better extend the local continuity of the parallax map. Similarly, when the Condition 2 constraint is imposed, the extracted feature point coordinates are not continuous. In this case, the point clustering method can be used to eliminate the outliers and improve the overall operational robustness, as shown in Fig. 7.
Quadratic purification of local generalized polynomial constraints. The matching based on parallax mapping, which has been able to perform well, can meet the requirements of most applications. However, in some applications where the accuracy requirement is more stringent, such as camera calibration, polynomial quadratic optimization can be performed to improve the robustness of the matching results. The local generalized polynomial is essentially the simplification of the motion-smoothing function in the local image region. In practice, generalized polynomial constraints can be used for different connected domains in the parallax map to reflect the locality. In addition, the matching and purification operations have been performed via Condition 1 and Condition 2, respectively, and the remaining matching points have a high accuracy rate. When the generalized polynomial is solved, the solution process is also very fast and consumes relatively small computing resources. In addition, in this paper, the cubic polynomial is used as the polynomial function. We use this particular polynomial because when the VOLUME 8, 2020 degree of the polynomial is too high, the Longge phenomenon easily occurs; and when the degree of the polynomial is too low, it cannot adequately describe the constraints between matching points.

D. IMPLEMENTATION DETAILS
Condition 1 constraint implementation details. To achieve the Condition 1 constraint and to speed up the connectivity of the parallax map after the mapping, we use the two-pass method [40] to conduct fast analysis of the connected domain.
Condition 2 constraint implementation details. When the Condition 2 constraint is implemented, we use the fast coordinate-grid judgment method to determine whether the points are continuous due to the discreteness of the point coordinates. The local consistency judgment is performed on the image points after the Condition 1 constraint is implemented. Because of the discreteness of the matching points, the image is divided into a grid. If the number of image points falling within the grid is greater than the threshold, then the local image points falling within the grid are considered to have continuity; otherwise, the rejection is performed.
Polynomial constraint implementation details. When applying a polynomial fitting constraint, the number of correctly matched points is the main component. Here, the singular value decomposition method is used for rapid iterative optimization. The specific algorithm comes from our previous research [9].

III. EXPERIMENTAL RESULTS
Image matching is mainly used to analyze the correspondence between image pairs in the same scene. Image matching has three main functions [41] in close range photogrammetric applications: similarity measurement, which is mainly used for image retrieval, target retrieval, target recognition and geolocation; geometry estimation, which is mainly used for camera calibration and other aspects; and data association, which is mainly used for 3D reconstruction, simultaneous localization and mapping (SLAM), image stitching, and visual tracking. Based on these functions, to discover the practical application requirements of the image matching algorithm, we used a variety of datasets and multiple indicators for the evaluation and analysis.

A. DATASETS
Four datasets were used for the evaluation, including ordered video images and unordered images. Ordered video images include ordered indoor video images and ordered outdoor video images; and the unordered images include unordered single-source images and Internet images. The ordered indoor video images, ordered outdoor video images, and unordered single-source image data are provided in Ref. [41], and the unordered Internet image data are downloaded from Flickr. The specific dataset information is shown in Table 1 and Fig. 8.
Among them, the Office data are taken from the TUM dataset [42], which are indoor order video image data; the  Street data are taken from the KITTI benchmark [43] which are outdoor video ordered image data; the Castle data are taken from the Strecha dataset [44], which are single source unordered image data; and the Church data are downloaded from Flickr, and the source is complex and unordered. In the image matching experiments, the image pair was constructed using the method from Ref. [41]. For the ordered Office data and the Street data, the original video sequences were divided into n nonoverlapping fragments, and each fragment contained k frames. The first image of each fragment is set as the reference image, which is matched to the next k − 1 frames. Among them, the Office data original video frame rate is 30 fps, and k is set to 15. The Street data original video frame rate is 10 fps, and k is set to 5. For the unordered Castle data and Church data, a pairwise image traversal matching method is used to construct image pairs for the matching experiments. These four datasets can effectively cover multiple close range photogrammetric image matching applications, and the experiment can be used to comprehensively characterize the performance of the image matching algorithm in practical applications.

B. EVALUATION METRICS
Regarding the evaluation metrics, considering the accuracy, sufficiency, and efficiency requirements of the algorithm, the matching algorithm used in the overall image matching process is evaluated by three metrics: the correct rate, the loss rate and the time consumption.
The correct rate is the percentage of correctly matched points out of all matching points. The loss rate is the percentage of points that are ultimately lost out of all the original correctly points. The time consumption is the total time required for the overall matching process. From the definition, the higher the correct rate is, the lower the loss rate is and the shorter the time consumption is, the better the performance of the algorithm. The key to evaluating the performance is being able to distinguish the correctly matched points. Since the number of matching points in the experiments is large, it is difficult to carry out traditional manual identification. Here, a triangulated back-projection error constraint is used to determine whether the matching points are correct or not.
Knowing the camera intrinsic parameters, rotation vector and translation vector of the two images, the matching points (x l , y l ) and (x r , y r ) can be triangulated to calculate the object point coordinate (x,ȳ,z). Then, back-project the calculated object points to the image to obtain the back-projection image points (x l ,ȳ l ) and (x r ,ȳ r ). Next, calculate the difference between the coordinates of the back-projection image points and the original image points, (δx l =x l − x l , δy l =ȳ l − y l ) and (δx r =x r − x r , δy r =ȳ r − y r ), respectively. Thus, we can calculate the triangulated back-projection error of the matching points as δr = max( (δx l ) 2 + (δy l ) 2 , (δx r ) 2 + (δy r ) 2 ) (18) For ideal conditions, the back-projection error of the matching points should be zero, but due to the existence of matching errors, parameter errors and calculation errors, the real back-projection error should be small. Therefore, it is possible to determine whether the matching points are correct or not by setting different triangulated back-projection error thresholds ε. In the experiment, the Office, Street, and Castle datasets have real camera intrinsic parameters, rotation vectors, and translation vectors that can be directly used. The Church dataset consists of Internet images, and its intrinsic camera parameters, rotation vector and translation vector are obtained by the COLMAP [45], [46] adjustment.
In the experiment, RATIO, GMS, K-VLD, RANSAC, LPM, RFM-SCAN and our algorithm were used to purify the initial matching points. Our algorithm is divided into four types: PMM1, in which only the Condition 1 consistency constraint is used for matching; PMM2, in which all consistency constraint conditions are used for matching; PMM3, in which the Condition 1 consistency constraint and the polynomial optimization are used to reject mismatches; and PMM4, in which all consistency constraint conditions and the polynomial optimization are used to reject mismatches.
We set the cumulative numbers threshold in the parallax map connected domain to α = 10. Points are considered as continuous clustering points if the number of local clusters is greater than 2. The polynomial error threshold is set to 3, the triangulated back-projection error threshold is set to ε = 3, and the RATIO threshold is set to 0.8. RANSAC uses the fundamental matrix model, and the error threshold is set to 3. GMS, K-VLD, LPM and RFM-SCAN use the default parameters. For all feature extraction and matching algorithms, the triangulated back-projection error calculations are implemented by using OpenCV functions with their default thresholds. Among the ORB feature extraction parameters, we set the maximum number of features detected. We set the number to 100,000 to remove this limitation to a fair comparison. The experimental platform is configured with an i7-CPU and 16 GB of memory, and our algorithm is implemented in C++ and Matlab. Due to the memory limitations of the experimental platform, the K-VLD algorithm is not used for the experiment when the number of original matching points is greater than 20,000 in the fourth dataset.

D. EXPERIMENTAL RESULTS
The experiments were performed using the four datasets. The experimental results are shown in Figs. 9 to 10 and Tables 3 to 6. Fig. 9 shows the matching results of the 10 different feature matching algorithms and 15 feature algorithms for the four datasets. To facilitate the visual statistical analysis, the matching results of all image pairs constructed in each dataset are averaged and then visualized. Fig. 9(a) shows the experimental results of dataset 1, (b) shows the experimental results of dataset 2, (c) shows the experimental results of dataset 3, and (d) shows the experimental results of dataset 4. In the experimental results of each dataset in Fig. 9,  the figure on the left shows the statistical results of the final matching points. The horizontal axis is the 15 feature algorithms, and the vertical axis is the average number of final matching points for each feature algorithm. The middle figure shows the correct rate results of the final matching points. The horizontal axis is the 15 feature algorithms, and the vertical axis is the average correct rate for each feature algorithm. The figure on the right shows the loss rate of the final matching points. The horizontal axis is the 15 feature algorithms, and the vertical axis is the average loss rate for each feature algorithm. The result in Fig. 9 is the average of the experimental results of all the constructed image pairs in each dataset using the different feature matching methods with different feature algorithms. Fig. 10 shows the visual results of the four datasets using PMM1 and the SIFT feature algorithm.   Tables 3 to 6 show the time consumption statistics of the GMS, K-VLD, RANSAC, LPM, RFM-SCAN and PMM feature matching algorithms using different feature algorithms on the four datasets. The first column in each table is the names of the different feature algorithms, the second column is the average number of initial matching points out of all the image pairs in the corresponding dataset, and the third column is the average initial correct rate out of all the image pairs in the corresponding dataset. The fourth through the tenth columns are the average time consumption for each of the different feature matching algorithms.

IV. DISCUSSION
In Fig. 9 and Tables 3 through 6, the results are the averages of the experimental results for all the image pairs constructed under the dataset, which increases the robustness of VOLUME 8, 2020 the analysis and makes the experimental results have more practical referential significance. Fig. 9 shows that for the different datasets, the different feature matching algorithms have the same trend in the number of matching points obtained under different feature algorithms. Among the ordered video images (dataset 1 and dataset 2), the ORB, FAST, and AGAST feature algorithms in the experiment have more final matching points, and the STAR feature algorithm obtains the fewest matching points. In general, RFM-SCAN, LPM, K-VLD and PMM1 have many matching points, PMM3 and PMM4 have relatively small numbers of matching points. In the unordered image-matching experiments (dataset 3 and dataset 4), the SURF, ORB, and AGAST feature algorithms obtained more final matching points. In general, RFM-SCAN, LPM, GMS, PMM1 and PMM2 have many final matching points, RANSAC and RATIO have relatively small numbers of final matching points.
In terms of the correct rate of the matching points, Fig. 9 shows that, for the ordered video image dataset, the PMM series algorithms are better than the other algorithms, and the PMM3 and PMM4 algorithms have the highest correct rates. The RATIO and RFM-SCAN algorithms have relatively low correct rates. For the unordered image dataset, the correct rate of the PMM series algorithms are obviously better than those of other algorithms, and the RATIO and RFM-SCAN algorithms have the lowest correct rates. Fig. 10 shows that after using the PMM1 algorithm, the image matching result has a high correct rate and no obvious mismatches.
In terms of the loss rate of the matching points, Fig. 9 shows that for the first ordered video image dataset (dataset 1), the K-VLD and RFM-SCAN algorithms have significantly lower loss rates, and the differences in the remaining matching algorithms loss rates are not obvious for the multiple feature algorithms. In dataset 2, the loss rates of the PMM3 and PMM4 algorithms are significantly higher. For the unordered image dataset, the RFM-SCAN algorithm has a relatively low loss rate, and the RANSAC, PMM3, and PMM4 algorithms have relatively high loss rates.
In terms of the overall time consumption of the matching, it can be seen from Tables 3 through 6 that the GMS, PMM1 and PMM2 algorithms always maintain relatively lower time consumption in all experiments, and the K-VLD and RFM-SCAN algorithms have much higher time consumption than other algorithms.
It can be seen from Table 3 through 6 that there are fewer initial matching points for the ordered video image dataset than for the unordered image dataset, but the original correct rate is much higher than that of the unordered image dataset. This result occurred because the unordered image dataset has a higher image resolution and a richer texture, and so more feature points can be extracted. The content difference between the image pairs of the unordered image dataset is relatively large, and the content difference between VOLUME 8, 2020 the image pairs constructed by the ordered image dataset is relatively small; therefore, the initial matching correct rate of the ordered image dataset is relatively high. This point is also shown in Fig. 8. The number of initial matching points of dataset 1 through dataset 4 sequentially increases. For dataset 1, the time consumption of the PMM1 algorithm is lower than that of the GMS algorithm. As the number of initial matching points increases, the time consumption of GMS increases relatively slowly, and the time consumption of PMM1 increases relatively fast; therefore, the GMS time consumption is lower than the PMM1 time consumption for dataset 4. However, it is worth noting that the overall calculation time for PMM1 is still better than 25 Hz in dataset 4, which can meet real-time calculation requirements. Furthermore, the GMS algorithm and the PMM series algorithms have little correlation with the initial correct rate, which is determined by the characteristics of the algorithm. The RANSAC algorithm is obviously related to the number and the correct rate of the initial matching points, and is greatly affected by its fluctuation, which is determined by the random sampling characteristics of the algorithm. The time consumption of the LPM algorithm also increases as the matching points increases. The overall time consumption in the experiment is at a medium level. The time consumption of the RFM-SCAN algorithm increases sharply as the number of matching points increases. When the number of matching points exceeds 20,000 in dataset 4, the time consumption reaches tens of seconds. The K-VLD algorithm has obvious time consumption, which is determined by the complexity of its calculation process. Therefore, in the experiment with dataset 4, when the number of initial matching points is greater than 20,000, memory overflow occurs when the K-VLD algorithm is used for matching.
Overall, the PMM algorithm proposed in this paper has a relatively high matching accuracy and low time consumption. The performances of PMM1 and PMM2 are similar, but PMM2 is slightly better than PMM1 in terms of the correct rate, and it is slightly worse than PMM1 in terms of the loss rate and the calculation time consumption. The performances of PMM3 and PMM4 are similar, but PMM4 is slightly better than PMM3 in terms of the correct rate, and it is slightly worse than PMM3 in terms of the loss rate and the calculation time consumption. The overall correct rates of PMM3 and PMM4 are better than those of PMM1 and PMM2. PMM3 and PMM4 worse than PMM1 and PMM2 in terms of the loss rate and the calculation time consumption. We can achieve good matching performance by using Condition 1 only (PMM1), and the polynomial constraint calculation significantly increases the time consumption while improving the correct rate; furthermore, the ''price ratio'' is relatively low. Therefore, it is recommended that the PMM1 algorithm be used in general applications to achieve good performance. In the applications where the accuracy needs to be high, such as camera calibration, the PMM4 algorithm is recommended.
It can be seen from Figs. 9(a) and (b) that the PMM series algorithms have significantly lower loss rates when using the ORB algorithm, which is caused by the principle of the cumulative statistics of the parallax map in the algorithm. When the extracted feature points are dense, the performance of the PMM algorithm can be exerted. When the content of the image pairs have large differences, the extracted matching points are relatively discrete. At this point, if parallax mapping is being performed, it is difficult for the local correct matching points to form an effective connected domain, which ultimately leads to the motion consistency constraint, Condition 1, failing. This phenomenon can also be seen in the unordered image matching loss rates of Figs. 9(c) and (d).
In practical applications, this deficiency can be compensated by a parallax mapping region expansion, as mentioned earlier.
In general, the PMM algorithm proposed in this paper relies on the calculation results of the initial feature extraction and matching algorithm. For practical applications with some challenging conditions, such as changing lighting conditions and low contrast, the current mainstream algorithms, such as ORB, can obtain a large number of initial matches, and the algorithm presented in this paper can achieve good matching purification results. For scenes with very sparse structures or 'featureless' regions, it is still one of the current difficult challenges. The initial matching results of the current mainstream algorithms are very sparse, and our algorithm will fail, which is also the next research direction. Therefore, in terms of the correct rate, the loss rate and the time consumption, ORB-PMM1 is better than existing algorithms in ordered video image matching. For real-time calculations, the proposed algorithm outperforms the RATIO, GMS, K-VLD, RANSAC, LPM and RFM-SCAN algorithms in terms of the accuracy, achieving state-of-the-art results.

V. CONCLUSION
Starting with the principle of local motion consistency and combining it with parallax mapping, the motion consistency constraint of matching points is divided into two constraint conditions. On the basis of considering strict motion consistency constraints, the quadratic constraint is implemented by using local polynomials. Using a variety of datasets and current mainstream feature matching algorithms, rich feature matching experiments are designed. The experimental results effectively verify the feasibility of our algorithm. For real-time calculations, the proposed algorithm outperforms the RATIO, GMS, K-VLD, RANSAC, LPM and RFM-SCAN algorithms in terms of its accuracy. In the practical ordered video image matching application, it is recommended to use the algorithm combination of ORB-PMM1 to achieve good performance. Of course, our algorithm also has certain deficiencies. In the matching of unordered image datasets, and in the case of extracting feature points that are relatively discrete, there may be a relatively high loss rate. Solving this problem is the future direction of our research.