Graph-Cut RANSAC: Local Optimization on Spatially Coherent Structures

— We propose Graph-Cut RANSAC, GC-RANSAC in short, a new robust geometric model estimation method where the local optimization step is formulated as energy minimization with binary labeling, applying the graph-cut algorithm to select inliers. The minimized energy reﬂects the assumption that geometric data often form spatially coherent structures – it includes both a unary component representing point-to-model residuals and a binary term promoting spatially coherent inlier-outlier labelling of neighboring points. The proposed local optimization step is conceptually simple, easy to implement, efﬁcient with a globally optimal inlier selection given the model parameters. Graph-Cut RANSAC, equipped with “the bells and whistles” of USAC and MAGSAC++, was tested on a range of problems using a number of publicly available datasets for homography, 6D object pose, fundamental and essential matrix estimation. It is more geometrically accurate than state-of-the-art robust estimators, fails less often and runs faster or with speed similar to less accurate alternatives. The source code is available at https://github.com/danini/graph-cut-ransac.


INTRODUCTION
T HE RANdom SAmple Consensus (RANSAC) algorithm proposed by Fischler and Bolles [1] in 1981 has become the most widely used robust estimator in computer vision.RANSAC and its variants have been successfully applied to a wide range of vision tasks, e.g., motion segmentation [2], short baseline stereo [2], [3], wide baseline matching [4], [5], [6], pose-graph initialization for structure-from-motion pipelines [7], [8], detection of geometric primitives [9], image mosaicing [10], and to perform [11] or initialize multi-model fitting algorithms [12], [13].In brief, RANSAC repeatedly selects minimal random subsets of the input data points and fits a model, e.g., a line to two 2D points, a fundamental matrix to seven 2D point correspondences, or a 6D pose to three 2D-3D correspondences.Next, the quality of the model is measured, for instance, by the cardinality of its support, i.e., the number of inlier data points.Finally, the model with the highest quality, polished, e.g., by leastsquares fitting of all inliers, is returned.In this paper, we propose a new local optimization technique for RANSAC considering the fact that real-world data often form spatially coherent structures.
Since the introduction of RANSAC, a number of modifications have been proposed replacing the components of the original algorithm.For instance, improving the sampler impacts the speed of the robust estimation procedure via selecting a good sample early and, thus, triggering the termination criterion.NAPSAC [14] assumes that inliers are spatially coherent and therefore it draws samples from a hypersphere centered at the first, randomly selected, locationdefining point.If this point is an inlier, the points sampled in its proximity are more likely to be inliers than the ones outside the ball.While NAPSAC exploits the observation that inliers tend to be "closer" to each other than outliers, the GroupSAC algorithm [15] assumes that inliers are often "similar" to each other and, therefore, data points can be separated into groups according to their similarities.PRO-SAC [16] exploits an a priori predicted inlier probability rank of each point and starts the sampling with the most promising ones.Progressively, samples that are less likely to lead to the sought model are drawn.P-NAPSAC [17] merges the advantages of local and global sampling by drawing samples from gradually growing neighborhoods.Gradually, the algorithm changes from the fully localized NAPSAC to the global PROSAC sampling.NG-RANSAC [18] predicts the inlier probability of each point via deep learning.
Regarding speeding up the robust estimation process, one way of avoiding unnecessary calculations is via termination of verification of models which are unlikely to be more accurate than the current so-far-the-best.There has been a number of preemptive model verification strategies proposed.For example, when using the T d;d test [19], the model verification is first performed on d randomly selected points (where d ( n).The remaining n À d ones are evaluated only if the first d points are all inliers to the verified model.The test was extended by the so-called bail-out test [20].Given a model to be scored, a randomly selected subset of d points is evaluated.If the inlier ratio within this subset is significantly smaller than the current best inlier ratio, it is unlikely that the model will yield a larger consensus set than the current maximum and, thus, is discarded.In [21], [22], an optimal randomized model verification strategy was described.The test is based on Wald's theory of sequential testing [23].Wald's SPRT test is a solution of a constrained optimization problem, where the user supplies acceptable probabilities for errors of the first type (rejecting a good model) and the second type (accepting a bad model) and the resulting optimal test is a trade-off between the time to decision and the errors committed.
To improve the accuracy by better modelling the noise in the data, different model quality calculation techniques have been investigated.For instance, MLESAC [24] estimates the model quality by a maximum likelihood procedure with all its beneficial properties, albeit under certain assumptions about data point distributions.In practice, MLESAC results are often superior to the inlier counting of plain RANSAC, and they are less sensitive to the manually set inlier-outlier threshold.In MAPSAC [25], the robust estimation is formulated as a process that estimates both the parameters of the data distribution and the quality of the model in terms of maximum a posteriori.
There are also methods to reduce the dependency on the user-defined inlier-outlier threshold.For example, MIN-PRAN [26] assumes that the outliers are distributed uniformly and finds the model where the inliers are least likely to have occurred randomly.Moisan et al. [27] proposed a contrario RANSAC, selecting the most likely noise scale for each model candidate.Barath et al. [17], [28] proposed the Marginalizing Sample Consensus method (MAGSAC) and its recent improvement (MAGSAC++) marginalizing over the noise scale s to eliminate the threshold from the model quality calculation.
Observing that RANSAC requires in practice more samples than what theory predicts, Chum et al. [29] identified a problem that not all all-inlier samples are "good", i.e., lead to a model accurate enough to distinguish all inliers, e.g., due to poor conditioning of the selected random all-inlier sample.They address the problem by introducing the locally optimized RANSAC (LO-RANSAC) that augments the original approach with a local optimization step applied to the so-farthe-best models.In the original LO-RANSAC paper [29], the local optimization is implemented as an iterated least-squares model re-fitting with a progressively shrinking inlier-outlier threshold inside an inner RANSAC applied only to the inliers of the current model.In the reported experiments, LO-RAN-SAC is superior to plain RANSAC both in terms of geometric accuracy and number of iterations.It is shown that the number of local optimizations is close to the logarithm of the iteration number and, therefore, it usually does not yield a significant overhead in the processing time.However, Lebeda et al. [30] showed that, for models with many inliers, the local optimization becomes a computational bottleneck due to the iterated least-squares model fitting where the processing time is a function of the number of used points.In [30], it is proposed to consider only a subset of the inliers in the local optimization.Only the final model polishing process is applied to the whole inlier set.
In this paper, we propose a new local optimization procedure considering that in real-world applications, data points often form spatially coherent structures.In the large body of RANSAC-related literature, the inlier-outlier decision has always been a function of the point-to-model residual, calculated individually for each data point.In practice, both inlier and outlier poinn the remaining casets often are spatially coherent and, therefore, a point near to an outlier or inlier is likely to be, respectively, an outlier or inlier.Spatial coherence, described by, e.g., the Potts model [31], has been exploited in a number of vision problems, for instance, in segmentation [32], multi-model fitting [12], [13], [33], [34], [35], [36] or sampling [14], [17] in RANSAC-like techniques.Directly formalizing the model verification in RANSAC as a graph-cut problem such that it considers spatial coherence is computationally prohibitive.However, when applied as the local optimization step, as in [29], just to each so-far-the-best model, the number of graph-cuts is only the logarithm of the number of sampled and verified models, and can be performed efficiently.
The proposed Graph-Cut RANSAC, GC-RANSAC in short, is a locally optimized RANSAC alternating graph-cut and model fitting as the local optimization step.It is superior to original LO-RANSAC in a number of aspects.The contributions are: 1. GC-RANSAC is capable of exploiting spatial coherence of points.See Fig. 1 for example.The LO step is conceptually simple, easy to implement, its inlier selection is a globally optimal and efficient graph-cut with only a few intuitive and learnable parameters unlike the ad hoc, iterative and complex LO steps [29].2. We propose a new energy term which models the spatial coherence of geometric data.Experiments show that the proposed term is more suitable for geometric robust model estimation than the traditionally used Potts model [31].3. We combine GC-RANSAC with the bells and whistles of USAC [37] and MAGSAC++ [17].It is shown experimentally that the proposed algorithm is superior to the state-of-the-art LO-RANSAC variants, included in USAC [37], in terms of accuracy and failure ratio on a wide range of vision problems (i.e., homography, essential and fundamental matrix, and 6D pose estimation).Remark.Isack's and Boykov's PEARL [12] was the first method to introduce spatial coherence to geometric model fitting.However, PEARL cannot be directly used for the problems solved by RANSAC, since the user has to manually set the number of hypotheses tested in the worst-case, i.e., the lowest inlier ratio possible.The a-expansion step executes, in the first iteration of PEARL, the graph-cut as many times as the number of hypotheses tested.The number is calculated from the worst-case scenario and is typically orders of magnitude higher than the number of iterations which the adaptive RANSAC termination criterion determines.Moreover, in GC-RANSAC, applying the local optimization to only the so-farthe-best models ensures that the graph-cut runs only very few times, paying only a small penalty.
A preliminary version of the GC-RANSAC algorithm was published at CVPR 2018 [38].This paper extends and improves it by (i) proposing a new spatial coherence model, (ii) adding the USAC components and MAGSAC++ scoring, (iii) and providing a number of new experiments on homography, fundamental matrix, relative and 6D object pose estimation.

RANSAC VERIFICATION REFORMULATED
The inlier selection of RANSAC is formulated as an energy minimization problem.The novel formulation allows to include additional constraints when selecting the inliers of a given model.

RANSAC as Energy Minimization
To facilitate understanding of the connection to energy minimization, we start by reformulating the original top-hat loss function of RANSAC, see Fig. 2. Then continuous loss functions, e.g., truncated L 2 , will be considered.
Suppose that we are given a set P R dp (d p > 0) of n points and a model represented by parameter vector u 2 R dm (d m > 0), where, respectively, d p and d m are the dimensions of a data point and the model.The residual function measuring the point-to-model assignment cost is f : For the standard RANSAC scheme which applies a top-hat fitness function (0 -close, 1 -far), the implied unary energy is as follows: E f0;1g ðLÞ ¼ P p2P jjL p jj f0;1g , where Parameter L 2 f0; 1g n is a labeling, ignored in standard RANSAC, L p 2 L is the label of point p 2 P, and 2 R þ is the user-defined inlier-outlier threshold.Labels 0 and 1 are the inlier and outlier labels, respectively.Solving problem L Ã ¼ arg min L E f0;1g ðLÞ leads exactly to the RANSAC solution since E f0;1g does not penalize only two cases: (i) when p is labeled inlier and it is closer to the model than the threshold, or (ii) when p is labeled outlier and it is farther from the model than .This is exactly what RANSAC does when selecting the inliers.
A number of papers discussed [17], [24], [25], [28] the replacement of the f0; 1g loss with some continuous function f, e.g., the truncated L 2 loss of MSAC [25], to improve the estimation accuracy.Considering a general robust loss function, the energy term is written as follows: E f ðLÞ ¼ P p2P fðL p ; pÞ.For example, when using the MSAC-like truncated L 2 loss, f MSAC ðL p ; pÞ becomes the following:

MAGSAC++ Loss
To use the state-of-the-art in robust model fitting, we consider the loss function of MAGSAC++ [17] which was designed in a way such that it does not require a strict inlier-outlier decision.The loss function proposed for MAGSAC++ is as follows: gðu; PÞ ¼ P p2P rðfðp; uÞÞ, where function For 0 r ks max where s max is a user-defined maximum noise scale, constant Cðd p Þ ¼ ð2 d p =2 Gðd p =2ÞÞ À1 and, for a > 0 is the gamma function, d p is the dimension of euclidean space in which the residuals are calculated and tðsÞ is set to Fig. 2. Example loss functions used for robust model fitting -RANSAC [1], MSAC [25], MLESAC [24], MAGSAC++ [17].
For r > ks max where gða; xÞ ¼ R x 0 t aÀ1 expðÀtÞdt is the lower incomplete gamma function.Weight wðrÞ in (3) can be calculated efficiently by storing the values of the complete and incomplete gamma functions in a lookup table.
The loss implied by MAGSAC++ given a binary labeling is where max is the max.threshold which noise scale s max implies.

Spatial Coherence in RANSAC
In geometric model fitting, real-world data often form spatially coherent structures.This observation inspired a number of approaches, e.g., for sampling [14], [17] in robust methods or multi-model fitting techniques [12], [13], [34], [36].To the best of our knowledge, there has been no attempt to exploit this property in the local optimization step of RANSAC.Due to formalizing the inlier selection as an energy minimization via a binary labeling, additional energy terms can be straightforwardly considered.The problem is still solvable efficiently and globally via the graph-cut algorithm.To model the point-to-point proximity in the energy, the Potts model [31] usually is a justifiable choice.It is written as follows: where ðp; qÞ 2 E is an edge connecting points p and q in a pre-calculated neighborhood graph A ¼ ðP; EÞ.When minimizing energy E Potts ðLÞ, the neighboring points are encouraged to have the same label, formalizing the assumption that close points likely belong to the same model.
In our experiments, we saw that the Potts model fails to act as expected, i.e., to spread the inlier label along a structure.An example line fitting is shown in Fig. 3a.Each column shows the results of the binary labeling using different weighting 2 ½0; 1 for the spatial coherence term.Due to the outliers being considered similarly structured as the inliers, and the model, the 2D line, being too inaccurate to select the sought inliers, the spatial coherence term forces all points in the structure to be outliers even if their pointto-model residuals are small, i.e., they are close to the line.Other examples are in the supplementary material, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TPAMI.2021.3071812.
The expected behaviour is to label all points which are closer than the threshold inliers and, also, points which are in the same spatial structure as the points close to the model.We achieve this behaviour by breaking condition L p ¼ L q of (5) down to two cases.The property, of the Potts model, of not penalizing two neighboring points p; q if they both are inliers, L p ¼ L q ¼ 0, should still be kept.The L p ¼ L q ¼ 1 case, i.e., when both points are outliers, should depend on the point-to-model residual.Otherwise, the term may force points with small residuals but in the neighborhood of outliers to be labeled outlier.This can be seen in last  5) is applied.The bottom one shows the labeling results of a single graph-cut run when using the proposed spatial coherence model (6).The inlier-outlier threshold is shown by green dashed lines.The edges of the neighborhood graph are grey line segments.
plot of Fig. 3a, where points close to the line are labeled outliers due to being in a structure consisting mostly of outliers.
The proposed spatial coherence term which fixes the mentioned issues of the Potts model is as follows: When using E GC , the points closer than the threshold are penalized for jointly being labelled outliers.Farther than the inlier-outlier threshold, only the L p 6 ¼ L q case is penalized, thus, leading to the same effect as the Potts model.This can be imagined as a "bumpy" non-uniform inlier-outlier threshold.
The bottom row of Fig. 3 shows the effect of the proposed term with different weights, 2 f0; 0:3; 0:6; 0:9g.Optimizing E GC leads to the desired effect -the inlier label is spread along the spatial structure while points close to the model do not get affected by the surrounding outliers.Note that this spatial coherence model leads to accurate results on the geometric problems investigated in this paper.However, if different assumptions hold, the energy can be straightforwardly modified and used within GC-RANSAC.

Graph-Cut Energy
The energy EðLÞ minimized in the proposed Graph-Cut RANSAC is a linear combination of the data (unary) and spatial coherence (pair-wise) terms where 2 R is a parameter balancing the terms.The globally optimal labeling L Ã ¼ arg min L EðLÞ can easily be determined in polynomial time using the graph-cut algorithm [40].
To balance between the energy terms, it is important to have each term normalized.It can be easily seen that E M++ n since robust loss f M++ ðL p ; pÞ 1 for each data point p.To ensure that E GC ðLÞ has the same scale, it has to be divided by the number of edges in the neighborhood graph and multiplied by n.Therefore, let us define the energy to minimize as follows: where jEj is the number of edges in A. It can be easily seen that E GC is sub-modular and, thus, b E also is.

GRAPH-CUT RANSAC (GC-RANSAC)
In this section, the described energy minimization-based inlier selection is used for proposing a new locally optimized RANSAC.Benefiting from the proposed approach, the LO step is conceptually simpler and cleaner than that of the original LO-RANSAC.

Energy-Based Labeling
The construction of problem graph G, which is fed into the graph-cut procedure, using unary and pair-wise terms Eqs. ( 4), ( 6) is shown in Algorithm 1. Functions AddTerm1 and AddTerm2 add, respectively, unary (4) and binary (6) costs to the problem graph.Such graph construction procedure is covered in depth in [39] (Section 4).The graph-cut algorithm is applied to G determining the globally optimal labeling L Ã which considers the spatial coherence of the points and their point-to-model residuals given the current so-far-the-best model.; .⊳ Loss of p; q with different labels.8: c 00 0. ⊳ Loss: p; q being inliers.9: c 11 2 P s2fp;qg rðfðs; uÞÞ ⊳ Loss: p; q being outliers.10: G AddTerm2(G, p, q, c 00 , c 01 , c 10 , c 11 ).

Graph-Cut in Local Optimization
The original LO step of LO-RANSAC consists of an inner RANSAC, applied locally to the inliers of the current best model, and an iterative model refitting, which uses inliers selected in each step by a progressively shrinking inlier-outlier threshold.
In the proposed GC-RANSAC algorithm, the inner RAN-SAC is a necessary step.The reason is that the least-squares (LS) fitting, which is applied to all inliers, minimizes the point-to-model residuals, i.e., the unary term.Minimizing this loss on points which are labeled inliers solely due to being in a spatial structure leads to inaccurate results in most of the cases.An intuitive example is shown in the right plot of Fig. 3b, where the sought inliers are found, but, also, points which are outliers of the ground truth model are labeled inliers.In this case, applying LS fitting fails to return the sought model parameters since LS is not robust and, thus, is extremely sensitive to outlying points.Instead, we apply an inner RANSAC to the points labeled inliers.In this case, the configuration of the last plot of Fig. 3b leads to an inner RAN-SAC applied to a point set with a very high inlier ratio.Consequently, the sought model is found easily in a few iterations.
Each step of the inner RANSAC selects a 7m-sized sample from the points labeled inliers, where m is the size of a minimal sample, e.g., m ¼ 4 for homographies.Parameter 7m was proposed in [30] and works well in our experiments.The LS fitting is always applied to points which are inliers due to the unary term, i.e., their point-to-model residuals are smaller than the inlier-outlier threshold.A detailed explanation of the steps of the proposed local optimization is written in Algorithm 2. Function ShouldTerminate is either a fixed iteration number or the standard RANSAC termination criterion.In the experiments, we used a fixed iteration number set to 20 to achieve fast performance.

GC-RANSAC
The Graph-Cut RANSAC algorithm is shown in Algorithm 3 in depth.To achieve state-of-the-art results, we combine the proposed graph-cut-based local optimization with the components discussed in USAC [37].We consider four popular vision problems, i.e., fundamental matrix, homography, 6D object pose (i.e., the PnP problem), and relative pose (i.e., essential matrix) estimation.The included components for each problem are as follows: 1. Sample degeneracy.The degeneracy tests of minimal samples are for rejecting clearly bad samples to avoid the sometimes expensive model estimation.
For homographies, samples consisting of collinear points are rejected.For 6D object pose estimation, samples are not used where the area of the triangle formed by the three selected points is smaller than a predefined threshold.2. Sample cheirality.The test is for rejecting samples based on the assumption that both of the cameras observing a 3D surface must be on its same side.For homography fitting, we check if the ordering of the four point correspondences -along their convex hulls -in both images are the same.If not, the sample is rejected.3. Model degeneracy.The purpose of this test is to reject models early to avoid verifying them unnecessarily.For fundamental matrices, DEGENSAC [41] is applied to determine if the epipolar geometry is affected by a dominant plane.For relative pose and 6D object pose estimation, improper rotation matrices [42], i.e., the ones with negative determinant, are rejected.
4. Model cheirality.The test is for rejecting models considering that the cameras must be on the same side of the observed surface.For fundamental and essential matrix estimation, we apply the oriented epipolar constraint [43].For 6D object pose estimation, we assume that the object is in front of the camera and, thus, coordinate z of the translation must be positive.5. Sampling.We use the PROSAC sampler [16].It requires an a priori determined ordering of the input data points.For point correspondence-based methods, we used the scoring coming from the standard SNN ratio-test [44].For 6D object pose estimation, the points are ordered by their confidence values provided by deep-learning [45] in the used datasets.6. Preemptive model verification.We use the Sequential Probability Ratio Test [22] (SPRT) to interrupt the model verification if the probability of being better than the current so-far-the-best model falls below a threshold.7. Scoring.We use the scoring of MAGSAC++ [17] to calculate the model quality.Even though MAGSAC ++ does not require a single inlier-outlier threshold, the other components of the algorithm (e.g., local optimization, SPRT, DEGENSAC) do.Therefore, we set the upper bound of the threshold in MAGSAC++ to be max ¼ 10, where is the manually set inlieroutlier threshold.8. Final model polishing.The algorithm finishes with an iteratively re-weighted least-squares model refitting on all inlier points for all problems to polish the final model parameters.

EXPERIMENTAL RESULTS
We tested Graph-Cut RANSAC on fundamental matrix, relative pose, homography, and 6D object pose estimation using publicly available real-world datasets.The compared methods are GC-RANSAC with MSAC [25] and MAGSAC++ [17] scoring techniques, vanilla RANSAC [1], MSAC [25], and USAC [37].USAC was applied with local optimization [29] and with the same modules as GC-RANSAC, i.e., SPRT test [22], degeneracy and cheirality tests, MSAC scoring, and PROSAC sampling [16].Since relative pose and 6D object pose estimation are not included in the available USAC implementation, we copied the corresponding parts from our GC-RANSAC code.Also, we included NG-RANSAC [18] in the comparison for fundamental matrix and relative pose estimation.All compared methods are implemented in C++.
The part of NG-RANSAC predicting inlier probabilities is implemented in Python and runs on GPU.Other parts, e.g. the one doing the robust estimation, are in C++.All methods were run on a computer with an Intel Core i7-8700K CPU and two GeForce RTX 2080 Ti GPUs.To provide a neighborhood graph, we used FLANN [46] in the 4D correspondence space using a hypersphere with radius 20 to assign neighbors to points.The distance for FLANN is calculated in the feature space and assigns, on average, 3-4 neighbors to most points.Parameter was set to 0.975.These values lead to accurate results on all tested problems.
If not stated otherwise, the required confidence in the solution was set to 0.99 and the maximum iteration number to 5000 for all methods.The maximum iteration number is an upper bound for the iteration number -the robust estimation finishes in two cases: (i) by the termination criterion being triggered, (ii) by the iteration number exceeding the maximum iterations.For each method and problem, we chose the threshold maximizing the accuracy.For homography fitting, it was as follows: USAC, MSAC and GC-RANSAC (5.0 pixels); RANSAC (3.0 pixels).For fundamental and essential matrix fitting, it was as follows: USAC, RANSAC, MSAC, NG-RANSAC, and GC-RANSAC (0.75 pixels).For 6D object pose estimation, the threshold was set to 1 pixel.We note that since NG-RANSAC is a deep learning-based sampler and GC-RANSAC is a local optimization technique, they can be straightforwardly combined.We did not include MAG-SAC [28] and MAGSAC++ [17] in the comparison since the improvements are orthogonal to that of GC-RANSAC.The algorithms, indeed, can be combined more than just taking the MAGSAC++ scoring function.However, that is out of this paper's scope.Comparing the methods would give the false message that they are competitors.

Fundamental Matrix Estimation
Fundamental matrix estimation is evaluated on the benchmark of [47].The benchmark includes scenes from datasets TUM, KITTI, Tanks and Temples, and Community Photo Collection.TUM [48] consists of videos of indoor scenes.Each video is of resolution 640 Â 480.KITTI [49] consists of consecutive frames of a camera mounted to a moving vehicle.The images are of resolution 1226 Â 370.Both in KITTI and TUM, the image pairs are short-baseline and, thus, the epipolar geometry estimation is relatively easy, usually, with high inlier ratio.Tanks and Temples (T&T) [50] provides images of real-world objects for image-based reconstruction and, thus, contains mostly wide-baseline pairs.The images are of size from 1080 Â 1920 up to 1080 Â 2048.Community Photo Collection (CPC) [51] contains images of various sizes of landmarks collected from Flickr.The benchmark defines 1000 randomly selected image pairs from each dataset.SIFT [44] correspondences are detected, filtered by the SNN ratio test [44] and, finally, used for estimating the epipolar geometry.The used error metric is the symmetric geometric distance [52] (SGD) in pixels which compares two fundamental matrices by iteratively generating points on the borders of the  images and, then, measuring their epipolar distances.Example image pairs from the datasets are shown in Fig. 6.
In Fig. 8, the cumulative distribution functions (CDF) of the SGD errors (horizontal; in pixels) are shown.The probability (vertical axis) is plotted as the function of the error (horizontal).For all datasets, GC-RANSAC is the most geometrically accurate method no matter if MSAC or MAGSAC++ scoring is used.The best performance is achieved by using MAGSAC++ scoring.
The failure ratio (in percentage) and the average and median errors are reported in Table 1.A test is considered failure if the error of the estimated model is greater than the 1 percent of the image diagonal.The average values are calculated from the successful tests.The best values are shown in red, the second best ones are in blue.On three out of the four datasets, GC-RANSAC with MAGSAC++ scoring is superior to the competitor algorithms in terms of failure ratio, average and median errors.On Tanks and Temples dataset, GC-RANSAC with MSAC scoring leads to the best accuracy by a small margin, while MAGSAC++ scoring leads to significantly lower failure rate.NG-RANSAC performs competitively on the datasets TUM and KITTI.On Tanks and Temples and CPC it performs poorly, worse than any tested method.This behaviour is probably related to the properties of data it was trained on.GC-RAN-SAC with MAGSAC++ scoring outperforms NG-RANSAC in all cases.
In Fig. 7a, the log 10 SGD errors (left plot) and the processing times (right; in milliseconds) are plotted as the function of the maximum iteration number.For these tests, the confidence was set to 0.99.GC-RANSAC leads to the most accurate results and it is the least sensitive one to the maximum iteration number.MAGSAC++ scoring leads to better accuracy than MSAC while having similar processing time.In Fig. 9a, the log 10 SGD errors (left plot) and the processing times (right) are plotted as the function of the required confidence.For these tests, the maximum iteration number was set to 1000000.We excluded NG-RANSAC from this test Fig. 6.Example image pairs from the datasets used for epipolar geometry estimation; with inlier correspondence visualization.Fig. 7. Maximum iteration number study.The avg. log 10 error (px) and the run-time (ms) on manually selected inliers are plotted as the function of the max.iteration number.The confidence was set to 0.99.Fig. 8. Fundamental matrix fitting.The cumulative distribution functions of the SGD errors (in pixels) on four datasets, each consisting of 1000 image pairs.Being accurate is interpreted as a curve close to the top-left corner.The confidence and maximum iteration number were set to 0.99 and 5000, respectively.
since it has no confidence parameter.It can be seen that GC-RANSAC leads to the most accurate results and it is the least sensitive method to the confidence parameter.The processing time implied by the two tested scoring techniques is similar.

Homography Estimation
For homography estimation, we downloaded homogr (16 pairs) and EVD (15 pairs) datasets [30].They consist of image pairs of different sizes from 329 Â 278 up to 1712 Â 1712 with point correspondences provided.The homogr For homography and fundamental matrix fitting, the avg.and median pixel errors are shown besides the failure rate.For relative and 6D object pose estimation, the avg.rotation (in degrees) and translation (in millimeters) errors are shown.The errors were calculated from the successful tests.The last three rows reports the average error, failure ratio and processing time (in milliseconds) over all datasets.The inlier-outlier thresholds were set to maximize the accuracy.The confidence was 0.99 and the maximum iteration number 5000.The best values in each column are shown by red and the second best ones by blue.The plus time demand of NG-RANSAC is the time of the model loading.
dataset contains mostly short baseline stereo images, whilst the pairs of EVD undergo an extreme view change, i.e., wide baseline or extreme zoom.In both datasets, inlier correspondences of the dominant planes are selected manually.All algorithms applied the normalized four-point algorithm [53] for homography estimation and were repeated 1000 times on each image pair.To measure the quality of the estimated homographies, we used the RMSE re-projection error (in pixels) calculated from the provided ground truth inliers in the reference image.Example image pairs are shown in Fig. 4. The CDFs of the errors are in Fig. 11.On homogr, GC-RANSAC with MSAC scoring is slightly more accurate than the second best algorithm, i.e., GC-RANSAC with MAG-SAC++.On EVD, the GC-RANSAC with MAGSAC++ goes the highest -it is the most accurate method.The avg. and median errors and the failure ratio are reported in Table 1.For GC-RANSAC, the avg.and median errors are fairly similar for both scoring techniques with a max. of 0:12 px difference.On EVD, MAGSAC++ scoring leads to a significant improvement in the failure ratio ($6%).
The effect of changing the maximum iteration number and required confidence is shown, respectively, in Figs.7b  and 9b.It can be seen that GC-RANSAC with MAGSAC++ scoring is the least sensitive to these two tested parameters and leads to the most accurate results.The processing time is marginally higher than that of the fastest methods, i.e., GC-RANSAC with MSAC scoring. .The reported values are the average errors over 4000 scenes from datasets TUM, KITTI, T&T, and CPC.The compared methods are the proposed Graph-Cut RANSAC combined with MSAC [24] and MAGSAC++ [17] scoring techniques, MSAC [24], RANSAC [1], USAC [37], and NG-RANSAC [18].In the bottom-right plot, the time of NG-RANSAC goes up to 3.4 seconds.In addition, NG-RANSAC model loading takes 1.4 seconds on average.Fig. 11.Homography fitting.The cumulative distribution functions of the re-projection errors (in pixels) on two datasets.Being accurate is interpreted as a curve close to the top-left corner.The confidence and maximum iteration number were set to 0.99 and 5000, respectively.

Relative Pose Estimation
The relative pose, i.e., essential matrix, estimation is tested on the same datasets -TUM, KITTI, Tanks and Temples, and Community Photo Collection -as what are used for fundamental matrix estimation since the intrinsic camera matrices and the ground truth relative poses are provided for all scenes.
The reported rotation and translation errors are measured in degrees ( ).The rotation error is calculated as follows: R ¼ cos À1 1 2 trð RR T Þ À 1 , where R is the measured and R is the ground truth rotation matrix.Translation error t is the angular difference between the estimated t and ground truth translations t.
The CDFs of the rotation (Fig. 14a) and translation (Fig. 14b) errors are shown in Fig. 14.It can be seen that, GC-RANSAC obtains the most accurate rotations and translations.NG-RANSAC leads to similar accuracy.
The failure ratio and avg.rotation and translation errors are in Table 1.An estimation is considered failure if the errors are greater than 45 .Note that using different threshold does not change the ordering of the methods.The most accurate results are obtained by GC-RANSAC and NG-RANSAC which have similar accuracy.However, it can be seen that GC-RANSAC is two orders of magnitude faster.The effect of varying the confidence (top row) and maximum iteration number (bottom) is shown in Fig. 10.The average translation (left) and rotation (middle) errors and the processing time (right) are plotted as the function of the tested parameter.The most accurate results are achieved by GC-RANSAC with MAGSAC++ scoring and NG-RANSAC.
test images with the ground truth for eight, mostly textureless objects from LM [57] captured in a cluttered scene under various levels of occlusion.YCB-V includes 21 objects, which are both textured and texture-less, and 900 test images showing the objects with occasional occlusions and limited clutter.T-LESS contains 30 objects with no significant texture or discriminative color, and with symmetries and mutual similarities in shape and/or size.It includes 1000 test images from 20 scenes with varying complexity, including challenging scenes with multiple instances of several objects and with a high amount of clutter and occlusion.To get 2D-3D correspondences, we applied the EPOS method [45].The tested robust estimators were applied to the obtained correspondences and the 6D pose was compared to the ground truth one.Example images from the datasets are shown in Fig. 5.
The reported errors for rotation and translation both were measured in degrees ( ).The rotation errors were calculated similarly as for relative pose estimation.The translation errors were in millimeters (mm) measured as , where t is the estimated and t is the ground truth translation vector.
The CDFs of the rotation (left column) and translation errors (right) are shown in Fig. 12.It can be seen that, on all tested datasets, GC-RANSAC leads to the most accurate results both in terms of rotation and translation accuracy.The failure ratio and average rotation and translation errors are put in Table 1.An estimation is considered failure if the rotation has greater than 45 angular error.Note that using different threshold does not change the ordering of the methods.It can be seen that GC-RANSAC leads always to the most accurate results with the lowest failure ratio.On YCV-V and T-LESS, MAGSAC++ scoring leads to the most accurate results both in terms accuracy and failure rate.On LM-O, MSAC scoring leads to the most accurate results by a small margin while being the second best in terms of failure ratio.
In the top row of Fig. 13, the average translation (left) and rotation (middle) errors and the processing times (right) are plotted as the function of the required confidence.For these tests, the maximum iteration number was set to 1000000.It can be seen that GC-RANSAC leads to the most accurate results and it is the least sensitive one to the confidence set.GC-RANSAC with MAGSAC++ scoring is the fastest method.
In the bottom row of Fig. 13, the average translation (left) and rotation (middle) errors and the processing times (right) are plotted as the function of the maximum iteration number.For these tests, the confidence was set to 0.99.It can be seen that GC-RANSAC leads to the most accurate results and it is the least sensitive one to the maximum iteration number.GC-RANSAC with MSAC scoring is the fastest method being slightly faster than MAGSAC++ scoring.

Effect of Spatial Coherence Weight
In Fig. 15, the effect of parameter is shown.For each problem, the relative average error, i.e., divided by the maximum average error, is plotted as the function of 2 ½0; 1.For relative and 6D object pose fitting, the rotation (R) and translation (t) errors are shown by different curves.For all problems, the most accurate results are obtained when is relatively high.Interestingly, the most notable improvement is achieved for homography fitting, while the gain on the other tested problems is around 10 À 15 percent.In all experiments, is set to 0.975 since that leads to accurate results on all problems and all datasets.It is however straightforward to tune whenever GC-RANSAC is applied to reflect spatial coherence properties of data in a particular domain.

Summary of the Experiments
The average errors and failure ratios on all datasets and problems are shown in the last two rows of Table 1.On average, Graph-Cut RANSAC leads to the most accurate results and the fastest robust estimation on all datasets on four vision problems.On 10 out of the 14 datasets, it fails to find the sought model parameters the least often.In the remaining cases, it fails only marginally more often than the best method.While NG-RANSAC shows comparable accuracy on relative pose fitting, its processing time is two orders of magnitude higher than that of GC-RANSAC.Using MAGSAC++ scoring inside GC-RANSAC leads to a significant improvement, in terms of accuracy and failure rate, for almost all datasets.For all the tested datasets and problems, setting the required confidence in the solution to 0.99 and the maximum iteration number to $3000 leads to the most accurate results.We included the results of [58], [59] in the supplementary material, available online.Additional experiments on different datasets can be found in [60].

CONCLUSION
We presented the Graph-Cut RANSAC algorithm which combines the strands of robust model fitting and energy minimization.GC-RANSAC is capable of modelling spatially coherent point distributions, and exploits this property in a local optimization procedure.It is more geometrically accurate than state-of-the-art robust estimation.It runs in real-time for many problems at a speed similar to its less accurate alternatives.It is much simpler to implement in a reproducible manner than many of the competitors (RANSACs with local optimization).The inlier selection in the local optimization, given the so-far-the-best model, is globally optimal.Two new parameters are introduced in GC-RANSAC, the neighborhood size and weight , which are easy to set.If ¼ 0 or the neighborhood size is too small, the algorithm acts as a wellimplemented LO-RANSAC.Otherwise, if 2 0; 1 ð Þand a reasonable neighborhood size is used, the results are superior to LO-RANSAC and USAC.On the tested problems and datasets, ¼ 0:975 leads to the best performance with a neighborhood size assigning 3-4 neighbors, on average, to each point.
The C++ and Python implementations of Graph-Cut RAN-SAC are available at https://github.com/danini/graph-cutransacincludingall components tested in the paper and examples for homography, fundamental matrix, relative and 6D object pose estimation.

Fig. 1 .
Fig. 1.Inlier correspondences (green dots) of a rigid motion model, i.e., a fundamental matrix, initialized by a minimal sample (a).Inliers obtained by (b) standard thresholding of the residual; (c) the proposed graph-cut-based selection considering spatial coherence.All other points are marked by gray circles.The graph-cut-based selection (c) returns more inliers compared to the traditional thresholding (b).

Fig. 3 .
Fig. 3.The effect of spatial coherence weight on the inlier selection of a 2D line.The inliers (red points) of a line (green) initialized by a minimal sample (blue crosses) are shown.The top row shows the results of a single graph-cut run using different values for when the Potts model (5) is applied.The bottom one shows the labeling results of a single graph-cut run when using the proposed spatial coherence model(6).The inlier-outlier threshold is shown by green dashed lines.The edges of the neighborhood graph are grey line segments.

Fig. 4 .
Fig. 4. Example image pairs from the datasets used for homography estimation evaluation; with inlier correspondence visualization.

Fig. 5 .
Fig. 5. Example scenes from the datasets used for 6D pose estimation.(Left) The input images passed to the EPOS method [45].EPOS returns a set of 2D-3D correspondences and object masks.(Right) The 3D objects rendered using the poses estimated by GC-RANSAC from the predicted 2D-3D correspondences.Courtesy of T. Hodan.

Fig. 9 .Fig. 10 .
Fig. 9. Confidence study.The avg. log 10 error (in pixels) and the run-time (in milliseconds) are plotted as the function of the required confidence.The max. iteration number was 1000000.

Fig. 12 .Fig. 13 .
Fig. 12. 6D object pose estimation.The cumulative distribution functions of the rotation (left column; in degrees) and translation (right; in millimeters) errors on three datasets (rows) are shown.Being accurate is interpreted as a curve close to the top-left corner.The confidence and max.iteration number were set to 0.99 and 5000, respectively.

Fig.
Fig. Relative pose fitting.The cumulative distribution functions of the rotation and translation errors (both in degrees) on four datasets are shown.Being accurate is interpreted as a curve close to the top-left corner.The confidence and maximum iteration number were set to 0.99 and 5000, respectively.

Fig. 15 .
Fig. 15.The relative average errors, i.e., divided by the maximum avg., are plotted as the function of .The values are calculated from all datasets.The vertical dashed line denotes the value, i.e., 0.975, where the average error summed over problems is

TABLE 1 The
Errors and Failure Ratios (in Percentage) are Reported for All Methods (1st Row) on All Problems (1st col.) and Datasets (2nd col.)