Practical, Fast and Robust Point Cloud Registration for 3D Scene Stitching and Object Localization

3D point cloud registration ranks among the most fundamental problems in remote sensing, photogrammetry, robotics and geometric computer vision. Due to the limited accuracy of 3D feature matching techniques, outliers may exist, sometimes even in very large numbers, among the correspondences. Since existing robust solvers may encounter high computational cost or restricted robustness, we propose a novel, fast and highly robust solution, named VOCRA (VOting with Cost function and Rotating Averaging), for the point cloud registration problem with extreme outlier rates. Our first contribution is to employ the Tukey's Biweight robust cost to introduce a new voting and correspondence sorting technique, which proves to be rather effective in distinguishing true inliers from outliers even with extreme (99%) outlier rates. Our second contribution consists in designing a time-efficient consensus maximization paradigm based on robust rotation averaging, serving to seek inlier candidates among the correspondences. Finally, we apply Graduated Non-Convexity with Tukey's Biweight (GNC-TB) to estimate the correct transformation with the inlier candidates obtained, which is then used to find the complete inlier set. Both standard benchmarking and realistic experiments with application to two real-data problems are conducted, and we show that our solver VOCRA is robust against over 99% outliers and more time-efficient than the state-of-the-art competitors.


I. INTRODUCTION
With the development of the 3D measurement and scanning technologies (e.g. LiDAR scanners, 3D sensors), point cloud registration, which seeks to estimate the best rigid transformation (including rotation and translation) between multiple 3D point clouds or scans, becomes an increasingly important building block in remote sensing, photogrammetry, robotics perception and computer vision, and has found extensive applications in 3D reconstruction [1]- [3], object recognition and localization [4], [5], SLAM [6], medical imaging [7], etc.
To address the point cloud registration problem, Iterative Closest Point (ICP) [8] has been a well-known solver, but its downside lies in its high dependence on the initial guess of the rigid transformation. If the initial information given is poor, it is likely to converge to local minima and fail. Hence, correspondence-based registration methods that can be free of initial guess is growing increasingly popular. It *Corresponding author. This work was not supported by any organization. 1  leisunjames@126.com consists in first matching keypoints between point clouds to construct putative correspondences and then estimating the best transformation using robust estimators.
It is known to all that the 3D keypoint matching technique (e.g. FPFH [9], ISS [10]), unlike its 2D counterparts (e.g. SIFT [11], SURF [12]), is very challenging in practice since we may encounter low texture, noisy environment, partiality, density variance, and cluttered or repetitive patterns in the 3D space, which may yield huge numbers of outliers lurking in the correspondence set established. According to [13], having extreme outlier rates (e.g. over 95%) is not uncommon in real-world scenes. Thus, various robust estimators [13]- [20] are employed to reject outliers, but unfortunately, many of them suffer from issues like high computational cost or limited robustness.
RANSAC [14] and Branch-and-Bound (BnB) [16], [21] are two well-known paradigms which, in essence, achieve robust estimation by maximizing the consensus set. However, both of the solvers have exponentially increasing runtime, where the former scales poorly with the outlier rate while the latter has the worst-case exponential time w.r.t. input size. Thus, the limitation in efficiency seems to be a fatal defect of them for practical use. More recently, Graduated Non-Convexity (GNC) is an effective global method for robust outlier rejection, typical examples including FGR [18] and GNC-TLS/GM [17]. Unfortunately, GNC solvers usually fail at no more than 90% outliers, so it is too limited in robustness for high-outlier practical problems. In addition, GORE [13], is a outlier removal approach that guarantees to only remove true outliers, but it still requires long runtime in the low-outlier regime and is also likely to have exponential time due to the probable use of BnB within it. Though TEASER [19], [20] is a current highly robust solver, it still requires external maximal clique algorithms that may be slow to run without parallelism programming, especially when the putative correspondences are in abundance.
In this work, our goal is to design a new solver which can tolerate extreme outliers (e.g. over 99%), has promising time-efficiency even with high outlier rates, and does not need any additional information (e.g. initial guess). To this end, we get inspirations from two aspects: (a) the line voting technique in [22] to sort the correspondences and (b) the fast robust single rotation averaging solver [23], and propose a novel robust registration solver named VOCRA (VOting with Cost function and Rotation Averaging).
The contributions of this paper include: (a) we introduce the Tukey's Biweight cost function in combination with the concept of GNC (GNC-TB) and pro- pose a voting technique based on it, which proves to be more effective in extreme-outlier regime than simple 0-1 voting; (b) we present a novel consensus maximization framework using robust rotation averaging to rapidly seek the inlier consensus set; (c) we apply GNC-TB for further robust optimization of the consensus set and use its solution to find the complete inlier set.
These three contributions lead to our robust solver VOCRA. Finally, we comparatively evaluate VOCRA in benchmarking and real-data experiments with applications to real-world problems against existing state-of-the-art robust registration solvers.
The rest of this paper is organized as: Section II provides some concise reviews on related methods for robust registration, Section III provides some preliminaries and mathematical notation for the registration problem and our method, Section IV introduces GNC-TB for both robust estimation and weighted voting, Section V elaborates on the proposed method VOCRA, and Section VI evaluates the performance of VOCRA through multiple experiments, followed by the conclusions in Section VII.

II. RELATED WORK
We review the related prior works w.r.t. the two key components of our solver: consensus maximization and Mestimation, as summarized in Table I.

A. Consensus Maximization
RANSAC [14] is probably the most popular robust estimator, which maximizes the consensus set using the iterative hypothesize-and-test framework: first making a hypothesis with minimal random samples and then testing the quality of its consensus. The variants of RANSAC have been proposed to enhance the performance (e.g. local optimization [15], [24], correspondence sorting [25]) against outliers. Nonetheless, the computational cost of the RANSAC-family solvers grows exponentially with the outlier rate, unsuitable for high-outlier problems. BnB [16], [21] is another consensus maximization approach, which searches in the parameter space to globally solve the optimization. Though with optimality guarantee, BnB's runtime is exponential to the input size (correspondence number), too slow for large problems. Moreover, GORE [13] realizes consesnsus maximization via guaranteed outlier removal, but it may also encounter long runtime due to its possible use of BnB as a subroutine. Our solver shares small similarity with RANSAC since minimal models are estimated and best consensus set is iteratively updated in both solvers. But unlike RANSAC, VOCRA maximizes the consensus by fast rotation averaging over minimal rotations through enumeration, which is deterministic and tremendously faster for use since: (a) the scale-invariant constraint is introduced for subset filtering and saving much computational cost during consensus building, and (b) the enumeration process is based on the inlier reliability of correspondences rather than being random.

B. M-Estimation
M-estimation consists in diminishing the effect of outlying data by optimizing over the cost functions (or called the loss functions). Iterative local solvers [26]- [28] have been applied to optimize M-estimation problems in earlier time, but they require initial guess and may get trapped into local minima. More currently, Graduated Non-Convexity (GNC) [17] is revisited and developed to work in conjunction with Black-Ragaranjan Duality [29] to reject outliers by adopting standard non-minimal solvers in multiple geometric vision problems. GNC does not require any initial guess but its downside is that it generally can only succeed with no more than 90% outlier rates, hence brittle in harsh outlier situations. Our solver adopt the theory of GNC for two purposes: line voting for pre-proccessing and consensus refining (robustifying with only low outlier rates) for postproccessing.

A. Problem Formulation
Given two corresponding point sets: where p i ↔ q i consitutes a point correspondence, the point cloud registration problem seeks to find the best rotation R ∈ SO(3) and translation t ∈ R 3 that align P and Q. This problem can be formulated as a consensus maximization problem such that where ξ denotes the inlier threshold or noise bound satisfying ξ ≥ ε , which is related to the standard deviation σ of noise and used to differentiate inliers from outliers, and I is the consensus set of the optimal transformation (R , t ).

B. Graduated Non-Convexity and Black-Ragaranjan Duality
In [17], Graudated Non-Convexity (GNC) is employed to develop a practically feasible and time-efficient solution for the robust estimation with outlier process (that is, Black-Ragaranjan Duality [29]). We summarize this technique as follows: Lemma 1 (Black-Ragaranjan Duality with GNC): As for the following correspondence-based robust estimation problem with outlier process [29]:  [14] Consensus Maximization Random sampling and model fitting Fast with low outliers, but slow with high outliers BnB [16], [21] Consensus Maximization Parameter-space searching Globally optimal, but slow with large input size GORE [13] Consensus Maximization Guaranteed outlier removal Globally optimal, but slow with large input size GNC [17], FGR [18] M-Estimation Non-minimal, alternating and iterative optimization Fast, but limited in robustness VOCRA Consensus Maximization & M-Estimation Voting, consensus maximizing, and then M-estimation Fast and highly robust min where x ∈ K denotes the variables within domian K, p i ↔ q i (i = 1, 2, . . . , N ) is the correspondence, ω i is the weight for the residual error r i (p i , q i , x) w.r.t. this correspondence that can also be abbreviated as r i , and Ψ(ω i ) is the outlier process corresponding to a certian robust cost function ρ(r i ) which serves as a penalty function over weight ω i , we now introduce a surrogate function based on GNC, writable as ρ µ,ξ (r i ) that is jointly defined by a controlling parameter µ and the inlier threshold ξ, for function ρ(r i ).
In this case, as µ gradually changes (usually monotonically increasing or decreasing), surrogate function ρ µ,ξ (r i ) starts from a approximately convex status and gradually approximates the original ρ(r i ) with becoming increasingly nonconvex. The optimization process can be operated alternately through iterations, and in each iteration, we first fix ω i and optimize x with a non-minimal solver, and then fix x and solve (optimize) ω i (usually in closed form). With µ continuously changing to push ρ µ,ξ (r i ) to recover ρ(r i ), the optimal solution x and its corresponding weights ω i can be approximated iteratively.

C. Robust Single Rotation Averaging
Robust single rotation averaging is a well-known problem in geometric vision, broadly applied to Structure-from-Motion [30], attitude estimation [31], etc, which seeks to estimate the best (average) rotation from a group of rotations including possible outliers.
Lee et al [23] provided an efficient chordal-distance-based robust approach for single rotation averaging, and its problem can be formulated as: where M is the total number of rotations (including outliers) given, and dis chord denotes the chordal distance between two rotations that can be given by: where F denotes the Frobenius norm and ∠ geo ( , ) means the geodesic distance [32] between rotations that is written as: The explicit solver can be found in Algorithm 2 of [23], which runs in milliseconds with hundreds of rotations, rapid for practical use. In the following paper, we uniformly use function robustLeeChordal(R) to represent this solver, where denotes the given group of rotations as the input.
IV. GNC-TB: FROM ROBUST ESTIMATOR TO WEIGHTED VOTING OPERATOR We now introduce a GNC-based Black-Ragaranjan Duality formulation for robust estimation by adopting the Tukey's Biweight (TB) robust cost function as the outlier process, named GNC-TB, which is derived based on [17], [29]. First, according to Figure 25 in [29], the TB cost can be written as and its robust objective with outlier process is writable as: where the explicit definition of x and z can be found in [29], and we replace 'σ' in [29] with c here to avoid confusion with the standard deviation of noise σ in this paper. Based on ρ(x) and Ψ(z), we can define GNC-TB as follows: Proposition 1 (GNC-TB): Following the robust problem (2), we introduce a surrogate function for the TB cost by adopting µ as the controlling parameter and ξ as the inlier threshold such that We initiate µ with a sufficiently large value to make ρ µ,ξ (r i ) approximately convex, indicating that the tolerance to residual errors is very lenient at first, and then we continuously diminish µ to gradually decrease the convexity and also make the filtration of residuals gradually stricter. When µ approximates 1, the original non-convex TB cost is recovered. Minimizing function (8) is equivalent to the following outlier process (with µ and ξ): Besides, we can update the weights ω i in each iteration in closed-form such that Algorithm 1: votingTB and (p j , q j ) using (14); Proof 1: Following the traditional TB function as in (6), in order to derive its GNC-based formulation, we can replace the fixed c the varying µξ 2 to build our surrogate function (8) controlled by µ. Hence, the outlier process w.r.t. the surrogate function can be written as where Ψ µ,ξ is given in (9). Then, we can derive its gradient w.r.t. ω i such that and by letting the gradient ∂E(ri,µ,ξ,ωi) ∂ωi be zero, we can have: which serves as the updating principle of the weights ω i . We derive GNC-TB for two purposes: (a) the TB function underlies our voting process for correspondence sorting and is the cost function that yields the best results for weighted voting in Section V-A, and (b) GNC-TB is taken as the robust outlier rejection approach in Section V-C.

V. THE PROPOSED METHODOLOGY
The proposed method VOCRA mainly consists of three key steps in the following three subsections: (a) line voting with the TB robust cost and sorting correspondences according to the probability to be inliers (Section V-A) , (b) maximizing the consensus set based on robust rotation averaging (Section V-B), and (c) finally removing outliers and seeking true inliers with GNC-TB (Section V-C).

A. To Sort the Correspondences: Voting with Robust Cost Function
After the 3D descriptors or feature matchers have established the putative correspondences between point clouds, we can start by raising a question: is it possible to sort these correspondences according to their respective quality (e.g. possibility to be true inliers) without any extra information required?
The answer must be yes. Line voting [22] is a simple voting technique based on the scale-based invariant established over pairwise point correspondences that has been used to roughly reject outliers in various previous works [17], [19], [33]- [35]. In line voting, each pair of correspondences (p i , q i ) and (p j , q j ) can be evaluated by condition: | p i − q i | ≤ 2ξ, and if this condition is satisfied, each of these two correspondences will get 1 vote. Through the voting process by every single correspondence pair, we are then able to sort the correspondences according to their respective total numbers of votes obtained because the more votes one correspondence can get, the more reliable (or likely) it is to be a true inlier.
However, this strategy only performs either 0 or 1 voting for the correspondences, which poorly reflects the realistic distribution of noise (usually assumed to be Gaussian [17]- [20], [36]). As a result, though line voting works well with relatively lower outlier rates, it would become brittle once the outlier rate grows extreme (e.g. more than 95%), since outliers that are in dominance are likely to get even more votes than the inliers that are only in the minority. Therefore, our goal is to design a more effective voting strategy to increase the votes of true inliers and meanwhile truncate that of outliers, so as to make the correspondence sorting process more reliable even with extreme outlier rates.
To this end, we render a new idea, that is, to vote with the robust cost functions, typically including Leclerc (Welsh), Cauchy (Lorentzian), Geman-McClure (GM), Truncated Least Squares (TLS), Tukey's Biweight (TB), etc, which has been widely applied to reject outliers in conjunction with M-estimation or GNC [17]. (The explicit expressions of these cost functions can be found in [29], [37].) These functions can endow the correspondences with weights based on their input residual errors, where specifically, they set small or even zero weights for correspondences with high residual errors while providing large weights for those with low residual errors, which has empirically proved to suit well with the Gaussian distribution of noise in practice.
Before formally discussing the voting technique, we first start with introducing an inequality condition built upon the invariability of the scale (or say, the distance) between pairwise point correspondences as in [22].
Proposition 2 (Pairwise Scale-invariant Condition): For any pair of correspondences (p i , q i ) and (p j , q j ) that are both inliers, they must satisfy the condition such that where ξ is still the noise bound: ε ≤ ξ in (1). Proof 2: Based on the triangular inequality and the fact that the norm is invariant to rotation R, we can derive that where ε ≤ ξ, so it is apparent to have S ij ≤ 2ξ as in (14).
The next step is to compute S for every pair of correspondences. Now we let v i and v j denote the numbers of votes w.r.t. correspondence i and j, respectively. When we obtain S ij , v i and v j will be simultaneously updated based on the TB function such that where we generally set u = 1.5 for allowing a slightly more lenient inlier threshold here. When all the correspondence pairs have been involved in the voting above, we can sort the correspondences according to their respective numbers of votes v. The implementation details of our voting technique is specified in Algorithm 1.= Besides the voting process discussed above, there is one additional point in Algorithm 1 worth discussing. We supplement a Boolean (0 or 1) parameter eIn (abbreviation for 'enough inliers') to greatly shorten the runtime of our solver in the case of low outlier rates. Here, we define that when one correspondence is able to get over 0.2N votes within the first 20 iterations, indicating that inliers are abundant, we could immediately stop voting since it seems no longer necessary. In this case, we can directly feed the correspondences to our consensus maximizer (Section V-B) because the inliers are not sparse and can be found easily even without the correspondences being sorted. This earlytermination operation can save much time in practice, making our solver run in milliseconds with no higher than 80% outliers (will show in Section VI).
1) Why Choosing Tukey's Biweight?: Readers may wonder why we only choose the TB cost for voting. This is because TB generally displays the best voting and sorting results in relation to the other robust cost functions. Now we provide an empirical demonstration to display its superiority. Figure 2 shows several examples of the indices of the true inliers among the sorted correspondences using various robust cost functions, including traditional 0-1 as in [22], Geman-McClure (GM), Leclerc, Cauchy, Truncated Least Squares (TLS), and TB, respectively. For example, if a true inlier is the third element in the sorted correspondence set d after voting, then its index should be 3. The experimental setup is the same as the benchmarking in Section VI-A, and we test different situations with varied correspondence numbers N and outlier rates, where for each situation, we conduct 10 random trials.
We can clearly observe that with N = 1000 and 95% outlier rate, most of the cost functions yield promising voting results, whereas when the outlie rate increases to 97% and even up to 99%, some cost functions may fail to make the true inliers rank at the top of set d. For instance, the 0-1 voting yields bad sorting results in the trial 9 of Figure 2(e) and the trial 4 of Figure 2(f) since the true inliers are distributed far from the top elements of set d. However, through TB-based voting, true inliers generally are shown to have the greatest tendency to distribute in the forefront (top) of set d. In other words, TB voting most often generates the lowest and shortest box in these trials, therefore capable of rendering the most reliable sorting results.
In sum, there is no doubt that 0-1 line voting is sufficient Algorithm 2: maxRotConsensus for low-outlier cases; however, when the outlier rate goes extreme as in Figure 2(d-f), 0-1 voting would become unable to sort the correspondences reliably since there exist so many outliers that some outliers may even get more votes than true inliers. Hence, we need to introduce the TB cost in voting so as to maximize the credibility of the voting and sorting results.

B. To Maximize the Consensus Set: Robust Single Rotation Averaging
Solving the point cloud registration problem with outliers, in essence, consists in maximizing the correspondence set in which all the correspondences can reach a consensus on the model (to be specific, the rigid transformation: R and t), as discussed in (1).
The popular consensus maximizer RANSAC usually suffers from two disadvantages: (a) the probability of selecting an all-inlier subset could be rather low when the outlier rate is high, and (b) the residual errors w.r.t. all the correspondences must be computed to build the consensus set in each iteration of sampling, taking much time with large problems.
Nonetheless, in subsection V-A, we managed to sort the correspondences according to their inlier-reliability with the TB cost function, so that the probability to obtain all-inlier subsets would be significantly increased if we sample in the order of d.
Therefore, in this subsection, we propose a novel con-sensus maximization method for robust registration on the basis of robust rotation averaging and the scale-invariant constraint (14), which brings two benefits: (a) the times of computing minimal rotations will be greatly reduced and (b) building the consensus set will become much less time-consuming than traditional random sampling and model fitting.
We introduce our consensus maximization method with pseudocode given in Algorithm 2.
1) Description of Algorithm 2: First of all, we evaluate the eIn obtained from Algorithm 1. If eIn = 0 which means the full correspondence set is sorted as d, our consensus will be found and maximized within the top 0.2N correspondences in d. And if eIn = 1 which means we have enough inliers and the sorting is not fully conducted, our consensus will be sought directly from the full correspondence set of size N (lines 1-5). Note that since our consensus maximization method generally runs much faster than correspondence sorting in Section V-A, sacrificing the probability of finding an all-inlier subset here to reduce the time for correspondence sorting is worthwhile.
We enumerate a pair of correspondences (say i and j) following the order of d (lines 7-8) and compute its S ij . If S ij satisfies condition (14), we continue to sample the third correspondence (say k, also in the order of d) and check whether S ik and S jk can both fulfill condition (14). If yes, we form a 3-point set with i, j and k, based on which rotation R ijk is estimated minimally using Horn's 3-point triad-based solver [38] (lines 9-12). Then, for correspondences i and j, we consecutively sample k from d and examine them with (14); for all k that pass the tests, we store their minimal rotations R ijk and indices k into R and K, respectively (line 13). Intuitively, this strategy uses the pairwise scale-invariant condition to examine the 3-point sets during enumeration, and only the eligible ones could be forwarded to consensus maximization, which, as a result, saves plenty of time in (a) minimal rotation estimation and (b) building of consensus set.
Note that the rotation-based consensus set is established by finding all the rotations in R • that have chordal error (distance) lower than θ with the averaged rotation R • . As for the proper choice of θ, we drive it based on the geodesic error. Assume a minimally solved rotation R whose correspondences in the minimal subset are perturbed by noise ε ∈ R 3 , writable as R * = R · Exp([ε R ] × ), where noise ε has standard deviation σ, [ ] × denotes the skew-symmetric matrix of a size-3 vector, and Exp is the exponential map of matrix. According to the properties of the geodesic distance in [32], we can derive that where θ geo is the geodesic threshold. For an empirical choice of θ geo , we usually let ε = δy, where y ∈ R 3 is a random unit vector and usually δ = 10 σ D (here D is the Update weights ω based on (10); Then we can obtain the equivalent threshold θ in chordal distance according to (4) such that: As for a certain correspondence pair (i, j), when we obtain over I −3 eligible k, meaning that we have sufficient 3-point sets passing the scale-invariant condition (14), we can then use Lee's chordal-distance solver [23] to robustly average the rotation group R and find all rotations from R that are in consensus with the averaged rotation R • (lines [14][15][16]. Subsequently, if the size of this consensus set exceeds the so-far-the-best consensus size, we update the best consensus set with the current one (lines [17][18]. Here, note that we do not always need to enumerate all the possible 3-point sets within the top 0.2N correspondences in d. Usually, when we obtain the minimum inlier number expected, which is set as I when eIn = 0 and is set as 1.5I when eIn = 1 (line 4), we can directly stop the enumeration and return the currently best K • plus i and j as the inlier set candidate I (lines [19][20][21][22]. Typically, we set I = max(0.05N, 5) for N ∈ (0, 200), I = 0.04N for N ∈ [200, 300), I = 0.03N for N ∈ [300, 500), I = 0.02N for N ∈ [500, 1000), I = 0.01N for N ≥ 1000. This strategy is applied to terminate our solver earlier in case that the outlier rate is not extremely high, while if the actual inlier number is smaller than I, this solver will continue to return tha maximum consensus just as expected.

C. To Further Prune Outliers: GNC-TB
only in small numbers. Thus, we can apply our GNC-TB framework (Proposition 2) to find the true inliers from set I . First, we compute the centroids w.r.t. the two point sets , respectively, as: where i ∈ I so that we can derive the translation-free objective function to minimize such that where we adopt the Singular Value Decomposition (SVD) non-minimal solver [41] to solve R efficiently in closed form. Subsequently, we can tailor the SVD solver to GNC-TB in order to robustly reject the outliers in the inlier set candidate I , which can be operated as in Algorithm 3.
controlling parameter µ to be large (e.g. µ = 100). Then, we start the GNC iterations. In each iteration, with current ω i , we first estimate the rotationR based on objective (20) using SVD, and then update the weights ω i for the next iteration according to the residual errors computed. After that, we decrease µ by µ = µ/1.2. In this manner, we stop iteration once the residual errors (or the objective) converge or µ is smaller than 1.

D. Main Algorithm: VOCRA
Up to now, we can render the pseudocode for the main algorithm of VOCRA in Algorithm 4. To summarize, VOCRA is implemented as follows: (a) using votingTB (Algorithm 2) to sort correspondences, (b) using maxRotConsensus (Algorithm 2) to maximize the consensus set, (c) using solveG-NCTB to obtain a correct estimate (Algorithm 3), and (d) find the complete inlier set (line 4).

VI. EXPERIMENTS AND APPLICATIONS
In this section, we test and evaluate our solver VOCRA in both standard benchmarking and realistic application experiments. All the experiments are implemented in Matlab on a laptop with a i7-7700HQ CPU and 16GB of RAM. We compare VOCRA against multiple existing state-ofthe-art robust registration solvers, including RANSAC [14], FLO-RANSAC [24] (a variant of RANSAC with speededup local optimization, also known as LO + -RANSAC elsewhere), FGR [18], GNC-TLS [17], ADAPT [40], GORE [13] and GORE+RANSAC (using RANSAC to further refine the remaining correspondences after the outlier removal of GORE). BnB [16] is not adopted for evaluation since it runs in hours with 1000 correspondence, and TEASER [17], [19] is also not used because there is no standard maximum clique solver in Matlab. Note that all the solvers are implemented with single thread and no parallelism programming is applied.
In order to quantify the accuracy of estimation, we adopt the following errors for rotation and translation such that where gt denotes the ground-truth values and denotes the optimal values solved.

A. Standard Benchmarking over Real Point Clouds
We first comprehensively benchmark all the robust solvers in a fair experimental environment to comparatively evaluate their robustness against outliers, accuracy of estimation and efficiency.
1) Environmental Setup: We adopt bunny, armadillo and dragon from the Stanford 3D Scanning Repository [  2) Benchmarking Results: From the results in Figure 3, 4 and 5, we find that the non-minimal robust solvers (FGR, GNC-TLS and ADAPT) as well as the RANSAC solvers (RANSAC and FLO-RANSAC) all fail at 95% outliers, while GORE (GORE+RANSAC also) and our VOCRA are highly robust against as many as 99% outliers. More importantly, VOCRA has the highest accuracy (in line with GORE+RANSAC) with all outlier rates, and is 2 or 3 orders of magnitude faster than GORE when the outlier rate is not high (≤80%) and several times faster than GORE when the outlier rate is extreme (>95%), showing the most state-ofthe-art performance. Specifically, with 99% outliers, VOCRA can return the accurate results in 3 seconds.
3) On-Surface Benchmarking: In addition, since in realworld applications the outliers are highly likely to lie on the surface of the registered 3D objects, we supplement another benchmarking experiment where all the outliers are generated on the surface rather than randomly in the 3D sphere, as demonstrated in Figure 6. We can see that the  Fig. 9: Quantitative results of both normal and partial registration over the 10 point clouds using RANSAC [14], FLO-RANSAC [24], FGR [18], GNC-TLS [17], ADAPT [40], GORE [13], GORE+RANSAC and VOCRA, corresponding to Figure 7 and 8, respectively. results are similar to that in the standard benchmaring and our VOCRA is still the most outstanding solver overall.
B. Realistic Point Cloud Registration 1) Normal Registration: In addition to benchmarking the solvers with artificial outliers and noise, we conduct realistic registration experiments by using the 3D feature descriptor FPFH [9] (function extractFPFHFeatures in Matlab) to generate the correspondences.
Apart from the three point clouds in Section VI-A, we also include the cheff, chicken, rhino, parasauro, and Trex from Mian's dataset [43], [44], and the city and castle scans from the ETHZ LiDAR dataset [45]. For each point cloud, we conduct 10 independent runs where in each run we transform it with a new random rigid transformation and use FPFH to match the correspondences. Then, the raw putative correspondences are directly fed to the different solvers with σ = 0.003 and θ = 0.15 constantly.

Algorithm 4: VOCRA
, thresholds ξ 1 ← 3σ and ξ 2 ← 5σ, chordal threshold θ, and minimum inlier number I; Output: best rotation R and inlier set I in ; 1 Vote with TB cost function: Maximize the consensus set using rotation averaging: 3 Make a correct estimate for the rotation: UseR to find the final consensus (inlier) set I in from all N correspondences and compute the best rotation R with I in using SVD; 5 return best rotation R and inlier set I in ; Figure 8 shows the examples (one example for each point cloud) of qualitative registration results by projecting the initial point cloud to the transformed point cloud with the transformation estimated by the respective solvers. VOCRA is able to estimate the exact transformation(s), making the projected initial point cloud overlap with the transformed one so well that no deviation can be observed. Besides, quantitative statistics over the 10 runs are supplemented in Figure 9(a), where VOCRA can always render the lowest estimation errors and the shortest runtime just as in the benchmarking experiments.
2) Registration with Low Overlapping: In real-world applications, partiality or low overlapping ratio between point clouds is a common but challenging issue. Hence, we supplement partiality registration experiments to further evaluate the solvers. This time we only preserve a small portion of the initial point cloud and fix the overlapping ratio as ≈30%, and then use FPFH to establish correspondences, which is highly likely to triggers high outlier rates.
Even if the outlier rates are high (even more than 98% in some cases) due to the low overlapping ratio, we can see that VOCRA is still the most (or at least one of the most) robust and efficient solver throughout the partiality experiments.

C. Real-Data Applications
To validate the practicality of VOCRA in real scenes, we test it on two application problems: scan matching and 3D object localization over real-world datasets.
1) Scan Matching: Scan matching (also called scene stitching) is an important problem in 3D reconstruction and loop closure detection (SLAM). We evaluate our VOCRA over the kitchen scene from the Microsoft 7-scenes dataset [42] in comparison with a RANSAC-based solver FLO-RANSAC, a non-minimal solver GNC-TLS, and a combination solver GORE+RANSAC (the most robust solver excluding VOCRA in the benchmarking).
Each time we select one pair of scenes (with overlapping regions), downsample them with box grid filter of size 0.02, and then use FPFH to establish the correspondences. Since GORE generally gets slower with increasing correspondences, we set the maximum correspondence number to N = 1500. After that, we feed these correspondences (images in the first column of Figure 10) to the four solvers with noise set to σ = 0.01 constantly.
We report the qualitative scan matching results over 6 scene pairs in Figure 10 where we show the FPFH correspondences and the inliers found as well as the registration results by the different solvers. We also provide the corresponding quantitative data in Table II, where we specify the N for each scene pair, and the number of inliers recalled, registration status and runtime by each solver. For the registration status, Large Error means that the registration is almost acceptable but with relatively high estimation errors, Fail means that the registration is completely wrong, and Succeed means that the registration is successful and with good accuracy, and this status is judged manually since the ground-truth relative poses between the scene pairs are not given.
According to Figure 10 and Table II, VOCRA succeeds in stitching all the scene pairs, while all the other competitors fail at least once. In Scene 1-2, we can see that there are only 13 inliers out of the 1500 correspondences (outlier rate is over 99.1%) recalled successfully by VOCRA, manifesting that VOCRA remains highly robust in practical application.
2) 3D Object Localization: In addition, we test VOCRA in the practical problem of localizing a 3D object with a RGB-D scene by adopting the RGB-D Scenes dataset [46]. We make use of the ground-truth labels provided to pick out and build the point cloud of the target object from the RGB-D scene, where we build three differently-shaped objects: cereal box, cap and table. Then we impose a random transformation on the object to generate an independent object in the 3D space. We employ FPFH to build correspondences between the scene and transformed target object. Afterwards, FLO-RANSAC, GNC-TLS, GORE+RANSAC and VOCRA are used to estimate the pose (transformation) between the object and the scene with noise all set to σ = 0.001.
We show the putative correspondences and the qualitative registration results (reprojecting the object back to the scene with the transformation solved) in Figure 11 and the supplementary quantitative results (estimation errors and runtime) in Table III. It can be clearly observed that only GORE+RANSAC and VOCRA can render the correct results in all scenes, while VOCRA has the highest accuracy all the time and most often has the best efficiency, which fully reflects the practicality of VOCRA in reality.

VII. CONCLUSION
In this paper, a novel, fast, and robust correspondencebased point cloud registration solver, named VOCRA, is  [46] dataset. The left-most column shows the correspondences matched by FPFH [9] where inliers are in green and outliers are in red. From left to right, we show qualitative reprojection results by FLO-RANSAC, GNC-TLS, GORE+RANSAC and VOCRA, respectively. presented. First, we sort the correspondences via voting based on the TB cost function, which proves to be superior to traditional line voting. Then, we perform rapid consensus maximization by using robust single rotation averaging, circumventing the building of consensus set in each iteration. Finally, we use GNC-TB to prune outliers and find the true inlier set efficiently.
We evaluate our VOCRA in vaired experiments on dif-ferent datasets, and based on the results, we can conclude that: (a) VOCRA is able to tolerate extremely high outlier rates (over 99%); (b) VOCRA is also significantly faster than other state-ofthe-art solvers (e.g. RANSAC, GORE); (c) VOCRA is applicable to, and also remains its high robustness in, the real-world application problems including scan matching and 3D object localization.
The main limitation of the proposed solver is that the time complexity of the voting process (Section V-A) is O(N 2 ), so with large problem size (correspondence number) N , VOCRA should require longer runtime.