A Polyhedral Annexation Algorithm for Aligning Partially Overlapping Point Sets

Point set registration aims to find a spatial transformation that best aligns two point sets. Algorithms which can handle partial overlap and are invariant to the corresponding transformations are particularly desirable. To this end, we first reduce the objective of the robust point matching (RPM) algorithm to a function of a low dimensional variable. The resulting function is nevertheless only concave over a finite region including the feasible region, which prohibits the use of the popular branch-and-bound (BnB) algorithm. To address this issue, we propose to use the polyhedral annexation (PA) algorithm for optimization, which enjoys the merit of only operating within the concavity region of the objective function. The proposed algorithm does not need regularization on transformation and thus is invariant to the corresponding transformation. It is also approximately globally optimal and thus is guaranteed to be robust. Moreover, its most computationally expensive subroutine is a linear assignment problem which can be efficiently solved. Experimental results demonstrate better robustness of the proposed method over the state-of-the-art algorithms. Our method’s matching error is on average 44% (resp. 65%) lower than that of Go-ICP in 2D (resp. 3D) synthesized tests. It is also efficient when the number of transformation parameters is small.


I. INTRODUCTION
Point set registration aims to find a spatial transformation that best aligns two point sets to a common coordinate system. It has extensive applications in 3D reconstruction, 3D localization, pose estimation, panorama stitching and medical imaging. Disturbances such as non-rigid deformation, positional noise, partial overlap and pose variation cause this problem difficult to solve. One way of achieving point set registration is by minimizing the objective of the robust point matching (RPM) algorithm [5]. By eliminating the transformation variable, [21] reduces the objective of RPM to a concave quadratic function of point correspondence with a low rank Hessian matrix. Then the branch-and-bound (BnB) algorithm with a low dimensional branch space is employed for optimization. But the method requires that one point set can be embedded in another point set, which may not be satisfied in many applications.
By relaxing this requirement, [19] reduces the objective of RPM to a concave function of point correspondence, which, albeit not quadratic, still has a low rank structure. [19] then employs the normal simplex algorithm, a variant of the BnB algorithm, for optimization. But this algorithm requires that the to-be-optimized function is concave over a set of simplexes whose union includes the feasible region, whereas the function at hand is not necessarily concave outside the feasible region. To address this issue, [19] enlarges the concavity region of the objective function by enforcing regularization on transformation where prior information about the values of the transformation needs to be supplied. Consequently, the method tends to yield transformation parameters biased towards the prior values. So the method is not invariant to the corresponding transformation and can not be used in applications where, e.g., rotation invariance is required.
To address this issue, the BnB based approach of [20] utilizes the constraints of 2D/3D rigid transformations to develop a new objective function of point correspondence which is always concave. Thus, no regularization on transformation is needed and the method is invariant to the corresponding transformation. Computation of the lower bound is the most time consuming subroutine of this algorithm. To render this subroutine efficient, [20] relaxes the original lower bounding problem into a linear assignment problem by dropping the branching constraints. The latter can be efficiently solved. However, this has the negative effect that the computed lower bound is loose and the method fails to converge to the upper bound. Consequently, early stopping is adopted and the method is only heuristic.
Aiming at aligning partially overlapping point sets, in this paper, we propose an alternative approach to optimizing the objective of RPM. Instead of employing the BnB algorithm for optimization as in [19], [20] which can lead to various types of difficulties as described previously, we employ the polyhedral annexation (PA) algorithm [11] for optimization. It can directly optimize the RPM objective function by only operating within the concavity region of the function. Therefore, no regularization on transformation is needed and the proposed method is invariant to the corresponding transformation. Also, unlike the BnB algorithm where branching constraints need to be taken into account to ensure tightness of the generated lower bounds, in the proposed PA algorithm, no such constraints exist and the core subroutine has a linear assignment formulation which can be efficiently solved. Finally, the proposed method is approximately globally optimal and thus is guaranteed to be robust. Our contributions are summarized as follows: 1) It is the first time that PA is introduced into the computer vision field. 2) Our method can directly optimize within the concavity region of the RPM objective function.
3) The core subroutine of the proposed method has a linear assignment formulation which can be efficiently solved. 4) Our method is versatile. It can handle partial overlap and is invariant to the corresponding transformation. It is also approximately globally optimal. 5) Experiments on 2D/3D synthetic and real data demonstrated favorable robustness of the proposed method over the state-of-the-art approaches. The rest of the paper is arranged as follows: In Sec. II, we review related work. In Sec. III and IV, we discuss our objective functions under two different types of transformations. In Sec. V, we present the PA algorithm. In Sec. VI, we present experimental results.

A. HEURISTIC METHODS
ICP based methods. ICP [2], [40] iterates between estimating point correspondence and transformation. It is fast but prone to be trapped in local minima due to the discrete nature of point correspondence. RPM [5] improves ICP by relaxing point correspondence to be fuzzily valued and uses deterministic annealing (DA) for optimization. But the employed DA is biased towards aligning the mass centers of two point sets. CDC [28] uses the covariance matrix of the transformation parameters to help recover point correspondence, resulting in better robustness to missing or extraneous structures. But the method is limited to simple transformations since the size of the covariance matrix is square times the number of transformation parameters.
GMM-based methods. These method are robust to noise and outliers since they align distributions. CPD [24] casts point matching as a problem of fitting a Gaussian Mixture Model (GMM) representing one point set to another point set. This method was later extended in a Bayesian setting in [10]. GMMREG [15] uses GMMs to represent both point sets and minimizes the L 2 -norm distance between the resulting GMMs. Support vector is used to generate Gaussian mixtures from point sets in [3], resulting in sparse Gaussian components. The efficiency of GMM based methods is improved by using filtering to solve the correspondence problem in [8].
The density variation problem of point sets is addressed in [17] by modeling the structure of scene as well. Hierarchical Gaussian mixture representation is used in [6] to improve the speed and accuracy of registration.

B. GLOBAL OPTIMIZATION METHODS
BnB based methods. BnB is a popular global optimization technique widely used in computer vision. It is used to align 3D shapes based on the Lipschitz optimization theory [18]. But the method assumes no occlusion or outliers. Go-ICP [14] uses BnB to optimize the ICP objective by exploiting the structure of the geometry of 3D rigid motions. The fast rotation search (FRS) method [1], [25] recovers rotation between 3D point sets by using stereographic projections. The general 6 degree rigid registration is accomplished by using a nested BnB algorithm. GOGMA registers two point sets by aligning the GMMs constructed from the original point sets [4]. A new type of rotation-invariant feature was proposed in [22], leading to a more efficient BnB based registration algorithm. Bayesian nonparametric point set representation and a new way of tessellating rotation space were proposed in [29]. BnB is used to optimize the RPM objective in [26], where branching over the correspondence variable and over the transformation variable are both considered. But the methods do not scale well due to lack of good structure for optimization.
Non-BnB based methods. FGR [41] optimizes a robust objective based on Black-Rangarajan duality between robust estimation and line processes. Mixture integer programming [13] and semidefinite programming [16], [23] have also been proposed for 2D/3D registration.

C. DEEP LEARNING METHODS
Correspondences-free methods. PointNetLK [37] uses the Lucas-Kanade (LK) algorithm to align global features of two point sets generated via a Pointnet [27] network. In a similar vein, FMR [12] uses an encoder to generate global features of two point sets and a decoder to convert the features back into point sets. LK is used to align the features and fidelity of the features is guaranteed by minimizing the Chamfer distances between the decoded point sets and the original point sets.
Correspondences-based methods. DCP [33] uses DGCNN [35] and Transformer [32] to extract features for each point set. Similarities of the features are used to determine point correspondences and SVD is used to recover rigid transformation. PRNet [34] uses keypoint-based solutions to handle partial overlap. But the ability to handle partial overlap highly relies on the quality of keypoint detection. RPM-Net [38] uses the differentiable Sinkhorn layer and annealing to get soft assignments of point correspondences from hybrid features learned from both spatial coordinates and local geometry. DeepGMR [39] designs a neural network to extract pose-invariant correspondences between point sets and GMM parameters.

III. CASE ONE: TRANSFORMATION IS LINEAR W.R.T. ITS PARAMETERS
Given two point sets X = {x i , i = 1, . . . , n x } and Y = {y j , j = 1, . . . , n y } in R n d to be registered, where the column vectors x i and y j denotes coordinates of points, following [19], we model point set registration as a mixed linear assignment−least square problem: where the transformation T (x i |θ) = J(x i )θ is chosen to be linear w.r.t. its parameters θ. Here J(x i ) is called the Jacobian matrix of x i . I n d denotes the n d −dimensional identity matrix. 1 nx denotes the n x −dimensional vector of all ones. ⊗ denotes the Kronecker product. diag(·) converts a vector into a diagonal matrix. For the correspondence matrix P, its element p ij = 1 indicates that there is a correspondence between x i and y j and p ij = 0 otherwise. The constraint 1 nx P1 ny = n p indicates that the number of matches is fixed to n p , a predefined positive integer.
Given P, E is a convex quadratic function of θ. Thus, the optimal θ minimizing E can be obtained by solving ∂E ∂θ = 0. After substituting the optimal θ into E, θ is eliminated and consequently we get a function in only one variable P: where the matrix J J (x 1 ), . . . , J (x nx ) and the vectors y y 1 , . . . , y ny , y y 1 2 2 , . . . , y ny . To facilitate optimization of E(P), P needs to be vectorized. We define the vectorization of a matrix as the concatenation of its rows into a column vector, denoted by vec(·). Let p vec(P). To obtain a concise form of E(p), new denotations need to be introduced. Let where n θ denotes the dimension of θ and matrix J 2 for any multiplicable matrices M 1 , M 2 and M 3 , we have [19]. Here e i d denotes the d-dimensional column vector with only one nonzero element 1 at the ith entry. W m,n d is a large but sparse matrix and can be implemented using function speye in Matlab.
With the above preparation, E can be expressed as a function of p as: where the operator mat(·) denotes reconstructing a symmetric matrix from a vector which is the result of applying vec(·) to a symmetric matrix. Thus, mat(·) can be viewed as the inverse of the operator vec(·). Since 1 nxny p = n p , a constant, for rows of B containing identical elements, the result of them multiplied by p can be replaced by constants. Besides, redundant rows can be removed. Since mat(Bp) is a symmetric matrix, B will contain redundant rows. Based on the above analysis, we hereby denote B 2 as the matrix formed as a result of B removing such rows. (Please refer to Sec. VI for examples of B 2 ). Consequently, E(p) can be rewritten as where the nonzero elements of the constant matrix ∆ correspond to the rows of B containing identical elements. The elements of the constant matrix K take values in {0, 1} and record the correspondences between the rows of B 2 not containing identical elements and the rows of B.
In view of the form of E(p), we can see that E(p) is determined by the variable B 2 , A , a p = Γ Q p, which in turn is determined by a low dimensional variable t Q p. Here QΓ denotes the QR decomposition of matrix B 2 , A , a with Γ being an upper triangular matrix and the columns of Q being orthogonal unity vectors. The specific form of E in terms of variable t is: where (Γ t) B2 denotes the vector formed by the elements of vector Γ t with indices equal to the row indices of the submatrix B 2 in matrix B 2 , A , a . Vectors (Γ t) A and (Γ t) ρ are similarly defined. VOLUME 4, 2016

IV. CASE TWO: 3D SIMILARITY TRANSFORMATION
If the framework presented in the previous section is directly applied to 3D registration problem, 3D affine transformation needs to be employed. However, Its high number of parameters will cause the algorithm to become rather inefficient. To address this issue, we instead focus on 3D similarity transformation due to its relatively low number of parameters and derive the corresponding objective function.
Following [20], we model point matching as a linear assignment−least square problem: where Ω denotes the feasible region of p, as determined by (2). Here the spatial transformation T (x i |s, R, b) = sRx i + b is chosen to be a similarity transformation with scaling factor s, rotation matrix R and translation b. The matrics X x 1 , . . . , x nx and Y y 1 , . . . , y ny and the vector x . Given P, s and R, E is a convex quadratic function of b. Thus the optimal b minimizing E can be obtained by solving ∂E ∂b = 0. After substituting the optimal b into E, b is eliminated and we obtain a new function without the variable b: where the vector x where the objective function Thus, the minimization of E(P, s, R) boils down to the minimization of E(P). Here the optimal R can be computed by first computing the singular value decomposition USV of , where S is a diagonal matrix and the columns of U and V are orthogonal unity vectors. Then the optimal R is R * = Udiag( 1, . . . , 1, det(UV ) )V [24]. By substituting R * back into (14), R is eliminated and we get a (possibly concave) quadratic program only in one variable s. Given the range of s as s ≤ s ≤ s, one can easily solve this univariate quadratic program by comparing the function values at the two boundary points s, s and the extreme point to obtain the optimal s. E(P) is characterized by the following proposition: Please refer to [20] for the proof.
To facilitate optimization of E(P), E(P) needs to be expressed in terms of vector p. To obtain a concise form of E(P), new denotations needed to be introduced. Let Based on formula (6), we have With the above preparation, E(P) can be rewritten in terms of vector p as In view of the form of E(p) in (21), we can see that E(p) is determined by a low dimensional variable being an upper triangular matrix and the columns of Q being orthogonal unity vectors. The specific form of E in terms of variable t is: where (Γ t) B1 denotes the vector formed by the elements of Γ t with indices equal to the row indices of the submatrix and (Γ t) a2 are similarly defined. One can deduce that the dimension of t is n 2 d + 2n d + 2.

V. OPTIMIZATION
The analysis in the previous section indicates that E(p) is linear w.r.t. the subspace L {p|Q p = 0}, i.e., For the case when the transformation is linear w.r.t. its parameters, by Proposition 1 in [19], we have that E(p) is concave over the spectrahedra Ψ {p|mat(KB 2 p)+∆ 0} ⊃ Ω.
For the case when the transformation is 3D similarity, we have that E(p) is always concave. Based on the above analysis, it is natural to use a decompostional version of the PA algorithm [31], a global optimization algorithm suitable for functions which have low rank structures and are concave over the feasible region, to optimize E(p).
To use this algorithm, we first give the definition of polars.
Definition 1. For any set D in R n , the set is called the polar of D.
One can deduce that the polar of a subspace is the same as its orthogonal complement. Some properties of polars are listed below.
Please refer to [31] for the proofs. Figure 1 illustrate the polar of a triangle.
for the case when the transformation is linear w.r.t its param- for the case when the transformation is 3D similarity. By concavity and continuity of E(p) (over Ψ in the case when the transformation is linear w.r.t. its parameters), C is a closed convex set. Since 0 ∈ intC, so C • is bounded. The essential idea of PA is to construct, starting from a polyhedron D 1 ⊃ C • , a nested sequence of polyhedrons D 1 ⊃ D 2 ⊃ . . . ⊃ C • that yields eventually either a polyhedron D k ⊂ Ω • or a point p k ∈ Ω \ C. For the former case, C • ⊂ Ω • , hence Ω ⊂ C and the optimal solution is found. For the latter case, C is updated by resetting p = p k and the above procedure is restarted. Let µ(v) = max{ v, p |p ∈ Ω}, we have the following result.
For the latter case, C is updated by resetting p = p k .
We next construct polytope D k+1 by pruning v k from D k . Denote byp k the point where the half line from 0 through holds for all r ∈ C • but not for r = v k . Therefore, the cut

A. INITIAL POLYTOPE
In order to exploit the low rank structure of E(p), the initial polyhedron D 1 is chosen in such a way that D 1 ⊂ L ⊥ , where L ⊥ denotes the orthogonal complement of the linear subspace L. If this can be done, then all D k ⊂ L ⊥ since D k ⊂ D 1 . Thus, the whole procedure will operate in L ⊥ , which will permit considerable computational saving since generally dimL ⊥ = n t n x n y . A polyhedron D 1 satisfying the above requirement can be built as follows: We first construct a full dimensional polytope S 1 in subspace L ⊥ such that S 1 ⊂ C and 0 ∈ riS 1 , where ri(·) denotes the interior of S 1 relative to affS 1 , the affine hull of S 1 . We then let D 1 = (S 1 +L) • . Since 0 ∈ riS 1 and S 1 ⊂ L ⊥ , so 0 ∈ int(S 1 + L). Hence D 1 is compact.

B. THE PA ALGORITHM FOR MINIMIZING E(P)
Based on the above preparation, we can now present the PA [31] algorithm for minimizing E(p), whose pseudo-code is listed in Algorithm 1. Please also refer to Figure 2 for an illustration. Note that in Algorithm 1, the problem can be solved in two steps. First, we solve the following semidefinite problem via solvers such as Sedumi [30]. Then, we solve the following univariant optimization problem via solvers such as f minbnd in Matlab (by minimizing |E(θp k )−E(p)|). Note that the search can be further limited to the range −∞, soθ k will not be a solution of (31). Parallel speedup. Note that in step 1, the linear assignment problem for each new vertex is solved independently and repetitively. Thus, we can achieve speedup by running these problems in parallel. In this work, the parallel implementation is done on a GTX1080Ti graphics card.

C. CHOICE OF THE INITIAL POLYTOPE
It is apparent that if the polytope S 1 is chosen in such a way that it is a good approximation of C (consequently D 1 is a good approximation of C • ), the PA algorithm will require less number of iterations to converge. In [31], S 1 is chosen as a simplex which has few facets and thus is a poor approxmation of C, causing the resulting algorithm to converge slowly. To address this issue, in this work, we choose S 1 as a polytope with more facets, which is constructed as follows.
We first compute the vectors p i ± = arg max ± q i , p |p ∈ Ω , for i = 1, . . . , n t (32) by solving linear assignment problems. The projection of p i j , 1 ≤ i ≤ n t , j ∈ {+, −} onto the subspace L are p i j QQ p i j . Since E is linear w.r.t. the subspace L, so E( p i j ) = E(p i j ). After the above computation, we define the set C according to (24) or (25) by choosing p as the best p i j for 1 ≤ i ≤ n t , j ∈ {+, −}.
We next define α i j to be the positive numbers such that z i j = α i j p i j ∈ ∂C. We now let the polytope S 1 = conv{z i j , i = 1, . . . , n t , j = −, +}, where conv(·) denotes the convex hull of a point set. It is apparent that S 1 is full dimensional in subspace L ⊥ , S 1 ⊂ C and 0 ∈ riS. One Algorithm 1: PA algorithm for minimizing E(p) 1 Initialization: Let p be the best feasible solution so far available. Define the convex set C according to Eq. (24) or (25). Let q 1 , . . . , q nt be the columns of Q (so that L ⊥ is the subspace generated by these vectors). Let D 1 π(D 1 ) be a polytope in space R nt , where π : L ⊥ → R denotes the linear map such that y = For every new t = (t 1 , . . . , t nt ) ∈ V k , solve the linear assignment problem to obtain its optimal value µ(t) and a basic optimal solution p(t). 4 Let t k ∈ arg max{µ(t)|t ∈ V k }. If µ(t k ) ≤ 1 then terminate: p is an optimal solution. 5 If µ(t k ) > 1 and p k p(t k ) / ∈ C, then update the current best feasible solution and the set C by resetting p = p k . 6 Compute in the case that transformation is linear w.r.t. its parameters or in the case that transformation is 3D similarity. Define can see that any facet of S 1 passes through the vertices z 1 j1 , . . . , z nt jn t , where j k ∈ {+, −}. Thus, there are a total of 2 nt facets, which is much more than the r + 1 facets of a simplex. Therefore, S 1 chosen in this way can better approximate C than a simplex.
Denote by D 1 the polar of S 1 + L. Since 0 ∈ riS 1 and S 1 ⊂ L ⊥ , so 0 ∈ int(S 1 + L). Thus D 1 is bounded. Since where j k ∈ {+, −}. Based on this formula, all the vertices of D 1 can be recovered.

D. VERTEX ENUMERATION
In step 7 of the PA algorithm, we need to compute the intersection of a hyperplane with a polytope. This is the vertex enumeration (VE) problem [11], whose definition is as follows:

Let
Without loss of generality, let us assume |V − | ≤ |V + |, where | · | denotes the cardinality of a set. For each t ∈ V − , denote by J (t) the set of constraints of D which are active at t. Because of the way D is constructed, vertex t is nondegenerate, thus, we have |J (t)| = n t and linear independence of the corresponding system of linear equations Moreover, t has n t neighboring vertices in D. That is, n t edges of D are incident with t. Each line through t in the direction of such an edge is the solution set of a system of n t − 1 linear equations which can be obtained from (36) by dropping one equation. The set of new vertices in D ∩ H which are adjacent to t contains the intersection points of these lines with the hyperplane H. Without loss of generality, for simplicity of notation, we assume J (t) = {1, . . . , n t }. Then, for each t ∈ V − , we have to consider the n t systems of n t linear equations which arise when l runs from 1 to n. When a system in (37) has a solution, we have to check whether this solution satisfies the remaining inequalities of D.
Instead of directly solving (37), which is cumbersome, in the following, the simplex pivoting algorithm is employed to solve this problem. It works by introducing slack variables s ∈ R nt + to write the binding inequalities of J(t) in the form and the equation of H in the form One can transform (38) into and transform (39) (by adding to (39) multiples of the rows of (38)) into 0 t + a s = b from which all possible new vertices neighboring t can be obtained by pivoting on all the current nonbasic variables s in the row (41).

E. TERMINATION CRITERION
The termination criterion max i µ i ≤ 1 guarantees that the PA algorithm is globally optimal. Nevertheless, meeting this criterion is generally computationally expensive and often not necessary since satisfactory solutions can often be generated before this condition is met. Based on the above analysis, we therefore relax this criterion into max j µ j ≤ 1 + , where is a preset small positive value. Consequently, our algorithm becomes approximately globally optimal.
Since higher dimensional space of t tends to lead to slower convergence, so instead of directly setting , we let = n t 0 by also taking into account the dimension n t of the space of t and set 0 instead.

VI. EXPERIMENTS
We implement our method under Matlab 2019b and compare it with other methods on a PC with 3.5 GHz CPU and 32G RAM. For the comparison methods which only output point correspondences, the generated correspondences are used to find the best affine transformations between two point sets. We define error as the root mean squared difference between the coordinates of transformed ground truth model inliers and those of their corresponding scene inliers. For our algorithm, we set the parameter 0 = 0.3.

A. 2D SYNTHESIZED DATASETS
We compare our method with RPM-CAV [20] and Go-ICP [14]. Both of them are based on global optimization, can handle partial overlap and allow arbitrary rotation and translation between two point sets, making them good candidates for comparison.
2D similarity and affine transformations are respectively considered for our method. For the former, we have its formu- For the latter, we have its formulation T (    [20] and Go-ICP [14] with different np values (chosen from 1/2 to 1/1 the ground truth value) over 100 random trials for the 2D deformation, noise, outlier and occlusion+outlier tests. Synthetic data is easy to obtain and can be used to test specific aspects of an algorithm. We conduct 4 categories of tests to evaluate different methods' robustness to various types of disturbances. i) Deformation test: increasing degree of non-rigid deformation is applied to the prototype shape to generate the scene point set. Here, the non-rigid deformation is generated by using the Gaussian radial basis function. ii) Noise test: increasing degree of positional noise is applied to the prototype shape to generate the scene point set. Here, the noise follows the normal distribution. iii) Outlier test: normally distributed random outliers are added to different sides of the prototype shape to generate two point sets. iv) Occlusion + Outlier test: First, equal degree of occlusions are applied to the prototype shape to generate two point sets, respectively. We simulate occlusion by first finding the shortest Hamiltonian circle of the prototype point set (by solving a traveling salesman problem) and then retaining a segment of the circle starting at a random point. Then, normally distributed random outliers (outlier to data ratio is fixed to 0.5) are added to different sides of the two point sets so as to simulate outlier disturbance. For all the above tests, disturbances of random rotation and scaling within range [0.5, 1.5] are also added when generating the two point sets. Fig. 4 illustrates these tests.
The matching errors by different methods are presented in Fig. 5. Our method using either transformation performs better than Go-ICP, particularly in the occlusion+outlier test, where there is a large margin between the errors of the proposed method and that of Go-ICP. In terms of different choices of transformations, our method using similarity or affine transformation performs similar to each other. In terms of different choices of n p , our method with n p close to the ground truth performs slightly better. This indicates relatively insensitivity of the proposed method to different choices of n p . The average matching errors across all degrees of disturbance and all types of disturbances are 0.23 for ours (simi, 1/1), 0.36 for ours (simi, 1/2), 0.22 for ours (aff, 1/1), 0.37 for ours (aff, 1/2), 0.14 for RPM-cav (1/1), 0.19 for RPM-cav (1/2), 0.52 for Go-ICP (1/1) and 0.68 for Go-ICP (1/2). Here 1/1 (resp. 1/2) means that n p is chosen as 1/1 (resp. 1/2) the ground truth value.
The average running times (in seconds) by different methods are listed in Table 1. Our method using similarity trans-formation runs quite efficiently across all types of disturbances. In comparison, Go-ICP runs slowly for the outlier test. This demonstrates high efficiency of the proposed method using similarity transformation. In terms of different choices of transformations, our method using affine transformation is slower than our method using similarity transformation, particularly for the outlier and occlusion+outlier tests. This is because affine transformation has more number of parameters, causing our algorithm to converge slowly.

B. 2D POINT SETS EXTRACTED FROM IMAGES
Point sets extracted from images are a more realistic type of data for testing methods. We test different methods on 2D point sets extracted via the Canny edge detector from several images in the Caltech-256 [9] and VOC2007 [7] datasets, as illustrated in Fig. 6. To test a method's ability at handling rotations, model point sets are rotated 180 degree before being matched to scene point sets. The registration results by different methods are presented in Fig. 6. Our method using similarity transformation performs much better than our method using affine transformation. This is because the data set contains significant clutters, causing transformations with high degree of transformation freedom (e.g. affine transformation) to easily lead to unconstrained registration results. Another reason is that for our method, the tolerance error = n t 0 for affine transformation is actually set larger than that of similarity transformation given that 0 is set the same for both types of transformations, whereas n t is large for affine transformation and small for similarity transformation. Go-ICP's performance is between our method using similarity transformation or affine transformation.

C. 3D SYNTHESIZED DATASETS
In this section, We compare our method with Go-ICP [14], FRS [25] and FGR [41], all of which are based on global optimization and allow arbitrary rigid transformations between two point sets, making them good candidates for comparison. We also compare with PRNet [34], which can handle partial overlap. PRNet is pretrained on ModelNet40 [36].
Similar to the experimental setup in Sec. VI-A, we conduct 4 categories of tests to evaluate different methods' robustness to various types of disturbances: i) Deformation test, ii) Noise test, iii) Outlier test and iv) Occlusion + Outlier test. Fig. 7 illustrates these tests.
The matching errors by different methods are presented in Fig. 8. Overall, the proposed method performs much better than other methods. It is insensitive to different degrees of non-rigid deformation and positional noise. It is also less sensitive to different choices of n p than Go-ICP. In comparison, Go-ICP only performs well when the n p value is close to the ground truth. FGR and PRNet perform poorly for all the tests. The reason for FGR's poor performance is that our dataset contains significant non-rigid deformation, to which FGR is quite sensitive. The reason for PRNet's poor performance is that our dataset contains arbitrary rotations between two point sets, of which PRNet is incapable of handling. The average matching errors across all degrees of disturbance and all types of disturbances are 0.11 for our method (1/1), 0.17 for our method (1/2), 0.17 for Go-ICP (1/1), 0.30 for Go-ICP (1/2), 0.17 for FRS, 0.31 for FGR and 0.29 for PRNet.
The average running time (in seconds) by different methods are listed in Table 2. Our method's running time is higher than those of other methods. This is because 3D similarity transformation has high number of parameters, causing our method to converge slowly.

VII. CONCLUSION
We proposed an approximately global optimization-based algorithm for aligning partially overlapping point sets with the property that it is invariant to the corresponding transformation. The method works by reducing the RPM objective to a function of a low dimensional variable and then using the PA algorithm to optimize the objective over its concavity region. Two cases of transformations, transformation is linear w.r.t. its parameters and 3D similarity transformation, are discussed, to deal with 2D/3D registration problems respectively. Experiments on 2D/3D data sets demonstrated better robustness of the proposed method over state-of-the-art algorithms for tasks involving various types of disturbances. It is also efficient when the number of transformation parameters is small.    [14] with different np values (chosen from 1/2 to 1/1 the ground truth value), FRS [25], FGR [41] and PRNet [34] over 100 random trials for the 3D deformation, noise, outlier and occlusion+outlier tests.