Point Cloud Normal Estimation by Fast Guided Least Squares Representation

Normal estimation is an essential task for scanned point clouds in various CAD/CAM applications. The method (GLSRNE) based on guided least squares representation (GLSR) balances speed with quality well among state-of-the-art methods. First, it segments each neighborhood into multiple sub-neighborhoods. For some neighborhoods, the segmentation is obtained by GLSR which is an efficient subspace segmentation model and widely applied in other applications. The segmentation of the rest neighborhoods is inferred via the subspace structure propagation (SSP) algorithm. Then, each sub-neighborhood is fitted by a plane. The plane achieving the minimum distance with the current point is selected for the final normal estimation. We make improvements for effectiveness and efficiency in the following three aspects. First, to improve the speed of GLSR, we propose a novel iterative algorithm to reduce the computation complexity from <inline-formula> <tex-math notation="LaTeX">$O(n^{3})$ </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">$O(n^{2})$ </tex-math></inline-formula> with its convergence guaranteed theoretically, where <inline-formula> <tex-math notation="LaTeX">$n$ </tex-math></inline-formula> represents the number of the data points. Moreover, this proposed algorithm will also be useful for other applications. Second, we add a normal constraint for SSP to improve accuracy. Third, when selecting one plane to estimate the final normal, we consider the match between the plane and all neighbors, whereas GLSRNE only considers the match between the plane and the current point. The experiments exhibit that our method is faster than GLSRNE and more effective than GLSRNE and other state-of-the-art methods.


I. INTRODUCTION
With the recent proliferation of commodity 3D sensors, such as RGB-D cameras and LiDAR, there is an increasing demand for using point clouds to understand the geometry even semantics of the scanned objects and scenes. This raw representation is challenging to handle since it lacks connectivity information or structure. When available, additional local geometric information, such as the surface normals at each point, will induce a partial local structure and benefit many point cloud processing tasks including point based rendering [1], surface reconstruction [2], classification [3], semantic segmentation, and depth prediction [4]. However, fast and reliable normal estimation is still an intriguing challenge since sharp features, noise, and sampling anisotropy are inevitably contained in raw point clouds. There are a considerable amount of works on this topic [5]- [7]. They can The associate editor coordinating the review of this manuscript and approving it for publication was Songwen Pei . be categorized into six types: Delaunay/Voronoi based methods, mollification based methods, regression based methods, statistics based methods, networks based methods, and segmentation based methods.
Delaunay/Voronoi based methods [8]- [10] all employ Voronoi diagram or Delaunay triangulation. This kind of method only performs well for small noise point clouds and none of them can handle point clouds with outliers and sharp features [11]. Mollification based methods [12]- [15] are inspired by the feature preserving image filters. They improve the preliminary normals but fail to recover sharp features if the preliminary normals do not preserve sharp features sufficiently. With the assumption that all neighbors of a point are sampled from one smooth surface, regressionbased methods [16]- [20] use all neighbors to estimate the normal. However, when a point is near sharp features, its neighbors may be sampled from multiple surfaces. Hence, these methods tend to smooth out the sharp features. Ideally, only the neighbors on the same surface with the current point p should contribute to the normal estimation, while neighbors belonging to different surfaces should be discarded. Statistics based methods [11], [21]- [25] introduce robust statistics technologies to reduce the influence of neighbors lying on different surfaces. However, these methods do not address sampling anisotropy [6]. Recently, networks based methods learn the role of each neighbor in the normal estimation by deep networks [26]- [28]. These methods are robust to noise and outliers and can use multiple neighborhood sizes simultaneously. However, the estimated normals of points near sharp features are still a bit smooth and the users have to retrain the network if they want to adapt specifically to the input data. Segmentation based methods [29]- [31] employ the lowrank representation method to segment each neighborhood into several sub-neighborhoods which enclose only the points located on the same surface. Then each sub-neighborhood is fitted by a plane. The normal is estimated by the plane obtaining the minimal distance with the current point. These methods achieve the best results for normal estimation with sharp features and anisotropic sampling, but at the price of the computation speed.
Because the structure of some real-world data can be well approximated by a union of subspaces, the extensions of sparse subspace clustering (SSC) [32] and low-rank representation (LRR) [33] are widely applied in computer graphics and image processing, e.g., the co-segmentation of 3D shapes [34], shape descriptor extracting [35], image segmentation [36], face recognition [37] and so on. However, SSC and LRR can only provide a theoretical guarantee for independent subspaces, i.e., the dimension of the ambient space is equal to the sum of the dimension of each subspace. The arrangement of independent subspaces is a special case of disjoint subspaces. Hence, the methods for disjoint subspaces have great potential to be successfully used in the applications in computer graphics and image processing. Zhang et al. [29] propose guided low-rank representation (GLRR) for normal estimation. It is worth noting that the GLRR can provide a theoretical guarantee for disjoint subspaces with a predefined weight matrix [38]. This may be the main reason that it outperforms the state-of-the-art. Zhang et al. [29] use GLRR to segment each neighborhood into multiple sub-neighborhoods and select one sub-neighborhood to estimate the normal. It handles anisotropic sampling well, obtains faithful normals for the points near sharp features, but suffers from the huge computation burden. To overcome this limitation, Liu et al. [30] propose GLSRNE which provides improvements in the following two aspects: The first is that GLSRNE proposes SSP algorithm. The idea of predicting the segmentation of other neighborhoods by the results of segmented neighborhoods can reduce the number of the segmentation tasks completed by GLRR. This strategy can improve the computation speed. The second is that GLSRNE replaces GLRR with their proposed GLSR to analyze the structure of each neighborhood. Although the segmentation results of GLSR and GLRR are almost the same, the numerical algorithm of GLSR is far faster than that of GLRR. After this work, GLSR is widely applied in other applications [39], [40]. However, GLSR still needs to compute the inverse of some n × n matrices, where n denotes the number of the data points. Hence, the scalability of GLSR is very limited in this big data era.
In this paper, we address the problem of point cloud normal estimation. To overcome the limitation of GLSR, we propose a novel iterative algorithm to solve the model of GLSR with its convergence guaranteed theoretically. Its computation complexity is O(n 2 ) whereas that of the original algorithm for GLSR is O(n 3 ), where n represents the number of the data points. Hence, we called our method as Fast Guided Least Squares Representation (FGLSR). Besides, we also make improvements for normal estimation in the following two aspects. First, to improve the accuracy of SSP, we add a normal constraint for it. Second, when selecting one plane fitted by one sub-neighborhood to estimate the final normal, we consider the match between the plane and all neighbors, whereas GLSRNE only considers the match between the plane and the current point. Extensive experimental results confirm the efficiency and effectiveness of our novel method for normal estimation.

II. ALGORITHM
GLSRNE provides a viable normal estimation framework for point clouds with sharp features. We will briefly go through the general framework and then introduce some improvements to generate a more effective and efficient normal estimation algorithm.

A. GLSRNE
Given a noisy point cloud {p i } N i=1 as input, GLSRNE takes three steps to estimate the normals preserving shape features.
Step 1: Estimating initial normals and detecting candidate feature points. For each point p i , GLSRNE computes an initial normal n 0 i and feature coefficient w i by eigenvalue analysis of the covariance matrix: where N i is the neighborhood of p i with size S andp i = i computed by the classical normal estimation method PCA [16] is the eigenvector corresponding to λ 0 . The coefficient w i = λ 0 /(λ 0 + λ 1 + λ 2 ) measures the confidence of p i close to sharp features. If w i is larger than the threshold w t , 1 p i is regarded as a candidate feature point whose neighborhood may be sampled from several smooth surfaces.
Step 2: Segmenting the neighborhoods of candidate feature points. For each candidate feature point p i , a larger neighborhood N * i with size S * is selected, S * > S. GLSRNE segments N * i into several sub-neighborhoods which enclose only the points located on the same surface. The neighborhoods of candidate feature points are divided into two parts. First, a small part of neighborhoods is segmented by GLSR. Then, by using SSP, the segmentation of the rest neighborhoods is inferred from the results of neighborhoods segmented by GLSR. Neighborhood segmentation by GLSR. First, for a candidate feature point p i , the data matrix X i and guiding matrix i are computed. Specifically, the j-th neighbor point p should have larger values. According to the observation that two neighbors sampled on the same surface will obtain similar initial normals, GLSRNE defines i (j, k) as follows: where τ max is a parameter, 2 < ·, · > represents the inner product of two vectors. To improve the reliability of i , GLSRNE employs two strategies used by [29]. Firstly, the neighborhood segmentation results are stored and used to update the guiding matrix of neighborhoods segmented later. Secondly, when p j i or p k i is a candidate feature point, the value of i (j, k) is decreased. For more details please refer to section 4.3.2 of [29]. Then one optimal representation coefficient matrix is computed by solving GLSR: where X ∈ R d×n is the data matrix with each column corresponding to the data point sampled in a d-dimensional space; is the Hadamard product defined as the element-wise multiplication; Both λ and β are the balancing parameters; W ∈ R n×n is a predefined weight matrix whose elements are all greater than or equal to zero and W(i, j) is expected to be larger when x i and x j are in different categories; Z ∈ R n×n is the representation matrix. Both |Z(i, j)| and |Z(j, i)| can be understood as the similarity between x i and x j . For each candidate feature point p i , X = X i , W W = i , n = S * , d = 6, and the solved representation coefficient matrix is denoted as Z i . Next, the affinity matrix i into several subneighborhoods by applying S i to Normalized Cuts [41]. The number of sub-neighborhoods is determined by the iterative segmentation process described in [29]. 2 It is automatically selected by the distribution of {d(n j , n k )} S * j,k=1 Neighborhood segmentation by SSP. After some neighborhoods segmented by GLSR, GLSRNE designs SSP for inferring the segmentation of the rest neighborhoods. Specifically, GLSRNE stores the neighborhood segmentation results obtained by GLSR. The correlations between neighbors are defined by the segmentation results. If a candidate feature point p i and its neighborhood N * i are well covered by neighborhoods segmented by GLSR, the segmentation of N * i is obtained by the region-growing algorithm with the correlation.
Step 3: Normal estimation. If p i is not a candidate feature point, its neighbors will be sampled from one smooth surface. The normal of p i is defined as the initial normal n 0 i estimated by PCA. If p i is a candidate feature point, N * i is segmented into several sub-neighborhoods. For each sub-neighborhood, a plane is fitted by PCA and the distance between p i and the plane is computed. The normal of p i is defined as the normal of the plane with minimal distance.

B. FAST SOLVER FOR GLSR
Although the original algorithm of GLSR is usually faster than those of most related works, it still requires the computation of the inverse of some n×n matrices, whose computation complexity is O(n 3 ). The explosion of the data motivates us to provide a novel algorithm to reduce the computation burden.
The original algorithm of GLSR computes the representation matrix Z column-wise. To find a novel algorithm, we try a different way of updating Z row-wise. In the following, z i and x i denote the i-th row of Z and i-th column of X, respectively. Hence, Denote z k i as the result of z i in k-th iteration. We adopt the strategy of alternatively computing each z i by fixing the others, where i is from 1 to n. In this way, when computing z k+1 i , we have already updated z j , j = 1, · · · , i−1. Employing Eq. (5), we formulate X − XZ as It can be reformulated as where Hence, the problem about z i is where w i is the i-th row of W and The closed-form solution of the model (9) can be obtained by setting the derivative as zero: Hence, the result of z k+1 where ÷ denotes the element-wise division, 1 is the vector of all ones. According to the above analysis, we summary our novel algorithm, named FGLSR, in Algorithm 1.
It can be found that the highest computation burden in each step of Algorithm 1 is that an n-dimensional vector multiplies with an n × n matrix, whose computation complexity is only O(n 2 ). Whereas that of GLSR is O(n 3 ). In addition, we also provide a theoretical analysis of FGLSR in detail.
Theorem 1: Denote Z k as the result of Z in k-th iteration, we can get Theorem 1 implies that FGLSR can at least converge to a local optimum of the problem (4). Because the model of GLSR is convex, the local optimum will be the global optimal solution. This result means that FGLSR can converge to the global optimal solution to problem (4). In our experience, by initializing Z as λ(λX T X + (1 + β)I) −1 X T X, Algorithm 1 usually converges very fast. FGLSR is far faster than the GLSR, especially when the number of data points is very  large. Next, we will perform a simple experiment on synthetic data to confirm this. We randomly generate n data points from three lines in R 2 on average. The slopes of these lines are tan( π 6 ), tan( 5π 6 ), and tan( 3π 2 ), respectively. In this situation, these three subspaces are disjoint. Hence, W is constructed as a binary matrix according to the theorem about disjoint subspaces in [38]. We compare the average computation time in seconds by repeating the experiment ten times to each choice of n. These experiments are all performed on a computer with an Intel Core i7-9700k CPU and 16-GB memory. The results are reported in Table 1 by varying n from 600 to 3000. It can be found that FGLSR is far faster than GLSR. As n increases, the gap becomes wider and wider gradually. We also report the difference of the representation matrices computed by FGLSR and GLSR in these experiments. The measure is the infinity norm • ∞ , i.e., the maximal of the absolute value. The results are shown in Table 2. It can be observed that the average difference is usually below 0.01, meaning that FGLSR converges well. This synthetic data experiment demonstrates our improvement in computation speed.

C. NORMAL CONSTRAINED SSP
After some neighborhoods segmented by GLSR, GLSRNE uses SSP to propagate the segmentation to the rest neighborhoods. To improve accuracy, we add normal constraint in the propagation.
First, we want to use GLSR to segment the neighborhoods which cover the candidate feature points well and are easy to obtain faithful results. The point p i with the larger w i is usually near a complex structure which makes the segmentation challenging. On the contrary, the point with the smaller w i may be far away from the sharp features and can not cover the candidate feature points well. Therefore, we rank the candidate feature points by w d i = |w i − w ave |, where w ave is the average of feature coefficients of all the candidate feature points. We select k = N f * r neighborhoods of points with smaller w d i and segment them by GLSR, where r ∈ [0, 1] is a parameter and N f is the number of candidate feature points. The selected k points are denoted as P T = {p t 1 , p t 2 , · · · , p t k }.
Then, given one of the rest candidate feature points p i and its neighborhood N * i , we need to determine whether p i and N * i are well covered by the neighborhoods segmented by GLSR. Specifically, we select a smaller neighborhood N s i VOLUME 8, 2020 with size S s . If N s i ∩ P T = ∅, we consider p i and N * i are not well covered and segment N * i by GLSR. Otherwise, we segment N * i by normal constrained SSP. Specifically, we use two matrices R and N to store the segmentation results of GLSR. The values of R(j, k) and N(j, k) represent the number of p j i and p k i grouped into the same surface and into different surfaces, respectively. The correlations between neighbors are defined by the segmentation results: , where σ = π/4. Then, the point p l i with the smallest feature coefficient is chosen as seed point and the point set representing the points on the same surface with p l i is initialized by The next step is iteratively adding one point to the set N 1 i . At each iteration, we chose the point which obtains the largest relation with N 1 i . The relation between point p and set N 1 i is defined as: This process is terminated until Next, we repeat the process for the rest of the neighbors where γ k i is the density weight defined as the average distance from p k i to its k = 10 nearest neighbors, d(

III. RESULTS
To evaluate the performance of the proposed normal estimation method, we test our method on a variety of point clouds with sharp features. We compare our method with some classic and state-of-the-art methods: PCA [16], robust normal estimation (RNE) [11], robust randomized hough transform (RRHT) [5], convolutional neural networks based on hough transform HTCNNs [26], GLSRNE [30], and pair consistency voting (PCV) [6]. The precision and speed are evaluated on the models perturbed by centered Gaussian noise with the deviation defined as 30%, 40%, 50%, and 60% average distance between points. In this section, all experiments have been performed with an Intel Core i7-9700k CPU and 16-GB memory. The parameters S, S * , S s , r, λ, and β are the same as GLSRNE: S = 70, S * = 120, S s = 30, r = 0.1, λ = 1 and β = 4. We use the same neighborhood size S * for PCA, RNE, and RRHT. HFCNNs considers five neighborhood sizes (32, 64, 128, 256, 512) simultaneously. All other parameters, if any, are set to default. We compare the various algorithms by the Root Mean Square measure with threshold (RMS τ ) which has been used in [5], [29]. The definition of RMS τ is: where f (n p,ref n p,est ) = n p,ref n p,est , ifn p,ref n p,est < τ, π/2, otherwise n p,ref and n p,est are the ground truth and estimated normals at p, respectively. As proposed by [5], [29], we take τ = 10 degrees. The algorithm with lower RMS τ and less run time is desired.
A. COMPARISON ON SYNTHESIZED DATA Fig. 1 shows the RMS τ of PCA, RNE, RRHT, HTCNNs, LRSGNE, PCV, and our method on Cylinder(60K), Fandisk(77K), Mechpart(92K), Octahedron(65K), and Tetrahedron(60K). The visualization of computed normals is shown in Fig. 2. For PCA, the normal estimation results are overly smooth on/near sharp features and it achieves higher RMS τ than other methods. The results of RNE are slightly better than PCA, but the sharp features are still smoothed out. The performance of RRHT and HTCNNs is comparable. HTCNNs estimates normals by using multiple neighborhood sizes simultaneously. Hence, as the deviation of Gaussian noise increases, the RMS τ of HTCNNs changes gently (see Fig. 1 f). But it fails to correctly estimate normals near sharp features (see Fig. 2). RRHT does somewhat better for points near the sharp features, see the second row in Fig. 2, but when the noise is large or the neighborhood structure is complex, the results of RRHT are still smooth. GLSRNE and PCV preserve sharp features better, but in the vicinity of sharp features, some erroneous normals may persist, as shown in Fig. 2. Our method obtains the lowest RMS τ and preserves shape features better in the presence of noise. Since PCA can not preserve sharp features, there are a large number of points near sharp features. RNE is devised to preserve sharp features, but it does not address the variation of density near edges. Therefore RNE is severely affected by non-uniform sampling. The bad points on the upper/lower face significantly outnumber those on the side face. Both the sampling strategy designed by RRHT and the segmentation method used by GLSRNE are not sensitive to density variation. Therefore they perform better than PCA and RNE. However, near the sharp features, there are still some erroneous normals. For the visualization of bad points shown in Fig. 3, the results of PCV and our method are comparable and more precise than the other methods. But our method achieves the minimal NBP and the lowest RMS τ .

C. COMPUTATION TIMES
We report the speed of PCA, RNE, RRHT, HTCNNs, GLSRNE, and our method in Tab 3. PCA is the fastest,  even with the MATLAB implementation, but at the price of a higher RMS τ . The methods with C/C++ implementation (including RNE, RRHT, and HTCNNs) are fast, but the normals obtained by them are far worse than those of GLSRNE, PCV, and our method. The speed of our method is acceptable and faster than GLSRNE and PCV. A parallel C++ implementation could increase our performance remarkably.

D. RESULTS OF REAL DATA
We further demonstrate the effectiveness of our method on more real scanned point clouds in which the typical imperfections, such as noise, outliers, and sampling anisotropy, are common and the sharp features are usually corrupted by these imperfections. Fig. 5 shows the results of our method on the raw scans: Genus2, Taichi, Nyon, and Shutter. The edges of these raw scans are preserved well. Detail 1 of Genus2 shows the results of nearby surface sheets which always challenge the normal estimation. Our method recovers the features faithfully. Sampling anisotropy is ubiquitous, see detail 2 of Genus2, detail 1 and detail 3 of Shutter, and detail 1 of Taichi and Nyon. Our method generates faithful normals for these points. Tiny structures, as shown in detail 2 of Shutter, challenge the normal estimation persistently. Our method copes with such case and recovers the features.

IV. CONCLUSION
In this paper, we propose a novel method for estimating faithful normals for point clouds with multiple types of noise, sharp features, and anisotropic samplings. Our method follows the framework of GLRSNE with three improvements making it more efficient and effective. Although our proposed method generates quality normals, it produces jagged features on sparse point clouds, similar to some existing methods. For future work, it will be interesting to try global labeling techniques to smooth out these jagged feature lines. Furthermore, an adaptive neighborhood or multiple neighborhood sizes would help to improve accuracy and reduce the number of parameters. At last, GLSR is widely applied in other applications [39], [40]. We have shown that as the number of data points increases, the time gap between FGLSR and GLSR becomes wider and wider gradually. Therefore, it would be interesting to apply FGLSR to more computer vision and computer graphics applications associated with large data sets.

A. PROOF OF THE CONVERGENCE
Proof: According to Algorithm 1, z k+1 n is the optimal solution for Hence, We can get According to the Eq. (18) and Eq. (21), we can obtain Using similar deduction, we can get and λ X − X Z k  XIUPING LIU received the B.Sc. degree from Jilin University, Changchun, China, in 1990, and the Ph.D. degree from the Dalian University of Technology, Dalian, China, in 1999. From 1999 to 2001, she conducted Research as a Postdoctoral Scholar with the School of Mathematics, Sun Yat-sen University, Guangzhou, China. She is currently a Professor with the Dalian University of Technology. She is the author of two books and more than 70 articles. Her research interests include geometric processing, deep learning, and shape modeling and analyzing.