3D Large-Scale Point Cloud Semantic Segmentation Using Optimal Feature Description Vector Network: OFDV-Net

Efficient semantic segmentation of large-scale 3D point clouds is a fundamental and essential capability for real-time intelligent systems, such as autonomous driving and augmented reality. The high dimension feature vector and the complex network structure are two major constraints to utilize the large-scale point cloud. This paper proposes an optimal feature description vector network (OFDV-Net) for 3D point cloud semantic segmentation. First, a multiscale point cloud feature extraction structure is constructed to generate an initial feature description vector (IFDV). Then, IFDV is selected by a feature selection unit to obtain the optimal feature description vector (OFDV). The OFDV encapsulates the best 3D features set of the points and can be used as the input of the deep neural network for training and testing. Finally, the OFDV-Net was applied to the standard public outdoor large-scale point cloud datasets Semantic3D and NPM3D, and the overall segmentation accuracy of 88.3% and 87.7% were obtained, respectively; moreover, the OFDV-Net requires less training time, which indicates that the algorithm can obtain high-precision semantic segmentation results on an outdoor large-scale point cloud while reducing model training time.


I. INTRODUCTION
With the development of 3D sensors, such as lidar and stereo cameras, point cloud acquisition is increasingly easier and more convenient. Efficient and accurate large-scare point cloud semantic segmentation is particularly important. and has already become a substantial topic in many areas, such as geomatics spatial information technology, navigation positioning, computer vision, and pattern recognition [1]. In the point cloud, points are independent and lack contextual information, making it difficult to infer semantic information directly. A lot of traditional point cloud semantic segmentation methods firstly calculated the geometric features [2] of the point cloud based on the neighborhood [3], then used machine learning algorithms (random forest, support vector The associate editor coordinating the review of this manuscript and approving it for publication was Gerardo Di Martino . machine) to classify the point cloud [4]. These methods achieve good point cloud semantic segmentation results, but their precision needs to be improved.
In recent years, with the rapid development of deep learning, many attempts have been made to apply deep learning to point cloud semantic segmentation, but there are numerous challenges to overcome. The scattered properties of 3D point clouds and the lack of clear structure akin to the regular grid arrangement in images means that deep learning cannot be applicable to point cloud segmentation directly. Previous approaches mainly transform 3D point cloud into 2D images and regular voxel grids. However, converting the point clouds into 2D formats results in the loss of information. Voxelization often discards small details and leads to heavy calculation and low efficiency due to the sparsity of point clouds. PointNet [5] was the first work to directly process 3D point clouds. However, this work failed to capture the VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ local features of point clouds. To overcome this drawback, PointNet++ [6] was proposed. Inspired by this algorithm, many attempts have emerged to apply deep learning to point cloud semantic segmentation. However, existing point clouds semantic segmentation methods do not make full use of common features in point cloud processing such as curvature and roughness. If some semantic information with advanced prior knowledge can be extracted prior to neural network training, a better segmentation result could be obtained. Inspired by this idea, we proposed a large-scale 3D point cloud semantic segmentation network using optimal feature description vector network (OFDV-Net). The network consists of three units: First, in the initial feature description vector (IFDV) preprocess unit, we extracted a series of features by setting different neighborhood sizes after random sampling and constructed the IFDV. Then, in the feature selection unit, the IFDV was optimized to the OFDV. Finally, in the semantic segmentation unit, the OFDV is input into the fully connected neural network to obtain the semantic segmentation results. Our network structure is very simple but can achieve a high precision semantic segmentation result while reducing model training time.
The main contributions of this study are listed as follows: • We designed a new network OFDV-Net, which can fully incorporate traditional feature extraction methods, feature selection methods and deep learning methods for highly accurate point cloud semantic segmentation while reducing model training time. We first extracted the initial features from 3D points which are usually cannot be 'seen' directly by neural networks, and optimized the initial features into the optimal features based on feature selection methods. The OFDV was input into the fully connected neural network to obtain the semantic segmentation results.
• We constructed the optimal feature description vector (OFDV) by optimizing each feature vector in the initial feature description vector (IFDV). Our OFDV encapsulates the best 3D features set of the points and can be used as the input of the deep neural network for training and testing, which can be a guideline in similar work.

II. RELATED WORK
Traditional point cloud semantic segmentation methods are based on the features extracted from local neighborhoods, and use machine learning methods to perform point cloud semantic segmentation. The precision of this scheme mainly depends on the quality of the extracted features. Therefore, a feature set with rich characterization capabilities is needed to describe the geometric figures in a point neighborhood.
Demantke et al. [4] first proposed features based on the local covariance matrix for point cloud semantic segmentation. Weinamann et al. [9] added descriptors such as verticality and height distribution on this basis. Chehata et al. [10] realized the airborne point cloud semantic segmentation by using random forests with 21-dimensional features. Kragh et al. [11] used the support vector machine model to perform point cloud semantic segmentation with the extracted 13-dimensional features. Thomas et al. [12] extracted multi-scale point cloud features by setting different neighborhood sizes and used random forest classifiers to achieve point cloud segmentation, and demonstrated that the multi-scale spherical neighborhood of 3D point clouds can significantly improve the precision of 3D scene semantic segmentation.
With the development of deep learning, there are an increasing number of attempts being made to apply deep learning to point cloud semantic segmentation. According to the different point cloud conversion forms, we divided these approaches into three groups. (1) Multiview-based approaches: Inspired by 2D images, Su et al. [13] proposed an image-based multiview convolutional neural network (MVCNN), which projected 3D point clouds into 2D images under multiple perspectives, and then used 2D convolution to process the classification and segmentation task. Boulch et al. [14] proposed the SnapNet network, which uses a 2D neural network to process pairs of RGB images and depth images to construct 3D scene images. (2) Voxelizationbased approaches: Wu et al. [15] converted the 3D point cloud into voxel representations and extended 2D convolution to 3D convolution to handle the classification and segmentation tasks. Graham [16] designed a sparse convolution network and applied it to the 3D point cloud segmentation task. Tchapmi et al. [17] subdivided the large-scale point cloud into voxel grids and used trilinear interpolation and conditional random field for post-processing to obtain the semantic segmentation results. Liu et al. [18] developed a 3D object classification system using a broad learning system (BLS) with a feature extractor called VB-Net.
(3) Point-based approaches: Qi et al. [5] pioneered the PointNet, a deep learning network that directly processes unstructured point cloud data to obtain classification and segmentation results. However, this work failed to capture the local features of point clouds. To solve this drawback, Qi et al. proposed an upgraded version, PointNet++ [6], by improving the original network framework. PointNet++ added a local area division module based on PointNet, and recycled PointNet to extract local features. Li et al. [19] proposed a convolutional network PointCNN suitable for point cloud models, which used X-conv operator to extract permutation invariant features, input them into a 2D convolutional network, and finally obtained classification results through full connection and a classifier. Landrieu and Simonovsky [20] divided the input point cloud into geometrically simple shapes called super points, and constructed a graph neural network called super-point graph (SPG) to deal with the large-scale segmentation problem. Liu et al. [21] proposed a simple yet effective Point Contex Encoding (PointCE) module to capture semantic contexts features of a point cloud. Their PointCE module can be integrated into any point cloud segmentation network to improve its segmentation precision with only marginal extra overheads.
The above work has made positive contributions to point cloud classification and segmentation based on deep learning, inspired more scholars to invest time in this field, and achieved good classification and segmentation precision [19], [22]. However, the network structures of these methods are generally more complicated, and the basic feature information of the point cloud is not fully utilized, so a longer training time is required. This paper intends to establish an efficient large-scale 3D point cloud semantic segmentation network-OFDV-Net-based on the optimal feature description vector. The network generates the optimal feature description vector, so that the neural network has advanced semantic information before training, which can reduce the training time of the model and increase the efficiency of point cloud semantic segmentation.

III. METHODOLOGY
In this section, we mainly describe the details of our proposed OFDV-Net framework. First, we summarize the architecture of our framework; then the IFDV preprocessing unit and feature selection unit are introduced. Finally, a 6-layer fully connected neural network is employed to perform point cloud semantic segmentation.

A. OVERVIWE OF THE PROPOSED FRAMEWORK
Our proposed framework includes three stages: IFDV preprocessing unit, feature selection unit, and semantic segmentation unit. As shown in Figure 1, we first divide the raw input point cloud P r into a sparse point cloud P s and construct a multi-scale spherical neighborhood to extract the neighborhood features F n . We concatenate F n and raw point features as the initial feature description vector (IFDV). Next, we design a two-step feature selection unit to convert IFDV to the optimal feature description vector (OFDV). Lastly, a 6layer fully connected neural network is constructed to train the input OFDV and perform high-precision large-scale point cloud semantic segmentation.
The reason for designing the IFDV preprocess unit and feature selection unit is the limitation of point-based deep learning methods. The feature vectors of point-based framework, e.g. PointNet and PointWise CNN are composed of XYZ coordinates, RGB color, and XYZ normalized coordinates. They didn't make full use of curvature, roughness and other commonly features used in point cloud processing. If some semantic information with advanced prior knowledge can be extracted prior to neural network training, a better segmentation result could be obtained. Thus, we design the IFDV preprocess unit to extract the neighborhood feature using different neighborhood sizes, and optimize IFDV to optimal feature description vector (OFDV) through the feature selection unit. The OFDV encapsulates the best 3D features set of the points and can be used as the input of the deep neural network for training and testing.

B. IFDV PREPROCESS UNIT 1) RANDOM SAMPLING OF POINT CLOUD
Considering the huge amount of data in outdoor large-scale 3D point cloud, the raw point cloud is sampled first. Hu et al. [23] analyzed the time and memory consumption of the existing point sampling approaches in outdoor large-scale point clouds and demonstrated that random sampling is better than others in terms of time and memory consumption. Inspired by this idea, we selected the random sampling approach to process the raw point cloud. The comparison results of the point cloud data volume before and after sampling are shown in Figure 2. The large-scale point cloud is significantly subsampled by three orders of magnitude, but basic features can still be retained.

2) MUTLI-SCALE FEATURE CALCULATION
Multi-scale point cloud feature calculation is mainly divided into two steps: Neighborhood selection and feature extraction. Neighborhood selection can be achieved by limiting the number of neighboring points (KNN [9]) or restricting the range of neighborhoods (spherical neighborhood [24]). The KNN neighborhood and spherical neighborhood construction forms under different point cloud densities are shown in Figure 3. Compared with the KNN neighborhood, the spherical neighborhood is relatively less affected by the point cloud density. Thus, we adopted the spherical neighborhood for neighborhood building. Let C ⊂ R be a point cloud, the spherical neighborhood of point ϑ r (p 0 , C) is defined by: where r ⊂ R is the radius of the spherical neighborhood. During feature extraction, if the search value of the spherical neighborhood radius r is too small, the number of contained points in the neighborhood is too small, so the calculated features are not discriminative. If the r value is too large, neighborhood calculation efficiency will be particularly low, and the calculated features lose their clear descriptive meaning. Considering this, we selected two surface features and six covariance matrix features with r values of 0.05 m, 0.1 m, 0.25 m, 0.5 m, and 1 m. The surface features mainly include roughness (R) and gaussian curvature (Gau). Roughness represents the ratio of the surface area to its projected area projection in a certain area. Gaussian curvature is defined by the ratio of the first basic invariant CN − M 2 of the surface to the second basic invariant EG − F 2 of the surface, reflecting the degree of curvature of the surface.  The covariance matrix can be used as the structure tensor of the 3D point cloud. It can reflect the spatial distribution characteristics of the point set and effectively distinguish the urban features [23]. Let P ⊂ R be a point cloud, the covariance matrix of P is defined by: where P is the centroid of the neighborhood, N . By calculating the covariance matrix of the point cloud in the neighborhood, the object eigenvalues λ 1 , λ 2 , λ 3 ∈ R (λ 1 > λ 2 > λ 3 ) and the corresponding eigenvectors e 1 , e 2 , e 3 ∈ R can be obtained. From the eigenvalues, we can compute six covariance matrix features: omnivariance (O), planarity (P), linearity (L), surface variation (Sur), sphericity (Sp), and verticality (Ve). The detailed calculation formulas are shown in Table 1. We show the example visualizations of the 8features calculation results when the radius r of the spherical neighborhood is 1 m in Figure 4.

C. FEATURES SELECTION UNIT
In this section, we introduce the feature selection unit, which performs two-step screening on the IFDV obtained by the IFDV preprocessing unit to obtain the OFDV. The specific algorithm information flow is shown in Figure 5. We first 226288 VOLUME 8, 2020 FIGURE 5. The overall information flow of feature selection unit. The input IFDV is screened in the first step to obtain IFDV subset T2, and T2 is then screened in the second step to obtain OFDV. N is 500.
used the XGBoost algorithm to rank the feature importance of IFDV, and removed the features with an F-Score lower than N to obtain the IFDV subset T2. Then, we used the sequential backward selection (SBS) algorithm to iteratively fit the random forest classifier which had adjusted the hyperparameters by IFDV. We removed the last feature from the rest of feature set each time. When the evaluation function value reaches the best value, the feature set at this time is the optimal feature description vector (OFDV).

1) XGBOOST IMPORTANCE RANKING INITIAL SCREENING
XGBoost [25] (Extreme Gradient Boosting) algorithm is a boosting algorithm for classifying regression tree models which is coming from the gradient lifting decision tree.
Comparing with the gradient lifting decision tree that only use first-order derivative information in the optimization process, XGBoost performs a second-order Taylor expansion on the cost function and uses the first-order derivative and the second-order derivative at the same time, so that XGBoost has good results. The XGBoost algorithm mainly has the following characteristics: (1) it has good anti-overfitting characteristics; (2) it has high calculation efficiency. In addition, the XGBoost algorithm can count the F-score importance of each feature variable, and a feature importance ranking is obtained based on this. The F-score of the feature i is defined by: wherex is the average value, j is the category, and k is the sample.
In the feature screening unit, we first used the XGBoost algorithm to rank the importance of IFDV, and removed the features with an F-Score lower than 500 to obtain the IFDV subset T2, as shown in Figure 6.

2) SEQUENCE BACKWARD SELECTION ITERATIVE FITTING RANDOM FOREST
First, we took the IFDV as input, used the random forest classifier of the scikit-learn library to perform point cloud   segmentation, and performed cross-validation with the following five hyper-parameters: Number of parameter base classifiers (n_estimators), maximum depth (max_depth), minimum number of samples for internal node division (min_samples_split), minimum number of leaf nodes (min_samples_leaf), and maximum number of features (max_features). The hyperparameter adjustment results are shown in Table 2. The hyperparameters were used for subsequent random forest model training.
The sequential backward selection (SBS) algorithm involves the removal one feature from the full feature set each time, so that after removing the feature, the evaluation function value reaches the optimal value [26]. In order to ensure reliability and stability of the model, this paper used 11 stations of the 15 station training data published by the Semantic3D dataset [27] as the training dataset, and the remaining 4 stations as the validation dataset. We set 3 stations of the 4 station training data published by NPM3D dataset [28] as the training dataset, and used the remaining station as the validation dataset and the specific distribution of each dataset is shown in Table 3. According to the feature importance ranking based on XGBoost, we used the SBS to iteratively fit the random forest classifier which had adjusted the hyperparameters, and finally obtained the OFDV, as shown in Figure 7.

D. SEMANTIC SEGMENTATION UNIT
We took the OFDV obtained after feature selection unit as the input and constructed a 6-layer fully connected layer neural network to perform point cloud semantic segmentation, as shown in Figure 8. The loss function of the neural network model adopts the cross-entropy loss function, and is defined by: where N is the total number of training samples, and K is the number of categories to be classified. If the category of the training sample i is k, y 1] is the probability that training sample i is category k.
To evaluate the results, we used the evaluation indicators given by Semantic3D and NPM3D official websites, namely, average intersection over union (A_IOU) and overall accuracy (OA). The metrics IoU and OA are computed by: where c is an L × L confusion matrix of the semantic segmentation method and c ij is the number of samples from the ground-truth class i that are predicted as class j.

IV. METERIALS AND EXPERIMENTS A. DATASETS AND CONFIGURATION
To verify the effectiveness and robustness of the OFDV-Net proposed in this paper, as shown in Table 4, we used two large-scale standard public point cloud datasets (Seman-tic3D [27] and NPM3D [28]) to evaluate the proposed approach. The ground-truth visualizations of the files are shown in Figure 9. All experiments were implemented with Python 3.6, TensorFlow platform, and the hardware environment was an Intel Xeon W-2145@ 3.70GHz, 64 G RAM, GPU NVIDIA TITAN RTX 24G workstation. The Semantic3D benchmark dataset is composed of dense point clouds obtained by a static ground laser scanner. We used 11 of the 15 stations with ground-truth training data as the training dataset, and 4 stations as the validation dataset to adjust the parameters of our model; then, we tested the online test data of 4 stations without ground-truth to obtain the predicted labels, and uploaded the labels to the Smeantic3D website to get the segmentation precision. The Smeantic3D dataset consists of 8 types of features, and the distribution of each type is shown in Table 5.
The NPM3D benchmark dataset is generated by a mobile laser system that accurately scans two different cities in France (Paris and Lille). We used 3 of the 4 stations with ground-truth as the training data, and 1 station as the validation dataset to adjust the parameters of our proposed model. We then tested the online test data of 3 stations without ground-truth to obtain the predicted labels and uploaded the labels to the NPM3D website to obtain the segmentation precision. The NPM3D dataset consists of 9 types of features, and the specific distribution of each type is shown in Table 6.

B. EXPERIMENTS AND ANALYSIS
First, we constructed a multi-scale spherical neighborhood for the Semantic3D dataset and the NPM3D dataset, calculated 8 features under 5 scale radii, and added the basic coordinate information X, Y, Z, the color information R, G, B, and the intensity information. A total of 47-dimensional features constituted the IFDV. Then we sorted IFDV based on the XGBoost importance, and the feature importance score is shown in Figure 10.
It can be seen from Figure 10 that the F score of surface variation (107) under the r value of 1 m in the NPM3D dataset is nearly four times lower than that of the surface   variation under 0.25 m (515), and nearly 19 times lower than the linearity score under 1 m (1971), indicating the importance of different features at different scales has significant difference. According to the F-scores in Figure 10, we eliminated features with F-scores below 500 to construct the IFDV subset T2, and then sequential backward selection-random forest (SBS-RF) was used to fine-screen T2. The relationship between the correct rate and the number of features in the feature set is shown in Figure 11. It can be seen that in the Semantic3D dataset, when the number of features reaches 22, the precision at this time is the highest (83.80%) and when the number of features reaches 17 in the NPM3D dataset, the precision is the highest (80.21%). This feature subset is carried forward as the OFDV.
The specific feature information contained in IFDV, T2 and OFDV was analyzed and compared, among which IFDV was the raw feature set containing 8 features under 5 scale radii. The feature types and scale information in T2 and OFDV are shown in Table 6.
For the Semantic3D dataset, it can be seen that in the T2 initially screened by the XGBoost classifier in the first step, all features with a spherical radius of 1 m were retained, and the verticality at each scale was also retained. In the OFDV, the verticality at each scale was retained, and only the surface variation under the 1 m scale radius was eliminated. It can be concluded that the verticality feature has a high degree of importance in this dataset. Therefore, the importance of verticality was enhanced in OFDV, that is, the value of verticality is multiplied by two. For the NPM3D dataset, in the T2 initially screened by the XGBoost classifier in the first step, all the features under the 0.05 m scale radius were eliminated. The main reason for this is that the NPM3D dataset is sparser than the Semantic3D point cloud. Therefore, most features under the 0.05 m scale radius cannot be calculated. Compared with other features, the linearity under the four scale radii are all retained, so the importance of linearity is enhanced in the OFDV, that is, the value of linearity is multiplied by two.
To compare the performance of the models trained on different feature subsets, the IFDVs, T2, and OFDV were used as the input of the neural network which is constructed in 3.C section. After this, the trained model was tested on the test dataset. Finally, the test dataset labels were uploaded to the official websites of the two benchmark datasets to get the final results. The results of the Semantic3D dataset are shown in Table 8. The visualization of the segmentation results of OFDV-Net in the Semantic3D dataset is shown in Figure 12.
According to Table 8, after going through the feature screening module, the training accuracy is significantly improved, and the final OFDV accuracy is the highest. The main reasons for this are as follows: (1) The optimal features obtained after feature selection can better characterize real scene compared to the initial features. In Figure 5, we can see that not all features have the recognition ability. Furthermore, the optimal features obtained through feature selection have been proven to have strong characterization capabilities in the feature selection unit. (2) If some semantic information with advanced prior knowledge can be extracted prior to neural network training, a better segmentation result could be obtained. It is inevitable that optimal features can achieve higher segmentation results due to its better prior knowledge.
Comparing the metrics of different features, it can be seen that the method in this paper has better semantic segmentation effects on three types of features: ''man-made terrain'', ''high vegetation'', and ''building''. The segmentation accuracy for these three feature types was above 94%, and the segmentation accuracy of high vegetation reached 97.74%. The main reasons for this are as follows: (1) The elevation information and intensity information have high recognition ability for these three types of features. Furthermore, the multi-scale and multi-features calculated by using multi-scale spherical neighborhoods further enhanced the model's ability to recognize these three types of features. (2) The number of these three types of features is relatively large compared to other types of features. Thus, in the process of neural network model training, more parameters can be obtained to distinguish these three types of features. In addition, the misclassification of the same number of point clouds has little effect on the accuracy of these three types of features. The method in this paper has a poor classification effect for ''artefacts'', mainly because the training samples of artefacts were too small, so the difference between them and other features could not be learned well. However, although ''low vegetation'' and ''hardscape'' have a sufficient number of training samples compared to artefacts, the method in this paper also has a poor segmentation effect for these two types of features, mainly due to their high similarity in the multiscale and multi-features of the two types of features with the ''natural terrain'' and ''building'' types. Low vegetation and hardscape were often mistakenly divided into natural terrain and building, which reduced the classification's precision.
To verify the efficiency of the method proposed in this paper, the time required to complete the training of the three feature sets was counted. As shown in Table 9, the training time of IFDV was about 80 minutes, and the OFDV obtained by feature screening can complete the training in about 50 minutes, which is 37% more efficient than IFDV.
We systematically evaluate the overall efficiency of our OFDV-Net on real-world large-scale point clouds for semantic segmentation. Particularly, we evaluate OFDV-Net on Semantic3D-reduced dataset, obtaining the total time consumption of our framework. We also evaluate the time consumption of recent representative works on the same dataset. Note that, all experiments are conducted on the same machine  with an Intel Xeon W-2145@ 3.70GHz CPU and an NVIDIA TITAN RTX 24G GPU. The calculated consumption is shown in Table 10. Table 10 quantitatively shows the preprocessing time, training time and total time of different approaches. It can be seen that: (1) SPG has an expensive preprocess cost due to its geometrical partitioning and super-graph construction; (2) RandLA-Net takes the lowest time to process the point cloud, but it is still very time costing in training. (3) Our OFDV-Net has the relatively lowest training time (3097 seconds on the semantic3D-reduced dataset).
To compare the segmentation results of OFDV-Net with others on the Semantic3D dataset, the semantic segmentation results of this IFDV, the IFDV Subset, and the OFDV are marked as IFDV-Net, IFDVSub-Net, and OFDV-Net respectively. The comparison results are shown in Table 11 and show that the method in this paper has the highest overall segmentation accuracy and average intersection over union (IoU).
As shown in Table 12, the overall accuracy and average IoU of OFDV-Net on NPM3D dataset achieved 87.7% and 47.5%, respectively. The visualization results for the NPM3D dataset are shown in Figure 12.

V. CONCLUSION
In this paper, we proposed a semantic segmentation network OFDV-Net for a large-scale 3D point cloud. Our OFDV-Net consists of three important components. In the first unit, the initial feature description vector is constructed by setting different neighborhood sizes after random sampling, the IFDV encapsulates the multiscale neighbor features of each point. Then, in the second unit, the IFDV is optimized to the OFDV through a two-step feature selection method. The proposed method allows us to create the optimal features of each points. Finally, in the semantic segmentation unit, the OFDV is input into the fully connected neural network to obtain the semantic segmentation results. Experimental results from two public outdoor large-scale point cloud datasets (Semantic3D and NPM3D) show that our OFDV-Net can obtain better semantic segmentation results. The overall accuracy for the Semantic3D and NPM3D datasets was 88.3% and 87.7%, and the average intersection over union was 57.3% and 47.5%, respectively. Compared with the existing deep learning networks, the overall accuracy was 2.1% higher, and the accuracy was 1.7% higher than that before feature screening. The training time was shortened by 37%.
However, there are also many problems that need to be solved in our network. Although the method in this paper can obtain high-precision segmentation results while reducing model training time, the IFDV preprocess unit computation is large due to the multiscale spherical neighborhood query, and it consumes more computation in the multi features computation. In the future, it is worthwhile to consider how to accelerate the preprocess time of our proposed network. In addition, the convolutional layer is not well suited for deep feature extraction in our segmentation process unit, so optimizing the neural network structure to achieve better semantic segmentation results is a suitable direction of further research.
JIAN LI received the Ph.D. degree in photogrammetry and remote sensing from the School of Remote Sensing and Information Engineering, Wuhan University, China, in 2012. He is currently an Associate Professor with the School of the Geo-Science and Technology, Zhengzhou University. He is also the Director of the Geographic Information System Department. His research interests include 3D point cloud data processing and application, UAV remote sensing application research, artificial intelligence, and deep learning application in geoscience research. VOLUME