Self-supervised Segmentation for Terracotta Warrior Point Cloud (EGG-Net)

At present, our team focuses on the research of cultural relics restoration and fragment splicing. In the research process of terracotta warrior splicing, we find that the existing calibrated fragment data is relatively small, which is not enough for related research. Therefore, we need to calibrate and segment different parts of the intact terracotta warrior data and extract some data that we need to use in the future. However, at present, we are short of human resources. If we want to carry out manual calibration, it will take much time, bringing trouble to our future work. Therefore, we hope to design a method to automatically calibrate the terracotta warrior dataset with a small amount of calibrated data. The existing 3D neural network research mainly focuses on supervised classification, segmentation, and unsupervised reconstruction.We cannot find enough schemes to refer to, and the existing methods do not perform well on our terracotta warrior dataset. Therefore, in this article, we propose EGG-Net to solve this problem. EGG-Net is an end-to-end self-supervised model, and it consists of three modules. The first module is an encoder based on dynamic graph and edge convolution. We can extract point cloud features with this module. The second module, called segmenter, is based on multi-layer perceptron, adding labels to points and segmenting the point cloud. Finally, we designed a point refinement process as the third module. Point refinement can adjust the cluster label estimated by the neural network with superpoint. Our EGG-Net can backpropagate with the third module.We evaluate EGG-Net on the terracotta warrior data and ShapeNet Part by measuring the accuracy and the latency. The experiment result shows that our EGG-Net outperforms the state-of-the-art methods.


I. INTRODUCTION
Nowadays, our team focuses on cultural relics restoration and fragment splicing research. In the terracotta warrior fragment study, the main problem is the lack of calibrated fragment data, and it is not easy to conduct further research without enough calibrated data. However, we are currently short of human resources. If we want to calibrate the terracotta warrior dataset manually, it will take much time, which will bring trouble to our future work. Therefore, we hope to design a method to achieve automatic calibration on the terracotta warrior dataset with a small amount of calibrated data.
In order to improve the efficiency of part labeling on terracotta warrior models, we propose a self-supervised method called EGG-Net for the terracotta warrior dataset, which is based on a convolutional neural network. Our end-to-end model can automatically segment an intact terracotta warrior into different parts: hands, heads, feet, and others, with only a tiny amount of segmentation results. Our EGG-Net can significantly improve efficiency and accuracy compared with the traditional manual and state-of-the-art methods.
Our terracotta warrior dataset is saved in the form of OBJ, a format created by wavefront technologies. OBJ is an open data format, which other 3D graphics application providers widely use. OBJ is a simple data format, which only represents 3D geometry, such as the position of each vertex, the UV position of each texture coordinate vertex, vertex normals, faces, and texture vertices. By default, vertices are stored in counter-clockwise order, so there is no need to declare face normals explicitly. OBJ coordinates have no units, but OBJ files can contain scale information in the form that humans can read.
As to which data format we choose to research, we VOLUME 4, 2016

FIGURE 1. Simplified Terracotta Warrior Point Clouds
compare different data formats. Recently, many pieces of research have focused on how to voxelize the point cloud to make them evenly distributed in regular 3D space and then implement 3D-CNN on them. However, voxelization brings high space and time complexity. Besides, there may be quantization errors in the process of voxelization, which would result in low accuracy. Compared with other data formats, the point cloud is a data structure suitable for the 3D scene calculation of terracotta warrior data. At last, we choose to segment terracotta warrior data in the form of point clouds (see samples in Fig. 1).
We think adding an annotation to different parts of terracotta warrior data contains two steps. The first step is to conduct segmentation on the 3D models. The second step is to add labels to different parts. So we can regard this problem as a segmentation problem. Our terracotta warriors data is in the form of {x n ∈ R p } N n=1 , where R p is the feature space, x n means the features of one point, such as XYZ coordinates and normal vector. N is the number of points in one terracotta warrior 3D object.
Our goal is to design a function f : R p → L, where L means the segmentation mapping labels and {c n ∈ L} N n=1 , so c n is label of each point after the segmentation. Different from our previous work SRG-Net, only few {c n } is fixed and the mapping function f is trainable and the other {c n } changes according to f . In order to solve the self-supervised segmentation problem, we can split the problem into two parts. Firstly, we want to design an algorithm to predict optimal L we need to design a network to extract the features and use the features to segment the point cloud. Secondly, we need to design an appropriate loss calculating model to evaluate the predicted segmentation results and train the network.
In designing neural network, we find that auto-encoder can help extract the global features of the point cloud from the neural network. In addition, we also find that the dynamic graph can learn the local features well. Therefore, Inspired by these two methods, our design of EGG-Net can learn the features of terracotta warrior data better. In Section III, we will describe the design of the network in detail.
In designing loss, we propose a method to evaluate the segmentation results. We think that a good point cloud segmentation should work like human beings. Firstly, the points with similar semantic features are more likely to be classified as the same kind of points. In 2D images, points with similar color and texture are generally considered spatially continuous; In 3D space, we think those points with similar normal vectors, color, and texture will be considered space continuous. In addition, the European distance between the points with the same label should not be very long. To sum up, we think that an excellent segmentation result of {C N } has the following two characteristics: • Points with similar spatial features are desired to be given the same label. • The Euclidean distance between spatially continuous points should not be quite long. Inspired by [1], we combine the segmentation result predicted by EGG-Net with the superpoint in the refining process, in which the superpoint meets the above two requirements. Then we calculate the loss by combining it with the prediction segmentation result predicted by the neural network. Please refer to Section III-B for detail.
In the Section IV, we compare EGG-Net with other methods and show the superiority of our method in visualization and quantification.
To sum up, the critical contributions of our work are summarized as follows: • Inspired by dynamic graph and auto-encoder structure, we propose our EGG-Net to learn local and global features with lower latency and higher accuracy. • We propose a new loss model suitable for the 3D point cloud self-supervised segmentation to obtain more accurate results. • Our end-to-end model can not only achieve good results on terracotta data. We also evaluate our model on the ShapeNet Part dataset and achieve quite good results.

II. RELATED WORK
Segmentation is typical in the 2D image and 3D point cloud processing. In image processing, segmentation completes a task assigning labels to all pixels in an image and clustering them with their features. Similarly, point cloud segmentation assigns labels to all points in the point cloud. The expected result is that points with similar characteristics are given the same label. In image segmentation, K-means is a classical segmentation method in 2D and 3D. It divides N observations into K clusters with the nearest mean, popular in data mining. The graph-based method is another popular method, such as prim and Kruskal [2], which realizes simple greedy decisions in segmentation. The methods above focus on global rather than local differential features, so they can not obtain satisfactory results in complex contexts. Among self-supervised deep learning methods, there are many learning features using the generative methods, such as [3]- [5]. They follow the model of neuroscience, where each neuron represents a specific semantic meaning. Meanwhile, CNN is widely used in supervised and unsupervised image segmentation. For example, in [1], Kanezaki combines the superpixel [6] method and CNN and employs superpixel for backpropagation to tune the unsupervised segmentation results. Besides, [7] uses a spatial continuity loss as an alternative to settle the limitation of the former work [6], whose method is also quite valuable for 3D point cloud feature learning.
In the field of 3D point cloud segmentation, the stateof-the-art 2D image method is not suitable for directly using point cloud. The 3D point cloud segmentation method needs to understand each point's global features and geometric details. We can classify 3D point cloud segmentation problems into semantic segmentation, instance segmentation, and object segmentation. Semantic segmentation focuses on scene-level segmentation instances. Instance segmentation emphasizes object-level segmentation, and object segmentation focuses on partial-level segmentation.
As to semantic segmentation, semantic segmentation aims at separating a point cloud into several parts with the semantic meaning of each point. There are four main semantic segmentation paradigms: projection-based methods, discretization-based methods, point-based methods, and hybrid methods. Projection-based methods always project a 3D point cloud to 2D images, such as multi-view [8], [9], spherical [10], [11]. Discretization-based methods usually project a point cloud into a discrete representation, such as volumetric [12] and sparse permutohedral lattices [13], [14]. Instead of learning a single feature on 3D scans, several methods are trying to learn different parts from 3D scans, such as [13]- [15].
The point-based network can directly learn features on a point cloud and separate them into several parts. Point clouds are irregular, unordered, and unstructured. PointNet [16] can directly learn features from the point cloud and retain the point cloud permutation invariance with a symmetric function like maximum function and summation function. PointNet can learn point-wise features with the combination of several MLP layers and a max-pooling layer. PointNet is a pioneer that directly learns on the point cloud. A series of point-based networks has been proposed based on PointNet. However, PointNet can only learn features on each point instead of the local structure. So PointNet++ is presented to get local structure from the neighborhood with a hierarchy network [17]. PointSIFT [18] is proposed to encode orientation and reach scale awareness. Instead of using Kmeans to cluster and KNN to generate neighborhoods like the grouping method PointNet++, PointWeb [19] is proposed to get the relations between all the points constructed in a local fully-connected web. As to convolution-based method. RS-CNN takes a local point cloud subset as its input and maps the low-level relation to the high-level relation to learn the feature better. PointConv [20] uses the existing algorithm, using a Monte Carlo estimation to define the convolution. PointCNN [21] uses χ − conv transformation to convert the point cloud into a latent and canonical order. As to point convolution methods, Parametric Continuous Convolutional Neural Network(PCCN) [22] is proposed based on parametric continuous convolution layers, whose kernel function is parameterized by MLPs and spans the continuous vector space. Graph-based methods can better learn the features like shapes and geometric structures in point clouds. Graph Attention Convolution(GAC) [23] can learn several relevant features from local neighborhoods by dynamically assigning attention weights to points in different neighborhoods and feature channels. Dynamic Graph CNN(DG-CNN) [24] constructs several dynamic graphs in the neighborhood and concatenates the local and global features to extract better features and update each graph after each layer of the network dynamically. FoldingNet uses the auto-encoder structure to encode the point cloud N × 3 to 1 × 512 and decode it to M × 3 with the aid of chamfer loss to construct the autoencoder network.
Part segmentation is more complex than semantic and instance segmentation because there are significant geometric differences between points with the same label, and the number of parts with the same semantic meaning may differ. Z. Wang et al. [25] propose VoxSegNet to achieve promising part segmentation results on 3D voxelized data, which presents a Spatial Dense Extraction(SDE) module to extract multi-scale features from volumetric data. Synchronized Spectral CNN (SyncSpecCNN) [26] is proposed to achieve fine-grained part segmentation on irregularity and non-isomorphic shape graphs with convolution. [27] is proposed to segment unorganized noisy point clouds automatically by extracting clusters of points on the Gaussian sphere. [28] uses three shape indexes: the smoothness indicator, shape index, and flatness index based on a fuzzy parameterization. [29] presents a segmentation method for conventional engineering objects based on local estimation of various geometric features. Branched AutoEncoder network (BAE-NET) [30] is proposed to perform unsupervised and weakly-supervised 3D shape co-segmentation. Each branch of the network can learn features from a specific part shape for a particular part shape with representation based on the auto-encoder structure.

III. PROPOSED METHOD
In this paper, our input data for the terracotta warrior is in the form of 3D point clouds (see samples in Fig. 1). Point cloud data is represented in the form of 3D points {P i |i = 1, 2, 3...n}, where each point is a vector R n containing coordinates x, y, z and other features like normal, color. Our method contains three steps: 1) If the point cloud only has three-dimensional coordi-VOLUME 4, 2016 nates x, y, z data, we need to estimate the normal vector value with xyz. 2) We use our pointwise CNN called EGG-Net to perform self-supervised segmentation of point clouds. 3) We design a refinement process to calculate loss and use it for back-propagation. There are many effective normal vector estimation methods, such as [31] using integral images for efficient boundary and covariance estimation, [32] [33] [34] [35] Use neural network to estimate. In our method, we tend to use the simplest method [36] because this method has lower time complexity and good accuracy.

A. EGG-NET
Inspired by dynamic graph and auto-encoder, we propose our EGG-Net. Unlike the classical graph CNN, our graph layer is dynamic and auto-updated at every layer of the network. Compared with the methods that only focus on the relationship between points, we also propose an auto-encoder structure to re-express the features of the whole point cloud, aiming at learning the whole structure of the point cloud and learning from a small number of samples. The structure of our network is shown in Figure 2. It consists of two parts. The first part is an encoder that generates features from the dynamic graph and the whole point cloud, and the second part is a decoder segmentation network. We can also call it segmenter.
Next, we will explain the symbols used in the paper. We denote the point cloud as S. We use lower-case letters to represent vectors, such as x, and the upper-case letter to represent matrix, such as A. We call a matrix m × n if it has m rows and n columns. In addition, the terracotta warrior point cloud data is N points with 6 features x, y, z, N x , N y , N z (xyz coordinates and normal values). We also denote each point as x, so X = {x 1 , x 2 , x 3 , ..., x n } ⊆ R 6 .

1) Encoder Architecture
The EGG-Net encoder follows a similar design of [37], the structure of EGG-Net is shown in Fig. 2. Compared with [37], our encoder concatenate several multi-layer perceptrons(MLP) and several dynamic graph-based max-pooling layers. The dynamic graphs are constructed by applying KNN on point clouds. Different from our previous work [38], we removed the STN module because we found this module improved the latency greatly but do not have a great impact on the accuracy of the experiment 1.
We compute a spatial transformer network for the entire point cloud and get a transformer matrix of 3-by-3 to maintain invariance under transformations. Then for the transformed point cloud, we compute three dynamic graphs and get graph features, respectively. In graph feature extracting process, we adopt the Edge Convolution in [4] to compute the graph feature of each layer, which uses an asymmetric edge function in Eq. (1): where it combines the coordinates of neighborhood center x i with the subtraction of neighborhood point and the center point coordinates x i − x j to get local and global information of neighborhood. Then we define our operation in Eq. (2): where µ and ω are parameters and Θ is a ReLU function. Eq. (2) is implemented as a shared MLP with Leaky ReLU. Then we define our max-pooling operation in Eq. (3): where N (i) means neighborhood of point i.
The graph feature extraction layer computes the bottleneck. The structure is shown in Fig. 2. First, we compute the covariance 3 × 3 matrix for every point and vectorize it to 1 × 9. Then the n × 3 matrix of point coordinates is concatenated with the n × 9 covariance matrix into a n × 12 matrix. Then we put the matrix into a 3-layer perceptron. Then we feed the output of the perceptron to two subsequent graph layers. In each layer, max-pooling is added to the neighbor of each node. At last, we apply a 3-layer perceptron to the former output and get the final output. The whole process of the graph feature extraction layer is summarized in Eq. (4): In Eq. (4), X is the input matrix to the graph layer and K is a feature mapping matrix. I max (X) can be represented in Eq. 5: where Θ is a ReLU function and N (i) is the neighborhood of point i. The max-pooling operation in Eq. (5) can get local feature based on the graph structure. So the graph feature extraction layer can not only get local neighborhood features, but also global features.

2) Segmenter Architecture
Segmenter gets dynamic graph features and bottleneck as input and assign labels to each point to segment the whole point cloud. The structure of segmenter is shown in Fig. 2. First, bottleneck is replicated N times in Eq. (6): where N is the number of points in point cloud and B is the bottleneck. The output of replication is concatenated with dynamic features in Eq. (7): where D 1 , D 2 , D 3 represent dynamic graph features. At last, we feed the output of concatenation to a multilayer-perceptron to segment the point cloud in Eq. (8).  where Ψ and Ω represent parameters in the linear function, and Θ represents a ReLU function.

B. REFINEMENT
In order to optimize the results of estimated by neural network and carry out back-propagation calculation, we designed point refinement to achieve better segmentation results.
In this section, we will describe how we train our network for self-supervised point segmentation. We can divide this problem into two sub-problems: 1) Estimate the cluster label using existing network parameters 2) Use the cluster label estimated by the neural network to train the neural network As to the first sub-problem, we can use the auto-encoder network in our Section III-A1 and Section III-A2 to implement the forward process, and the second sub-problem is a backward process based on gradient descent. Next, we will describe the second sub-problem in detail.
We need to calculate the loss of network predicted labels and refine predicted labels in self-supervised segmentation. In the field of point cloud segmentation, we think that the points assigned the same label are spatially continuous (the clusters of image pixels should be spatially continuous in 2D images). Here, to better cluster the point cloud, we add additional restrictions on the points in the neighborhood. First, we use the region growing method to extract K ′ superpoints from the input point cloud. In this article, since our neural network can learn local and global features, we do not need to set K ′ very large in EGG-Net. In order to reduce the time complexity, we choose K ′ = K to calculate the superpoint. The value of K is generally set to the number of segments of the few-shot samples. Then we set all the points in one superpoint with the same label. According to the cluster label estimated by the neural network, we select the most frequent cluster label c max , where |c max | n∈S k ≥ |c n | n∈S k for all c n ∈ 1, ..., q. The cluster labels are replaced by c max for n ∈ S k , which are called refined predicted labels. VOLUME 4, 2016 As to the seed-region-growing method used in superpoint calculating, we will describe in detail below: Unlike 2D images, not all point cloud data has features such as color and normal. For example, our terracotta warrior 3D object does not have any color feature. Normal vectors can be calculated and predicted by point coordinates in III. It is worth noting that there are many similarities and differences between the color feature in 2D and the normal feature in 3D. For the color feature in a 2D image, if pixels are semantically continuous, the color in the neighborhood generally does not change. For 3D point cloud normal features, compared with the color features in 2D images, the normal value of points in the neighborhood of the point cloud often differs. However, even though each point cloud neighborhood has different normal values, they usually do not change much unless they are not semantically continuous. We use seed-region-growing to cluster the point cloud to get the superpoints.
First, we implement KNN to the point cloud to get the nearest neighbors of each point. Then we initialize a random point as the start seed and add to the available points to start the algorithm. Then we choose the first seed from the available list to judge the points in its neighborhood. If the normal value and Euclidean distance are within the threshold we set, we think the two points are semantically continuous, and we can group two points into one cluster. The outline about the SRG is given in Algorithm 1.  N (a, neighbour) ≤ e t h ∧ neighbour ̸ ∈ S then append neighbour to S remove neighbour f rom neighbours else remove neighbour f rom neighbours add a to S After obtaining the network predicted labels and refined predicted labels, we calculate their loss, and then we iterate this process T times to obtain the final prediction of cluster labels r n .
Unlike general supervised learning, when the target labels are fixed, we need to perform batch normalization on each dimension to get reasonable labels r n . In parameter adjustment, we found that setting the learning rate to 0.1 and momentum to 0.9 can get the best results. For the comparison of different parameters, please check our Section IV.

IV. EXPERIMENTS
We do experiments on the terracotta warrior dataset and ShapeNet dataset. We implement the pipeline using PyTorch and Python3.7. All the results are based on experiments under RTX 2080 Ti and i9-9900K. The performances of each method in the experiment are evaluated by the accuracy (mIoU) and the latency.

A. EXPERIMENTS ON TERRACOTTA WARRIOR
We use Artec Eva [39] to collect 500 intact terracotta warrior models, and we take 400 of the 500 models as the training set and 100 as the validation set. Each model consists of about 2 million points, including xyz coordinates, vertical normals, triangle meshes, and RGB data. Before the experiment starts, we eliminate the triangle meshes and RGB data of the original models and remain xyz coordinates and vertical normals. Moreover, we uniformly sample the above point clouds to 10,000 points thus as the experimental inputting. In reality, the terracotta warriors are generally unearthed in the form of limb fragments.
According to the description in [40], we can divide the terracotta warrior 3D model into six parts: head, body, left hand, right hand, left leg, and right leg. In order to implement self-supervised learning better, we calibrate 5˜10 terracotta warriors 3D models. Unlike the supervised problem, our self-supervised method solves two sub-problems: using the existing network parameters to estimate the cluster label and the predicted cluster labels for the training network. The previous sub-problem is solved with section III-A. The latter sub-problem is solved by section III-B.
In order to show that EGG-Net can achieve better results in detail than other self-supervised methods, we use Pointnet, Pointnet2, DG-CNN, and Pointhop++ to replace the neural network encoder in EGG-Net, and maintain the same structure as EGG-Net. In order to show that EGG-Net can achieve more correct results than unsupervised methods, we also compare EGG-Net with our previous unsupervised segmentation method SRG-Net, and the comparison methods of SRG-Net (SRG-DGCNN, SRG-PointNet2, and SRG-PointNet). We select SGD as the optimizer and set lr to 0.005. We set the momentum parameter to 0.1 and set the number of iterations T to 500.
The visualization results are shown in Figure 1. We can find that EGG-Net can get more correct results than unsupervised methods (like SRG-Net, SRG-DGCNN, SRG-Pointnet2). In contrast, the unsupervised methods can not get correct results. Compared with self-supervised methods (PointNet-EGG, PointNet2-EGG, DG-EGG, and PointHop2-EGG), we can find that these methods can correctly segment the point cloud. However, in detail, EGG-Net has more accurate segmentation results. For example, EGG-Net can get a more accurate result in the segmentation of the hands of the terracotta warrior 3.
The quantization results are shown in Table 1. Because the STN module is removed, compared with SRG-Net, EGG-Net reduces the latency by 27.3%, while accuracy improves about 8.2%. Compared with methods of the similar architecture (DG-EGG, PointNet2-EGG, PointHop2-EGG), our method also has quite good performance. Compared with DG-EGG, our solution reduces the latency by 40% and improves the accuracy by 9.8%. The accuracy of our method is also much better than PointNet2-EGG (17.1%) and Pointhop2-EGG (12.7%), and the latency is about 65.8% and 63.1% of each method. In summary, we can draw the following conclusions with the experiment results: 1) Compared with unsupervised methods (such as SRG-Net, SRG-DGCNN, SRG-PointNet, and SRG-PointHop2), our network can obtain more accurate results with less latency. 2) Compared with self-supervised methods with similar structures (such as PointNet-EGG, PointNet2-EGG, DG-EGG, and PointHop2-EGG), our EGG-Net can obtain more refined results. 3) In general, EGG-Net has obvious advantages in accuracy and latency on our terracotta warrior dataset.

B. EXPERIMENTS ON SHAPENET
In this section, we conduct experiments on the ShapeNet Part to evaluate the robustness of the EGG-Net method. ShapeNet part is a consistent, large-scale 3D object dataset annotated with fine-grained, instance level, and hierarchical 3D part information. This dataset consists of 573585 part instances, covering 26671 3D models of 24 object categories. The dataset acts as a catalyst for many tasks, such as shape analysis, dynamic 3D scene modeling, simulation, affordance analysis, etc. ShapeNet established three benchmark tasks for evaluating 3D part recognition: fine-grained semantic segmentation, hierarchical semantic segmentation, and instance segmentation. Among these tasks, ShapeNet Part is always used, for instance segmentation.
The quantitative results of our experiments are shown in Table 4. As shown in Table 4, EGG-Net outperforms all previous models. EGG-Net improves the overall accuracy of DGCNN-EGG by 8.2% and is even larger when compared with PointNet2-EGG and PointNet-EGG. Significantly, our method outperforms DG-EGG on all kinds of categories, increasing 5% accuracy on the knife. Overall, our method achieves better accuracy on ShapeNet compared with other methods. Some visualization results are shown in Fig. 5. As is shown in Fig. 5, EGG-Net achieves quite good results on bag, earphone, faucet, hat, refrigerator, and vase.

C. ABLATION STUDY
In order to show the influence of different modules and epochs in our method, we conduct an ablation study on our terracotta warrior dataset, which is described in Section III-A, which are evaluated by overall accuracy and mIoU.
The influences of different modules. As shown in Table 2, the results of the pipeline without the graph convolution (row 1) show that the network is not working well in learning the topological features of the local neighborhood of the point cloud. The results of the version without edge convolution (row 2) demonstrate that our method without edge convolution will cause the network not to understand the relationship between points well. The third is a pipeline without refinement (row 3), and we calculate loss between labels of this epoch and the previous epoch. The results of the third method reveal that the pipeline cannot set tags reasonably based on point cloud content because the number of unique cluster labels should be adaptive to context. The results show that the refinement operation increases 15.6% on the accuracy of EGG-Net.
The influences of different epochs. To visualize the influence of different epochs, we set the number of epochs to 1000. The segmentation results of different iterations in one terracotta warrior model are shown in Fig. 6. We can find that when the number of iterations reaches 500, the VOLUME 4, 2016

V. CONCLUSION
This paper provides an end-to-end model called EGG-Net for self-supervised learning segmentation on terracotta point clouds. Our idea comes from the process of researching the Terracotta Warrior dataset. The existing calibrated data is insufficient for related research; however, we are currently short of human resources. It will cost much time to perform Ablation Study manual calibration. This will limit future terracotta warrior restoration work. Therefore, we hope there will be a method that can achieve automatic calibration on a large number of terracotta warrior 3D models with a small amount of calibrated data. According to the existing problems, we designed EGG-Net, an end-to-end self-supervised model. Our model contains three sub-modules. The first module is an encoder structure based on dynamic graphs and edge convolutions. We can extract features of our 3D point cloud well with this structure. Appended to the encoder is a segmenter based on a multilayer perceptron. Finally, we designed a point refinement process. We calculate the superpoint with the seed region growing method and adjust the cluster labels calculated by the neural network to carry out backpropagation with this structure.
Finally, we evaluate our method on the terracotta warrior dataset and compare it with the latest and classical methods. The quantitative and visual results show that our EGG-Net has higher accuracy and lower latency. In addition, we also carried out experiments on ShapeNet Part and achieved good results, which shows that our method is robust on the general dataset. We also conducted an ablation study for different modules to show the rationality of the EGG-Net network structure by using different encoders and different refinement methods. Finally, we research the number of iterations, which shows the rationality of our chosen parameters.
Our work still has some limitations. For example, it is not so convenient to deploy our model and use it. We will try our best to solve this problem in the future. We hope our work can be helpful to the research of terracotta warriors in archaeology and other point cloud work of other researchers.

VI. ACKNOWLEDGMENTS
This work is equally and mainly supported by the Na-