Data-Driven Morphological Feature Perception of Single Neuron With Graph Neural Network

Clarifying the morphological characteristics of neurons can promote the understanding of brain function. However, traditional morphometrics fail to capture the modeling of each point in reconstructed neurons, leading to limited ability to distinguish massive nerve fibers and restricted application scenarios. To address these challenges, we propose MorphoGNN, a single neuron morphological embedding based on a graph neural network in this study. MorphoGNN learns the point-level structure information of reconstructed nerve fibers by considering their nearest neighbors on each hidden layer. This enables MorphoGNN to capture the lower-dimensional representation of a single neuron through an end-to-end model. In order to meet the requirements of various tasks, both supervised and self-supervised training strategies are designed to learn the characteristics that fit artificial semantics or the morphological patterns of neurons, respectively. We quantitatively compare our embeddings with other features in neuron classification and retrieval tasks and demonstrate cutting-edge performance. Additionally, we introduce our embeddings to the task of reconstruction quality classification and neuron clustering, where they can help detect reconstruction errors and obtain similar subtyping results to existing work. Furthermore, our method can be handily combined with other modal features, such as microscopic image features and traditional morphometrics. Ablation and robustness tests are also conducted to analyze the impact of several network components and low-quality reconstructed neurons on the performance of our method. The code is available at https://github.com/fun0515/MorphoGNN.


I. INTRODUCTION
T HE rapid advances in brain-wide optical microscopy [1], [2], [3] have enabled researchers to obtain huge amounts of brain slice images, and a single mouse brain can generate several terabytes of data [4].Digital tracing methods [5], [6] can then be used to reconstruct the whole structure of a single neuron from brain-wide three-dimensional images.As neuronal reconstructions are non-Euclidean data, they are typically represented in the form of SWC files [7].These files contain a set of three-dimensional points connected in the brain space, which describe the tree-like morphological structure of the neurons.Although no two neurons have exactly the same morphology, neurons with similar structural characteristics often possess similar functions.Therefore, extracting the morphological and structural features of neurons plays a crucial role in the study of brain function [8], [9], [10].Dozens of morphometrics [11], [12] have been proposed to extract neuronal morphological features, including the total neuron fiber length, number of sections, branch order, radial distance, and density maps.. Classifiers can use these measurements to categorize neurons, but they are challenged by massive neuron data.Each measurement only describes a certain local morphological characteristic of neurons.In order to completely describe the structure of neurons, as many parameters as possible are often comprehensively used.Recently, researchers have focused on the more complex morphological features beyond these metrics.For example, the topological morphology descriptor [13] combines the topology of neuronal fibers with their spatial location, enabling it to distinguish neuron trees and randomly generated tree structures.However, designing morphological characteristics based on the experience of neuroscientists is not only cumbersome, but also incomplete in the use of morphological information.This deficiency leads to the limitation of identification ability.Thus, learnable methods have been proposed to capture neuron features at an advanced level.Cellular morphology neural networks [14] can convert reconstructed neurons into lower-dimensional vector representations.Neurons are described by a series of two-dimensional images from multiple viewpoints, and the representing features are extracted from the images by convolutional neural networks.This method uses image features to indirectly represent neuron morphology, which may introduce additional interference information from the visualization stage.MorphVAE [15] samples walks from neuron structures and learns the hidden representation at the branch-level through a variational seq2seq-autoencoder.This learning method directly processes the topological structures of neurons, but lacks the use of structural information between points.Approaches that learn neuron morphology at the point-level remain to be studied.Moreover, there is also a lack of quantitative comparison of multiple morphological features under the same framework.
We have noticed that the reconstructed neuron data presents a similar structure to the point cloud, and several point cloud-based processing and analysis methods [16], [17] have been proposed in recent years.Multi-view convolutional neural networks [18] have demonstrated that a collection of two-dimensional views can accurately describe three-dimensional objects.Graph neural networks(GNN) [19], [20] capture the dependency relationship in the graph by information transfer between nodes, and they have also been introduced into the field of point cloud processing.Wang et al. [21] dynamically updates the graph structure between layers to learn the lower-dimensional features of the point set and applied it to classification and segmentation tasks.These point cloud methods provide learning techniques for three-dimensional spatial points, which lay the foundation for extracting point-level morphological features of neurons, thus fully exploiting the neural structure information.
In this study, we propose MorphoGNN to learn the morphologic representation of a single neuron.The main contributions are as follows: • To our best knowledge, we propose a comprehensive framework that employs the GNN model to directly learn the morphologic information of a single neuron for the first time, covering the stages of pretreatment, analysis, utilization and evaluation.Through a data-driven approach, we capture the point-level geometric characters of neurons, leading to improved performance on a variety of morphological tasks.
• We also design both supervised and self-supervised training strategies to obtain the morphological features that suitable for multiple tasks.For diverse neural data, we incorporate dense connection, double-pooling operator, and a joint loss function into this network architecture.Their effects are analyzed through ablation experiments, and network robustness is also studied to clarify the impact of low-quality neurons on model performance.
• Our method is quantitatively assessed on neuron classification and retrieval tasks, demonstrating superior performance compared to traditional morphometrics and other baseline methods.Specifically, our method achieves an accuracy of 75.5% in judging the quality of reconstructed samples.By utilizing our extracted features, we are able to obtain neuron cluster results similar to those reported in previous studies.Experimental results also indicate that our morphological embedding is highly compatible and easily combinable with other features.The basic principle, formula derivation, as well as network architecture of MorphoGNN are expounded in section III.Section IV introduces the experiment datasets and processing details, and presents the results of several morphological applications.Ablation study and network robustness are discussed in section IV-H.Finally, the limitation of our method and the possible follow-up work are discussed in section V.

A. Morphometrics
Brain-wide and high-resolution optical microscopes [1], [2] enable the acquisition of complete morphological images of neuronal axons and dendrites, providing a rich source of morphometrics for describing neuronal topology.Along with fundamental morphometrics such as total neuron fiber length, number of sections, and density maps [11], [12], recent morphological studies have concentrated on more refined and intricate features.For example, Gillet et al. [22], [23] encode the branches of axons and dendrites of nerve fibers, and calculate the similarity between trees by comparing the corresponding relationships of branches.Kanari et al. [13] and Sizemore et al. [24] describe neuronal topology based on persistent homology and fiber evolution in brain space, respectively.Le Gao et al. [25] propose a method for measuring the similarity between two neurons based on the distance that each point of one neuron maps to another in the other neuron.While this method cannot be used as a morphological description of individual neurons, it can be used to measure differences between projection patterns of paired neurons.Laturnus et al. [12] construct a binary classifier based on logical regression and PCA preprocessing, systematically comparing the effects of density maps, persistent images, morphological statistics, and morphological distribution.Typically, these morphometrics focus on specific characteristics of neuron morphology, and researchers often combine them based on their experiences to study a particular task.The development of a method capable of automatically designing morphological features suitable for application could greatly assist neuroscience research.Additionally, the comprehensive and systematic comparison of morphological representations is also an important area for future research.

B. Learning Based Morphometrics
Learning based methods have made significant contributions to the biomedical fields, including brain region segmentation [26], [27], medical image registration [28], [29] and neuron reconstruction [30], [31].However, these successful applications are based on regular data, such as medical images.The irregularity of neurons presents a challenge for applying deep learning techniques to morphological data, resulting in less work on morphological features based on learning methods.Inspired by [18], Schubert et al. [14] use a set of images to describe a section of nerve fibers They analyzed the images through a two-dimensional convolutional neural network to extract morphological features indirectly.This learning-based method has an advantage over traditional morphometrics, as neuroscientists no longer need to consider feature design, but rather focus on the tasks they want to perform.The network automatically learns appropriate features according to the set targets.Schubert's method is tested for a series of automatic tasks such as glia detection on electron microscope data.However, the multi-view based method is vulnerable to interference during the visualization stage of neurons, which can affect the learning of morphological information.MorphVAE [15] operates on walks within the tree structure of a neuron and it can generate new morphologies by sampling new walks from the latent space.This approach processes reconstructed neurons at the branch-level.However, the fixed walking length and quantity may limit its flexibility in processing complex morphologies.Currently, research is needed to develop an approach that directly learns the morphological characteristics of a single neuron at the point-level.

III. METHOD
The overall framework of our proposed morphological embedding for a single neuron is shown in Fig. 1.After normalization and farthest point sampling(see in section IV-B), the morphology of each neuron can be represented as a point cloud P = p i ∈ R 3 = {x i , y i , z i } |i = 1, 2, . . ., n , x i , y i , and z i represent the three-dimensional coordinates of these nodes, respectively.Then a graph neural network is employed to generate morphological features of reconstructed neurons.To increase the applicability of our network, we offer both supervised and self-supervised learning strategies: (i) fitting existing classification labels or (ii) decoding neuron shapes from hidden features.This chapter details our methodology, including the principles behind learning point-level morphological features, the refined network structure, and the joint loss function employed.

A. GNN Encoder
Inspired by [21] and [16], we design a graph neural network that learns the geometric relationship between reconstructed points and then directly generates the representation embedding of the neuronal fibers, instead of manually designed morphometrics.
The problem focused in this paper can be described in (1): where f n represent a embedding function.The function is supposed to capture the global features h of the reconstructed irregular neural fiber P.
Specifically, we update the graphs dynamically by the Knearest neighbors(KNN) method and continuously learn node features for the higher dimensions.Node features at different hidden layers are linked to avoid vanishing-gradient.Then a double-pooling operator is utilized to capture the global features of neuronal fibers.The following are detailed explanations of key elements of the proposed method.

B. Local Graph Update
The points in cloud form a graph with the nearest K points, then new features of the central node are learnt from its neighbor nodes.As node features are updated, the graphs are dynamically recomputed between layers.A local graph G l composed of vertices V and edges E at layer l can be defined as in (2): where C and K represent the dimension of node features and the number of neighbors, respectively.We define the geometric relationship between center node and neighbor nodes as edges, which can be calculated by ( 3): where e i is the directed edge from one of the neighbor nodes p i to the updating node p c .

C. Local Feature Update
The initial feature of each node for a single neuron is a three-dimensional vector (x i , y i , z i ), and the local feature is updated to a higher dimension via the EdgeConv [21] layers.For a local graph, the local feature of the updating point p c can be generated by ( 4): where and are trainable parameters, while g(e i ) represents the hidden vector of the edge e i from the neighbor node p i to p c .The new local feature s ′ c can be obtained through a max-pooling operation for all edge vectors.Since e i is combined of the old local feature s c , initially coordinates, and the morphologic information of the neighborhood s i − s c , s ′ c can be considered as the morphological features of neurons in another representation.

D. Double-pooling Operator
The dimension C increases rapidly over several updating iterations on the node features S. To improve the local feature utilization, the global feature of neurons is computed by two kinds of pooling operators for node features, which could be described as shown in (5): where h and N denotes the features and the number of points of a single neuron, respectively.And || represents the concatenation operator.Max-pooling captures the prominent feature, while average-pooling preserves the overall feature strength.Their outputs are concatenated so the final shape of h is 1 × 2C.

E. Encoder Network Architecture
The network architecture of the encoder is depicted in Fig. 2. We begin by increasing the dimension of neuron node features from 3 to 256 through four EdgeConv layers.These features are then aggregate into the global features of nerve fibers using a double-pooling operator.To reduce vanishing-gradient and enhance feature utilization, node features at different levels are densely connected.The input of each EdgeConv layer, except for the first, is derived from the outputs of each previous EdgeConv layer.To obtain rich global features, we pass the output of the last EdgeConv layer through a maximum-pooling layer and an average-pooling layer, respectively.Their outputs are then concatenated to form the final feature vector of nerve fibers.

F. Supervised Classifier
To fit different category labels, MorphoGNN can independently learn neuron features suitable for different tasks, such as morphological classification or quality classification, instead of manually designing morphometrics by the experience of neuroscientists.For classification, we aim to find a function f c that maps the features into a probability distribution P d as defined in (6): In some cases, such as quality classification in section IV-F, we need to jointly classify with the corresponding microscopic image features, which can be described in (7): where h ′ denotes image features learned by other methods.
In addition to the cross-entropy function commonly applied in classification tasks, we also use triplet loss [32] to deal with complex and diverse neurons.Considering the morphological embedding h i of a single neuron, h p and h n are defined to represent the embeddings of the same and different categories in the training batch, respectively.So triplet loss L t can be calculated by (8): where d is a function to calculate the euclidean distance between two embeddings.And margin is a constant that always greater than 0, making d(h i , h p ) smaller and d(h i , h n ) larger.As described in (8), triplet loss shortens the distance of similar morphological embeddings and lengthens the distance of heterogeneous morphological embeddings, so as to identify neurons with minor inter-class differences.The total Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I INFORMATION OF THE 7-CLASS NEURONAL MORPHOLOGICAL DATASET
criterion L can be expressed as the weighted sum of two loss functions as below ( 9): where L c represent the cross-entropy loss function, w 1 and w 2 are the weights of the two loss functions.

G. Self-Supervised Decoder
Certain neural tasks, such as unsupervised neuron clustering, necessitate unsupervised training.And in some cases, acquiring precise neuron labels can prove to be challenging.In such situations, MorphoGNN can adopt a self-supervised approach to learn the morphological feature of a single neuron.We achieve self-supervised learning by decoding the spatial structure of neurons from their hidden features h, and then computing the reconstruction log-likelihood [33] with input neurons P. The reconstruction loss L r econ can be described in (10): where E φ and D ψ are the encoder and decoder of Mor-phoGNN.And φ and ψ represent the trainable parameters of encoder and decoder, respectively.The network architecture of the decoder follows the flow-based model [34], [35], [36], composed of the modules of an autoregressive layer, an invertible 1×1 convolution layer, and an actnorm layer.

IV. EXPERIMENT
This section outlines the data and training process for the proposed MorphoGNN and presents the model's quantitative performance, as well as that of other baseline models, across various tasks.We also provide a detailed report on the results of the network ablation and robustness tests in this section.

A. Data Description
Three datasets are utilized to evaluate our model's performance: a 7-class morphological dataset, a dataset comprising 6357 long-range mouse neurons, and a neuron reconstruction blocks dataset.These datasets are employed to assess morphological classification and retrieval, clustering, and reconstruction quality classification, respectively.
The 7-class dataset contains 1393 reconstructed neurons collected from NeuroMorpho.Org [37] and we denote this dataset as Neuro7.These neurons are labeled with their morphological category, so they can be used for supervised learning.However, the raw morphological data is non-standardized due to the various sources of the contributors.As shown in Table I, we selected seven representative types of mouse nerve cells, each of which results from multiple research groups.
In addition to Neuro7 dataset, we also utilize an open-source dataset consisting of 6,357 long-range neurons mapped from the mouse prefrontal cortex, which have been previously identified into three large subtypes and sixty-four small groups based on their projection pattern [25].In this paper, we attempt to classify these same neurons using our proposed morphological embedding and compare our results with existing methods.
Moreover, we introduce a reconstructed neuron dataset consisting of microscopic images collected from a collaborative platform, where neural segments were manually reconstructed by researchers from three-dimensional optical microscopic image blocks.The reconstructions were subsequently verified by other researchers to ensure their accuracy.This dataset consists of 3,000 paired samples, with an equal proportion of correct and incorrect samples.Each neuron segment is comprised of 1024 points, corresponding to a 64 × 64x64 image block.These three datasets are randomly split into training and validation sets at a 7 : 3 ratio.

B. Morphological Data Preprocessing
Prior to network training, morphological data undergoes a two-step preprocessing procedure to transform them into regular point clouds.
Farthest point sampling is used to address the issue of inconsistent point numbers per sample and to accelerate training by enabling minibatches of differently sized neurons to be grouped together.We iteratively select the farthest point from the existing set of sampling points to obtain neurons of any size.Additionally, to ensure that graphs used for training contain the same number of points, we fill in some small reconstructed neurons with zero points (0, 0, 0) until they reach the same size as the other neurons.
Normalization is another necessary step in preprocessing neuronal morphological data.The absolute location of neurons in the brain space can lead to a wide difference between the coordinates of the reconstructed points.Furthermore, there is no unified standard for the brain spatial location of neurons that have not been uniformly registered.The following formula ( 11) is used to normalize the neuron point coordinates: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply., where v max and v min are the maximum and minimum initial coordinate values v i over a single neuron, respectively.And V represents the normalized coordinates set.we identify the smallest unit cube that can surround the original neuron and use it as the new origin for normalization.Fig. 3 shows that the normalized neuron retains its original shape, with its center serving as the spatial origin, and that the coordinate value of each point falls within the range (−1, 1).

C. Implementation Details
Here we explain the experiments setup, especially the training process of our network.For supervised training, we employ the Adam optimizer [51] with a learning rate of 10 −3 and momentum of 0.9 to train the entire network.The network is trained for 300 epochs, with a batch size of 16, and every 20 epochs, and the learning rate is decreased by half every 20 epochs.The cross-entropy and triplet loss are used jointly, with a weight ratio of 1 : 1.For self-supervised training, we set the initial learning rate of the Adam optimizer to 2×10 −3 and reduce it by half after every 500 epochs.The network is trained for 3500 epochs, using a batch size of 32.All experiments are conducted on a single NVIDIA RTX 5000 GPU with 16GB of graphics memory, as well as four NVIDIA Tesla V100 graphics cards, each with 16GB of memory.

D. Morphological Classification
We first evaluate the performance of our proposed features and traditional morphometrics on the 7-class neuron dataset.Sixteen basic neuronal morphometrics [52] are used to form the feature vector of each neuron, including total length, number of tips, number of sections, number of segments, number of bifurcation points, maximum branch order, maximum neurite length, maximum radial distance, and maximum/median section tortuosity, as well as maximum/median/minimal section length and maximum/median/minimal remote bifurcation angle.These measurements are then used for neuron classification via a simple multi-layer perceptron network with the same settings as other methods.We trained MorphVAE via supervised learning on our dataset to achieve its best performance.Additionally, since reconstructed neurons are transformed into standard point clouds after preprocessing, we include three recent point cloud networks for comparison.Table II shows the overall accuracy(OA) and mean class accuracy(MA) of these methods.
Our network achieves the highest overall accuracy(85.58%)and mean class accuracy(79.45%),which are about 2.5% and 6.8% higher than those of PointNet++.The mean class accuracy of each method is significantly lower than the overall accuracy due to the less-obvious differences between classes and imbalanced class populations.Interestingly, the performance of morphometrics is close to, and even surpassed, some learning-based features, including PointNet++ and DGCNN.This could be attributed to the loss of some morphological information, such as the absolute size of neurons, in the preprocessing step, which limits the performance of learningbased methods.However, this drawback can be overcome by combining them with morphometrics.When our embedding and 16 morphological statistics are concatenated for classification, the overall accuracy and mean class accuracy are further improved by 5.8% and 6.6%, respectively.We use UMAP [53] to visualize the features extracted by these methods(see in Fig. 4), where each point represented the feature of a single neuron after dimensionality reduction.The features of the same class extracted by our method are close to each other and far away from other types.However, some morphologically similar categories, such as spiny and aspiny, are still not clearly distinguished.

E. Retrieval on Morphological Dataset
Current Large neuronal morphological databases [37], [54] typically support search based on meta information such as contributors and species.However, relying on such meta-information can result in a loss of valuable morphological details.Here we demonstrate the effectiveness of direct query based on embedding similarity between neurons.
These networks trained in section IV-D are used previously to extract features from each neuron to build the feature databases.Fig. 7(c) shows the time required to build the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.database using different features.PointNet++ model with complex multi-scale learning takes the longest time to build a feature database, approximately 25 times longer than PointNet.The time consumption of traditional morphometrics is the second longest, approximately three times that of to DGCNN.To only measure the performance of features, we retrieve similar neurons by simply calculating the cosine similarity between paired features.The accuracy of each retrieval operation is defined as the proportion of the same types among the ten most similar neurons retrieved.We searched each neuron in the Neuro7 dataset in turn, recorded the overall accuracy, and mean class accuracy.Fig. 7(a) shows the five most similar neuron fibers retrieved from the same neuron fiber by the six methods.The retrieval results of other methods are mixed with various kinds of neurons, while our method retrieved five neurons of the same class.As shown in Fig. 7(b), PointNet++ and MorphoGNN achieve the best retrieval performance.The retrieval performance of traditional morphometrics is between that of DGCNN and MorphVAE.Among them, our method achieves the highest overall accuracy(84.19%)and mean class accuracy(77.41%),respectively.

F. Reconstruction Quality Classification
Our research investigates the potential of combining our morphological embedding method with optical microscopic images in neuron reconstruction.Quality control has always been an important part of neuron reconstruction, traditionally achieved through multi-person collaboration where one team reconstructs nerve fibers and another team evaluates the accuracy of the reconstruction.While effective, this approach is labor-intensive and relies heavily on human resources.To reduce costs, we explore the possibility of automated methods that can assist in certain steps of the process.To this end, we leverage a collaborative reconstruction platform (see in section IV-A) that provides labeled data for training and testing our proposed method.Fig. 5 shows some data samples used in the reconstruction quality classification.One criterion to judge the quality is whether the reconstructed nerve fibers match the signals in microscopic images.
Fig. 6 displays the quality classification framework, which comprises two channels: MorphoGNN and 3D U-Net [55].These channels model the morphology of neurons and their corresponding microscopic images, respectively.We concatenate the features of these two modal data for quality classification.Table III summarizes the results of the reconstruction quality classification, divided into two types: using  only neuronal channel and using both When only the morphology of neurons is modeled, the precision of the network for correct samples is low(60.95%),while the recall rate is high(90.27%).This indicates that the network tends to judge all samples as correct.However, when both channels are combined, the judgment results of positive and negative samples are relatively balanced, resulting in an average precision of 75.57%.This method can automatically judge simple samples and identify uncertain ones that require manual labor for further assessment.Overall, our proposed method shows promise in reducing the labor required for neuron reconstruction and providing accurate judgments of reconstruction quality.

G. Neuron Clustering
Morphological subtyping is a top-down process.With the refinement of criteria, neurons with the similar anatomical characteristics can be clustered into the same subtype.Identifying neuron subtypes advances the understanding of brain structure and function, so it has always been a hot spot in neuroscience research [56], [57].The difference between neuron clustering and neuron classification is that the former is an unsupervised task, where the subtypes of neurons are not known in advance.Current neuron clustering mainly relies on the manual features designed by neuroscientists that reflect the fields they are interested in.For instance, [25] mainly considers the information of brain spatial location and morphology to calculate the pairwise similarity of 6357 longrange neurons reconstructed in the mouse prefrontal cortex.These individual neurons are precisely identified into three large subtypes, including intratelencephalic neurons(IT), corticothalamic nuerons(CT) and pyramidal tract neurons(PT), as well as 64 small groups further.Here we compare the results of the three major subtypes of these prefrontal cortical neurons hierarchical clustered based on our morphological embeddings with [25], using the evaluation metrics of intersection of union(IoU) and accuracy.IoU describes the coincidence between subtypes.When the existing cluster results are taken as absolutely correct references, accuracy can also be calculated.MorphoGNN is trained through self-supervised learning(see in section III-G).We construct the objective function by decoding the three-dimensional structure of neurons from hidden representations.Fig. 8(a) illustrates the reconstruction performance of neurons seen and unseen in the training stage.Our hidden representations can restore an approximate structure of neurons, which is hard to achieve by traditional morphometrics.However, there is still a gap between reconstructed structures and real neurons, and designing special constraint terms to make the structure skeleton may improve this defect.
As shown in Fig. 8(c), the mean IoU and accuracy of two cluster results are only 91.15% and 96.70% when our morphological embeddings are used for hierarchical clustering.This is because the two methods focus on different areas.Our model focuses on the relative morphology of neurons, lacking the information on absolute size and brain spatial location, which are deeply concerned in [25].Hence we introduce some additional information to strengthen our features.Specifically, we weighted concatenate several simple morphometrics, including soma location, radial distance, and total length, to complement this shortcoming(see in Fig. 8(b)).The influence of weight is shown in Fig. 8(d).After simple morphometrics are jointed, the mean IoU and accuracy increase to 96.49% and 98.62%, respectively(see in Fig. 8(c)).This obvious increase shows that our morphological embedding is an open architecture, which is easy to combine with other features.When more abundant additional information is added, the cluster results may be further improved.Overall, our study demonstrates the potential of using morphological embeddings for neuron clustering and the importance of incorporating additional information to improve the accuracy of the clustering results.

H. Ablation and Robustness Test
We further investigate the effect of several components specially designed in our network on the final performance.Table IV shows the classification performance of multiple versions of MorphoGNN on the Neuro7 dataset, including dense connection, double-pooling operation and triplet loss.
Dense connection greatly impacts on network performance, improving overall accuracy from 82.47% to 85.58% and mean class accuracy from 74.07% to 79.45%.The doublepooling operation also plays a notable role.Compared with it, no matter which single pooling operation is adopted, the mean class accuracy can be reduced by up to 2.8%.Moreover, maximum-pooling performs better than average-pooling when capturing global features.The triplet loss function brings an In the biomedical domain, unsupervised learning has become increasingly important.Therefore, we further investigate the effects of self-supervised training proposed in this study.Specifically, we add self-supervised training to the morphological classification task in section IV-D and compare it with the original classification accuracy to determine whether self-supervised training could improve the performance of supervised classification tasks.Table V shows that the added self-supervised training slightly improves the overall accuracy of PointNet (1.00%) and DGCNN (1.36%), but at the cost of about 20 million more network parameters.However, the overall accuracy of MorphoGNN decreased by 1.21% after adding self-supervised training.To clarify this issue, we conduct further ablation experiments on the loss functions of MorphoGNN.Table VI illustrates that the reconstruction loss of self-supervised training conflicts with the triplet loss.While triplet loss and reconstruction loss can each improve the overall accuracy by 4.09%, when used together, the overall accuracy is only improved by 2.99%.These results indicate that self-supervised training does not always improve the performance of supervised tasks, and combining them requires careful research.
Due to inevitable defects in imaging and reconstruction, incomplete and inaccurate morphological data are often obtained.Discarding all these samples is costly, so algorithms are expected to be robust to low-quality reconstructed neurons.In this section, we also discuss the performance of those deep networks mentioned in this paper against two attacks: point dropping and point-wise Gaussian noise.The former randomly drops points in a single neuron to simulate incomplete reconstruction, while the latter adds point-wise noise to make the reconstructed shape deviate from reality.The neurons in the testset of Neuro7 are attacked on varying degrees, and the classification accuracy of these pretrained networks is recorded.The abscissa of (a) is the ratio of discarded points, while that of (b) is the variance of Gaussian noise.Fig. 9 shows the overall accuracy and mean class accuracy of deep networks against different degrees of point dropping and Gaussian noise attacks.The classification accuracy of all networks shows a downward trend, indicating that network performance is generally weakened when the morphology of neurons is destroyed.For instance, when MorphoGNN is attacked by Gaussian noise with a variance of 4 × 10 −4 , its overall accuracy and mean class accuracy are decreased by around 19% and 23%, respectively.When 40% points are lost, these two metrics of MorphoGNN are reduced by 14% and 18%, respectively.The performance degradation of networks against point dropping is slower than that against random noise.When the variance of Gaussian noise increases to 2.5 × 10 −4 , the overall accuracy and mean class accuracy of DGCNN rapidly drop to less than 80% of the initial network.However, the same network's performance is hardly decreased when the ratio of dropped points gradually increases to 35%.The same trend is shown on PointNet++.The better robustness to point discarding may be because some zero points are filled into neuron data during data preprocessing(see in section IV-B).Although the dropping operation is before filling zero points, these filled points can be regarded as a disturbance to the number of effective points added in the training stage, which makes networks less sensitive to point dropping than random noise.However, lightweight deep network such as PointNet is almost stable under different degrees of attacks.Overall, experimental results show that low-quality neurons significantly weaken network performance.Therefore, promoting network robustness is a necessary area for future work.

V. CONCLUSION
We present MorphoGNN, a novel method for capturing the morphological embeddings of reconstructed neuron fibers based on a proposed deep network, aiming to improve the utilization of existing approaches on the morphological information of neurons by a data-driven approach.By learning point-level geometric information, our proposed network can capture the feature of a single neuron and be quantitatively compared with other existing morphological features.We demonstrate that the network can be trained via both supervision and self-supervision.Moreover, the learned morphological embeddings can be combined with the features from microscopic images as well as traditional morphometrics, thereby adapting to multiple tasks.We demonstrate the effectiveness of our approach in neuron classification, retrieval, reconstruction quality classification, and neuron clustering on three thousand-level morphology datasets.
Limitations and future work: We construct a self-supervised objective function by reconstructing the shape of neurons from hidden representations, but our current model is not capable of accurately reconstructing the neuron structure due to the absence of a constraint function that reflects the prior knowledge of neuron structures.Our method does not consider that branches of neurons are usually slender and directional in order to transmit electrical signals, which may account for the imprecise reconstruction.Furthermore, the shape generated by our method is a point cloud rather than a topological tree structure, which necessitates further work to predict the connection between points.Additionally, our method does not incorporate information regarding the spatial location of the brain, resulting in the cluster results of mouse prefrontal cortex neurons being slightly different from those obtained in previous research.Although combining our approach with existing morphometrics can alleviate this deficiency, a significant gap still remains.Therefore, the integration of multimodal information, such as morphology, brain location, genomics, and proteomics, is necessary to fully comprehend the structure and function of neurons, and ultimately achieve a census of all brain cells.

Fig. 1 .
Fig. 1.Overview of the proposed MorphoGNN.This paper proposes a data-driven model for learning the morphological embedding of reconstructed neurons.Through the preprocessing steps, the nerve fiber is converted into a standard point cloud format.We employ a graph neural network to model the geometric relationship between points, and then capture the point-level morphological features of neurons.Two training strategies are provided to satisfy different morphological tasks.By fitting different labels, the model can supervised learn neuron characteristics suitable for different scenes.When labels are unavailable or need to describe their own morphological patterns, the model can be trained via self-supervision by constructing the training target to recovering neural shapes from hidden representations.

Fig. 2 .
Fig.2.Architecture of GNN encoder.Our encoder is a modular graph neural network that takes the three-dimensional coordinates of n points as input and outputs the global feature of neurons.EdgeConv{N} denotes that after this EdgeConv layer, node features are expanded to N dimension.MLP represents for multi-layer perceptron.Point features at different levels are densely connected to reduce vanishing-gradient.Then the morphological embedding is combined of the outputs of a max-pooling and an average-pooling layer.

Fig. 3 .
Fig. 3. Sampling and normalization for neurons.From left to right are the original neuron, sampling 2048 points and sampling 1024 points.

Fig. 6 .
Fig. 6.Dual-channel network for reconstruction quality classification.We model neurons and their corresponding microscopic images respectively, and then concatenate their features for classification.

Fig. 7 .
Fig. 7. Retrieval results based on different methods.(a) is an example of five most similar neurons retrieved based on different features for a randomly selected query neuron.(b) shows the retrieval accuracy of six methods, which is defined as the proportion of the same types among the ten most similar neurons retrieved.(c) compares the time-consuming of different methods to establish the feature database.

Fig. 8 .
Fig. 8. Neuron clustering based on self-supervised learning features.(a) shows the reconstructed shape by the self-supervised model from the hidden features of neuron samples seen and not seen during the training stage.(b) displays the process of enhancing the description of neuronal information through combining our morphological embedding with traditional morphometrics.(c) compares the cluster similarity between our features and the existing work.The evaluation metric is intersection of union(IoU).(d) illustrates the influence of weight on the result of neuron clustering when our features combined with traditional morphometrics.

Fig. 9 .
Fig. 9. Influence of two attacks on the classification performance of learning based methods.This experiment is done on the Neuro7 dataset.The abscissa of (a) is the ratio of discarded points, while that of (b) is the variance of Gaussian noise.

TABLE II RESULTS
OF MORPHOLOGICAL CLASSIFICATION ON NEURO7 Fig. 4. UMAP visualization of different features.

TABLE III RESULTS
OF RECONSTRUCTION QUALITY CLASSIFICATION

TABLE IV ABLATION
TEST OF NETWORK COMPONENTS ON NEURO7.HERE ✓ INDICATES THIS COMPONENT IS APPLIED IN THE NETWORK