Multi-Level Graph Neural Network With Sparsity Pooling for Recognizing Parkinson’s Disease

Parkinson’s disease (PD) is a neurodegenerative disease of the brain associated with motor symptoms. With the maturation of machine learning (ML), especially deep learning, ML has been used to assist in the diagnosis of PD. In this paper, we explore graph neural networks (GNNs) to implement PD prediction using MRI data. However, most existing GNN models suffer from the efficiency of graph construction on MRI data and the problem of overfitting on small data. This paper proposes a novel multi-layer GNN model that incorporates a fast graph construction method and a sparsity-based pooling layer with an attention mechanism. In addition, graph structure sparsity is plugged into the graph pooling layer as prior knowledge to mitigate overfitting in model training. Experimental results on real-world datasets demonstrate the effectiveness of the proposed model and its superiority over baseline methods.


Multi-Level Graph Neural Network With Sparsity
Pooling for Recognizing Parkinson's Disease Xiaobo Zhang , Member, IEEE, Yuxin Zhou, Zhijie Lu, Donghai Zhai , Haonan Luo, Tianrui Li , Senior Member, IEEE, and Yang Li , Senior Member, IEEE Abstract-Parkinson's disease (PD) is a neurodegenerative disease of the brain associated with motor symptoms.With the maturation of machine learning (ML), especially deep learning, ML has been used to assist in the diagnosis of PD.In this paper, we explore graph neural networks (GNNs) to implement PD prediction using MRI data.However, most existing GNN models suffer from the efficiency of graph construction on MRI data and the problem of overfitting on small data.This paper proposes a novel multi-layer GNN model that incorporates a fast graph construction method and a sparsity-based pooling layer with an attention mechanism.In addition, graph structure sparsity is plugged into the graph pooling layer as prior knowledge to mitigate overfitting in model training.Experimental results on real-world datasets demonstrate the effectiveness of the proposed model and its superiority over baseline methods.

I. INTRODUCTION
P ARKINSON'S disease (PD) is an age-related degen- erative brain disorder that affects nerve cells in parts of the brain responsible for planning and controlling body movement.Among those people aged over 65 years, 2%-3% of the elderly are affected [1].As the factors such as human exposure to industrialized environments and advances in detection technologies [2], the number of patients with PD is increasing over years in the modern society with an aging population.Its prevalence may be double by 2030 [3].
Although PD has non-motor and motor symptoms, the clinical diagnostic criteria for PD mainly follows medical history and physical examination [2].Non-motor symptoms such as cognitive impairment are happened in the early stages of PD.However, adopting non-motor symptoms as an independent diagnostic criterion [4] calls for more efforts due to objective differences between patients and subjective differences in assessment.Physical examination criteria mainly includes motor signs and pathological examination.
MRI as a type of pathological data are usually collected to help doctors diagnose PD.In recent years, advanced deep learning methods and imaging techniques explore MRI data to assist in the diagnosis of PD [2].In this paper, we delve into graph neural network (GNN) for PD prediction/recognition using the Parkinson's brain MRI data collected from Parkinson's Progression Markers Initiative (PPMI) [5].
The mainstream methods for PD prediction using MRI data include SVM, CNN [4], GNN, etc. GNNs show promising performance in this task [6], [7].MRI data have two types of MRI sequence data and slice MRI data.GNN based methods usually utilizes Freesufer [8] for data processing.Graph structure and node features are constructed and extracted on MRI sequence data.However, the preprocessing of MRI sequence takes more than 8 hours as the results shown in [7].Such a time-consuming process limits their applications to large scale data.Slice MRI data are also available for PD prediction [9].PD prediction using slice MRI data requires a little bit less preprocessing time than that using MRI sequence data.However, the problem is how to construct graph inputs on the slice MRI data for a GNN model.In addition, most existing GNN-based models perform overfitting on small data.
In this paper, we propose a novel GNN model for PD prediction with slice MRI data to tackle the above-mentioned challenges.For the problem of fast graph construction on MRI data, we design an image-to-graph construction method by detecting key points.Thus, PD prediction is transformed into graph classification.For the model overfitting on small medical data, we propose a multi-level GNNs model based on sparsity pooling called SparsityATopk.We further design a sparsity-based pooling layer by using node neighborhoods as prior knowledge.The pooling layer incorporates an attention mechanism to push the model focusing on those informative graph structure.Experimental results show that our model outperforms existing GNN-based models.In addition, for the problem of lack of experimental reproducibility in Parkinson's machine learning research [4], the papers provide a detailed introduction to the model parameter settings and training settings.We will also release the codes of our model.
The main contributions of this paper are as follows: • A multi-level GNN model called SparsityATopk is proposed, which combines prior information and learning mechanism to realize multi-level learning, so as to be suitable for PD prediction on small data sets and performs better than baseline methods.
• It proposes a novel pooling layer called SparsityPool.SparsityPool uses prior knowledge called sparse information as a multi-level information extraction strategy.
In such a way, the pooling layer preferentially retains nodes with more neighborhood information.
• Model sensitivity analysis is discussed to improve the reproducibility of this paper and to provide reference for interested researchers about parameter settings.

II. RELATED WORK
The essence of GNNs is to follow certain rules to propagate node information.For graph classification tasks, GNNs models need to generate graph-level representations.Pooling is a kind of information screening mechanism, which plays an important role in multi-level learning and graph-level representation generation.GNNs model also shows potential for PD prediction.There are two studies on GCN in PD prediction based on MRI data.

A. GNNs
Wu et al. divide GNNs into recurrent graph neural networks (RecGNNs), convolutional graph neural networks (ConvGNNs), graph autoencoders (GAEs) and spatiotemporal GNNs (STGNNs) according to the node propagation rules and network structure [10].The characteristic of RecGNNs is to make the node state converge through a certain recurrent mechanism.Such as node distance shrinkage function [11], randomly initialized encoder [12] and gated recurrent unit [13], etc. which contain convergent recurrent mechanisms.Con-vGNNs are similar to RecGNNs in that they generate node representations using their neighbors and their own nodes, utilize a fixed number of network layers and different network representations for each layers for node update.
ConvGNNs are mainly divided into spectral-based and spatial-based ConvGNNs, which solve the problem of graph learning from the perspective of graph signal processing and node signal propagation, respectively.The design of the former has a solid mathematical foundation, and the extension work of spectral-based ConvGNNs mainly focuses on reducing the computational complexity of obtaining filters [14], [15].
The essence of space-based ConvGNNs is to propagates node information through edges.For example, Diffusion CNN [16], which propagates adjacent nodes with a certain probability, Tran et al. [17] increases the contribution of far neighbors through the shortest distance, and Gilmer et al. [18] uses k-step message propagation to expand the range of message passing.Yang et al. [19] studied the nature of preventing node propagation from being over-smooth, and proposed a Graph Representation Learning (GRL) framework by constraining neighboring nodes to have similar representations.He et al. [20] introduced block modeling in the GCN framework to automatically learn aggregation rules corresponding to different categories of neighbors.Besides, methods such as preserving feature transformation orthogonality [21], utilizing attention mechanism [22] and using multiple loss to constrain model training [23] have improved the model performance.From the perspective of data sources, multi-view learning that integrates more types of features [24], [25] and dynamic graph learning that updates graphs during training [26] expands the application domain of graph neural network and improves the effectiveness of the model.
In graph classification tasks, graph-level representations need to be generated after the feature learning.Downsampling, or pooling, avoids problems such as model overfitting is somewhat similar to the readout mechanism used to generate graph-level representations [10].Many researchers have proposed corresponding pooling methods based on ideas such as attention mechanism [27], ranking node importance [28], dual-view learning and fusion [29], and node merging [30].

B. PD Prediction Method Based on Deep Learning
Due to the particularity of the pathology of PD, there is no effective preventive measure in clinical medicine to prevent the occurrence and progression of the disease.On the one hand, PD can rely on reliable prognostic markers for potential diagnosis and estimation of prognostic value [31], among which the combination of multiple cerebrospinal fluid biomarkers has become an accurate diagnostic and prognostic model.On the other hand, PD prediction realized by machine learning especially deep learning, has attracted wide attention, and achieved the most advanced performance [32].
Prediction methods of PD based on machine learning can be divided into speech patterns, motion data, handwriting patterns, MRI and other data based on data sources [4].Among them, the research based on speech patterns and motion data occupies a large proportion [4].For example, Guo et al. [33] proposed a sparse adaptive graph Convolutional network (SA-GCN) to evaluate Parkinson's motor symptoms by evaluating the fine-grained skeleton sequences in motion videos.
Most of PD prediction studies based on MRI data use SVM or CNN to achieve prediction.Shinde et al. [34] utilized CNN to extract features from neuromelanin-sensitive MRI as prognostic and prediction biomarkers for PD.Yagis et al. [35] used a CNN model and T2-weighted MRI data of PD patients to verify the effect of dataset partitioning on model accuracy, pointing out that the data splitting method would greatly affect the model effect.Besides, Kumar et al. [36] proposed to use capsule network to identify early signs of PD.
Currently, two studies use ConvGNNs for PD prediction.McDaniel and Quinn [7] and Zhang et al. [6] have similar studies.The difference is that Zhang et al. [6] used sMRI to build graph structures, while McDaniel and Quinn [7] used aMRI.Both studies used multiple brain tractography algorithms to obtain multi-view graph data.In addition to PD prediction, there is further predictive analysis.Li et al. [37] proposed a local interpreter method to interpret the prediction results after using random forest to complete PD prediction.
The utility of GNNs remains to be explored in PD prediction.In the existing GCN-based research, image preprocessing takes a long time [7], the model learning is insufficient for small datasets, and there is no targeted improvement to the model architecture.In this paper, a multi-level GNN model using sparse pooling is proposed to address the above problems.

III. METHODOLOGY
SparsityATopk utilizes the sparsity of node neighborhood information to achieve multi-level learning and complete graph classification.The sparsity information is used in SparsityPool to obtain node scores at node selection.The classification problem along with the basic notation is defined in Section III-A.The architecture of SparsityATopK is described in Section III-B.Finally, the details of SparsityPool are presented in Section III-C.

A. Notations and Problem Definitions
A graph consists of a set of n nodes and m edges, which can be represented as G(F, E, N ).Among them, F = [e i j ∈ R] n×d represents the d-dimension features of n nodes, and E = [e i j ∈ R] n×n is the adjacency matrix of the graph, which represents the connection of nodes.N (node 1 , node 2 . . ., node n ) represents the set of graph nodes.In matrix E, e i j ̸ = 0 indicates that there is an edge connection between nodes i and j.The s graphs constitute the graph set Gs(G1, G2, . . ., Gs), in which graphs can be divided into k categories.Now the problem is to obtain a mapping f , which maps the graphs to the probability space, f (Fi, Ei)− > − → p , so as to obtain the prediction result.The proposed Sparsity-ATopK is the desired mapping f.
In f , in order to capture the information of the local substructure of the graph, a pooling layer is designed to obtain the multi-level graph.The strategy of the pooling layer is to obtain the node importance score through a function f n, and select the saved node according to the score.

B. Model Architecture
The key idea of SparsityATopk is that using the SparsityPool to construct a multi-layer GNN model to hierarchically extract graph information.The architecture of SparsityATopk with two pooling layers is shown as Fig 1 .The graph representation after propagation is denoted as global information, and the pooled nodes processed by the attention layer are assigned as local information.Finally, the global information are concatenated and send to two fully connected layers to realize the prediction.
In SparsityATopk, we adopt the information propagation rule proposed by [38] in k-GNNs.This method improves the node propagation range to k nodes and their neighborhoods, so that information can be extracted better in terms of coarse and fine granularity, which shows better performance in graph classification.
1) Input: The input of the model is the adjacency matrix E and the feature matrix F of the graph, which represent the connection status and characteristics of the nodes, respectively.And node set N can be obtained by E.
2) Global Information: In layer t,the propagated information can be accessed by: where f represents the feature, in first layer, f (0) = F, and e is a subset of N (t−1) with g nodes, W 1 and W 2 are the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
learning parameters.u is the neighborhood of node set e, defifined in [38] as: Which means the imformation is propagated through subgraphs rather than nodes.And in (1), σ is a nonlinear activation function which is implemented as ReLu(•) = max(0, •).
Finally, the global representation F t g = 1 n n i=0 ( f t i ) of this layer is obtained by an average pooling module.
3) Local Information: local inormation is gained by Sparsi-tyPool layer.In the SparsityPool layer, after calculating node sparsity, the sequence idn t of the top n t nodes that need to be reserved is obtained.Before obtaining the pooling output, the features need to be weighted with semantic information obtained by a propagation layer with an output depth of 1 to present a clearer significance.This propagation layer is represented as attn.From the weighted feature matrix F t l and adjacency matrix N t , the new adjacency and feature matrix are retrieved according to the pooling sequence idn t as the input to the next layer.Use the aggregation layer with output channel 1 as the attention layer, denoted as attn, the local information F t l can be accessed by: 4) Classification: To aggregate the information of each graph level, all representations are merged before classification: The cat here refers to splicing the representation of different layers to obtain a (d 1 + d 2 + .. + d n ) × 1 dimension tensor.Finally, the obtained graph representation is sent to the two fully connected layer for classification prediction, so as to obtain the class probability p of the graph.

C. SparsityPool
To mitigate overfitting, we introduce prior knowledge into the pooling layer.SparsityPool draws on the idea of repair priority in image inpainting [39].In traditional image inpainting, the pixels of the known area and the patches formed by their fields are combined with weights to repair the missing areas.Structural sparsity of edge blocks in missing regions is used to assist in selecting blocks with more significant structures as blocks to be repaired preferentially.The structural sparsity here refers to the pixel similarity between a patch and adjacent patches.The more similar blocks means that it has more salient texture features in the area, and the less similar blocks signifies that it has more obvious structural features, so that the structure and texture can be better distinguished.
Analogous to the idea of using sparsity to find more important blocks to generate inpainted sequences in image inpainting, we describe the sparsity of graph structures and use this as a criterion to generate pooled node sequences.
The pooling layer is applied after the aggregation layer.After the t-th aggregation layer, the feature matrix F t and the adjacency graph E t are obtained, and the node neighborhood is defined as the first-order neighborhood: In the node neighborhood, the more similar nodes are, the less effective information the central node can obtain.For each node, before evaluating node sparsity, the pooling layer needs to calculate the feature similarity between the central node and adjacent nodes: where dis is the distance function and mu is set to 5 in the implementation.Z (w) is the normalization constant such that w node i ,node j = 1.The p neighboring nodes constitute the sparsity vector ⃗ w i = [w node i ,node j ] p×1 of the central node.After that, the node sparsity can be calculated as: The process of node selection is shown in Fig 2 .As can be seen in the Fig 2, sparse pooling will preferentially select nodes with richer neighborhood information, that is, there is less repetition between the central node and the neighborhood.Similar nodes in the neighborhood mean duplication of information and the node will get a low score.In addition, the normalization operation when calculating sparsity also makes nodes with only one neighbor get low scores, thereby shrinking peripheral nodes and similar nodes.At the same time, this operation also makes central nodes with more neighborhood nodes tend to have higher scores.And under different pooling rates, nodes with different information richness will be discarded.

D. SparsityATopK Algorithm
The deep representation learning algorithm of our proposed SparsityATopK is shown in Algorithm 1.There are three layers of loops.Loop 1: Steps 1 through 16 represent the learning process of the model.Loop 2: Steps 3 through 11 represent the process of obtaining the hierarchical representation.Loop 3: Steps 5 to 8 represent the process of obtaining node sparsity.
It should be noted that (1) is used three times in the algorithm.
Step 2 is to ensure that the model will carry out feature propagation at least once.Step 4 is feature propagation before obtaining hierarchical representation.The 12th step is to carry out a feature propagation after obtaining the last hierarchical representation to obtain the global information.At the same time, the process of selecting the first k scoring nodes in the pooling layer and obtaining the attention score by using formula 1 is ignored in the algorithm process.

IV. EXPERIMENTS
In this section, the evaluation of the effectiveness and superiority of the proposed SparsityATopk in PD prediction are given.The proposed model is compared with several until Go through all the nodes 9: Compute the feature matrix F t l by (4) 10: Compute the adjecent matrix N t l by (5) 11: until Extract the hierarchical information n times 12: Compute the hidden feature F n by (1); 13: Compute the global representation F by (6) 14: Compute the prediction vector of global representation by linear transformation 15: Optimize SparsityATopK; 16: until convergence baselines on the transforme PD MRI dataset to demonstrate its superiority.In addition, model sensitivity analysis is provided to improve method reproducibility.

A. Experiment Settings
The experiments are performed on a GPU server (Intel(R) Xeon CPU Gold 6140 @2.30GHZ, RAM with 256GB memory and GPU NVIDIA Tesla P100 with 16GB memory).The model is implemented in Python, using Pytorch1.4.0 and torch-geometric2.0.4.
On the MRI dataset, three keypoint detection methods, Oriented FAST, Rotated BRIEF(ORB) [40], KAZE [41] and Accelerated-KAZE(AKAZE) [42], are applied to construct the graph datasets, respectively, and finally obtain three datasets.The models are trained on three views respectively to compare the effect under different keypoint detection methods.In the experiments, the model trains on 80% of the data, validates on 10% of the data during training, and finally tests on 10% of the data, using the test results to evaluate the model performance.During training, ten independent disjoint splits of the dataset are performed, and training is repeated on each split dataset.Due to the imbalance of data categories, the loss weight of the category is taken as the object of parameter search.The datasets obtained under different keypoint detection methods have different parameter settings.The specific settings will be shown in Section IV-D.The model is compared with Sortpool [28], SAGpool [43]), ASAP [44], GIN0 [45], GlobalAttentionNet [13] on three evaluation metrics.

B. Datasets
In this paper, the data used for model training is the MPRAGE sequence imaged Sagittal brain MRI dataset downloaded from the PPMI [5].The dataset covers four groups from suspected to confirmed and normal groups, namely prodromal group, SWEED group, PD group and control group.48, 63, 366, and 163 individuals are sampled from the four groups to obtain a preliminary dataset.After cleaning out the lowcontrast, blurry, and small size data, the final number of each category in the dataset is 85, 87, 677, 183, as shown in the table I.These data are finally converted into graph datasets through preprocessing.

C. Image to Graph
Although we can treat the pixels as the nodes of the graph and construct the graph without information loss, this would Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.greatly increase the amount of computation and make the model unable to process.Therefore, we construct the graph by using the extracted features as nodes.
Use python's cv2 [46] package to realize image preprocessing and image feature extraction.In order to remove the noise and to focus on more salient features for subsequent keypoint detection, the data is first denoised.Using the f ast Nl Means DenoisingColor ed function, with a filter strength of 10, non-local average denoising is achieved.After converting the denoised image into a grayscale image, the three feature point detection methods that have been implemented in the cv2 [46] package, including ORB [40], KAZE [41], and AKAZE [42], are used for keypoint detection.Feature point detection algorithm mainly includes key point detection and key point description.The algorithm takes the points with sharp changes of pixels as the key points, and obtains the corresponding features by calculating the gray change of the surrounding areas of the key points.That is, feature detection reflects the object structure and the surrounding features of the structure.
Different keypoint detection methods will focus on different features, as shown in the Fig 3 .The obtained keypoints are sorted according to the importance of the features, and the first 300 points are selected as the nodes of the graph.By calculating the intensity pattern around the point, the features of the node are obtained.According to the coordinates of the selected points, find the 8 nearest neighbor nodes of the central node as the neighborhood, so as to construct the adjacency matrix.The image processing flow is shown in the Fig 4 .On a device with CPU Intel(R) Core(TM) i7-6500U @2.50GHZ and 16G memory, the processing time of converting an MRI image into a graph is less than 1 second, which is much lower than the 8 hours required for image preprocessing proposed by McDaniel and Quinn [7].In the following, the three datasets obtained by different keypoint detection methods are called P-ORB, P-KAZE, and P-AKAZE.

D. Experimental Results
Table II shows the model parameter settings on different datasets and the tuning range.Among them, hidden is the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.• The proposed model has the highest average accuracy performance on the P-ORB dataset.But for dataset P-ORB, the scores of b and c are relatively low, indicating that the dataset obtained by the detection method of ORB increases the category imbalance.

TABLE II PARAMETER SETTINGS
• The proposed model has the highest precision performance on the P-KAZE dataset and the highest F1 score performance on the P-AKAZE dataset.And the comprehensive performance of P-KAZE and P-AKAZE is better than that of P-ORB, indicating that KAZE [41] and AKAZE [42] are more suitable for PD prediction tasks.
Meanwhile, we compare the pooling layer and model with existing methods.The effectiveness of the model design was demonstrated by ablation of the pooling layer and the attention layer of SparsityATopK.Further more, to show the effect of the proposed pooling layer and attention module, we evaluate the performance of the models after degradation.The design considerations are as follows.• Our proposed SparsityATopK is markedly better than other baselines.SparsityATopK achieves better performance on all datasets and metrics.Specifically, on the P-ORB dataset, SparsityATopK showed an improvement of at least one percentage point in three indicators compared with other models.In the P-AKAZE data set, the Precision index can be at least flat, and the other indexes can improve by at least one percentage point.In the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.P-KAZE data, the improvement of average accuracy is obvious.
• It can be seen from the tables III to V that the degenerate model of SparsityATopk does not show obvious advantages over the baselines, or even performs slightly lower than other models, while the performance of Spar-sityATopk is generally superior to other models.The proposed modules are important for PD prediction tasks.
• From Fig 6, it can be seen that by adding pooling layers and attention mechanisms, the model performance is improved.The precison and f1 score on the datasets P-AKAZE and P-KAZE are particularly improved.It shows that the pooling layer can improve the performance, and the combination of the attention mechanism will further improve the prediction effect of the model.
• From the comparison results of pooling structure and model architecture, it can be seen that the proposed SparsityATopk is more effective for PD prediction.

E. Parameter Sensitivity Analysis
Parameter studying is an important part of the experiment.After preliminary experiments, the network with three propagation layers outperforms the network with four or more layers.For loss functions, cross-entropy is better than others, such as MSE loss.In addition, the parameters involved in studying include feature dimension, pooling rate, drop rate, loss function weight, the learning rate, learning rate decay factor (called decay factor), learning rate decay step (called decay step), number of training iterations (called decay epochs), batch size and optimizer.Random search is used to adjust parameters within a certain range.Table II shows the parameter tuning range and step, L represents the logarithmic distribution and F represents the feature depth.For example, the pooling rate is a random value between L[0.2, 0.8], the value space follows the logarithmic distribution, and the learning rate is a random value between L[0.01, 0.0001].The value range of batch size is [32,256], and the step size is 32, which means that the possible values of batch size include [32, 64, 96, . . . , 256].In order to balance the performance of the model on the three indicators, the root of the square sum of the three indicators is used as the optimization indicator.300 times of parameter studying are performed on each dataset.
After parameter studying, according to the model performance, the model sensitivities on the three datasets are obtained, as shown in the Fig 7 .It can be seen that on the P-ORB dataset, the three most important parameters are optimizer, batch size and epochs.For P-KAZE, they are batch size, decay factor, and optimizer.On the P-AKAZE dataset, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

V. DISCUSSION
We have proposed a method based on GNN and structure sparsity rules for building predictive models for PD recognition.MRI data have been feed to the predictive models for training and evaluatiton.The advantages of the proposed method are that it can quickly construct graph data through MRI images, and a sparse multi-level graph neural network model is designed for the phenomenon of small amount  In previous studies of graph neural network based on MRI data [6], [7], graph data preprocessing often takes a lot of time, but retains and fuses more effective information.Although the key point detection method can reduce the data dimension and obtain key information, the lost information will still affect the accuracy of the model.Meanwhile, the information in the single view also limits the prediction accuracy.Processing multi-view data will become a further research direction.
At the same time, as the focus of model design is to use the prior information of data to cope with the shortage of training data, this study also has advantages for other graph classification tasks.We conducted tests on three datasets and comparison models.The model performance is shown in Table VI, and the dataset details are shown in Table VII.The visualization from Table VI is shown in Fig. 11.It can be seen that SparsityATopK has better classification prediction performance on smaller datasets, such as MUTAG.This is due to the fact that the proposed model has a more stable performance while having good classification ability.On larger datasets, such as REDDIT-MULTI-5K, the performance of SparsityATopK is slightly weaker on the Precision index.It is further proved that SparsityATopK is more suitable for   small-scale data sets.At the same time, it also indicates that one of our future work directions is to make the model more suitable for large data sets.

VI. CONCLUSION
paper proposed a sparsity based multi-level graph learning framework (denoted as SparsityATopk) for PD prediction.SparsityATopk leverages the properties of GCN, sparsity pooling, attention mechanisms, and feature fusion.Brain MRI data from PPMI are first transformed into graph dataset using keypoint detection techniques (ORB, KAZE, AKAZE).Spar-sityATopk has excellent performance in screening prodromal PD, SWEED, confirmed PD and control group, meanwhile, the design of pooling layer and network model is proved to be beneficial for PD prediction.Extensive experimental results show that the proposed model outperforms other existing GNNs, meanwhile, AKAZE, KAZE are more suitable for PD prediction tasks compared to ORB.In future, we will investigate a principle to determine a more appropriate graph construction methods and incorporate multimodal data to further improve PD prediction accuracy.

Fig. 1 .
Fig. 1.Model architecture of SparsityATopK.A multi-level GNN consists of a stack of propagation layers and pooling layers, and obtains a global representation of each level through average pooling, as shown in the upper box.In multi-level learning, the graph structure will change, and the data flow shows the propagation scope and pooling process of node information.

Fig. 2 . 6 : 7 :
Fig. 2. Node selection.(a) is the initial node state, the numbers in the circle represent the node features, and the dotted circle outlines the current node and neighboring nodes of the three nodes; (b) shows the similarity scores between nodes; (c) is the sparse score of the node, and the numbers on the edges indicate similarities that are normalized; (d) is the pooling result under different pooling ratios.

Fig. 3 .Fig. 4 .
Fig. 3. Feature detection results.From left to right are the original image, ORB detection results, KAZE detection results, and AKAZE detection results.

Fig. 5 .
Fig. 5. Experimental results.Comparison of indicators of the model on three datasets.
TABLE III EVALUATION RESULTS ONTHE P-ORB DATASET • Among the five comparison models, SortPool, ASAP, and SAGPool use the same model architecture as the proposed model.The difference lies in the pooling layer, and the three models are used for the comparison of the pooling layer.This part is set to test the effectiveness of the proposed pooling layer for PD prediction tasks • GIN0 and GlobalAttentionNet as a comparison of model architectures.These two models have the same number of propagation layers as SparsityATopK.This part is set to test the effectiveness of adding pooling layer to PD prediction task • S-GCN is the version of the model without the pooling layer, and SparsityTopk is the version of the model without the attention module.The purpose of setting S-GCN is to check the validity of pooling layer and semantic module.SparsityTopk was set to test the effectiveness of adding semantic modules to the pooling layer.The tables III to V shows the test results of the proposed model along with the degradation model and 5 comparative models on three datasets.Fig 6 shows the model performance of the proposed model and the degradation model.From tables III to V and Fig. 6, we have the following observations.

Fig. 6 .
Fig. 6.Model performance for degenerate models.OBR, KAZE, and AKAZE represent three datasets, respectively.G stands for S-GCN, S stands for SparsityTopk, and AS stands for SparsityATopk.

Fig. 7 .
Fig. 7. Model sensitivity.Sensitivity ratio of each parameter on the three datasets.

Fig. 11 .
Fig. 11.Visualization of comparing results with graph classificaiton methods on three graph data sets by three evaluation indexs.
Manuscript received 22 June 2023; revised 16 September 2023; accepted 31 October 2023.Date of publication 6 November 2023; date of current version 14 November 2023.This work was supported in part by the National Natural Science Foundation of China under Grant 61961038 and Grant 61976247, in part by the Central Guiding Local Science and Technology Development Fund under Grant 23ZYZYTS0189, in part by the Key Research and Development Program in Sichuan Province of China under Grant 2023YFS0404, in part by the Fundamental Research Funds for the Central Universities under Grant 2682022KJ045, and in part by the Open Research Fund Program of Data Recovery Key Laboratory of Sichuan Province under Grant DRN2203.(Corresponding author: Donghai Zhai.)Xiaobo Zhang, Yuxin Zhou, Donghai Zhai, Haonan Luo, and Tianrui Li are with the School of Computing and Artificial Intelligence, the Institute of Artificial Intelligence, and the National Engineering Laboratory of Integrated Transportation Big Data Application Technology, Southwest Jiaotong University, Chengdu 611756, China, and also with the Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education, Chengdu 611756, China (e-mail: dhzhai@swjtu.edu.cn).Zhijie Lu is with the Department of Neurology, General Hospital of Western Theater Command, Chengdu 610083, China.Yang Li is with the School of Automation Science and Electrical Engineering, Beijing University of Aeronautics and Astronautics, Beijing 100191, China.

TABLE I SUMMARY
OF MRI DATA IN SAGITTAL VIEW

TABLE IV EVALUATION
RESULTS ON THE P-AKAZE DATASET

TABLE V EVALUATION
RESULTS ON THE P-KAZE DATASET

TABLE VI PERFORMANCE
OF THE MODEL ON GENERAL DATASET