EEG-Based Graph Neural Network Classification of Alzheimer’s Disease: An Empirical Evaluation of Functional Connectivity Methods

Alzheimer’s disease (AD) is the leading form of dementia worldwide. AD disrupts neuronal pathways and thus is commonly viewed as a network disorder. Many studies demonstrate the power of functional connectivity (FC) graph-based biomarkers for automated diagnosis of AD using electroencephalography (EEG). However, various FC measures are commonly utilised, as each aims to quantify a unique aspect of brain coupling. Graph neural networks (GNN) provide a powerful framework for learning on graphs. While a growing number of studies use GNN to classify EEG brain graphs, it is unclear which method should be utilised to estimate the brain graph. We use eight FC measures to estimate FC brain graphs from sensor-level EEG signals. GNN models are trained in order to compare the performance of the selected FC measures. Additionally, three baseline models based on literature are trained for comparison. We show that GNN models perform significantly better than the other baseline models. Moreover, using FC measures to estimate brain graphs improves the performance of GNN compared to models trained using a fixed graph based on the spatial distance between the EEG sensors. However, no FC measure performs consistently better than the other measures. The best GNN reaches 0.984 area under sensitivity-specificity curve (AUC) and 92% accuracy, whereas the best baseline model, a convolutional neural network, has 0.924 AUC and 84.7% accuracy.


I. INTRODUCTION 29
A LZHEIMER'S disease (AD), a neurodegenerative dis-30 ease, is the most common form of dementia. AD patients 31 exhibit progressive deterioration of memory and other cogni-32 tive functions. From a neuroscience perspective, AD leads to 33 synaptic loss and cellular death, which progressively occurs 34 over multiple brain regions [1]. Disruption of communication 35 pathways amongst brain regions is observed in AD [2], [3], [4]. 36 Due to this distributed nature of AD, it can be recognised 37 as a network disorder. Thus, graph theory is well suited 38 for analysing and classifying AD, as it provides a general 39 framework to study the interactions of various pathological 40 processes across multiple spatiotemporal scales. 41 Functional connectivity (FC) is one of the methods to 42 construct and study brain graphs. The edges of FC graphs 43 represent the statistical dependencies between brain regions 44 rather than the physical connectome, i.e. structural con-45 nectivity. FC brain graphs can be constructed from any 46 functional brain imaging modality, such as EEG, magnetoen-47 cephalography (MEG), functional magnetic resonance imaging 48 (fMRI), or positron emission tomography (PET). In this paper, 49 we focus on EEG. EEG has been shown to be an effective tool 50 for studying the changes in brain activity in AD cases [5], 51 [6], [7]. Compared to other modalities, EEG is economical, 52 non-invasive, easy to administer, and has a superior temporal 53 resolution. On the other hand, it suffers from a low spatial 54 resolution as the activity is measured by electrodes placed on 55 the subject's scalp. 56 Emerging evidence shows large-scale alterations in func-57 tional connectivity (FC) in AD, such as increased connectivity 58 in the low-frequency bands [8], [9], [10]. Graph-based studies 59 show that AD is characterised by reduced complexity [6] and 60 loss of small-world organisation, assessed by the clustering 61 coefficient and characteristic path length [11], [12], [13], [14]. 62 However, there are multiple FC methods commonly used 63 within the literature. 120 use a minimum spanning tree algorithm (MST) to produce 121 sparse brain graphs. This is in contrast to threshold-based edge 122 filtering, as MST can select edges with various edge weights 123 and ensures that the resulting graph is connected. Additionally, 124 Zhong et al. [26] utilise a learnable mask in order to learn 125 the optimal graph structure for a specific classification task 126 without relying on any FC measure. 127 In this study, we systematically evaluate the effects of using 128 various FC methods to infer EEG brain graphs in training 129 GNN for the classification of AD patients. Two types of 130 edge filtering are used to induce graph-sparsity in order to 131 improve the performance of GNN. To compare and evaluate 132 the classification performance of various FC-based GNNs, 133 a GNN-based baseline is trained using a fixed graph structure 134 for all brain graphs, represented by the euclidean distance 135 between spatial positions of EEG sensors. Three additional 136 baseline models are established: two SVM baselines fitted 137 on node strength (SVM-NS) and vectorised adjacency matrix 138 (SVM-vector), respectively, and a CNN trained on images of 139 adjacency matrices. Fig. 1  The EEG dataset consists of 20 AD patients and 20 healthy 143 control participants (HC) below 70 years. A subset of this 144 dataset has been previously used in Blackburn et al. [5]. All 145 AD participants were recruited in the Sheffield Teaching Hos-146 pital memory clinic. AD participants were diagnosed between 147 one month and two years before data collection, and all were 148 in the mild to moderate stage of the disease at the time of 149 recording. Age and gender-matched HC participants with nor-150 mal neuropsychological tests and structural MRI scans were 151 recruited. The EEG data used in this study was approved by 152 the Yorkshire and The Humber (Leeds West) Research Ethics 153 Committee (reference number 14/YH/1070). All participants 154 gave their informed written consent.

155
EEG was acquired using an XLTEK 128-channel headbox, 156 Ag/AgCL electrodes with a sampling frequency of 2 kHz using 157 a modified 10-10 overlapping a 10-20 international electrode 158 placement system with a referential montage with a linked 159 earlobe reference. The recordings lasted 30 minutes, during 160 which the participants were instructed to rest and not to think 161 about anything specific. Within each recording, there were 162 two-minute-long epochs during which the participants had 163 their eyes closed (alternating with equal duration eyes-open 164 epochs, not used in this work).

214
The adjacency matrix using the absolute values of Pearson's 215 correlation coefficients between nodes x and y is given by: where x(t) is the value of signal x at time t, andx is the mean 218 of x. The absolute value is calculated as we are only interested 219 in the coupling magnitude. Next, the adjacency matrix of 220 coherence is given by: where C S xy and C S x x are cross-spectral and auto-spectral 223 densities respectively at frequency f . The coherence within a 224 frequency band B is then calculated as the mean of G coh The imaginary part of coherency (iCOH) measures phase 227 consistency similar to coh and accounts for volume conduction 228 effects. The adjacency matrix using iCOH is computed as:    locking [27]. The adjacency matrix using PLI is defined as:

246
where φ x is obtained using Eq. 5. Weighted phase lag index 247 (wPLI) is an extension of PLI, which aims to remove the 248 effects of amplitude and volume conduction by maximally 249 weighting the ±90 deg phase differences and thus omitting 250 uniformly driven differences [28]. The adjacency matrix using 251 wPLI is computed as

253
Phase locking value (PLV) is another approach to quantify 254 the consistency of phase differences between signals, and its 255 associated adjacency matrix is computed as signal. The adjacency matrix using MI is calculated as:

265
where P XY and P X are the joint and marginal probability 266 distributions, respectively. 267 1) Edge Filtering Methods: It is worth noting that we did not 268 use any corrections for false positives. Thus, the true brain 269 graph structure might be masked by noise due to spurious 270 coupling. Traditionally, a surrogate threshold might be used 271 to control such spurious edges. However, such a procedure 272 is computationally expensive, as it requires re-computing the 273 connectivity measure on multiple random surrogate versions 274 of the original signals, to estimate a null surrogate distrib-275 ution. Instead, we implement two edge-filtering methods to 276 select only important edges and thus produce sparse graphs. 277 Compared to the surrogate threshold method, edge-filtering is 278 a fast and efficient, albeit naive method to deal with potentially 279 noisy brain graphs. We also utilise the fully-connected graphs, 280 i.e. without any edge selection, in the classification models in 281 order to test the effect of edge-filtering.

282
The first edge-filtering method is an FC-strength-based top-283 k% filter (k ∈ {10, 20, 30}), which selects only the top k% 284 strongest edges of the given graph and removes the rest. 285 This approach assumes that edge weight, i.e. the connectivity 286 strength, is directly related to the importance of an edge. 287 However, this assumption might not be valid.

288
A minimum-spanning-tree-based filter (MST-k), also an 289 orthogonal minimum spanning tree [29], addresses this con-290 cern as it selects a mix of edge weights and always produces 291 a connected graph, i.e. a path exists among all nodes. Briefly, 292 the MST algorithm [30] aims to extract a backbone of a graph 293 with N nodes by selecting N − 1 edges, such that the sum of 294 weights is minimised. We use Prim's algorithm for computing 295 MST [30]. In the case of brain graphs, a stronger edge weight 296 implies a higher degree of coupling; thus, we use an inverted 297 MST algorithm which maximises the sum of weights instead. 298 When k = 1, MST-k is equal to a single iteration of the 299 MST algorithm. For k > 1, the edges selected by the previous 300 iterations are removed from the graph, and the MST algorithm 301 is re-run. Thus, MST-k filter selects k(N − 1) edges.

B. Graph Neural Network Classification 303
A graph neural network (GNN) is an extension of an 304 artificial neural network that is capable of learning on graph-305 structured data. Specifically, we implement a graph convolu-306 tional network (GCN) for a graph classification task (Fig. 1A). 307 The input to the GCN classifier is in the form of a graph: 308 G = {N, E, F}, where N, E, and F are sets of nodes, edges 309 and node features, respectively. The nodes are fixed in our 310 case, as this is the number of EEG electrodes. The set of 311 edges E is given by the adjacency matrix A computed by 312 the FC measures introduced in the previous section. Finally, 313 the node feature matrix F is an N × D matrix where each 314 row encodes a D-dimensional feature for the corresponding 315 node. Specifically, power spectral density (PSD) is computed 316 over 1 Hz increments in an interval between 0 and 100 Hz, 317 forming a 100-dimensional node feature vector (i.e. D = 100). 318 GCN is based on the message-passing framework, which 319 assumes that neighbouring nodes should have similar node 320 features. Briefly, a GCN layer updates the node features 321 the GCN layer is implemented on a node-level as follows [31]: 346 where x L i is the output of the L th GCN layer for the i th

362
In summary, the GNN used in this study has several hyper-363 parameters, as shown in Table I  In order to enable a fair assessment of the advantages 372 of using graph-based learning (i.e. the GNN), four baseline 373 classifiers are trained and compared. These baseline models 374 utilise the same graph-structured input data extracted using 375 different FC measures, frequency bands and edge filters, and 376 the same evaluation process. Thus, we argue this to be a fair 377 comparison of models.   Table I. Additionally, in order to select an appropriate 390 kernel for SVM, we include two kinds of kernels as hyper-391 parameters: radial and polynomial (up to 3 rd order). Both 392 of the SVM-based baseline models are trained on manually 393 extracted features. All features are first normalised to zero 394 mean and unit standard deviation.

395
The SVM-NS is trained on node strengths (Fig. 1B). Node 396 strength is defined as the sum of edge weights of one node 397 and can be interpreted as a measure of node importance. Thus, 398 each brain graph is represented by an 23-dimensional feature 399 vector N S = (ns 1 , ns 2 , . . . , ns N ), where N is the number of 400 nodes (N = 23).

401
The SVM-AM is trained on vectorised weighted adjacency 402 matrices (Fig. 1D). As we use only undirected FC measures, 403 the N × N adjacency matrix of a brain graph is symmetric.   Table I.    Table I.

459
The best-performing models are selected using the area 460 under the sensitivity-specificity curve (AUC), i.e. one model 461 per each combination of FC measure and model type. In order 462 to assess the stability of the selected models, 50 times repeated 463 CV is performed. The performance errors are computed using 464 the maximum difference between the mean and 5 th and 95 th 465 quantiles. This approach does not assume a normal distribution 466 and results in conservative error estimates.

467
The CNN and GNN models are trained using an Adam opti-468 miser with an exponential learning rate decay (controlled by 469 the gamma hyper-parameter) and cross-entropy loss function. 470 The models are trained for 300 epochs with an early stopping 471 after 15 epochs if the loss stops decreasing.

473
Brain graphs were inferred for each 3-second-long EEG 474 segment by using several commonly used FC measures, which 475 aim to quantify both the linear and nonlinear coupling between 476 pairs of brain signals. The brain graphs were then used as an 477 input to train the GNN brain-graph classifier. Moreover, four 478 baseline models were trained on these brain graphs in order to 479 demonstrate which type of classifier performs the best. AUC 480 is used to select the best model. 481 Table II reports the AUC values and the 95% confidence 482 intervals of the SVM-NS, SMV-AM and CNN baseline 483 models and GNN across the 8 FC measures. Note that 484 the MLP baseline is not included here, since it does not 485 utilise the FC brain graphs. Additionally, the performance of 486 the baseline GNN using Euclidean distance between spatial 487 positions of EEG (GNN-euclid) is reported in Table II as 488 well. The hyper-parameter values of the best models from 489 their respective categories are reported in Table IV. The aver-490 aged sensitivity-specificity curves of these models are shown 491 in Fig. 2.

492
All baseline models perform worse than all of the GNN 493 models across all FC measures as shown in Table II Table II, we can also see that the GNN models trained 498 using FC-based brain graphs perform better than GNN-euclid, 499 which was trained using a static graph structure. 500 Furthermore, we report the effect of frequency bands and 501 edge filtering methods on the performance of the trained 502 models in the supplementary materials. Figure S3 and tables 503 S1-S3 report these effects of frequency bands. Figure S4 and 504 Tables S4-S6 report these effects of edge filtering methods.  The results suggest that the GNN outperforms all baseline 509 models across all FC measures (Table II). Moreover, neural- SVM-NS) that rely on manually engineered features. 514 We argue that the relatively low performance of the machine 515 learning approaches is caused by the inability to remove 516 noise-contaminated information from the input features. This 517 is likely exacerbated by the lack of false positives control 518 during the brain graph inference, which would limit the 519 number of edges caused by spurious coupling. We suggest 520 that the neural network-based models can solve this issue by 521 using weight regularisation and dropout layers, designed to 522 learn generalisable features insensitive to noise.

523
It could be argued that the GNN models perform better 524 than CNN and MLP because they are trained using two input 525 information sources, i.e. the FC weighted brain graph and the 526 node feature matrix with power spectral density. This is a 527 unique property of GNN as it can aggregate information from 528 both inputs. Moreover, to the best of our knowledge, GNN is 529 the only model architecture that can process these two inputs 530 simultaneously.

531
The CNN and MLP baseline models offer an interesting 532 comparison to the GNN since each is trained using one 533 of the two input information sources. The CNN and MLP 534 baselines show the individual predictive power of the FC-based 535 brain graph and node feature matrix, respectively. The results 536 suggest that the node feature matrix provides a slightly better 537 source of information in the classification task (Table III). 538 However, GNN performs significantly better, and we argue that 539 the comparison with the CNN and MLP baselines highlights 540 the power of GNN in brain-graph classification.

541
The relatively poor performance of CNN also demonstrates 542 the shortcomings of treating the adjacency matrix of a brain 543 graph as an image. Each pixel of an image has an equal 544 number of neighbouring pixels, and the content of the image 545 depends on the specific spatial ordering of its pixels. There-546 fore, convolution can be applied to patches of pixels to extract 547 features automatically. This assumption is invalid for a graph 548 where each node can be connected to an arbitrary number 549 of neighbours, and no meaningful ordering of nodes exists. 550 In contrast, graph convolution generalises the convolution 551 to efficiently solve this issue by utilising order invariant 552 operations to aggregate information from neighbouring nodes. 553 Moreover, the hyper-parameter optimisation has identified a 554 GNN model with two graph convolutional layers as the opti-555 mal GNN architecture (Table IV). This means that the GNN 556 aggregates information not only from the nodes connected by 557 an edge directly (i.e., the 1-hop neighbours) but also from 558 the 2-hop neighbours. This suggests the importance of global 559 graph properties in diagnosing AD accurately, in addition to 560 the local properties, which could likely be learned with a single 561 layer. This is in line with the reported loss of small-world 562 properties of AD brain graphs [11], [12], [13], [14]. 563 Next, the results demonstrate that the FC-based GNNs also 564 outperform the GNN-euclid model, which utilises a static 565 graph structure (Table II). This suggests that it is preferable 566 to utilise FC-based brain graphs rather than the distance-based 567 static graphs previously used for EEG-GNN tasks [23], [24]. 568 However, it seems that no FC measure offers clearly superior 569  577 Surprisingly, the GNN-euclid model achieves relatively high 578 accuracy despite utilising a fixed graph structure (Table III). 579 The Euclidean brain-graph structure highlights the spatially 580 local relationships between the EEG channels. In contrast, 581 long-range edges have only a low weight. Therefore, we argue 582 that the Euclidean brain graph biases the GNN model to 583 learn local graph features predominantly. On the other hand, 584 the FC-based brain graphs may contain both local and long-585 range relationships. Previous research suggests that AD-related 586 differences are observed in long-range pathways and global 587 graph properties [10], [11], [13]. In our opinion, the FC-based 588 GNNs outperform GNN-euclid since they can better capture 589 both the local and global differences on the graph level.

590
To further investigate the differences between FC measures 591 on the graph level, we compute an average adjacency matrix 592 for each FC measure across both groups and frequency bands 593 ( Figure S1). In Figure 3, we show these matrices for α and θ 594 frequency bands as these are utilised by the best performing 595 models (Table IV). The brain graphs are relatively similar 596 across the FC measures. In the θ band, increased connectivity 597 can be observed in AD compared to HC. In contrast, the 598 connectivity seems to be decreased in AD in the α band. These 599 differences are well documented in the literature [8], [9], [10]. 600 Interestingly, all FC measures detect a well-defined cluster 601 containing mostly parietal and occipital EEG channels. The 602 strength of this cluster distinguishes AD from HC consistently 603 across FC measures. We speculate that this cluster contributes 604 most of the predictive information for the classification mod-605 els. However, since the GNN architecture is a block-box 606 model, it would be difficult to confirm our speculation. 607 Next, the optimised model architectures suggest that using  Table IV), suggesting that frequency-centred brain graphs 616 should be preferred over the full-frequency-range brain graphs.

617
The selection of these frequency bands is not surprising, 618 as they are both well known to be altered in patients with 619 AD [8], [9], [10]. In contrast, the effect of edge-filtering is 620 not so apparent as only the GNN and SVM-NS models use 621 edge-filtering with top-20% and MST-3, respectively. On the 622 other hand, CNN and SVM-AM use unfiltered brain graphs. 623 We expect that a sparse graph is preferable for GNN since 624 there are fewer messages to aggregate while updating the 625 node embeddings. These messages are also less likely to be 626 a product of false-positive brain interaction, thus leading to 627 better node and graph embeddings.  (Table III). It could be argued that the 633 GNN uses only the topological information provided by the 634 graph structure to enable message-passing, but the FC is not 635 fully reflected in the node embeddings and graph embeddings, 636 by extension. Nevertheless, we believe that the FC information 637 is utilised to some extent by the GNNs since these models 638 perform better than the GNN-euclid, which arguably utilises 639 merely the topological information (Table II). However, the 640 extent to which the information provided by the FC measures 641 is contained within the learned graph embedding remains 642 unclear. One can merely speculate without introducing an 643 additional mechanism into the GNN architecture, which is 644 beyond the scope of this paper. GNN is an effective model for learning on graph-structured 656 data, such as FC-EEG brain graphs. However, in the absence 657 of consent about the ideal FC measure for estimating EEG 658 brain graphs, the effect of an FC measure on the performance 659 of GNN classifiers is unclear. In this paper, we have selected 660 eight common FC measures to investigate this effect. 661 First, we demonstrated that GNN models are superior to 662 classical machine learning and CNN models for brain graph 663 classification. Unfortunately, the utilised GNN architecture 664 is a black-box model. Thus, future work should focus on 665 implementing interpretable GNN architectures that achieve 666 similar performance but additionally offer interpretability, such 667 as which nodes, i.e. brain regions, drive the prediction. Besides 668 providing an opportunity for experts to validate such models, 669 interpretable predictions might also serve in the development 670 of GNN-informed targeted treatment.

671
Finally, we showed that utilising FC measures to define 672 the brain graph results in improved performance of GNN 673 models compared to a fixed graph structure (i.e. the Euclidean 674 distance between EEG electrodes). While using an FC measure 675 improves the performance, no concrete FC measure can be 676 recommended as the ideal choice. Thus, in future research, the 677 choice of suitable FC measure should be carefully evaluated 678 in the context of the given research question. Alternatively, 679 focusing on fusion methods might lead to developing a novel 680 composite measure of FC.

ACKNOWLEDGMENT 682
The views expressed are those of the author(s) and not 683 necessarily those of the NHS, the NIHR or the Department 684 of Health.