EAG-RS: A Novel Explainability-guided ROI-Selection Framework for ASD Diagnosis via Inter-regional Relation Learning

Deep learning models based on resting-state functional magnetic resonance imaging (rs-fMRI) have been widely used to diagnose brain diseases, particularly autism spectrum disorder (ASD). Existing studies have leveraged the functional connectivity (FC) of rs-fMRI, achieving notable classification performance. However, they have significant limitations, including the lack of adequate information while using linear low-order FC as inputs to the model, not considering individual characteristics (i.e., different symptoms or varying stages of severity) among patients with ASD, and the non-explainability of the decision process. To cover these limitations, we propose a novel explainability-guided region of interest (ROI) selection (EAG-RS) framework that identifies non-linear high-order functional associations among brain regions by leveraging an explainable artificial intelligence technique and selects class-discriminative regions for brain disease identification. The proposed framework includes three steps: (i) inter-regional relation learning to estimate non-linear relations through random seed-based network masking, (ii) explainable connection-wise relevance score estimation to explore high-order relations between functional connections, and (iii) non-linear high-order FC-based diagnosis-informative ROI selection and classifier learning to identify ASD. We validated the effectiveness of our proposed method by conducting experiments using the Autism Brain Imaging Database Exchange (ABIDE) dataset, demonstrating that the proposed method outperforms other comparative methods in terms of various evaluation metrics. Furthermore, we qualitatively analyzed the selected ROIs and identified ASD subtypes linked to previous neuroscientific studies.

A utism spectrum disorder (ASD) is a neurological disabil- ity associated with brain development.Patients with ASD experience social communication and interaction difficulties in multiple contexts, and exhibit limited or repetitive behavioral patterns, interests, or activities [1].Although patients with ASD incur considerable average medical expenses over their lifetime (e.g., at least one million dollars per patient [2]), accurate clinical curative treatments are not available, forcing them to suffer from lifelong illnesses [3].Therefore, it is crucial to identify the emergence of the disease as early as possible for accurate treatment [4].
Over the past few decades, several approaches based on resting-state functional magnetic resonance imaging (rs-fMRI) have been proposed to diagnose various brain diseases, including ASD [5], schizophrenia [6], and Alzheimer's disease [7].Rs-fMRI is a non-invasive technique that identifies spatiotemporal scales of regional brain activation by measuring blood-oxygen-level-dependent (BOLD) signals [8].Most existing rs-fMRI studies utilize raw time signals [9]- [11] or low-order brain functional connectivity (FC) [12], [13].FC is typically constructed based on the temporal correlation between spatially remote brain regions-regions of interest (ROIs)-in a statistical manner [14], [15].Therefore, FC not only provides information about functional communication in the human brain [16] but also employs it as a biotype for the disease diagnosis [17].
With recent advances in deep learning (DL), brain disease diagnostic methods based on rs-fMRI have garnered significant attention in neuroimaging research.Raw time signals of rs-fMRI have been used as inputs in recurrent neural networks (RNNs) [9], graph convolutional neural networks (GCNs) [18], etc, and FC has been used as input in a variant of auto-encoder (AE) architecture [19]- [25] to develop methods for diagnosing brain diseases.In addition, several approaches have utilized more discriminative features for improving diagnostic performance [21], [23], [26], [27].Feature selection (FS) methods involve (i) ranking-based methods, in which all features are ranked and then curated based on their specific criteria [21], [26], and (ii) subset-based methods, which select features by optimizing a definite objective function [23], [27].Moreover, FS methods help explore the pathology of brain diseases by considering the selected features as biomarkers [28].
Although these diagnostic methods exhibit remarkable classification performance, they continue to suffer from certain limitations.Firstly, most existing methods use (partial) Pearson's correlation as their input FC, which represents the linear correlation of brain regions as connectivity strength and contains low-order information within brain regions or voxels.However, merely considering low-order information is not sufficient for capturing subtle changes in signal between normal and patient groups [29].[30], [31] proposed a method for constructing high-order FC networks based on the similarity between the topographical profiles of pairs of FCs, which is referred to as "correlation's correlation".In contrast, existing studies on FC typically calculate low-order or high-order FCs separately.However, pairs of FC levels may exhibit intriguing relationships or functional associations.In this context, [32] integrated three types of FCs, encompassing low-order FC, high-order FC, and the inter-level associated FC, and proposed a hybrid high-order FC network for brain disease diagnosis tasks.Their method exhibited higher accuracy than methods based on a single type of FC.Based on the findings reported in prior studies, our study aims to enhance diagnostic performance by leveraging a combination of low-order FC and high-order features.
Ranking-based FS approaches focus on single levels of contribution, and therefore, do not consider complementary information between multiple features [28].On the other hand, subset-based FS methods investigate the importance of various groups of features simultaneously and do not consider the individual characteristics of patients with ASD.Finally, a few other FS methods use refined inputs and ignore localglobal structural information in terms of the entire population, making their decision-making processes difficult to explain.In practice, explainability is vital in the medical field (especially in neuroimaging) to improve reliability.
To address the aforementioned issues, we propose a novel explainability-guided ROI selection (EAG-RS) framework that selects informative features dynamically at the ROI-level for brain disease diagnosis.To this end, we estimate high-order information of FC based on a high-level representation obtained from the layer-wise relevance maps.We leverage the estimated information in conjunction with low-order information for ASD diagnosis learning.Further, prior studies [19], [24] have primarily focused on learning low-level inter-regional FC relationships based on computer vision tasks rather than from a neuroscientific perspective.Our earlier work [25] introduced a novel approach from a neuroscientific perspective to complement these limitations, emphasizing brain regionlevel considerations.We designed and used random ROI-level masking to facilitate robust and expressive feature learning.Given an ROI-masked FC, a stacked AE (SAE) inherently learns non-linear relations among remaining ROI connections to reconstruct or infer masked connections.Following model training, connection-wise relevance score estimation is performed based on the pre-trained SAE with layer-wise relevance propagation (LRP) [33] to explore the high-order relations between functional connections.The LRP transmits the output of the trained network back to the input level using a decomposition rule, which enables the identification of input connection features that contribute to the restoration of masked connections either positively or negatively.Thus, we estimate non-linear high-order relations among seed-based networks (i.e., ROIs) in FC via the trained inter-regional nonlinear relational learning model.Finally, given the estimated non-linear high-order FC, the pre-trained encoder and classifier are trained to discover ASD-informative ROIs at the sample level.
The proposed method is verified to exhibit superior classification performance than comparative methods on the publicly available Autism Brain Imaging Database Exchange (ABIDE) dataset [34].The impacts of individual components of the proposed method are estimated using ablation studies and posthoc analysis is performed to identify ASD subtypes.The main contributions of our study can be summarized as follows: • We propose a novel method to derive the high-order information of FC using a connection-wise relevance score estimation module between each masked seed ROI and other neighboring ROIs.• Two types of representative vectors (i.e., mean and count) statistically estimated based on the connection-wise relevance score, contribute to the selection of disease-relative ROIs at the individual level.• Our proposed framework achieves state-of-the-art diagnosis performance on the ABIDE dataset.The neuroscientific analysis is also conducted using our framework.This study is an extension of our previous work [25], in which we introduced a self-supervised learning framework that considers inter-regional non-linear relations for rs-fMRI.In this study, the proposed framework is supplemented using connection-wise relevance score estimation and a diagnosisinformative ROI selection network, thereby improving its ASD diagnosis performance.We further conducted an ablation study to verify the capabilities of the constituent modules in our proposed framework.In addition, we utilized selected ROIs at the sample level for cluster subtyping of autism and analyzed them to acquire neuroscientifically reliable results by performing group-wise ROI comparisons.

A. DL-based Brain Disease Diagnosis Methods in rs-fMRI
In recent decades, DL has been utilized to diagnose brain diseases using rs-fMRI [9], [11], [18], [19], [21]- [23], [35].In [9] and [11], diagnosis tasks were performed using the raw time signals of rs-fMRI.In particular, [9] first trained a long short-term memory network (LSTM) to capture greater amounts of temporal information.[11] focused on extracting more disease-relevant intermediate features by combining a self-attention mechanism with mutual information maximization.Another approach is to use the FC of the rs-fMRI as the input for DNNs.For example, [19] and [22] learned the hidden representations of FC by training AE variants and using them for classification.
Several studies have demonstrated that the FS-based methods enhance classification performances.[21] proposed the DNN-FS method, in which the authors ranked outputs of multiple stacked sparse AEs in terms of their Fisher scores to determine more discriminative features.[23] devised support Fig. 1.Overview of the EAG-RS framework comprising a three-step learning strategy: i) inter-regional relation learning using seed (ROI)based network masking that generates masked input FCs for ROIs, ii) estimating connection-wise relevance scores via LRP to investigate highorder information between functional connections, and iii) extracting ROI-level representative vectors from the estimated relevance scores for simultaneous diagnosis-informative ROI selection and brain disease diagnosis.Slashed modules (i.e., pre-trained SAE) represent modules with frozen parameters (i.e., fixed parameters).Given a masked FC, the SAE is trained to discern non-linear relations among low-order FC connections.Henceforth, we estimate connection-wise relevance scores using the pre-trained SAE combined with LRP.These scores signify the importance of a given connection in restoring other connections.Lastly, we integrate low-order and high-order FCs to individually select informative ROIs for ASD diagnosis.
vector machine recursive feature elimination (SVM-RFECV).Further, [35] selected features based on a combination of elastic net and manifold regularization, referred to as MTFS-EM.
However, existing diagnosis methods result in suboptimal performance because of their single-level contribution to loworder information, without considering individual characteristics.In contrast, our proposed method presumes the highorder FC information dynamically and explores more discriminative features by incorporating estimations and explainable relevance maps.

B. Explainability for fMRI
Deep neural networks (NNs) suffer from a black-box problem, wherein their outcomes cannot be easily explained because of complex non-linear mechanisms.To address this issue, various explainable artificial intelligence (XAI) methods have been applied to neuroimaging models [36]- [39].Among these, LRP is widely used to identify group-discriminative features/patterns by quantifying their relevance to the model outcome [25], [36], [38].For example, [38] analyzed disease classification results by searching for the most groupdiscriminative patterns using LRP.
Besides providing neuroscientific explanations, LRP can also be used to construct novel FC-determination techniques.For example, a novel brain connectivity measurement based on a trained network with BOLD signals was reported using LRP [40].The objective of this paper was to explain the relevance of one region in influencing another in a regression task using NNs.
Similar to [40], we explore brain connectivity using LRP in this paper.The main differences between [40] and our study are as follows: (i) We use a seed-based network mask as the input of the FC network, instead of BOLD signals, which include self-seed-ROI influences.Therefore, we focus on the inter-regional non-linear relationships to estimate the contributions of neighbors inherently restoring a seed-based network.(ii) We apply LRP results to explore high-order relations between functional connections using connectionwise relevance score estimation and leverage them to select ASD-discriminative ROIs dynamically during training.This is unlike any technique in any previously published LRP-based study.

III. METHODS
The overall framework of the proposed EAG-RS is illustrated in Fig 1 .It comprises three phases; i) inter-regional relation learning, ii) connection-wise relevance score estimation via LRP, and iii) brain disease diagnosis based on diagnosisinformative ROI selection.Given an FC randomly masked at the seed-based network level, the SAE is trained to learn a feature representation that reflects non-linear relations between low-order FC connections.The proposed framework estimates the connection-wise relevance score, which represents the relevance of a connection to the restoration of other connections using the trained SAE model and LRP.This framework helps analyze FCs between spatially distinct regions.Then, ASDinformative ROIs are selected at an individual level based on statistical measures of the relevance scores (i.e., mean and count).Finally, we diagnose ASD by considering only ASDinformative ROIs in FC.

A. ROI connection masking
Given an input FC matrix X ∈ R R×R , where R denotes the number of ROIs, we generate a mask matrix M at the ROI-level, where the set R q includes randomly selected ROI indexes based on a q-ratio.The mask matrix M ∈ 1 R×R is updated using the following rule: M(i, :) = 0 and M(:, i) = 0 for i ∈ R q , where (i, :) and (:, i) denote the elements of the i-th row and any column, and any row and the i-th column, respectively.Then, the masked input FC matrix X is obtained via element-wise multiplication of the input FC matrix X with the mask matrix M, and it is denoted by X = M ⊙ X, where ⊙ represents an element-wise multiplication operator.The masked FC matrix X is flattened to a one-dimensional vector X ∈ R D , where D indicates the number of elements in the upper triangle of the FC matrix without diagonal elements (D = R × (R − 1)/2).In this procedure, masks are generated arbitrarily for each input during each iteration.This enables augmentation of the training samples and helps in learning robust and enriched feature representations, thereby preventing overfitting [41].

B. Inter-regional relation learning
In contrast to our previous work [25], in this paper, we focus on inter-regional relation learning without considering classification to be the first step for examining high-order information of FCs.Initially, the flatten-masked FC X is embedded into a hidden space with h 1 ∈ R D1 , as follows: where E 1 denotes the first layer of the encoder with a weight matrix W 1 ∈ R D1×D and a bias vector b 1 ∈ R D1 ; and σ denotes the activation function.The first hidden represented features h 1 is trained to reconstruct the original FCs X using the corresponding layer of the decoder (i.e., generator) = X by minimizing the reconstruction loss, as follows: where N and X represent the total numbers of training and reconstructed samples, respectively, which are outputs of G 1 ; and W ′ 1 ∈ R D×D1 and b ′ 1 ∈ R D denote a weight matrix and a bias vector, respectively.Note that Θ E and Θ G are learnable parameters; thus, During the training process, to learn high-level feature representations of FCs, we sequentially train the encoder (E ℓ ) and generator (G ℓ ) pair for each layer ℓ ∈ {2, . . ., L} in the network.We freeze the previous layer(s) of both the encoder and the generator while estimating the ℓ-th level representation of the FC input, h ℓ , which corresponds to the (ℓ+1)-th level non-linear relations among ROIs.Subsequently, this is transmitted to E ℓ .The encoder E ℓ and the subsequent generator trained by minimizing the sum of the reconstruction losses of {h ℓ } L ℓ and X.In this regard, the proposed SAE is trained to reconstruct the removed connections as well as estimate the remaining connections based on relations inherently present in the neighboring connections.In this step, the proposed model learns an inter-regional non-linear representation that encompasses firstorder connections of rs-fMRI as well as high-level relations among ROIs.

C. Connection-wise relevance score estimation
After training the SAE using the process stated in Section III.B, we utilize the LRP technique to estimate connectionwise relevance scores in the pre-trained SAE.The relevance score represents the influence of each connection on other connections.The LRP traces back from the final output layer to the input connection layer to calculate these scores.Specifically, we define the relevance score S ℓ+1 j , which represents a hidden unit j in the (ℓ + 1)-th layer.The relevance score S ℓ+1 j is determined based on the contribution of all hidden units in the ℓ-th layer that affect the activation of the hidden unit j in the subsequent layer ℓ+1.This ensures that the total relevance per layer is conserved [29] as i S ℓ,ℓ+1 i←j = S ℓ+1 j .Given the original FC matrix X, which includes a set of R seed-based networks, we generate a masked FC X by removing one of the seed-based networks (i.e., ROI), resulting in a set of (R − 1) seed-based networks.To achieve this, the r-th seed-based networks are masked in the sequence of ROI indexes.Subsequently, the masked FC matrix is transmitted to the pre-trained SAE to reconstruct the original FC matrix, denoted by X. Via this reconstruction process, the masked FC reconstructs the masked seed-based network based on the remaining (R − 1) non-masked seed-based networks, as follows: where E and G represent the encoding and decoding layers of the pre-trained SAE, respectively.This process is repeated R times for each seed-based network in the FC, resulting in X = {x ⊤ r } r=1,...,R .Subsequently, we use the LRP technique to estimate the relevance scores, representing the contributions of other connections to the masked seed-based networks.
The reconstructed FC X = {x ⊤ 1 , . . ., x⊤ r , . . ., x⊤ R } is utilized to estimate the connection-wise relevance score S = {s 1 , . . ., s r , . . ., s R } via LRP.The relevance score represents the contributions of non-masked neighboring ROIs in restoring masked seed-ROI connections.To obtain this, we define the ϕ(•) function, which assigns a value of 0 to masked regions (i.e., the i-th row) and applies LRP to the remaining nonmasked regions, as follows where i and j represent the corresponding ROI indexes and dim(0) = 1 × R, respectively.The original dimension of the LRP outcomes, obtained when the entire FC matrix is provided as input to the pre-trained SAE without masking, is R × R.However, since we mask the i-th seed-based networks before transmitting them to the pre-trained SAE, the masked regions in the LRP outcomes can be disregarded.Therefore, the dimension of LRP(x ⊤ i,j ) in Eq. ( 4) is (R − 1) × R.  5) The connection-wise relevance score s r , where r ∈ {1, . . ., R} for reconstructing the r-th ROI based on other connections, is obtained using the ϕ(•) function and defined as where indicates a concatenation operator and dim(s r ) = R×R×R.Note that the self-connections corresponding to the seed-ROI are excluded during the calculation of the relevance score.To estimate the local contributions of the r-th ROI, we simply aggregate the relevance scores for various connections [42].This can be done using the following equation where s ′ r ∈ R R×R denotes the aggregated relevance score.Moreover, a global explanation can be represented by aggregating all the connections from the perspective of the r-th ROI.
By increasing the order of ROI indexes r from 1 to R as given by Eq. ( 5) and ( 6), we obtain the global explanation set S, as follows: The detailed procedure is outlined in Algorithm 1. Finally, the ROI selection network takes the mean of the set S as the input.

D. ROI selection network and diagnostic classifier
Algorithm 2: Formulating ROI-level vectors input : Dataset {X, S, Y} output: We perform statistical analysis to distinguish individual impacts and identify the most important effects [42].Therefore, given the averaged relevance scores S ∈ R R×R , we reformulate the ROI-level representative vectors (i.e., f v ∈ R R×1 and f c ∈ R R×1 ) using statistical measures such as mean and count, as referenced in Algorithm 2.
Subsequently, these vectors are concatenated channel-wise as f = [f v ∥f c ] ∈ R R×2 , and transmitted into the ROI selection network ψ.The joint training of the ROI selection network and classifier reveals discriminative features for diagnosis.To maintain the information corresponding to each ROI, we use a convolutional layer (Conv1D) with a learnable 2 × 1 kernel, a stride of one in each dimension, and zero padding.Based on these configurations, we define the ROI selection network as follows: where W ψ1 , W ψ2 , and W Conv1D denote weight matrices, b ψ1 and b ψ2 denote bias vectors, σ denotes a Rectified Linear Unit (ReLU) activation function, and Θ ψ denotes a learnable parameter.
2) Diagnostic classifier (C): Information of individually selected ROIs, f ∈ R R×1 , is reshaped and multiplied with the original FC X.To this end, we perform the following operation: f = f ⊙ 1 ⊤ , where 1 ⊤ ∈ R 1×R represents a vector of size R containing a single value.In addition, to reflect the symmetrical characteristics of FCs, we perform the following operation: where I denotes the identity matrix.Subsequently, the original FC X and information of individually selected ROIs (f ′ ∈ R R×R ) are element-wise multiplied and simultaneously transmitted to the encoders (E) of the pre-trained SAE and the prediction network (C) for the brain disease diagnosis task.Note that we remove the bias vector corresponding to the encoder layers to retain connections of zero values and prevent it from affecting other connections.The diagnostic classifier is trained to predict the clinical status ŷ by minimizing crossentropy loss, as follows: where N and y denote the numbers of training samples and class labels, respectively.

E. Optimization
The objective function corresponding to each step comprises different losses, and it is given by Step 1 : min Step 3 : min where α is a hyperparameter used to control the ratio between two losses.The parameters of the encoder and generators are optimized by minimizing the combination of the reconstruction losses in Step 1. None of the parameters are updated during relevance score estimation via LRP in Step 2. In Step 3, the proposed model performs a classification task.To this end, we use a cross-entropy loss to train the pre-trained encoder of SAE, an ROI selection network, and a classifier.

A. Dataset & Pre-processing
We use pre-processed rs-fMRI data collected from the publicly available ABIDE 1 dataset [34].The ABIDE dataset includes previously collected structural MRI, rs-fMRI, and phenotypic data for use by the broader scientific community.It consists of 1, 112 subjects, including 539 from individuals with ASD and 573 corresponding to typical development (TD) (ages 7-64 years, median 14.7 years across groups) from 17 international sites 2 .The ROIs fMRI series of all sites are downloaded from the pre-processed ABIDE dataset with a configurable pipeline for the analysis of connectomes (CPAC), band-pass filtering (0.01 − 0.1Hz), and no global signal regression, and it is parcellated using the Harvard-Oxford (HO) atlas.After downloading the pre-processed data, 110 ROIs are acquired using the HO atlas.At this stage, samples with missing filenames and incomplete data are excluded; and the remaining 880 samples across 17 international sites are utilized, which include 418 ASD subjects and 478 TD subjects.The Pearson correlation coefficient is used to estimate FC.

B. Experimental Settings
To ensure a fair comparison, stratified five-fold crossvalidation is conducted, where one fold is used for the validation set, another for the test set, and the remaining folds for the training set comprising all samples in the ABIDE dataset.Average performance is estimated in terms of the area under the receiver operating characteristic curve (AUC), accuracy (ACC), sensitivity (SEN), and specificity (SPEC).All proposed methods as well as competing methods are implemented using PyTorch and trained using a Titan RTX GPU on Ubuntu 18.04.All codes used in the experiments are available in a repository 3 .
1) Training Settings: In the proposed SAE architecture, the encoder E comprises two fully-connected layers (L = 2) with the units of {9000, 1800}.The generator, G, comprises two fully-connected layers with a reverse number of hidden units from the encoder.For the non-linear activation function (σ), the scaled exponential linear unit (SELU) is used for only the first intermediate layer in the encoder, and hyperbolic tangent (Tanh) is used for the remaining layers.The diagnosis-informative ROI selection network ψ comprises one Conv1D layer and three fully-connected layers with units of {512, 1650, 110}.The classifier C comprises two fullyconnected layers with {10, 2} hidden units.The ReLU activation function is used for all intermediate layers.In the meantime, we set the sigmoid and softmax functions as the activation functions of the last layers of ψ and C, respectively.
In Step 1, 10% of the ROIs (q = 0.1) are randomly masked during every training iteration and the SAE is trained using Adam optimizer [43] with a learning rate of 10 −3 and a minibatch size of 50 over 300 epochs.In addition, ℓ 2 regularization is applied with a coefficient of 5 × 10 −5 .We set α to be 0.5.All trainable parameters in Step 3 are optimized using the same settings except for the learning rate (10 −4 ).The Gumbelsoftmax temperature is set to 0.01.Note that we take a grid search strategy for hyperparameter selection and select the best parameters based on the validation set results.
2) Competing Methods: The following six comparative methods are considered to evaluate the proposed method.First, a basic AE, dAE [44], and SAE are trained without any masking methods; they share the same architecture as that of EAG-RS.Further, EAG-RS is compared with the AE with M, and SAE with Gaussian noise [45].Henceforth, we denote these two baselines by AE (M) and SAE (G).To validate the effectiveness of ROI-level masking, the SAE is trained using random FC connection masking, SAE (FC-M), inspired by [46].Additionally, we demonstrate the statistical significance between our proposed EAG-RS and competing methods based on McNemar's test [47].We also compare EAG-RS with other simple feature selection methods, including rankingbased approaches, such as the t-test (p < 0.05) [26] and recursive feature elimination (RFE) [48], as well as the subsetbased approach LASSO [27].In the case of these three methods, we utilize a linear SVM, which is a commonly used classifier in brain disease diagnosis [23].Here, we adopt the hyperparameter C for SVM and λ for LASSO in the  sets of {10 −3 , 10 −2 , . . ., 10 3 } and {0.001, 0.002, . . ., 0.01}, respectively.In the case of RFE-SVM, RFE iteratively assigns SVM weights to each feature based on its importance to the brain disease diagnosis by eliminating the least informative and redundant features.Further, ASD-DiagNet [22], which is a state-of-the-art approach that employs an autoencoder-based architecture with joint single-layer perception (SLP) training, is also considered.For feature selection in ASD-DiagNet, the 1/4 largest and 1/4 smallest Pearson's correlation values are used as input features based on the training data.The hyperparameter ranges for our experiments are derived from the values reported in [22].
In addition, we re-implement and compare the results of the state-of-the-art methods.First, we select BrainNetCNN [49], a convolutional neural network (CNN)-based model comprising edge-to-edge, edge-to-node, and node-to-graph convolutional filters, thereby utilizing the topological locality of brain network structures.Next, BrainGNN [50] is considered, which uses FC as a node feature and selects the top 10% positive partial correlations as edge features.The architecture of BrainGNN consists of ROI-aware graph convolutional layers and ROI-selection pooling layers, along with a regularization loss term that softens the distribution of the node pooling scores, facilitating the prediction of neurological biomarkers.Finally, BrainNetTF [51] is also considered.It exhibits a transformer-based architecture with an orthonormal clustering readout function that accounts for the similarity of ROIs within functional modules underlying brain regions.The hyperparameter configurations reported in our manuscript are adopted for each comparative method.

1) AE-based classification results:
The comparative methods, such as AE, AE (M), and dAE, use different masking methods and are trained in an end-to-end manner.On the other hand, the SAE-based methods adopt greedy layer-wise training strategies [52], and their masking configurations are different from those of the AE-based methods.Experimental results are summarized in Table I.SAE without a mask (SAE) is observed to outperform AE-based methods in terms of all metrics except for specificity.Meanwhile, SAE (G) outperforms SAE in terms of all metrics.In addition, SAE (FC-M) outperforms all competing methods in terms of AUC, ACC, and SPEC, but not sensitivity (SEN).Importantly, SAE with a random seed-based network mask (M), which was proposed in our previous work [25] and is referred to as Baseline in this paper, is observed to outperform SAE (FC-M) in terms of all metrics.However, it did not outperform comparative methods in terms of SEN adequately.Finally, our proposed EAG-RS is observed to outperform all competing methods in terms of all metrics.
2) Comparison of feature selection performance: Table II indicates that the subset-based method (i.e., LASSO) outperforms the ranking-based methods (i.e., t-test and RFE).However, ASD-DiagNet performed better than conventional FS approaches, except in terms of SEN.Although comparative FS methods are used to select important features and remove redundant ones to improve classification performance, their performances are still lower than the proposed method without feature selection (EAG-RS w/o ψ).The proposed method with FS module (EAG-RS) outperformed all other methods in terms of all metrics.Based on these promising results, we conclude that the steps adopted in the proposed EAG-RS play pivotal We further report and compare the classification results obtained from state-of-the-art methods on the ABIDE dataset to demonstrate the superiority of our proposed EAG-RS, as illustrated in Table III and Table IV.In Table IV, a fair comparison is ensured by re-implementing all the methods using the same experimental configurations as those used in our study.

A. The ratio of random ROI-level masking
First, the ratio of ROI-level masking is varied from 0 to 0.9 at intervals of 0.1.The corresponding performance results are presented in Fig. 4. When the ROI-level masking is q = 0.1 and q = 0.2, the performance is better than that obtained without ROI-level masking, indicating a positive influence of ROI-level masking on diagnostic performance.For this reason, q = 0.1 is selected for subsequent experiments, as it corresponds to the highest performance quality.

B. Ablation Study
We conduct additional experiments to validate the effectiveness of the proposed framework.We estimate features via six ablation cases in the context of a classification task.In Case I, the ROI selection network is removed and only FC features are used to classify the brain diseases using the multi-layer perceptron (MLP) (Case I = Baseline).In Case II, the ROI selection network is removed and an estimated ROI representative vector, f v , is used to classify brain diseases.Under this setting, the number of dimensions is different from that in the original FC; thus, brain diseases are classified using an SVM.In Case III, the other estimated ROI representative vector, f c , is used for classification.In Case IV, the original FC is used and the two ROI representative vectors (i.e., f = [f v ||f c ]) are concatenated.Henceforth, we use an ROI selection network.Therefore, the original FC is implemented along with f v (Case V), f c (Case VI), and the concatenation of the two ROI representative vectors, f (EAG-RS).
As reported in Table V, the proposed framework achieve the best diagnostic performance among all ablation cases.Cases without an ROI selection network (Case I, II, and III) are observed to exhibit lower classification performance.When independent representative vectors, f v (Case II) and f c (Case III), are used, slightly lower performance than that of the original FC (Case I) is observed.Therefore, the features are combined using concatenation to confirm the effectiveness of the representative features (Case IV), which improves the performance.However, the performance is observed to be degraded when each representative vector is used with an ROI selection network (Case V and VI).Thus, the count and value information aid the extraction of diagnosis-informative ROI information, which improves classification performance by removing redundant and irrelevant features of the original FC.

C. Analysis of ROI Selection Network
The best performing model on the validation dataset is analyzed further.On average, 22 ROIs (median, 23) are selected for the ASD group, and 40 ROIs (median, 43) from the TD group.To visualize the selection ratio (SR) of each ROI, the ROIs selected from the ψ module are enumerated and the total number is divided by the number of ROIs and the number of subjects in each group (Fig. 3).In particular, ROIs with SR exceeding 0.5 as considered (Fig. 3(a)).These ROIs are further categorized into two cases: i) 0.5 < SR < 0.75 (Fig. 3(b)) and ii) SR > 0.75 (Fig. 3(c)).The majority of selected brain regions are associated with ASD.As depicted in Fig. 3(b), 12 brain regions lie on the left side of the red vertical line with slightly higher SR in the TD group, while eight regions lie on the right side of the vertical line with higher SR in the ASD group.In Fig. 3(c), five brain regions lie on the left side of the vertical line with slightly higher SR in the TD group, while nine regions lie on the middle side of the vertical line with higher SR in the ASD group.The SR patterns reveal that 12 specific brain regions are always selected, with six brain regions on the right side of the vertical line (Fig. 3(c)) chosen consistently in both the TD and ASD groups.In the TD group, the "l.PHIPpos" region is selected 100% of the time, while in the ASD group, five specific brain regions ("r.CAU," "r.LIN," "l.ITGpos," "r.STGant," and "r.LOCinf") are chosen consistently.
Group analysis is performed based on these selected ROIs to perform a neuroscientific analysis of ASD and TD groups at the ROI level.Subsequently, we identify nine brain regions with significantly different selection frequency between the two groups by measuring the difference (¡0.05).Remarkably, the proposed framework, trained without prior  knowledge, is observed to identify brain regions highly related to existing neuroscience studies.Specifically, it identifies the following brain regions: 'l.SupraCAL,' 'r.PHIPant,' 'r.TFUSant,' 'r.ITGant,' 'r.PHIPpos,' 'l.ParaCG,' 'r.IFGtriang,' 'r.PUT,' 'l.POper.'These regions are marked with asterisks (*) in Fig. 3(b).Notably, the left supracalcarine cortex (l.SupraCAL), situated in the visual cortex, is involved in various visual processes such as discerning object shape, size, and color, and motion perception [54].Similarly, the right inferior temporal gyrus (r.ITGant) plays a similar role [55].In addition, the right temporal fusiform cortex (r.TFUSant) in the temporal lobe is responsible for facial processing and recognition.It distinguishes facial features and supports social interactions and recognition skills [56].The right parahippocampal gyrus (r.PHIPant and r.PHIPpos) in the medial temporal lobe is vital to memory, spatial navigation, and emotional processing [57].The left paracingulate gyrus (l.ParaCG), which is a part of the cingulate cortex, contributes to various cognitive and emotional functions.The right putamen (r.PUT), a basal ganglia structure, participates in motor control, procedural learning, reward processing, attention, and cognition.The left parietal operculum cortex (l.POper) has broad involve- ment in somatosensory processing, language, and multisensory integration, supporting sensory perception, communication, and body awareness.Finally, the right inferior frontal gyrus triangular (r.IFGtriang) region contains Broca's area, essential for language processing like production, comprehension, and sophisticated inhibitory control.This region's crucial role in language-related functions and cognitive processes is welldocumented [58].This confirmation of the biological relevance and interpretability of our findings further supports the conclusion that it is effective in identifying important brain regions associated with target disorders.

D. Clustering Subtypes in Autism Spectrum Disorder
ASD is known for its diverse and heterogeneous nature, with various characteristics associated with ASD-related brain regions, and varying symptom severity levels and comorbidities [59].As a result, several previous research projects in ASD have aimed to identify distinct behavioral subtypes within ASD populations.In this study, we consider such heterogeneity among ASD groups during ROI selection.To  explore this heterogeneity further, subtype analysis is performed using hierarchical clustering with ward linkage, a commonly used method in neuroimaging.The results reveal three subtypes of ASD (Fig. 4), each with unique characteristics and functional connectivity patterns.Further, we investigate demographic information, such as age and gender, of each identified ASD subtype using statistical tests (Table VI).In Table VI, the average age of TD individuals is observed to be 15.5 years.Among the clustered ASD subtypes, ASD1 exhibits an average age of 12.9 years; ASD2, 15.7 years; and ASD3, 15.2 years.In the case of gender distribution, the TD group exhibits a female proportion of 23.1%.For the identified ASD subtypes, ASD1 exhibits a female proportion of 18.2%; ASD2, 7.1%; and ASD3, 12.5%.Further, in terms of Full-Scale Intelligence Quotient (FIQ), Verbal Intelligence Quotient (VIQ), and Performance Intelligence Quotient (PIQ), TD individuals exhibit average scores of 113.2, 114.1, and 109.6,respectively.In comparison, the ASD1 subtype exhibits average scores of 98.0 (FIQ), 101.1 (VIQ), and 98.6 (PIQ), respectively; ASD2, 99.6 (FIQ), 99.9 (VIQ), and 100.4 (PIQ), respectively; and ASD3, 94.8 (FIQ), 96.7 (VIQ), and 92.8 (PIQ), respectively.This analysis provides deep insight into the potential relationships between the identified subtypes and various demographic factors, as well as their associations with psychological and educational assessments.
Based on our analysis of the selected ROIs depicted in Fig. 3(b), they are sorted based on their SRs for each subtype of ASD.In addition, the significant regions associated with ASD subtypes are visualized in Fig. 5(b).Two common regions are observed-the right posterior parahippocampus (r.PHIPpos) and right temporal occipital fusiform cortex (r.TOFus).These are known to be associated with ASD, as depicted in Fig. 5(a) and Fig 5(b).Interestingly, as presented in Fig. 5, we observed different SRs for different ASD subtypes in these regions.Finally, to provide a more intuitive understanding, the SRs corresponding to each ASD subtype are mapped onto 3D brain images and visualized in Fig. 6.In Fig. 6, the brain regions are represented by circular markers, and the color of each marker corresponds to the range of SR values for that specific region.These results provide beneficial insights into ASD subtyping from a neuroscientific viewpoint.The proposed ROI selection network successfully identifies and differentiates specific brain regions associated with different ASD subtypes.We believe that this capability enhances our understanding of the biological basis of the heterogeneity within the ASD population and may have significant implications for the advancement of subtype analysis in autism research.

VI. CONCLUSION
In this work, we propose a novel explainability-guided ROI selection (EAG-RS) framework that dynamically selects informative features at the ROI-level for brain disease diagnosis.Our EAG-RS framework learns inter-regional relationships using random seed-based network masking to estimate nonlinear relationships, representing other neighboring connections to restore masked seed-ROI connections.We also estimated connection-wise relevance scores to explore high-order relationships between FCs using LRP.Finally, we utilized the estimated non-linear high-order FCs to select diagnosisinformative ROIs and diagnose brain disease simultaneously.To demonstrate its validity, ASD diagnosis was performed using the proposed EAG-RS framework on the ABIDE dataset.Furthermore, the cluster subtypes for ASD were identified based on individually selected ROIs.The results demonstrate that our EAG-RS framework provides new neuroscientific insights into ASD subtypes and their biomarkers.
However, this study suffers from the following practical limitations.First, regarding the architectural design, MLPs were used for the encoder-decoder structure in inter-regional relation learning and the ROI selection network.Although MLPs provide flexibility and expressiveness, they involve a high number of tunable parameters.Given the scarcity of neuroimaging data and labels, optimization of such MLP-based architectures may be challenging, requiring strong regularization.To address this limitation, alternative approaches may be explored, e.g., incorporating convolutional neural networks and transformers into each module of the proposed framework.Second, as the sole focus of this study was the use of FC for ASD diagnosis, it did not incorporate other neuroimaging modalities or clinical information.Integrating multiple modalities and clinical data could potentially provide complementary insights and improve the accuracy and robustness of ASD diagnosis.Addressing the abovementioned limitations in the future will contribute to a more comprehensive understanding of the application of FC in ASD diagnosis and enhance the effectiveness and interpretability of the proposed framework.

Fig. 2 .
Fig. 2. Effectiveness of ROI-masking ratio, q, of the proposed framework on the ABIDE dataset.

Fig. 3 .
Fig. 3. (a) Visualization of the selection ratio (SR) of ROIs for each group at an ROI-level.Although we used 110 ROIs are used for this analysis, few had values of zero (i.e., not selected).In addition, we set 0.5 as the threshold to consider the general patterns in SR.(b) The list of brain regions in each group (0.5 ¡ SR ¡ 0.75).The left and right sides of the red vertical line correspond to the TD and ASD groups, respectively.(c) The list of brain regions in each group (SR ¿ 0.75).The left, middle, and right portions of the red vertical line correspond to the TD, ASD, and common groups, respectively.

Fig. 4 .
Fig. 4. The Y-axis represents the similarity between patients, i.e., the shorter the distance, the greater the similarity between patients-the similarity threshold is set at 0.3.The yellow, green, and red lines represent the clustered subtypes of ASD.

Fig. 5 .
Fig. 5. (a) Visualization of the SR of ROIs corresponding to the three subtypes of ASD at ROI-level.The list of brain regions in the ASD group corresponds to 0.5 < SR < 0.75.(b) Statistical test results for three subtypes of ASD analysis using ROIs selected by the ROI selection network.Only statistically significant ROIs are visualized.Note that * , * * represent p < 0.05 and p < 0.01, respectively.(c) The list of common brain regions captured between (a) and (b).

TABLE III COMPARISONS
OF ASD CLASSIFICATION PERFORMANCES OF THE PROPOSED METHOD AND STATE-OF-THE-ART METHODS ON THE ABIDE DATASET.NOTE THAT THE ENTRIES CORRESPONDING TO EACH METHOD ARE BASED ON THE RESULTS REPORTED IN THEIR RESPECTIVE MANUSCRIPT.

TABLE V AVERAGED
CLASSIFICATION PERFORMANCES FOR ASD AND TD IN THE ABLATION STUDY.

TABLE VI CHARACTERISTICS
OF DEMOGRAPHIC, PSYCHOLOGICAL, AND EDUCATIONAL ASSESSMENTS FOR TD AND ASD SUBTYPES.