GK-BSC: Graph Kernel-Based Brain States Construction With Dynamic Brain Networks and Application to Schizophrenia Identification

The dynamic brain network can reflect time-varying changes of the BLOD signal fluctuations, and has been widely used in the research of brain diseases identification. This network consists of a set of connection matrices, where each connection matrix represents the relationship between brain regions under a certain period. Researchers often convert these matrices into vectors, and then use the K-means clustering method to divide these matrices into different brain states according to their vector-based distances. Through analyzing these states, they can identify some brain abnormalities. However, simply using the vector-based distances may lead to two problems: 1) it ignores the topological properties and underlying mechanisms of brain networks, and 2) it never considers individual differences between subjects. Hence, to solve these two problems, we propose a novel method, called GK-BSC, for constructing brain states with dynamic brain networks. Specifically, we first use the graph kernel rather than the vector-based distances to measure the similarities between connection matrices of the dynamic brain network. Then, we aggregate these matrices to generate brain states based on these calculated similarities. This aggregation operation is sustained several times on one subject, whereby each subject is represented by a set of hierarchical brain states. Finally, we extract features from these states, and feed them into the multi-instance support vector machine (MI-SVM) for identifying patients. Experiments on a real schizophrenia dataset suggest that our method not only improves the performance of schizophrenia identification, but also accurately locates the brain abnormalities.


I. INTRODUCTION
Schizophrenia is a challenging mental disorder that usually affects how a person thinks, feels, and behaves [1], [2]. To make a diagnosis, doctors perform physical exams and conduct thorough reviews of a person's medical, psychiatric, and family history [3], [4]. However, it completely relies on the doctor's experience and lacks biomarkers to support the diagnosis. Recent studies have shown that neuropsychiatric disorders are related to the abnormal structure of the brain [5]. Hence, researchers are trying to find some potential neuroimaging biomarkers related to schizophrenia. For example, Liang et al. [6] used the voxel-based morphometry method The associate editor coordinating the review of this manuscript and approving it for publication was György Eigner .
to analyze the diffusion tensor imaging (DTI) data, and found that patients with first-episode schizophrenia had lower white matter volume in the right temporal-occipital region. Leroux et al. [7] jointly analyzed the functional magnetic resonance imaging (fMRI) and DTI, and found that abnormal frontal-temporal pathways in the brain of schizophrenic patients may be one of the factors leading to the disease.
Recently, constructing dynamic brain networks (DBNs) have been suggested to be an effective way for brain disease diagnosis [8]. DBNs are commonly obtained by sliding the time window at a regular interval of typically one step. Compared with the static brain network, these DBNs add a time dimension and reflect the dynamic fluctuation in functional connectivity (FC). Several studies have suggested that these dynamic changes in FC can offer potential biomarkers for characterizing brain diseases [9]. Rabany et al. [10] compared the temporal patterns of three groups (i.e., schizophrenia, autism spectrum disorder (ASD), and healthy controls), and found that these temporal patterns can help distinguish schizophrenia participants from both ASD and healthy individuals. Guo et al. [11] measured the instability of dynamic changes, and found that there is a high degree of instability anchored on the precuneus in patients with schizophrenia compared to healthy controls. These studies provide results that DBNs can be used to identify schizophrenia.
There is an increasing interest in extracting effective features from dynamic brain networks. In the DBN analysis, researchers usually extract these features from two scales, including node measurements and brain states measurements. Specifically, the most commonly used node measurement is the temporal variability [12]. This measurement can be obtained by calculating the average correlation coefficient among all functional connections of the brain regions across different time windows. Sun et al. [13] found that this measurement can be used to reveal differences in spontaneous thought, attention and cognitive control between individuals. Jie et al. [14] extracted these measurements from DBNs, and used these measurements to identify mild cognitive impairment (MCI). However, it is worth noting that such node measurements can only describe the dynamic changes with a fixed scalar, which will inevitably lose dynamic information because the brain is often in different states [15]. The brain states based method can overcome this shortcoming. In this method, connection matrices of DBNs are first clustered into several states according to their similarities. In other words, brain states consist of the continuously evolving dynamics of widespread networks. By analyzing these brain states, researchers can reveal structural changes caused by brain diseases. For example, Damaraju et al. [16] used the K-means clustering to characterize five brain states, and found that patients with schizophrenia show significantly stronger connectivity compared to healthy controls. Bonkhoff et al. [17] found that patients with severe stroke spent significantly more time in highly segregated brain states. Also, these brain states can be used to identify schizophrenia [18].
However, there are two major drawbacks to existing brain states construction methods. Firstly, researchers only use the vector-based distance, which generates from the upper triangular elements on the matrix, to measure the similarity between connection matrices, thus ignoring the topology of the brain network. Generally, brain networks have the topological properties of graphs. Their vertices and connecting edges have their specific representative information, that is, the vertex represents each brain region, and the connected edge represents the relationship between brain regions [19]. Therefore, it is difficult to describe the structural similarity between networks by using the vector-based distance. Secondly, current methods construct brain states on a whole dataset without considering the individual-specific characteristics. In other words, the researchers used the entire dataset to construct a finite number of brain states (usually 10 to 15) to represent the dynamic information of all subjects. Individual differences in brain functional networks are ubiquitous since they have different personal identifiers, including health, age, and ability [20]. Therefore, these brain states can not accurately describe the activity information of each subject, not to mention using these states to identify brain diseases.
To address these problems, we propose a novel brain states construction method, called GK-BSC. The proposed method is implemented for a single subject, and constructs brain states by using the graph kernel method. The framework of the proposed method is shown in Figure 1. Specifically, we first construct DBNs of each subject by using the sliding window correlation. Then, we set a threshold to get sparse DBNs. For these sparse networks of each subject, we use the graph kernel method to obtain the similarity matrix. Based on this similarity matrix, we aggregate the similar connection matrices in the DBNs to generate brain states. This process can be repeated several times, from which we can obtain hierarchical brain states. Finally, we extract features of these brain states for each subject, and integrate feature vectors into bag-level features. These bag-level features can be further fed into the multi-instance classifier to identify schizophrenia.
The main contributions of this paper are as follows: 1) We propose a novel brain states construction method, which uses the graph kernel instead of the vector-based distance to measure the similarity between connection matrices of the DBNs. 2) The brain state construction is implemented for a single subject rather than all subjects. It can generate the unique brain state for each subject, and is more sensitive to capturing abnormal structures caused by brain diseases. 3) We treat brain states of the same subject as the multi-instance data rather than simply combining them. In the framework of multi-instance learning, it is easier to find the key brain states with significant discriminant structures. To the best of our knowledge, our proposed method is the first one that applies the multi-instance learning algorithm to the DBNs analysis. 4) We evaluate the proposed method on a real schizophrenic dataset. The experimental results demonstrate the effectiveness of the proposed approach compared with several state-of-the-art methods.

II. MATERIALS AND PREPROCESSING A. PARTICIPANTS
In this paper, we used the schizophrenia dataset, contributed by the Center for Biomedical Research Excellence (COBRE), to evaluate the proposed method. This dataset comprise fMRI data of 53 schizophrenia patients (11 females and 42 males, mean age: 36.7) and 67 normal controls (21 females and 46 males, mean age: 34.8). There is no significant difference in age (P − value = 0.39) between the two groups. Patients are diagnosed by experienced doctors. All data has been collected with time echo (TE) equal to 29 ms and time repetition (TR) equal to 20 ms. The fMRI brain data is collected from over 150 brain volumes. Each brain volume consists of an acquisition of 32 brain slices with dimension 64 × 64,  where voxel size is equal to 3×3×4 mm 2 . Table 1 reports the demographic and clinical information of the studied subjects.

B. PREPROCESSING
The fMRI data are preprocessed using DPARSF toolbox version 2.0 [21]. This process includes the removal of the first ten image volumes, realignment, slice-timing correction, and spatial normalization into Montreal Neurological Institute (MNI) space. These data reslice to 3mm × 3mm × 3mm voxels, and smooth with a Gaussian kernel (FWHW = 5mm). After the preprocessing, we define 90 nodes of the functional connectivity network for each subject according to the automated anatomical labeling (AAL) template [22].

III. METHOD
Our proposed GK-BSC method consists of three parts, including dynamic brain network construction (described in section III-A), graph kernel based brain states construction (described in section III-B), and brain states based disease classification (described in section III-C).

A. DYNAMIC BRAIN NETWORK CONSTRUCTION
We construct dynamic brain networks by using the sliding window approach, which divides a time series into a number of windows. In particular, given an R-fMRI time series where N is the length of the time series (i.e., the number of the temporal image volumes). The sliding window is rectangular, so the connection weight between the region i and the region j on the static brain network is: then, the connection weight between the region i and the region j on the dynamic brain network is: where m ∈ [0, N −L + 1] is the starting point of the time window, L is the length of the window, m = m + L − 1.
According to the above equations, we can see that the static brain network uses a single value to reflect the connection weight between brain regions, which cannot reflect the dynamic information of connections. On the contrary, the dynamic brain network uses a set of values to characterize time-varying connections generated by the sliding window approach. In this way, we can discover dynamic changes to identify brain diseases.

B. GRAPH KERNEL BASED BRAIN STATES CONSTRUCTION
In this paper, we use the graph kernel method to construct brain states, which consists of two steps: 1) similarity matrix construction and 2) connection matrices aggregation. The similarity matrix is used to measure the structural similarity between connection matrices on each subject. To describe this similarity accurately, we use the graph kernel based method rather than the vector-based distance to calculate this matrix. The aggregation is used to merge connection matrices into brain states according to their similarities. Next, we will introduce these two steps in detail.

1) SIMILARITY MATRIX CONSTRUCTION
Graph kernels have been employed in the brain network analysis [23], [24]. It is worth noting that they are designed for static brain networks, and have not been used in the dynamic brain network analysis. In this paper, we choose a simple graph kernel, called the shortest path kernel, to measure similarities between connection matrices in DBNs. Considering that this graph kernel is designed for unweighted graphs, we first transform the connection matrices into some unweighted sparse graphs. In this paper, we just retain the top 25% connections for each connection matrices according to their weights. For the convenience of representation, these sparse connection matrices are denoted as A (k) = {A (k) [1], A (k) [2], . . . , A (k) [M ]}, where M is the number of connection matrix in each DBNs.
We use the shortest path graph kernel [25] to measure the similarity between these sparse connection matrices. The shortest path graph kernel compares the similarity by counting the number of shortest paths with the same length on two graphs. Therefore, the similarity between connection matrices of each subject can be written as follows: where d(v i , u j ) is the length of the shortest path between node v i and node u j on the graph G, d(v k , u l ) is the length of the shortest path between node v k and node u l on the graph G . When these two shortest paths are equal in length, The value can be obtained by comparing all the shortest paths in two networks. This value measures the similarity between the two graphs. After obtaining the pairwise similarity of M sparse networks in the set of A (k) , we can get the similarity matrix K (k) ∈ R M ×M for each subject.

2) CONNECTION MATRICES AGGREGATION
The purpose of aggregation is to combine similar connection matrices to generate brain states. In the previous step, we generate sparse connection matrices, which can retain important connections while inevitably lose some structural information. Therefore, the aggregation can enhance the ability of brain states to describe structural information. In this paper, we aggregate sparse connection matrices A (k) = {A (k) [1], A (k) [2], . . . , A (k) [M ]} to generate brain states C (k) = {C (k) [1], C (k) [2], . . . , C (k) [H ]}(H ≤ M ) by using a sample additive procedure, which can be described as follows: where α denotes the reduced connection matrix to A[α], and δ , α = 1 when A[α] and other connection matrices are aggregated into brain states C[ ].
In order to obtain the hierarchical representation of brain states, we aggregate two most similar connection matrices at a time. Specifically, we first find the most similar connection matrix for each connection matrix according to the similarity matrix K (k) . Then, we aggregate these matrices in pairs to generate brain states C[ ]. Through this aggregation, the number of connection matrices will be reduced by half. For these brain states, we can continue to obtain the similarity matrix and then perform aggregations until it gets the preset number of aggregations. It is worth noting that, for each aggregation, we only keep the connection information, that is, the brain state is still an unweighted and undirected graph.

C. BRAIN STATES BASED DISEASE CLASSIFICATION
After the above steps, each sample (i.e., subject) can be represented by several brain states. Based on these states, we can identify brain diseases. Specifically, we first extract features from these states. In this paper, we use the clustering coefficient as features of these states, which can be described as follows: where k i is the degree of node i, t i is the number of triangles around the node i. The clustering coefficient measures the closeness between the node and the community to which it belongs, and is a commonly-used measurement. In this way, we can construct a bag-level feature for each sample: (1), cc (k) (2), . . . , cc (k) (p)}. Each bag-level feature has a label (i.e., y(k) ∈ {±1}), corresponding to the healthy or the patient.
After getting features of each subject, we use the multi-instance support vector machine (MI-SVM) [26] to predict the label. The MI-SVM assumes that the classification model is a linear model: f (d) = w φ(d), where φ is a feature map generated by any kernels. The objective of MI-SVM is to minimize the structural risk f : where is a monotonically increasing function, l(·) is a monotonically increasing loss function, η is a regularization parameter. It is worth noting d (k) [ ] = arg max d (k) [l] f (d (k) [l]) can be regarded as the most discriminating instance (i.e., brain state) in the positive sample.

A. METHODS FOR COMPARISON
We first compare the proposed method with four methods, including 1) clustering coefficient based method (denoted as CC), 2) temporal variability based method (denoted as TV) [12], 3) spatial-temporal variability based method (denoted as STV) [14], and 4) group temporal dynamics based method (denoted as GTD) [27]. We also compare our method with two variants of the proposed method, including 5) L2-norm based MI-SVM (denoted as L2 + MISVM), and 6) multi-instance support vector machine based method (denoted as MI-SVM). Now we briefly summarize these methods as follows: 1) CC based method: This method first constructs the static brain network for each subject, and then sets a threshold for these networks according to their weights on edges. For these sparse networks, we extract the clustering coefficient as features of these brain networks. These features will be fed into the SVM to identify patients.
2) TV based method [12]: This method extracts temporal variability as features of dynamic brain networks. In this way, we can construct a feature vector for each subject. These features also can be fed into the SVM to identify patients.
3) STV based method [14]: This method extracts spatial and temporal variability as features of dynamic brain networks. In this way, the feature dimension constructed by this method is twice that of the TV method. For these features, we use the sparse-learning based method to reduce the feature dimension, and then feed these features into the SVM to identify patients. 4) GTD based method [27]: This method first uses the group graphical lasso model to construct brain states, and then extracts the clustering coefficient as features of these brain states. These features will be fed into the SVM to identify patients. 5) L2 + MISVM based method: Compared with our method, this method uses the L2 distance rather than the graph kernel to measure the similarity between connection matrices of the DBNs. The other settings are consistent with our method. 6) MI-SVM based method: This method directly extracts features from the DBNs instead of brain states. It extracts the upper triangular elements of all connection matrices of the DBNs as features, and then uses the MI-SVM for classifications.

B. EXPERIMENT SETTINGS
The datasets are divided into training and test sets by fivefold cross-validation, that is, the sample is divided into five parts of close numbers, four of which are used as training sets, and the remaining one is used as test sets. This process will repeat five times. All experiments take the average result as the final result. The effectiveness of the method can be measured by the following three measurements, including accuracy (ACC), specificity (SPE), and sensitivity (SEN). These measurements are calculated as follows: where TP, TN , FP, and FN refer to the number of subjects who correctly identified patients, correctly identified normal controls, misidentified patients, and misidentified normal controls, respectively. According to these equations, it is easy to see that ACC measures the overall recognition performance of the model, SPE measures the ability to correctly identify normal controls, and SEN measures the ability to correctly identify patients.
In this paper, we set the sliding window length as 20 TR, and then construct 16 connection matrices for each subject (i.e., M = 16). We do the process of network aggregation twice, and generate 4 brain states (i.e., H = 4). For all methods, we use linear support vector machines with default parameters (i.e., C = 2).

C. CLASSIFICATION RESULTS
The experimental results are shown in Table 2 and Figure 2. From Table 2, we can see that our method can achieve the best performance in all measurements. Compared with the best comparison method (i.e., L2 + MISVM), our method improves accuracy, specificity, and sensitivity by 10%, 11%, and 8% respectively. Also, from the receiver operating characteristic (ROC) curves shown in Figure 2, we can see our proposed method consistently outperforms the comparing methods in the classification task. Our proposed method yields the Area Under Curve (AUC) score of 81.05%, while  the best AUC of the competing method is 73.87%. Besides, we can find three interesting observations. First, it is easy to see that the performance of the method based on dynamic information is better than the method based on static information. The accuracy of the CC-based method is only 59%, which is nearly 6% lower than that of the TV-based method (the method used dynamic information). This observation is also consistent with previous research [28]. Second, compared with the method used the traditional measurement of similarity (i.e., L2 distance), the graph kernel used in our method can achieve better performance. Specifically, when using traditional methods, the accuracy of our method will drop by at least 10%. This result also suggests that when measuring the similarity between dynamic brain networks, it is important to consider topological information for improving classification performance. Third, constricting brain states is beneficial to improve the accuracy of identifying patients. Compared with the MI-SVM method, our method can improve the accuracy of the classification by at least 12% by constructing brain states. The possible reason for this result is that brain states can eliminate some redundant information, which can help to find the discriminative information.

D. BRAIN STATES ANALYSIS
To analyze differences in brain states between the schizophrenic and normal controls, we first choose the most discriminative brain state (i.e., the most discriminating instance), and then do statistical tests on these states. The results of these tests are shown in Figure 3. From Figure 3, we can see that most discriminant connections are concentrated in the default mode network. To easily show this result, we also list the top-10 discriminant connections of the brain state in Table 3. From Table 3, we can find these  connections are most related to the core region of the default mode network, such as calcarine and amygdala. The changes of connections in the default network have been proven to be closely related to schizophrenia [29], so this result can also suggest that the brain states constructed by our method have the ability to identify schizophrenics.

A. COMPARISON WITH PREVIOUS STUDIES
In this paper, we propose a novel brain states construction method, and apply this method to identify schizophrenic. In general, there are at least two major differences between our method and the traditional dynamic brain network analysis methods [30], [31]. First, our method uses the graph kernel to measure the similarity of connection matrices in the DBNs, rather than the simple vector-based distance. The graph kernel can naturally use the topological properties of the brain network to measure similarities, thereby improving the effectiveness of brain states construction. Also, from Table 2, we can find that compared to our method, using L2 distance will reduce the accuracy by at least 10%. Second, our method constructs brain states for each subject, not all. According to previous research [32], [33], brain states have some individualized patterns that can help to identify brain diseases. Hence, constructing subject-level brain states can further improve the accuracy of diagnosis.

B. THE INFLUENCE OF THE AGGREGATION NUMBER
To evaluate the influence of the aggregation number, we test the classification performance of our method under different aggregation numbers (i.e., H = 1, 2, 4, 8, 16), which correspond to four, three, two, one, and zero aggregations respectively. Table 4 reports these results. From Table 4, we can see that our method does not perform well when the H value is too high or too low. It corresponds to a few aggregations when H is too high. In this situation, the structural information contained in bag-level features is limited and cannot reflect the changes caused by the disease due to the sparseness of brain states. It corresponds to multiple aggregations when H is too low. Due to the brain states that have been aggregated many times, they will be particularly dense. Then, the discrimination information will not be easily detected in a lot of useless information. It is worth noting that our method can achieve well performances when H = 2, 4, 8, and all of them are better than the comparison methods reported in Table 2.

C. DISCRIMINANT REGIONS ANALYSIS
According to the most discriminative brain state, we can find some discriminant regions to reveal the pathological mechanism of schizophrenia. Specifically, we first choose the most discriminative brain state, and then extract the features (i.e., clustering coefficient) of this brain state. For these features, we further use statistical tests to find some discriminant regions, which have significant differences between schizophrenics and normal controls. We show these results in Figure 4 and Table 5. From Figure 4 and Table 5, we can find these results are consistent with the result of the brain states analysis. It is easily to find that most regions are concentrated in the default mode network. Besides, most of the selected brain regions have been suggested to be related to schizophrenics by previous studies [34], [35].

D. LIMITATIONS AND FUTURE WORK ALTHOUGH
Although our proposed GK-BSC method shows significant improvement in terms of schizophrenic diagnosis over existing brain states construction methods, several technical issues need to be considered in the future. Firstly, we do not use the weight information of DBNs in our method. Several studies suggest that these weights can also reflect disease-related changes [36]. Therefore, a weighted graph kernel will be studied in the future. Secondly, we use the clustering coefficient as features of the brain state. Although the clustering coefficient is simple to calculate, it cannot accurately describe the structural information of the brain state. It is interesting to design an efficient feature for brain states. Thirdly, the brain state construction and classification processes are two separate processes in our method. A unified framework for joint brain state construction and classifier training will further improve the recognition accuracy of the model.

VI. CONCLUSION
In this paper, we propose a novel brain states construction method, and also design a framework that uses the constructed brain states to identify schizophrenics. Our method uses the graph kernel instead of the traditional vector-based distance to measure the similarity between connection matrices in DBNs, which can better capture the topological properties of the brain network. Also, we extract features from the brain states of each subject, rather than use the group-level brain states, which can better characterize structural abnormalities caused by diseases. Experimental results on the schizophrenic classification indicate that our proposed method can not only improve the accuracy of brain diseases classification, but also find some potential biomarkers of Schizophrenia.
XINYAN YUAN received the M.S. degree from Jiangsu University. She is currently an Assistant Professor with the Jiangsu Vocational College of Business. Her research interests include machine learning and image processing.
LINGLING GU received the M.S. degree from Nanjing Forestry University. She is currently a Lecturer with the Jiangsu Vocational College of Business. Her research interests include image processing and data mining.
JIASHUANG HUANG received the Ph.D. degree from the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, in 2020. He is currently a Lecturer with the School of Information Science and Technology, Nantong University. From 2018 to 2019, he was a Visiting Scholar with the University of Wollongong (UOW), Wollongong, NSW, Australia. His research interests include brain network analysis and machine learning. VOLUME 10, 2022