Sparse Representation and Dictionary Learning Model Incorporating Group Sparsity and Incoherence to Extract Abnormal Brain Regions Associated With Schizophrenia

Schizophrenia is a complex mental illness, the mechanism of which is currently unclear. Using sparse representation and dictionary learning (SDL) model to analyze functional magnetic resonance imaging (fMRI) dataset of schizophrenia is currently a popular method for exploring the mechanism of the disease. The SDL method decomposed the fMRI data into a sparse coding matrix <inline-formula> <tex-math notation="LaTeX">${ {X}}$ </tex-math></inline-formula> and a dictionary matrix <inline-formula> <tex-math notation="LaTeX">${ {D}}$ </tex-math></inline-formula>. However, these traditional methods overlooked group structure information in <inline-formula> <tex-math notation="LaTeX">${ {X}}$ </tex-math></inline-formula> and the coherence between the atoms in <inline-formula> <tex-math notation="LaTeX">${ {D}}$ </tex-math></inline-formula>. To address this problem, we propose a new SDL model incorporating group sparsity and incoherence, namely GS2ISDL to detect abnormal brain regions. Specifically, GS2ISDL uses the group structure information that defined by AAL anatomical template from fMRI dataset as priori to achieve inter-group sparsity in <inline-formula> <tex-math notation="LaTeX">${ {X}}$ </tex-math></inline-formula>. At the same time, <inline-formula> <tex-math notation="LaTeX">${L_{1}} - norm$ </tex-math></inline-formula> is enforced on <inline-formula> <tex-math notation="LaTeX">${ {X}}$ </tex-math></inline-formula> to achieve intra-group sparsity. In addition, our algorithm also imposes incoherent constraint on the dictionary matrix <inline-formula> <tex-math notation="LaTeX">${{D}}$ </tex-math></inline-formula> to reduce the coherence between the atoms in <inline-formula> <tex-math notation="LaTeX">${{D}}$ </tex-math></inline-formula>, which can ensure the uniqueness of <inline-formula> <tex-math notation="LaTeX">${{X}}$ </tex-math></inline-formula> and the discriminability of the atoms. To validate our proposed model GS2ISDL, we compared it with both IK-SVD and SDL algorithm for analyzing fMRI dataset collected by Mind Clinical Imaging Consortium (MCIC). The results show that the accuracy, sensitivity, recall and MCC values of GS2ISDL are 93.75%, 95.23%, 80.50% and 88.19%, respectively, which outperforms both IK-SVD and SDL. The ROIs extracted by GS2ISDL model (such as Precentral gyrus, Hippocampus and Caudate nucleus, etc.) are further verified by the literature review on schizophrenia studies, which have significant biological significance.


I. INTRODUCTION
schizophrenia is a complex mental illness that is characterized by abnormal thinking, speech, and behavior of patients [1]. The diagnosis of schizophrenia remains a difficult problem and there is no gold standard. Although interviews and medical history are the key factors in determining the diagnosis, The associate editor coordinating the review of this manuscript and approving it for publication was Mohamad Forouzanfar . medical experts may make a wrong judgment [2]. fMRI has many advantages such as non-invasiveness and high spatial resolution, it is often used for diagnosis. In particular, it is a very meaningful work to extract brain regions of interesting (ROI) that are significantly related to schizophrenia to assist in the diagnosis.
Canonical correlation analysis (CCA) [3], independent component analysis (ICA) [5] and sparse representation and dictionary learning (SDL) [6] are the most commonly used methods in neuroimaging data analysis. CCA usually treats fMRI data as endophenotype and single nucleotide polymorphisms (SNP) data as genotype to find the maximum correlations between them. A variety of CCA-based methods include SCCA [7], FL-SCCA [8], Group SCCA [9], Joint SCCA [1] and AGN-SCCA [10]. ICA assumes that the signals in fMRI are composed of a mixture of these independent sources. However, ICA-based methods haven't incorporated prior information, such as sparsity [11], task paradigm information [12]. Moreover, recent studies show that the basic assumption of model independence is not guaranteed in practice [12]- [14].
The SDL has been shown to be efficient in learning adaptive, over-complete and diverse features, which decomposes the observed signals into sparse bases representations [12]. This method outforms the traditional methods including principal component analysis (PCA) and ICA in the extraction of activity patterns [12], [15]. As a result, it has been successfully applied to many fields including computer vision [16], image processing [17], machine learning [18] and bioinformatics [19]. In neuroscience, SDL method has attracted more and more attentions. Alexander and Baumgartner [20] used SDL algorithm to denoise fMRI. Zhao et al. [12], Lv et al. [19], Leonardi et al. [21] and Li et al. [22] used SDL-based methods to identify brain functional networks. Jiang et al. [23] used a cortical folding pattern guided model to analyze functional brain networks reconstructed from SDL. Jiang et al. [24] explored the temporal dynamics of functional brain networks reconstructed from SDL in each time window. Jiang et al. [25] applied SDL to 'grayordinate', a special organization of fMRI data to reconstruct functional networks. These SDL methods have a basic assumption that the signal matrix Y can be decomposed into the product of a dictionary matrix D and a sparse coding matrix X. Each column of Y can be sparsely represented by a few atoms in D, and its representation coefficient is the corresponding column vector in X. However, the traditional SDL model only uses L 1 − norm to make each column of X to be sparse, while overlooking rich anatomical structure information in fMRI data. In neuroanatomy, the human brain can be divided into many regions, for example, using the famous AAL template can divide the brain into 116 regions. Furthermore, it is useful to have a highly discriminant dictionary matrix. In these SDL algorithms such as K-SVD [6], orthogonal matching pursuit [26] and online dictionary learning [27], the constraint on the dictionary matrix is the standardization of its columns. When there is a high coherence between atoms, the corresponding sparse coding matrix X is not unique. Therefore, the coherence between the atoms should be considered. For example, [28]- [30] all improve the performance of the model by reducing the coherence of the dictionary matrix.
Herein, we propose a new sparse representation and dictionary learning algorithm incorporating both group sparsity and incoherence, namely (GS2ISDL) for fMRI analysis. Specifically, GS2ISDL algorithm imposes group-norm and L 1 −norm constraints on sparse coding matrix X to make X to gain both intra-group homogeneity and inter-group sparsity. According to the AAL template, the group information is obtained from fMRI data. Group-norm is a constraint defined by these group information for guiding X to achieve intergroup sparsity. L 1 − norm is used to achieve intra-group sparsity in X. In addition, our algorithm reduces the coherence between atoms by imposing incoherent constraints on the dictionary matrix D (i.e., the atoms are as orthogonal as possible). This property guarantees the uniqueness of X and the discriminability of the atoms.
To validate our model, we compared GS2ISDL, IK-SVD [28] and SDL [6] algorithm in feature extraction on the fMRI dataset collected by Mind Clinical Imaging Consortium (MCIC). The results show that the accuracy, sensitivity, recall and MCC values of GS2ISDL are 93.75%, 95.23%, 80.50% and 88.19%, respectively, which outperforms both IK-SVD and SDL. Compared with the results obtained by IK-SVD algorithm, the accuracy, sensitivity, recall and MCC values obtained by GS2ISDL algorithm are improved by 5.5%, 9.51%, 5.28%, and 9.06%, respectively. Compared with the results obtained by SDL algorithm, the accuracy, sensitivity, recall and MCC value obtained by GS2ISDL algorithm are improved by 6.24%, 14.52%, 7.65%, and 10.73%, respectively. Moreover, the ROIs extracted by GS2ISDL algorithm are verified by literature review on schizophrenia.
The rest of the paper is organized as follows: In Section II, we review the basic SDL model and propose our method. In Section III, the validity of GS2ISDL model is tested on real fMRI data. Section IV summarizes the work.

II. METHODS
A. OVERVIEW Figure 1 shows the framework of our proposed GS2ISDL model for identifying abnormal brain regions associated with schizophrenia. In Figure 1(a), Y 1 represents the fMRI data of patients and Y 2 represents the fMRI data of healthy controls. The sizes of Y 1 and Y 2 are both m×n, where m is the number of samples and n is the number of features. According to the AAL template, Y 1 and Y 2 can be divided into 116 groups by column respectively. We combine Y 1 and Y 2 into a matrix Y of size m × 2n, then use GS2ISDL to decompose Y into the product of a dictionary matrix D of size m × K and a sparse coding matrix X of size K × 2n. Here we combine Y 1 and Y 2 into Y so that the fMRI data from healthy controls and patients are mapped into the same dictionary matrix. In Figure 1(b), because group-norm and L 1 −norm constraints are imposed on X, the resulting X is both intra-group and inter-group sparsity. In addition, our approach reduces the coherence between atoms in D by minimizing ||D T D − I|| F . In Figure 1(c), we use the ROA (see section 3.2 for details) coefficient to evaluate the significance of each atom in D. The larger the ROA value, the more significant the atom is. If the ROA value of the i-th atom in D is the largest, we take the non-zero elements of the corresponding patient in the i-th row of X as the features that we need to extract. Based on We use GS2ISDL algorithm to decompose Y into dictionary matrix D and sparse coding matrix X. (c) Finding the significant atom d i in D, which is in i-th column. We take the non-zero elements of the patient's corresponding part in the i-th row of X as important features. Then use SVM model to quantitatively analyze these features, and combine the prior knowledge in the field of neuroscience to make qualitative analysis of important features. the features extracted by us, we use SVM to classify the fMRI data. Accuracy, sensitivity, recall Mathews correlation coefficient (MCC) are used to evaluate the significance of these features. Moreover, we review the relevant literature of schizophrenia to further analyze the ROIs corresponding to these features.

B. REVIEW OF SPARSE REPRESENTATION AND DICTIONARY LEARNING
Assuming the input vector y ∈ R m×1 can be reconstructed by a linear representation of a few atoms {d i } K i=1 in the dictionary matrix D ∈ R m×K , i.e., y = Dx, where its reconstruction coefficient is x of size 1 × K . When we need to learn a dictionary matrix D of size m × K and a sparse coding matrix X of size K × N to reconstruct N vectors Y = {y i } N i=c1 , they can be defined as follows: where x i represents the i-th column of X, || · || F represents the Frobenius norm of the matrix and || · || 0 represents 0 −norm which is the number of nonzero elements in the vector. λ is a regularization parameter used to control the sparsity of x i . Since 0 −norm is included in Eq.(1), this is an NP-hard problem and cannot be solved directly for large-scale data.
To make this problem feasible, 0 −norm is usually relaxed to 1 −norm, and is defined as follows:

C. OUR PROPOSED METHOD
Neuroimaging data often contain group structure information. Many studies [1], [10], [31] have shown that incorporating structural information (such as overlapping groups, trees, and graphs) into the model can improve the classification accuracy.
In this work, in order to improve the performance of SDL model, we incorporate group information in the fMRI data as a prior into SDL model. The group information in the fMRI data is provided by the AAL template [32]. The voxels in each brain mask are associated with a brain region and grouped together. We define these grouping structures as follows: where g i is the set of feature columns belonging to the i-th group in the matrix Y 1 or Y 2 (As shown in ROI 1, ROI 2, . . . , ROI S of Y 1 or Y 2 in Figure 1(a)). GS2ISDL algorithm decomposes Y into the product of D and X. The group information in Y is inherited by X (as shown in Figure 1(b) by X's ROI 1, ROI 2, · · · , ROI 2S). We use these group information as a prior, and impose a group sparse constraint on each row of X to improve SDL model for feature extraction. The group-norm of X is defined as follows: where || · || g represents the group-norm, X is represents the element in the i-th row and s-th group of matrix X, and |g s | represents the number of elements in the s-th group. While group lasso can select significant groups, it cannot identify important features from each group. To make X have both intra-group homogeneity and inter-group sparsity, we combine group Lasso and L 1 − norm. The mathematical formulation of the model is given in the following: where Y ·j and X ·j represent the j-th column of Y and X, respectively. In Eq. (5), 2n j=1 ||X :j || 1 c which makes X to be intra-group sparse. K i=1 2S s=1 |g s |||X is || 2 is a group sparse constraint imposed on X, which makes X to be inter-group sparsity.
In the SDL-based models, it is desirable that coherence between dictionary atoms as low as possible, which can improve the discrimination capability of atom. In Eq. (5) the coherence between atoms in D is not considered. Inspired by [28], we use the Gram matrix to measure the correlation between atoms. The Gram matrix is defined as G = D T D (each column in D has been unitized). G is a symmetric matrix, and the absolute values of its off-diagonal elements represent the correlation between any pair of atoms in D. Therefore, we expect that the off-diagonal elements in G may be as small as possible. We reduce the coherence between atoms by minimizing ||D T D − I|| 2 F and combine it with Eq. (5) to form the following model.
where I is an identity matrix of size K × K , K is the number of atoms in D. γ , λ 1 and λ 2 are all regularization parameters greater than zero. γ is used to control the coherence of the atoms in D. The larger γ is, the lower the coherence between the atoms, and vice versa. λ 1 is the number of atoms used to linearly represent each column in Y , and its range is 1 ≤ λ 1 ≤ K . When λ 1 = 1, it means that each column of Y only needs one atom in D to represent. When λ 1 = K , it means that each column of Y needs K atoms to represent. λ 2 represents the number of groups to be kept in each row of X, and its range is 1 ≤ λ 2 ≤ 2S. When λ 2 = 1, only one group of each row of X is retained, and the elements of the other groups are zeros. When λ 2 = 2S, all groups are kept in each line of X.
Eq. (6) is non-convex for D and X, but when one of them is fixed, it is convex for the other. Therefore, we can design an alternative least square (ALS) algorithm to solve the optimal D and X in Eq. (6). The optimization process of Eq. (6) includes two stages: sparse coding and dictionary updating.
Sparse coding includes two steps. Firstly, we use the Orthogonal Matching Pursuit (OMP) to solve sparse coding matrix X. Through this step we extract the important features in each group (i.e., intra-group sparse). Secondly, we use |g s |||X is || 2 in Eq. (6) to calculate the group-norm of each group in each row of X, then use the soft thresholding algorithm to make each row of X sparse at the group level. (i.e., inter-group sparsity).
The dictionary updating includes three steps. Firstly, we use the K-SVD algorithm to update the atoms in D column-by-column. In this step, we have optimized each atom in D and also adjusted the non-zero elements in X. Secondly, we reduce the coherence between the atoms in D by minimizing ||D T D − I|| 2 F in Eq. (6). We solve the partial derivatives of f = ||D T D − I|| 2 F , which can be expressed as follows: We use gradient descent method D ← D−ξ ∇ f D to update D: where γ = 4ξ > 0 is the step size used to control the convergence speed of the algorithm and k is the number of iterations. We design a variable step size γ k = γ 0 (1 − α) (1 − α k ) . The smaller α is, the larger γ k changes, and vice versa. Thirdly, we unitize each column of D. We iteratively perform the two stages of sparse coding and dictionary updating, until the termination condition |e t+1 The pseudo code for solving GS2ISDL model is shown in Algorithm 1. In order to prevent f (D, X) from falling into a local optimum, we randomly initialize 100 D matrices, then substitute them into Algorithm 1. Finally, the experimental results corresponding to the minimum of ||Y − DX|| F in these 100 experiments are used for further analysis.

A. EXPERIMENT MATERIALS
The fMRI and SNP data used in this paper are from the Mind Clinical Imaging Consortium (MCIC), which collected 208 subjects including 116 healthy controls (age: 32 ±11, 44 females) and 92 schizophrenia patients (age: 34 ±11, 22 females). All subjects signed a commitment to participate in the experiment. Healthy participants had no medical, neurological, or mental illness, and had no history of drug abuse. By the clinical interview of patients for DSM IV-TR Disorders or the Comprehensive Assessment of Symptoms and History, patients met criteria for DSM-IV-TR schizophrenia. Without missing data, 183 subjects were retained (79 schizophrenia patients and 103 health controls) [33]. Because GS2ISDL model combines patient data Y 1 (the size is m 1 × n) and healthy control data Y 2 (the size is m 2 × n) into a matrix Y , we need to make the dimensions of the samples m 1 and m 2 equal. With that in mind, we only randomly retained 79 healthy controls. According to the data preprocessing steps of Lin et al. [31], 41,236 fMRI voxels and 722,177 SNP loci are used for further analysis.

1) fMRI DATA COLLECTION
The fMRI data were collected during a sensorimotor task, a block-design motor response to auditory stimulation. During the on-block, 200 msec tones were presented with a 500 msec stimulus onset asynchrony (SOA). A total of 16 different tones were presented in each on-block, with frequency ranging from 236 Hz to 1318 Hz. The fMRI images were acquired on Siemens 3T Trio Scanners and a 1.5T Sonata with echoplanar imaging (EPI) sequences using the following parameters (TR = 2000 ms, TE = 30 ms (3.0 T)/40 ms (1.5 T), field of view = 22 cm, slice thickness = 4 mm, 1 mm skip, 27 slices, acquisition matrix 64 × 64, flip angle = 90 • ) [34]. The raw fMRI data were pre-processed with SPM5 (http://www.fil.ion.ucl.ac.uk/spm) and were realigned, spatially normalized and resliced to 3 × 3 × 3 mm, smoothed with a 10 × 10 × 10 mm 3 Gaussian kernel, and analyzed by multiple regression considering the stimulus and their temporal derivatives plus an intercept term as repressors. Finally, the stimulus-on versus stimulus-off contrast images were extracted with 53 × 63 × 46 voxels and all the voxels with missing measurements were excluded [1], [31]. After the above data preprocessing process, 41,236 voxels were obtained. According to automated anatomical labeling (AAL) [32] brain atlas, 41236 voxels are divided into 116 ROIs.

2) SNP DATA COLLECTION
The SNPs data were obtained from each subject's blood sample. Genotyping for all participants were performed at the Mind Research Network using the Illumina Infinium HumanOmni1-Quad assay covering 1140419 SNP loci. Bead Studio was used to make the final genotype calls. PLINK software package (http://pngu.mgh.harvard.edu/ purcell/plink) was used to perform a series of standard quality control procedures, out of which 777,365 SNPs loci were retained. Geno-types ''BB'' (non-minor allele), ''AB''(one minor allele) and ''AA''(two minor alleles) were coded as 0, 1 and 2 for each type of SNP [31], respectively.

B. SPECIFIC PATTERN EXTRACTION
The dictionary matrix D containing K atoms is obtained by GS2ISDL algorithm described in Sec. 2.3. Each atom shows one pattern of the brain, and the state of the brain is equivalent to a linear combination of these patterns. The sparse coding process can be regarded as using these patterns to reconstruct the state of the brain. Compared with healthy controls, schizophrenia patients shows abnormal patterns in the brain, which can help us to have a better understanding of schizophrenia. Therefore, we need to compare the distribution of each atom between schizophrenia and healthy controls. In order to quantify this distribution, we introduce the ROA coefficient, which is defined as follows: ROA = ||X(i, j s )|| 0 , Column j s is the schizophrenia ||X(i, j h )|| 0 , Column j h is the healthy controls (9) where the numerator is the number of times that the atom i appears in schizophrenia, and the denominator is the number of times that the atom i appears in healthy controls. Atoms with higher ROA values are more likely the patient brain patterns, while atoms with lower ROA values are more likely to correspond to the healthy brain patterns.

C. MODEL PARAMETER SELECTION
There are four parameters K , λ 1 , λ 2 and γ that need to be adjusted. K represents the number of atoms in the dictionary matrix D. Currently, there is no golden criterion for parameter selection and we have to set K empirically. A meaningful and over-complete dictionary D ∈ R m×K should satisfy K > m, K n [27], which means the lower bound of the dictionary size should be m. The dictionary size should not be too big in order to reduce redundant information. According to [12] and [19], the dictionary size satisfying m < K (dictionary size) < 2m usually gives good result. In our experiment, the range of dictionary size is 79 < K < 158. In order to reduce the coherence between atoms in D, the value of K should be as close as to 79 in our framework. Based on our observation and the works of Groves et al. [35], when K = 100, better experimental results can be obtained. λ 1 is the number of atoms that we used to linearly represent each column of Y (i.e., λ 1 is the number of non-zero elements in each column of X). According to the dictionary size of our framework, each column in Y can be represented by up to 100 atoms. So the range of λ 1 is 1 ≤ λ 1 ≤ 100. If the value of λ 1 set too small, the reconstruction error of each column in Y will be relatively large. If the value of λ 1 set too large, each column in X will not be sparse. Therefore, the value of λ 1 should be as small as possible by minimizing the following reconstruction error of Y : where y j and x j represent the j-th column of Y and X, respectively. We fixed λ 2 as a constant, and set λ 1 from 1 to 100 with a step of 1. We brought these λ 1 values into Eq. (6) to calculate D and X, then calculated the reconstruction error corresponding to each λ 1 according to Eq. (10). The experimental results are shown in Figure 2. The reconstruction error of Y changes within a small range when λ 1 ≥ 78, so we set λ 1 = 78 in this paper. λ 2 represents the sparsity at the group level in each row of X (i.e. the number of groups that need to be retained in each row of X). In our framework, X consists of two parts: healthy controls and schizophrenia. Each part can be divided into 116 brain regions, so the value range of λ 2 is 1 ≤ λ 2 ≤ 232. If the value of λ 2 set too small, it will increase the error of reconstruction error. If it set too large, it will not achieve sparsity at group level. 1 ≤ λ 2 ≤ 232 is a very large range, and it is very time consuming to use the exhaustive method. According to the previous step, we fixed λ 1 = 78 and set λ 2 from 10 to 200 with a step of 10. We brought these λ 2 values into Eq. (6) to calculate D and X, then calculated the reconstruction error corresponding to each λ 2 according to Eq. (10). The experimental results are shown in Figure 3. Our principle of choosing λ 2 is that the reconstruction error is as small as possible and the sparsity of X is as large as possible at the group level ( i.e. λ 2 set as small as possible ). As can be  seen from Figure 3, the reconstruction error decreases very slowly when λ 2 ≥ 50. According to the selection principle of λ 2 , in this paper we set λ 2 = 50.
γ is used to control the incoherence between atoms. We let γ k = γ 0 (1 − α) (1 − α k ) automatically adjust its size according to the number of iterations, where k is the number of iterations. According to the works of [28], we set k = 20 with step size 1 in this paper. γ 0 and α can control the speed of change γ k . When γ 0 is fixed, the smaller α is, the faster γ k changes, and vice versa. γ 0 has the opposite effect on γ k as α. In other words, when α is constant, the larger γ 0 is, the faster γ k changes. There is no uniform standard on the determination of γ 0 and α. In our framework, γ 0 and α can be used to reduce the coherence between the atoms in D by changing the value of γ k during iteration. Based on our experiments and observations, we set γ 0 = 0.005 and α = 0.1.

D. QUANTITATIVE COMPARISON OF EXPERIMENTAL RESULTS
We use accuracy, sensitivity, specificity and Mathews correlation coefficient (MCC) to quantitatively evaluate the performance of GS2ISDL, SDL [6] and IK-SVD [28].

1) CLASSIFICATION RESULTS ON fMRI DATA
Based on the regularization parameters selected above, we use GS2ISDL algorithm to decompose Y ∈ R 79×82472 into a dictionary matrix D of size 79 × 100 and a sparse coding matrix X of size 100 × 82472. We calculated the ROA coefficient of each atom in D according to Eq. (9), and the result is shown in Figure 4. The ROA coefficient of the 5-th atom is significantly higher than the other atoms. Therefore, we recorded the patients counterpart (i.e. X (5, 41237: 82472)) in line 5 of X as x s , then put the indices of the non-zero elements in x s in the new vector I s . According to I s , we took out the corresponding columns in Y 1 and We used a 10-fold cross-validation to randomly divide Y s into 10 subsets. Each subset is successively used for testing, and the remaining 9 subsets are used for training the SVM model. We use the accuracy, sensitivity, specificity and MCC obtained by GS2ISDL algorithm on the testing set as the final experimental results. Similarly, we can obtain the performance measures of SDL and IK-SVD algorithms on the fMRI dataset. Table 1 shows the accuracy, sensitivity, specificity, and MCC obtained by GS2ISDL, IK-SVD and SDL from fMRI dataset, respectively. As we can see from Table 1, the accuracy, sensitivity, recall and MCC values of GS2ISDL are 93.75%, 95.23%, 80.50% and 88.19% on fMRI dataset, respectively, which outperforms both IK-SVD and SDL. Compared with the results obtained by IK-SVD algorithm, the accuracy, sensitivity, recall and MCC values obtained by GS2ISDL algorithm are improved by 5.5%, 9.51%, 5.28%, and 9.06%, respectively. The performance of GS2ISDL is better than IK-SVD because GS2ISDL adds the group information of fMRI data as priori to IK-SVD model. Similar to the work of [11], [31], [36], we all improve the performance of the model by adding group-norm constraint on the model. In addition, compared with the results obtained by SDL algorithm, the accuracy, sensitivity, recall and MCC value obtained by GS2ISDL algorithm are improved by 6.24%, 14.52%, 7.65%, and 10.73%, respectively. The performance of GS2ISDL is better than SDL because GS2ISDL adds incoherent and group-norm constraints on SDL model. Applying incoherent constraint to the dictionary matrix not only increases the discriminability of the atoms but also makes the sparse coding matrix unique, which has been confirmed in [28], [37]. The features extracted by GS2ISDL from fMRI data make it have better classification performance, which indicates that these features are significantly related to schizophrenia. The corresponding ROI of these features in the AAL template gives us a deeper understanding of schizophrenia, and may help clinicians in the auxiliary diagnosis and treatment of schizophrenia.
In this paper, we further compared the performance of the three methods by calculating the correlation between the atoms in the dictionary matrix D. We used the Pearson correlation coefficient to calculate the correlation. In order to express the correlation between atoms more intuitively, a heat map was used to show the absolute value of correlations between atoms. As shown in Figure 5, (a) represents the correlation between atoms in the initialized D.  in D obtained by GS2ISDL, IK-SVD, and SDL algorithms on the fMRI dataset. Since there is no incoherent constraint imposed on D in SDL algorithm, the correlation between atoms in D is higher than the other two algorithms. Both GS2ISDL and IK-SVD algorithms impose incoherent constraints on D, so the correlation of atoms is low. Furthermore, we can observe that the results in Figure 5(b) are slightly better than Figure 5(c). This is because GS2ISDL algorithm not only imposes incoherent constraints on D, but also imposes group sparse constraints on X.

2) CLASSIFICATION RESULTS ON SNP DATA
To verify the effectiveness of our algorithm on different datasets, we applied GS2ISDL, IK-SVD and SDL to the SNP dataset. Table 2 shows the accuracy, sensitivity, specificity, and MCC obtained by GS2ISDL, IK-SVD and SDL from SNP dataset, respectively. As we can see from Table 2, the accuracy, sensitivity, recall and MCC values of GS2ISDL are 98.13%, 96.67%, 98.32% and 96.46% on SNP dataset, respectively, which still outperforms both IK-SVD and SDL. Compared with the results obtained by IK-SVD algorithm, the accuracy, sensitivity, recall and MCC values obtained by GS2ISDL algorithm are improved by 0.63%, 1.11%, 1.46%, and 1.18%, respectively. Compared with the results obtained by SDL algorithm, the accuracy, sensitivity, recall and MCC value obtained by GS2ISDL algorithm are improved by 1.97%, 2.35%, 2.26%, and 1.64%, respectively. In addition, we also analyzed the correlation between the atoms in the dictionary matrix D. As can be seen from Appendix I, compared with the other two algorithms, the correlation between the atoms in D obtained by our algorithm is the lowest, which indicates that our algorithm is slightly better than IK-SVD and SDL algorithms on the SNP dataset.

E. QUALITATIVE ANALYSIS OF EXPERIMENTAL RESULTS
When GS2ISDL algorithm is applied to fMRI dataset (as shown in Section 3.2), the ROA coefficient of the 5-th atom is the largest. Therefore, we recorded the patient's part in row 5th of X as x s . We took out the voxels when the absolute value of x s is greater than 1, and found the corresponding ROI of these voxels in the AAL template. The results are listed in Table 3. In Table 3, the 2th, 3th, and 4th column indicate the corresponding numbers of the selected ROI in the AAL template, the volume of selected ROI in the left brain and the right brain, and the corresponding numbers of selected ROI in the Brodmann area, respectively. A total of 6 ROIs are selected by our algorithm. They are Precentral gyrus, Hippocampus Caudate nucleus, Thalamus, Superior temporal gyrus and Lenticular nucleus. Specifically, recent studies have shown dysfunction of the precentral gyrus has long been thought to play a role in the impairments of voluntary movement associated with Schizophrenia and it has significantly reduced functional activity in patients with schizophrenia [38]- [40]. Hippocampus is currently recognized as a brain region associated with schizophrenia, and its reduced volume can cause patients to experience rest, hearing and memory dysfunction [41], [42]. Compared with FIGURE 6. Abnormal brain regions associated with schizophrenia selected by GS2ISDL algorithm.
healthy controls, caudate nucleus and cortical regions with connections to the caudate nucleus show markedly abnormal hemispheric specialization in patients with schizophrenia [43], [44]. Thalamus plays a key role in sensory information processing and has attracted much attention in the study of schizophrenia. Its structural abnormalities can cause cognitive impairment in the brain [45], [46]. Superior temporal gyrus is closely related to hallucinations in the pathophysiology of schizophrenia [47]. Lenticular nucleus' nerve damage is linked to tardive dyskinesia in schizophrenia [48]. In addition, Lenticular nucleus has been shown in the literature [49] to be associated with the early onset of schizophrenia. In order to illustrate the specific location of the abnormal brain region in brain surface, we use BrainNetViewer to plot the abnormal brain ROIs selected by GS2ISDL, as shown in Figure 6. Different colors of the balls indicate different ROIs with the corresponding numbers of the ROIs in the AAL template. The larger the volume of the sphere, the larger the volume of the ROI we have extracted in the left or right brain.

IV. CONCLUSION
In this paper, based on the sparse representation and dictionary learning, we proposed a novel GS2ISDL algorithm to extract abnormal brain regions associated with schizophrenia. GS2ISDL improves SDL algorithm in two aspects. First, we introduced both group-norm and L 1 − norm constraints on the sparse coding matrix X to make X to be both intragroup and inter-group sparsity. Specifically, we grouped ROIs according to the AAL template, then used these grouping information as a prior for guiding the sparse coding matrix to achieve between-group sparsity. At the same time, we used L 1 − norm to make it to be intra-group sparse. Second, we also reduced the coherence between the atoms in the dictionary matrix D by minimizing ||D T D − I|| 2  He is currently a Doctoral and a Postdoctoral Supervisor of electronic and control engineering with Chang'an University. His research interests include traffic data mining, and the robot control and intelligent control. He is currently an Associate Professor and a Master's Supervisor with the School of Electronic and Control Engineering, Chang'an University, Xi'an. His research interests include bioinformatics, parallel computing, and machine learning.
KAIMING WANG was born in Shaanxi, China, in 1974. She received the B.S. degree in mathematics from North West University, Xi'an, China, in 1997, and the M.S. and Ph.D. degrees in applied mathematics from Xi'an Jiaotong University, Xi'an, in 2013.
Since 1997, she has been a Teacher then an Associate Professor with the School of Science, Chang'an University, Xi'an. She has authored about ten articles. Her research interests include the stability of nonlinear systems, integration on image genetics, and modeling on multi-modal big data.
SUYING JIANG was born in Shangnan, Shaanxi, China, in 1990. She received the B.S. degree in electrical engineering and automation from Xi'an Technological University, in 2013, and the M.S. degree in control theory and control engineering from the Shaanxi University of Science and Technology, Xi'an, Shaanxi, in 2016. She is currently pursuing the Ph.D. degree in traffic information engineering and control with Chang'an University, Xi'an. Since 2016, she has been a Teacher with Baoji University. Her research interests include vehicle-road collaboration, wireless positioning, and data mining. He is currently a Professor of biomedical engineering, and biostatistics and bioinformatics with the School of Science and Engineering, Tulane University and the Tulane University School of Public Health and Tropical Medicine. He is also a member of the Tulane Center of Bioinformatics and Genomics, the Tulane Cancer Center, and the Tulane Neuroscience Program. His research interests include computer vision, signal processing, and machine learning with applications to biomedical imaging and bioinformatics, where he has over 250 peer-reviewed publications. He has served on numerous program committees and NSF/NIH review panels, and served as editors for several journals, such as the IEEE TRANSACTIONS ON MEDICAL IMAGING, the IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, and the Journal of Neuroscience Methods.