A Novel EEG Correlation Coefficient Feature Extraction Approach Based on Demixing EEG Channel Pairs for Cognitive Task Classification

This paper presents a novel feature extraction method for electroencephalogram (EEG)-based cognitive task classiﬁcation based on the correlation coefﬁcients of EEG channel pairs by introducing preprocessing of the EEG signals. The preprocessing attempts to optimally demix each pair of EEG channels using a two-dimensional rotation matrix in order to mitigate the interference between channel pairs and, consequently, to enhance the resulting correlation coefﬁcient features for cognitive task classiﬁcation. For the optimization, the following criteria are proposed with an optimal rotation angle approximated for each criterion: i ) maximum inter-class correlation coefﬁcient distance (ICCD); ii ) minimum within-class correlation coefﬁcient distance (WCCD); and iii ) maximum Fisher ratio (FR), which is the ratio of ICCD to WCCD. Performance evaluation based on the cognitive task dataset, dataset IV and Ib in BCI competition II, and Keirn and Aunon’s dataset, shows that ICCD optimization with the ‘above the mean’ and 1.5 interquartile range (IQR) feature selection method yields the best classiﬁcation performance in comparison with other existing cognitive task classiﬁcation methods


I. INTRODUCTION
Brain-computer interfaces enable direct communication between the computer and the human brain through neural signals [1]- [3]. Recently, electroencephalogram (EEG)based BCIs have been studied in the field of brain science, neuroscience, and rehabilitation [4], [5]. EEG is one of the most studied neural signals for detecting a user's intention. EEG signals are composed of electrical potentials arising from several sources and has the advantages of high temporal resolution and non-invasiveness [6], [7]. However, the direct use of EEG signals is problematic, as several sources can be activated simultaneously and the electrical potentials from task-irrelevant sources are superimposed on the measured EEG signal, which is known as the volume conduction The associate editor coordinating the review of this manuscript and approving it for publication was Naveed Ur Rehman . effect [8]. Therefore, various methods to extract the taskrelated information buried in EEG signals have been developed in BCI systems.
In particular, motor imagery (MI) is widely used in EEGbased BCIs, as the power of EEG signals from the cerebral cortex is decreased or increased while imaging the movements of specific parts of the body, called eventrelated desynchronization (ERD)/event-related synchronization (ERS). The common spatial pattern (CSP) approach [9] is the most commonly used MI feature extraction method based on ERD/ERS. CSP produces output powers maximizing the difference between two MI-tasks by using optimal spatial filters that indirectly suppress the volume conduction effect [10]. Various extensions of the CSP algorithm, i.e., regularized version of CSP [11]- [13], and frequency optimized version of CSP [14], have been studied with successful results. Moreover, the CSP algorithms based on MI-task related EEG channel selection have been proposed in [15], [16]. Contrast to the CSP, attempts have also been made to develop a direct method to mitigate the volume conduction effect. The adaptive spatial filtering (ASF) approach ( [17]) partially enhances each EEG channel signal by mitigating the interference from only the non-significantly correlated channels using noise canceling adaptive filters.
Although MI-EEG classification approaches have been used successfully for BCI systems, several limitations have recently been noted [8]. For example, the extracted EEG features based on temporal, spectral, and spatial characteristics for MI may not provide enough information for BCI systems and no successful communication has been established for certain types of subjects [8]. Therefore, there is increasing interest in BCI systems based on brain connectivity (neuronal activation patterns of separated brain regions) triggered by general cognitive tasks.
In recent years, several studies have been conducted to measure brain connectivity to obtain informative features on BCI systems for cognitive tasks [18]- [25]. These studies developed various algorithms based on various features to characterize the EEG-based interactions of the brain region: Phase lag index (PLI), defined as the absolute mean value of the sign of the difference between the instantaneous phase for the EEG channel pairs, is used in [18]. Partial directed coherence (PDC) method ( [19]) uses coherence, i.e., the magnitude of normalized cross-spectral density between the EEG channel pairs. Phase locking value (PLV), defined as the absolute value of the mean phase difference between EEG channel pairs, is used in [20] with the Fisher score, which plays a role in selecting good PLV features (hereafter referred to as the PLV-FS method) and in [22]. The method using PLV with the recursive feature elimination (RFE) is referred to as the PLV-RFE method [27]. Correlation coefficient, a normalized covariance of EEG channel pairs, is considered in [21], [22]. Finally, the amplitude envelope correlation (AEC), the correlation coefficient for the envelop of Hilbert transformed EEG channel pairs, is used in [23].
Among these various brain connectivity related features, the algorithms based on correlation coefficient and PLV features show superior performance compared to those based on other brain connectivity features ( [24], [25]). Furthermore, correlation coefficient features have several advantages over PLV features, such as insensitivity to frequency band optimization and noise-robust characteristics, making them suitable for EEG signals with a low signal-to-noise ratio (SNR) [22], [26]. Despite these advantages, the large number of correlation coefficient features generated from all pairs of EEG channels requires a sophisticated feature selection method to yield satisfactory classification performance. The Gaussian curve (GC) approach described in [21] selects the EEG channel pairs that produce significant statistical differences between the two tasks by approximating the feature distribution as Gaussian. The t-statistic approach proposed previously [25] selects the EEG channel pairs with 'good' correlation coefficient features using t-score.
Correlation coefficient feature selection using RFE is proposed in [22] with various modified support vector machines (SVMs).
However, merely selecting good correlation coefficients has limitations with regard to improving performance. For example, connectivity/time-frequency mask (CM/TFM) algorithm demonstrate that integration of time-frequency power features and carefully selected correlation coefficient features can yield a better performance than that produced by existing correlation coefficient-based algorithms [25]. Therefore, fundamental approaches to enhance the quality of correlation coefficient features are required.
In this paper, preprocessing of raw EEG signals is proposed to enhance correlation coefficient features in contrast to conventional correlation coefficient-based methods. Ideally, a perfect source separation of the EEG signals that makes EEG channels statistically independent would dramatically enhances the correlation coefficients as features representing brain connectivity. However, such perfect source separation is not feasible and we attempt a two step sub-optimal signal processing. Adaptive spatial filtering (ASF, [17]) is firstly applied to all channels to mitigate mild interferences among the weakly correlated channels. Then, instead of global demixing of EEG signals, we consider feasible demixing of each channel pair EEG signals with a two-dimensional rotation matrix that has a single parameter, the rotation angle. Furthermore, the demixing matrix is adjusted to a tangible criteria to directly enhance classification accuracy. The following three demixing matrix optimization criteria are proposed and the optimal rotation angles are investigated: i) maximum distance of averaged correlation coefficients between two cognitive tasks, termed as inter-class correlation coefficient distance (ICCD), ii) minimum sum of the variances of correlation coefficients for each cognitive task, termed as within-class correlation coefficient distance (WCCD), and iii) maximum Fisher ratio (FR) of the correlation coefficients for two cognitive tasks i.e., the ICCD to WCCD ratio. The closed form expression for the approximated optimum demixing matrix for each criterion is derived in terms of the rotation angle. Furthermore, feature selection methods for the optimized correlation coefficients, i.e., selection of good correlation coefficient features, are thoroughly investigated. Four prevailing feature selection methods i.e., the 'above the mean' and 1.5 interquartile range (IQR) method [28], GC approach [21], t-statistic method [25], and RFE method [27] are considered with respect to each criterion. The classification performances of the proposed schemes are evaluated for three cognitive task EEG datasets, Ib and IV in BCI competition II [31] and Keirn and Aunon's dataset [32]. The ICCD optimization with 'above the mean' and 1.5 IQR rule shows the best performance.
The paper is organized as follows. In Section II, we introduce the system model and the proposed method. In Section III, the experimental setting using the Ib and IV in BCI competition II and Keirn and Aunon's dataset are explained. Section IV analyzes the experiments results. Finally, conclusion for this paper is made in Section V.

A. SYSTEM MODEL
Let us consider an EEG signal dataset recording binary cognitive tasks using K channels. The EEG signal corresponding to channel k is denoted as the vector x (k) = [x (k) (1), x (k) (2), . . . , x (k) (N )] T , where k = 1, 2, . . . , K and N is the number of temporal samples per channel. We assume that M trials of EEG signals are available as a training dataset and each trial consists of N sample points, indexed by x (k) i ∈ R N ×1 , where i = 1, · · · , M . Each trial is included in two index sets, I 1 and I 2 (I 1 ∪ I 2 = {1, 2, . . . , M }), corresponding to each cognitive task, respectively, as we consider binary cognitive task classification. We assume that x (k) i is already band-pass filtered.
For the cognitive task classification, we use sample correlation coefficients between selected EEG channel pairs after preprocessing. The (sample) correlation coefficient of the i-th trial EEG channel pair k and p is given by:

B. COST FUNCTIONS OF CORRELATION COEFFICIENTS 1) INTER-CLASS CORRELATION COEFFICIENT DISTANCE
The inter-class correlation coefficient distance (ICCD) for each EEG channel pair (k, p), denoted by D (k,p) , is the distance between the averaged correlation coefficients for two tasks [28]. The mean correlation coefficient for the task c, denoted byρ where |I c | denotes the number of training trials for the task c. For a training data set {x (k) i }, the ICCD can be computed as following:

2) WITHIN-CLASS CORRELATION COEFFICIENT DISTANCE
The within-class correlation coefficient distance (WCCD) for EEG channel pair (k, p), denoted by W (k,p) , is the sum of the variances of correlation coefficients for two tasks. Using the training dataset, WCCD is given by:

3) FISHER RATIO
The Fisher ratio (FR) for the correlation coefficient of EEG channel pair (k, p), denoted by F (k,p) , is defined as the ratio of ICCD to WCCD [29]: Fig. 1 depicts the block diagram of the proposed algorithm. The EEG signals are preprocessed using the ASF method [17] for interference mitigation. For each pair of ASF-filtered EEG signals, a rotation matrix is applied to optimize one of the above criteria, maximize ICCD, minimize WCCD, or maximize FR. Various feature selection schemes are examined with the resulting correlation coefficients. Fig. 2 illustrates the block diagram of the adaptive spatial filtering algorithm for EEG channel signals. For each channel k = 1, 2, . . . , K , the ASF method [17] first determines a reference set of EEG channels, denoted by H (k) , with the channel k computed using training data in both classes, the correlation coefficients of which are below a predetermined threshold, ρ thr , as follows:

C. ADAPTIVE SPATIAL FILTERING
Once H (k) is set, ASF attempts to reduce the residual signal components of H (k) in x (k) (n) by minimizing the power of the resulting signal, denoted by y (k) (n) with adaptive interference canceling filters [17]: D−1 (n)] T denotes the canceling filter for the channel h ∈ H (k) initialized to zeros. The goal of ASF is to minimize the power of y (k) (n) with optimal filter w (k,h) , hoping the power term contributed by interference is minimized. That is equivalent to find the filter minimizing the cost function, where E(·) denotes the expectation operation. To minimize the cost function, a stochastic descent algorithm [30] can be considered as follows: where µ is a step size. The above adaptive algorithm is equivalent to the least mean square (LMS) algorithm [30]. In order to guarantee the stability of the algorithm, we apply the normalized LMS (NLMS) algorithm [30], the normalized version of LMS by the power of the input signal: , h ∈ H (k) (12) After applying the ASF method, we denote the ASF-filtered x

D. EEG CHANNEL PAIR OPTIMIZATION USING ROTATION MATRICES
After processing with ASF, we optimize the correlation coefficients of each EEG channel pair by following three criteria: maximize ICCD, minimize WCCD, and maximize FR. We apply a two-dimensional rotation matrix to the ASF-processed signals from each channel pair (k, p) as follows: where θ denotes the rotation angle. The correlation coefficient of the rotatedŷ can be expressed as the following function of the rotation angle θ with respect to the original y (14) where P(y However, it is difficult to find the optimal θ 's which satisfy our goals (three optimizing criteria), respectively, due to the denominator terms in ρ (θ ) and it can be rewritten as: For EEG signals, we can assume that the sum of powers, P(y (k,p) i,+ ), is significantly greater than the difference, |P(y (k,p) i,− )|, and the covariance, C(y i,+ ) term and fluctuates due to the sin 2 (θ ) term. Therefore, G (k,p) i (θ ) is periodic with period π/2 and reaches its maximum value when sin 2 (2θ + ψ) = 1 and its minimum value when sin 2 (2θ + ψ) = 0. As such, we approximate G evaluated at sin 2 (2θ + ψ) = 1/2 or θ = π/8 − ψ/2, and denote it withĜ (k,p) i as follows: Based on these assumptions, P(y (θ ) can be approximated follows: Fig. 3 plots an example of the approximation for the (C4, CP5) channel pair of the BCI competition II dataset IV. Fig. 3(a) shows G The optimal angle maximizing D (k,p) (θ ) is approximately given by (refer to Appendix A for the detailed derivation): where

2) OPTIMAL ROTATION ANGLE FOR WCCD MINIMIZATION
The WCCD can be written as a function of θ as follows: The optimal angle minimizing W (k,p) (θ ) is given approximately by (refer to Appendix B for the detailed derivation): where

3) OPTIMAL ROTATION ANGLE FOR FR MAXIMIZATION
The FR is computed using ICCD and WCCD, i.e., F (k,p) (θ) = D (k,p) (θ ) W (k,p) (θ ) , and the optimal angle maximizing F (k,p) (θ) is given approximately by: see eq.28, as shown at the bottom of the next page (refer to Appendix C for the detailed derivation), where γ (k,p) = 4H (k,p) cos(2ψ

E. FEATURE SELECTION RULES
We have considered three optimization criteria for the correlation coefficients. The number of correlation coefficients generated from the K -channel EEG signals is K (K − 1)/2. From these overabundant correlation coefficients, a few 'good' EEG channel pairs are selected using training data during the training stage to determine the final output features. There are several feature selection methods for correlation coefficients. In this paper, four feature selection methods, the 'above the mean rule' and 1.5 IQR ( [28]) method, GC method ( [21]), t-statistic method ( [25]), and RFE ( [27]), are tested with the optimized correlation coefficients with three different criteria. The simulation results presented in the next section show that ICCD maximized optimization with the 'above the mean rule' and 1.5 IQR method yield the best performance.

1) 'ABOVE THE MEAN' AND 1.5 IQR METHOD
The 'above the mean' rule selects the correlation coefficients that have the values above the mean in terms of the criterion for which the rotation matrix is optimized [28]. For example, when the rotation matrix is optimized for ICCD maximization, the channel pairs whose resulting ICCD is greater than the average of the ICCDs of all EEG channel pairs are selected. The selected set of EEG channel pairs, denoted by Z , is given by: opt ). When the rotation matrix is optimized with respect to WCCD minimization, we select the channel pairs with WCCDs that are lower than the averaged WCCD, as follows: . For a small K , the 'above the mean' rule yields a suitable number of EEG channel pairs. However, for a larger K the size of Z may be still large. In such case, we further apply the 1.5 IQR rule [28] to select more discriminative EEG channel pairs, since the 1.5 IQR rule is effective feature selection method from the large number of features. The IQR thresholds for the optimization criteria, D thr , W thr and F thr are defined as follows: where Q 1 and Q 3 denote the middle value between the minimum and median and the middle value between the median and maximum, respectively, with respect to the metric of interest (ICCD, WCCD, or FR) among the pairs in the 'above the mean' set Z . We further refine the selected channel pairs with respect to the threshold. For example, for ICCD optimization and for WCCD optimization

2) THE GAUSSIAN CURVE METHOD
The GC method approximates the distribution of correlation coefficients of an EEG channel pair for a given class as Gaussian with a sample mean and sample variance [21]. Hence, the probability of a misclassification error for a correlation coefficient for the channel k and p, denoted by A (k,p) , is given as the intersection area of the two approximated Gaussian probability density functions corresponding to each class, which can be easily computed by Gaussian probability density functions. The selected EEG channel pair set Z consists of EEG channel pairs with lower misclassification errors than a threshold, denoted as O thr , as follows:

3) THE T -STATISTIC METHOD
The t-statistic method applies a statistical method based on the Student's t-distribution to select discriminative EEG channel pairs [25]. The t-score for correlation coefficients, the normalized difference of the averaged correlation coefficients between two classes, approximately follows the t-distribution and the significance of the t-score, denoted by P (k,p) , is measured using the approximated t-distribution. The selected EEG channel pair set Z consists of EEG channel pairs with P (k,p) values higher than the threshold P thr , as follows:

4) RECURSIVE FEATURE ELIMINATION (RFE)
The RFE method is a feature selection method that determines the ranks of features by iteratively eliminating the features with the lowest contribution to the SVM classifier [27]. Then, a certain number of top features, N Z , are used for the SVM. In our experiment we optimized N Z based on cross-validation.

III. DATA AND EXPERIMENTS A. DATA DESCRIPTION 1) BCI COMPETITION II DATASET IV
The goal of public benchmark dataset IV from BCI competition II is to predict the laterality of upcoming finger movements [31]. This dataset is widely used to evaluate the performance of cognitive task classification algorithms. During the recording, the subject was requested to press the key using a self-chosen order and timing for predicting the movement of the left or right hand. This dataset was recorded without a feedback session. The EEG signal was recorded using 28 channels (K = 28) and bandpass filtered between 0.05 and 200 Hz. The sampling rate was 1000Hz. The dataset consisted of 416 trials (316 training trials and 100 test trials).

2) BCI COMPETITION II DATASET IB
The public benchmark dataset Ib from BCI competition II was taken from an artificially respirated amyotrophic lateral sclerosis (ALS) patient [31]. The subject was asked to move a cursor up and down on a computer screen, while his cortical potentials were monitored. The EEG signal was recorded using seven channels: A1-Cz, A2-Cz, 2cm frontal of C3, 2cm parietal of C3, vEOG, 2cm frontal of C4, and 2cm parietal of C4. The sampling rate was 256Hz. The dataset consisted of 380 trials (200 training trials and 180 test trials).

3) KEIRN AND AUNON'S DATASET
This public benchmark dataset was originally reported by Keirn and Aunon [32]. The data were recorded from seven subject aged 21-48 years who were performing five distinct mental tasks [32]: i) the baseline task in which the subjects were asked to relax as much as possible; ii) the letter task in which the subjects were instructed to mentally compose a letter to a friend or relative without vocalizing; iii) the math   task in which the subjects were asked to solve nontrivial multiplication problems without vocalizing or making any other physical movements; iv) the visual counting task in which the subjects were asked to imagine a blackboard and to visualize numbers being written on the board sequentially, v) the rotation task in which the subjects were asked to visualize a particular three-dimensional block figure being rotated about an axis. The EEG data were recorded for 10s at sampling rate of 250Hz using six channels (C3, C4, P3, P4, O1, and O2).

B. DATA PROCESSING 1) BCI COMPETITION II DATASET IV
For the experiments with dataset IV, the EEG data were bandpass filtered using a fourth-order Butterworth filter operating at 10-33Hz. The time segment of EEG data from 0.15 to 0.5s after the cue was extracted. We set ρ thr to 0.65 in the ASF algorithm. The step size (µ) for NLMS algorithm is set to 0.0001 by 10 × 10 cross-validation and filter length (D) is set to 3. For feature selection, the 'above the mean' and 1.5 IQR rule, GC method, t-statistic method and RFE are used. For the GC method and RFM, we select 30 EEG channel pairs. For the t-statistic method, the probability threshold is set to 0.995.

2) BCI COMPETITION II DATASET IB
For the experiment with dataset Ib, we use six channels excluding the vEOG channel (K = 6). A fourth-order bandpass filter operating at 0.5-7.5Hz was applied as the difference in EEG spectral power between the two classes of dataset Ib is concentrated in the low frequency band [25]. The time segment of EEG data from 1 to 3s after the cue were extracted. The ρ thr for the ASF algorithm is set to 0.6. The step size (µ) is set to 0.0001 by 10 × 10 cross-validation and filter length (D) is set equal to the experiments with dataset IV. For feature selection, the 'above the mean' rule, GC method, t-statistic method and RFE are used. Eight EEG channel pairs are selected using the GC method and RFE, and probability threshold is set to 0.9 for the t-statistic method.

3) KEIRN AND AUNON'S DATASET
For the experiment with Keirn and Aunon's dataset, we use four subjects as the other three subjects had fewer than 10 sessions or some errors in the recording [33]. As we consider the binary classification, we test our proposed method on two mental tasks, a letter task and a math task. The EEG signal for each mental task was segmented into 10 segments with length 1s. The ρ thr is set to 0.6. For the NLMS algorithm, µ is set to 0.0001 by 10 × 10 cross-validation and D is set to 3. For feature selection, the 'above the mean', GC method, t-statistic method and RFE are used. For the GC method and RFM, we select eight EEG channel pairs. For the t-statistic method, the probability threshold is set to 0.9.

A. PERFORMANCE COMPARISON OF PROPOSED THREE OPTIMIZATION SCHEMES
We compared the performance between three correlation coefficient optimization schemes (ICCD, WCCD, and FR optimization) with four feature selection methods ('above the mean'/1.5 IQR rule, GC method, t-statistic method, and RFE) using the three EEG datasets mentioned above. First, Table 1 presents the classification accuracies of BCI competition II dataset IV. The ICCD optimization with the 'above the mean'/1.5 IQR rule outperforms than other optimization criterion with feature selection methods. Table 2 shows the performance of three optimized schemes with feature selection methods for BCI competition II dataset Ib. For this dataset, the ICCD optimization with the 'above the mean'/1.5 IQR rule also better than other schemes. Table 3 shows the simulation results of Keirn and Aunon's dataset using 10 × 10 cross-validation. ICCD optimization with the 'above the mean'/1.5 IQR rule yields the highest classification accuracy in average, although the performance varies among subjects. The simulation results with the three datasets have shown that ICCD optimization with the 'above the mean'/1.5 IQR rule perform the best in terms of cognitive task classification. Fig. 4 shows the EEG channel pairs selected by ICCD optimization with the 'above the mean'/1.5 IQR rule for BCI competition II dataset IV. Four EEG channel pairs, C4-C3, C4-CP5, C4-CP3, and C4-CP1, are selected, and channel C4 is included in all selected channel pairs. Table 4 lists the ICCD values and the corresponding optimal rotation angles of the selected EEG channel pairs by the proposed algorithm based on ICCD optimization with the 'above the mean'/1.5 IQR rule. The ICCD values for raw EEG pairs are also listed. The classification accuracies with/without the proposed preprocessing are also reported. The classification performance is enhanced by 25%. Fig. 5 shows the correlation coefficient topography of channel C4 for BCI competition II dataset IV indicating the changes in correlation coefficient pattern due to ICCD optimization. Fig. 5(a) illustrates the topography of the raw EEG signals and fig. 5(b) shows the ICCD optimized EEG signals. Fig. 6 shows the final EEG channel pairs selected using ICCD optimization with the 'above the mean'/1.5 IQR rule for BCI competition II dataset Ib. Six EEG channel pairs are selected and '2cm frontal of C4' is included in four of the six selected EEG channel pairs. Table 5 shows the ICCD values and optimal rotation angles of the EEG channels selected using ICCD optimization with the 'above the mean'/1.5 IQR rule. The ICCD values and classification accuracy for raw EEG pairs are compared with the optimized ICCD values, and the resulting performance improvement is 6.67%.

B. COMPARISON OF THE PERFORMANCE OF THE PROPOSED METHOD WITH EXISTING CONNECTIVITY BASED METHODS
To confirm the performance of proposed method, we compare the performance of proposed method with the various brain connectivity based methods, such as correlation coefficient based methods (GC [21], CC-RFE [22]), PLV feature based methods (PLV-FS [20],PLV-RFE [22]), coherence feature based method (PDC [19]) and CM/TFM ( [25]) method for three EEG datasets. We use the ICCD optimization with 'above the mean'/1.5 IQR rule for proposed method that yields the best classification accuracy. Table 6 lists the classification accuracies of the proposed method and existing methods. The proposed method achieves the best classification accuracy of 90% among existing connectivity based methods. Table 7 lists the classification accuracies of the proposed method and existing methods. The classification accuracy of dataset Ib is lower than that for BCI competition II dataset IV as the former was recorded by a paralyzed patient with ALS (low SNR) [31]. Our proposed method generates the enhanced correlation coefficient features by preprocessing the EEG signals with the optimal rotation and shows highest classification accuracy among the methods examined. Table 8 shows the 10 × 10 cross-validation results of the proposed method and existing connectivity based methods. The proposed method shows the best performance for subjects 1 and 3 and also yields the highest classification accuracy on average. A two-tailed paired t-test shows that the performance of the proposed method is significantly better than GC and PDC method (p < 0.1) and is comparable to the CM/TFM and PLV based method that utilize both correlation coefficient and time-frequency information or phase information.

V. CONCLUSION
We describe a novel feature extraction method for cognitive task classification using improved correlation coefficient features by preprocessing EEG signals. We roughly mitigate the mild interference of EEG signals using ASF. The ASFfiltered EEG channel pairs are demixed using a rotation matrix that performed optimization according to one of three criteria, i.e., ICCD maximization, WCCD minimization, and FR maximization. The various feature selection methods are applied to select 'good' correlation coefficient features. The method employing ICCD optimization with the 'above the mean'/1.5 IQR rule performs substantially better than existing brain connectivity based feature methods.

A. ICCD MAXIMIZATION
With the approximation of ρ (k,p) i the ICCD, D (k,p) (θ ), can be written as the follow: Therefore, the optimal angle maximizing D (k,p) (θ ) is approximately given by: The derivation of F k,p (θ) is obtained as follows: (see eq.49) at the top of the next page.
The optimal angle maximizing F (k,p) is approximately given by: (see eq.50) at the top of this page.