Discriminative Feature Selection-Based Motor Imagery Classification Using EEG Signal

Achieving a reliable classification of motor imagery (MI) tasks is a major challenge in brain–computer interface (BCI) implementation. The set of relevant and discriminative features plays an important role in the classification scheme. This paper presents a supervised approach to select discriminative features for the enhancement of MI classification using multichannel electroencephalography (EEG) signal. The dimension of multiband feature space is reduced using the feature selection method. Each trial of the multichannel EEG signal representing MI tasks is decomposed into a finite set of narrowband signals. The common spatial pattern-based features are extracted from each subband. The features obtained from the multiple subbands are combined to derive a high-dimensional feature vector. The neighborhood component analysis-based feature selection method is implemented to select the features that are relevant in performing an accurate classification. It is a nearest-neighbor-based approach to learn the feature weights with regularization by maximizing the average leave-one-out classification accuracy over the labeled training data. The selected features are used to train the support vector machine for classification. The features relatively irrelevant to the classification task are discarded, yielding a reduction of feature dimension. The evaluation of the proposed method is performed using BCI Competition III dataset 4a and IV dataset 2b. Both are publicly available datasets and are used as types of benchmark data to evaluate the MI classification algorithm to implement BCI. The obtained simulation results confirm the superiority of the proposed method compared to the recently developed algorithms.


I. INTRODUCTION
A Brain-Computer interface (BCI) decodes the movement imagination, also called the motor imagery (MI) of the brain, to issue a command without any peripheral nerve or muscle activity [1]. It has potential applications in neuroscience and neuro-engineering. MI has been used to encourage neuroplasticity in a patient's brain after a stroke [2]. Thus, recent applications of BCI with appropriate feedback offer neurorehabilitation to assist stroke patients in restoring their impaired motor functions [3], [4]. The usage of prosthetics, robots, and The associate editor coordinating the review of this manuscript and approving it for publication was Juan Wang . other electronic devices used in neurorehabilitation tasks is fully controlled by motor imagination [1], [5]. The kinesthetic, auditory, or visual feedback to the subject is used to stimulate the response of the brain after a stroke. The use of non-invasive electroencephalography (EEG) is a comfortable and relatively easy method for BCI implementation. The BCI user's brain activity is typically measured using EEG [6]. The EEG-based design of BCI application is an extremely challenging task [7]. There are two types of MI-based BCI: asynchronous and synchronous. The subject controls the task and its timing without any external cues in asynchronous BCI [8]. It is more appropriate to implement real-time BCI applications but such BCI system requires processing of the VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ brain signals continuously. In synchronous BCI, the cue is provided for a fixed time duration in which the subject needs to perform a mental task [9]. In this paper, the experimental setup is confined to implementing synchronous BCI. The principal goal of BCI is to recognize the correct intention of brain activity during MI tasks, leading to a translation of the intention into an equivalent control command. To achieve this goal, suitable features are extracted and the appropriate machine learning scheme is employed to perform the classification of MI tasks recorded by multichannel EEG. Thus, discriminative feature extraction from EEG signal is one of the key stages in designing an MI-based BCI application. The common spatial pattern (CSP) is one of the methods used to extract potential features from multichannel EEG [10]. It implements optimal spatial filter from the training samples of recorded EEG signals. CSP derives a weighting matrix for the electrodes based on their significance in the classification task. Later, the performance of the MI-based BCI (MI-BCI) is improved with CSP by accomplishing different physiological events, such as event-related desynchronization and lateralized readiness potential [11]. Instead of using only spatial pattern, the common patterns of the spatial, time, and frequency domains are jointly considered to extract the potential features from EEG [12]. The features of the spatio-temporal discrepancy signal are derived from the EEG to classify two classes of MI tasks [13]. The deep learning approach with variational autoencoder is employed in [14] for EEG classification. The images achieved by time-frequency representation of EEG signals are used as the input to the deep neural network. The features obtained by regularized Riemannian transformation are used to classify MI tasks with reduced calibration time of EEG [15]. The narrowband signals containing the significant information about movement imagination enhances the MI classification performance. None of the methods [12]- [15] use subband filtering to extract the signal components representing the MI tasks.
The performance of MI-BCI significantly depends on the selection of frequency bands of the EEG signal from which the features are extracted [7]. The mu and beta rhythmic components evoked in response to the imagination of different movement tasks are promising sources of the features used in MI-BCI [16]. These rhythms are observed in the area of sensorimotor. The movement imagination of the hand and the foot produces the change in the mu rhythm at different brain regions [11]. It is also observed in [17] that the effective fluctuations in brain activity occur in low-frequency components of EEG. Moreover, the changes in mu and beta rhythms occur due to the subject's voluntary movements. To capture the changes within the narrow frequency band, CSP is implemented in each subband of EEG signals for discriminative feature extraction [18]. The selection of relevant features plays an important role in MI classification for BCI implementation. The feature selection approach is not implemented in previous studies [11], [16]- [18].
In MI-BCI studies, a number of methods including filter bank CSP (FBCSP) [19], [20], subband CSP [21], sparse filter-bank CSP [22], and discriminative filter bank CSP [23] have been proposed to extract the features from the narrowband EEG signals for MI classification. The sparse representation of CSP feature is implemented in [24] for two classes of MI discrimination. These works promote the implementation of subband CSP to obtain the discriminative features, thereby yielding a reliable classification of MI tasks. Therefore, the subband approach with CSP method is implemented in this work to extract the effective features from the narrowband EEG. In addition to the narrowband signals, the wideband signal can contain some apparent features to enhance MI classification. Without considering this issue, the narrowband signals are used for feature extraction in the subband CSP-based methods.
In the subband-based method, feature extraction is performed in individual subbands and combined to yield the feature vector. Thus, the derived feature space has a relatively higher dimension [25]. A number of features included in the feature vector might not be accurate, which can garble the machine learning algorithm and lead to the degradation of its performance [26]. An appropriate subgroup of the obtained features should be selected to perform a reliable classification of MI tasks. The others need to be removed to reduce the degradation of performance and reduce the overfitting and training time of the machine learning algorithm.
There are two types of feature selection techniques: unsupervised and supervised. Unsupervised algorithm selects relevant features without using the label information of the dataset available for training [25], whereas the supervised method requires the proper labels of training data for discriminative feature selection. Usually, it quantifies the distance between the features and the labelled training set by measuring mutual information, correlation, etc. [26]. The supervised technique is effective when the label information is available for training data. The label information is available with the data used in the experiments of this research and, thus, the supervised feature selection algorithm is implemented.
In this paper, subband CSP features are used and neighborhood component analysis (NCA)-based [27] supervised feature selection method is introduced to separate the highly discriminative features to enhance the performance of MI task classification. The multichannel EEG signal is recorded to represent the mental imagination in terms of MI task. The multichannel EEG is passed through a subband decomposition scheme to generate a finite set of narrowband signals. It localizes the signal components that are effective for MI classification within the subbands. The spatial features are extracted from each of the subbands by applying CSP. Then, the features obtained from the individual subbands are combined to yield a high-dimensional feature vector. Discriminative feature selection is performed by using NCA. Thus, the selected feature vector is used for the classification of MI tasks with the support vector machine (SVM). The effects of the number of features on classification accuracy and the performance of the different classifiers are evaluated. The results of the experimental evaluation are compared with those obtained using the recently developed methods.
Regarding the organization of this paper, Section II describes the datasets used in the experiments, Section III details the methodology, Section IV illustrates the experimental results and discussion and, finally, Section V presents the conclusions.

II. DATA DESCRIPTION
Two publicly available BCI competition datasets are used to evaluate the performance of the proposed method. The datasets are described below.

A. BCI COMPETITION III DATASET 4a
The data were obtained from five healthy subjects, denoted as 'aa', 'al', 'av', 'aw', and 'ay' [11]. The age of each subject was between 24 to 25 years. Proper instructions were provided to the subjects, such that the experimental conditions were fulfilled. They sat on a comfortable chair and avoided eye movement. The visual stimulus was presented for 3.5 s. During that time duration, each participant was asked to perform three MI tasks, i.e., right hand, left hand, and right foot movement. Two MI tasks of the right hand and foot were taken into consideration for classification. A total of 280 trials of EEG with 118 channels were recorded for each subject while they performed the MI tasks according to the instruction. In particular, 168, 224, 84, 56, and 28 trials out of 280 were designed as training data for the subjects 'aa', 'al', 'av', 'aw', and 'ay', respectively. The other trials were kept for testing. The training data available with proper labels for the individual subjects are used in this study to evaluate the performances. The recorded signals are filtered using a band-pass filter between 0.05-200 Hz, sampled at 1000 Hz, and quantized by 16-bit resolution. The EEG signal is downsampled at 100 Hz for further processing. The details of the experimental setup are provided in the study [11]. In this study, the 2 s length (0.5-2.5 s) EEG trial is extracted to obtain the meaningful feature that will be used in the classification. It is considered that the first 0.5 s (0-0.5 s) and the last 0.5 s (3.5-4.0 s) are the durations for pre-and post-imagination, respectively.
Channel Selection: A total of 118 EEG channels are used to record the mentioned dataset, including some irrelevant signals, because all of the channels are not required to discriminate the two MI tasks. The selection of relevant and minimum number of channels will be effective in terms of computational cost. The relevant zone of motor activity is the motor cortex region, including the primary, supplementary, and premotor cortex area [28]. The electrodes placed at these areas should be selected. Several studies are performed to use a selected number of channels to design the MI-BCI [29]- [31]. The 18 channels from the area of the sensorimotor cortex are used in [29], while 30 channels are selected in [30], [31] to classify two MI tasks. Considering the previous studies [30], [31], 30 channels in the sensory-motor cortex area are selected for MI task classification, as indicated in Fig. 1. Throughout this paper, the multichannel EEG refers to the signals recorded from the 30 selected electrodes.

B. BCI COMPETITION IV DATASET 2b
The EEG signals were recorded with three channels (C3, Cz, and C4) sampled at 250 Hz and band-pass filtered in the frequency range of 0.5-100 Hz. Nine subjects, namely, B01, B02, . . . , B09 participated in an experiment on MI by performing two different tasks (left hand and right hand). Five sessions were recorded for each subject. The first two sessions and the third session were designed for training without visual feedback and with visual feedback, respectively. The last sessions (4th and 5th) were recorded for testing. The EEG data of the three training sessions were used in this study to evaluate its performance. The 2 s trial length (0.5-2.5 s after starting the stimuli) is extracted to conduct the experiments. More details about the BCI Competition IV (2b) dataset could be found in [32].

III. METHODOLOGY
A multiband approach for dominant feature selection method is implemented here to enhance the classification accuracy of MI tasks for BCI application. A block diagram of the proposed method is shown in Fig. 2.
Subband CSP features are extracted from the EEG signals. A selected subset of features is used to classify the motor imageries with the SVM [33] classifier. Below are the steps to implement the method: (i) The multichannel EEG signal is decomposed into subbands (ii) CSP is used for feature extraction from each subband (iii) The features obtained from the individual subband are combined to derive a feature vector (iv) Discriminative features are selected using neighborhood component analysis based feature selection (NCFS) VOLUME 8, 2020 (v) The SVM classifier is trained with selected features of the labeled training dataset and classification is performed using the test dataset (vi) Finally, the command is generated on the basis of MI classification for BCI implementation

A. SUBBAND DECOMPOSITION
The multichannel raw EEG is often contaminated by electrophysiological noise. Sometimes, such noise power is stronger than that of the EEG signal. Moreover, some narrowband components of EEG signal have a stronger response to the specific MI task. Therefore, the proper selection of subbands would intuitively provide a more accurate classification of MI tasks than using the full bandwidth of the EEG. Related studies claim that most of the brain activities related to MI tasks exist within the frequency band of 7-30 Hz [34]. Based on the experiments, four subbands are used within the frequency range of 8-35 Hz in this study. The subband decomposition is accomplished by applying Butterworth zero-phase band pass filter. The full band (8-35 Hz), mu band (8-13 Hz), low beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22), and high beta with low gamma (22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33)(34)(35) are used here as the usable narrowband signals. The CSP-based features are extracted from each subband and combined to construct a feature vector with high dimensionality.

B. FEATURE EXTRACTION
The extraction of potential features is one of the most crucial stages in the field of BCI. Recent studies have generally investigated how to modify existing methods or develop novel techniques for feature extraction because of the features' direct influence on the performance of the BCI system [1], [6]. One of the most successful and well-known methods in BCI application to extract features from multichannel EEG is CSP [15], [16]. It decomposes a multichannel EEG into a number of additive components. Basically, it is a linear transformation to project a high-dimensional EEG signal into a low-dimensional spatial subspace with a projection matrix. Any row of the projection matrix consists of the weights of the EEG channels. Such transformation maximizes the variance of two-class signal matrices. It performs the simultaneous diagonalization of the covariance matrices derived from both of the classes [35]. The spatial filter is designed such that the variance of filtered data from one class is maximized while that of the other class is minimized. The resultant features minimize the intra-class variance while maximizing the inter-class variance. It increases the separation between the two classes in terms of variance [34]. Such attribute of CSP makes it an effective spatial filter to classify MI tasks using multichannel EEG classification. The first CSP-based spatial filter was implemented in [36] to effectively classify movement-related EEG for BCI implementation. Let E i,1 and E i,2 ∈ K ×L denote the EEG training trials selected from the two different classes with dimensions K ×L, where K represents the number of channels and L is the number of discrete samples. CSP method derives the features based on the simultaneous diagonalization of the covariance matrices of both classes. It finds a spatial filter w∈ K to transform the EEG data with a projection matrix, such that the ratio of variance between the two classes becomes maximized [35].
and Y c is the number of trials belonging to class c (c = 1, 2). The optimal solution of Eq. (1) can be obtained by solving a generalized eigenvalue problem.
A matrix w = [w 1 , w 2 , . . . , w 2M ] ∈ K ×2M including the spatial filters is formed by the eigenvectors corresponding to the M largest and smallest eigenvalues. For a given EEG sample E, the feature vector is constructed as where var(.) represents the variance. Log transformation is done in order to normalize the elements of x m . A selected number of features are extracted from each subband using CSP. All the features obtained from the subbands are combined (simple concatenation) to generate the feature vector for the respective trial.

C. FEATURE SELECTION
Not all of the features in a high-dimensional feature vector are effective for classification. In fact, some features often degrade the performance of the machine learning algorithm. The aim of feature extraction is to provide appropriate discriminative information to enhance the object classification performance. Hence, in machine learning approach, feature selection is an important part for choosing the best set of features from all that are available. Another objective of feature selection method is to suppress the irrelevant features with minimization of information loss. The discriminative features are selected in this study from the raw features space using NCA [27]. It is a supervised learning method for classifying multivariate data into distinct classes according to a given distance metric over the data [37]. It is non-parametric, that is, it does not require any parameter or assumption about the statistical distribution of the samples. It ranks the features with regularization to learn the feature weights for minimization of an objective function that measures the average leave-one-out (LOO) classification loss on the labeled training data.
The set of training samples is defined as . . C} represents its corresponding class label, C is the number of classes, and N is the number of training samples. The goal of the feature selection algorithm is to find a weighting vector w that leads itself to select features by optimizing the nearest neighbor classification. In terms of the weighting vector w, the weighted distance between the two samples x i and x j is defined by: where w l is a weight associated with the lth feature. The LOO technique is considered to maximize the classification accuracy on training set S. The probability distribution is an effective assumption to select any reference point from S for classification. Here, the probability of x i selecting x j as a reference point is given by: where τ (z) = e (−z/α) is a kernel function and its width α is an input parameter. It influences the probability of each point being selected as the reference point. There are two limiting cases (α →0 and α → +∞) of α. If α →0, the term (-z/α) becomes undefined and unable to select reference points in a probabilistic way. The nearest neighbor of query point can be selected as the reference to resolve this exceptional case. For α → +∞, p ij →1/N c (except for i = j, N c is the number of candidate points) and all the candidate points have the same chance to be selected as reference points apart from the query point. Then, the probability for correct classification of the query point x i is given by: with y ij = 1 only for y i = y j , and 0 otherwise. Therefore, the objective function can be defined as [37]: where ζ (w) = i j y ij p ij is the approximate LOO classification accuracy. If α →0, ζ (w) becomes a true classification accuracy. In order to perform feature selection and manage overfitting, a regularized parameter η (>0) is introduced. The value of η can be tuned by cross-validation. Being differentiable, the derivative of ϕ(w) can be computed as: The above-mentioned derivative leads to the corresponding gradient-based update equation. Thus, the obtained weight vector is used to select the features. Each feature is ranked using its corresponding weight and the desired number of top-ranked features are selected to be used in the classification. The steps for the NCFS method are illustrated in Algorithm 1 [37].

IV. EXPERIMENTAL RESULTS AND DISCUSSION
The well-known publicly available datasets BCI Competition III (4a) and IV (2b) are used to conduct the experiments to evaluate the performance of the proposed VOLUME 8, 2020 method. Each trial of the datasets is decomposed into four subband signals and CSP-based features are extracted from each subband. The features of all subbands are combined to derive a high-dimensional feature vector. The NCFS-based supervised method is used to select the discriminative features. Thus, the obtained features are used for training SVM with linear kernel, followed by the evaluation of the classification performance with test data. For each subject, each trial of 2 s duration is extracted from the EEG data. The details of the data extraction method are described in Section II.
The fullband (8-35 Hz) and the three other subbands of channel C4 for subject 'aa' selected from dataset BCI Competition III (4a) are shown in Fig. 3. The CSP is applied on each frequency band to extract the spatial features. Four pairs of spatial filters resulting in eight features are selected from each subband. The CSP features obtained from each of the four bands are combined to comprise 32 (= 4 × 8) dimensional feature vectors for each trial. Then, NCFS-based supervised feature selection algorithm is applied on the highdimensional feature space. It uses the label information of training data and assigns a weight for each feature. The features are ranked based on the weights determined by NCFS method. A number of high-rank features are selected according to the obtained ranking. The UDFS, which is an unsupervised approach, is introduced in a previous work [35]. The class labels are available in the dataset BCI Competition III (4a). Thus, the supervised method for feature selection fits this experimental setup well. The values of the parameters used in the NCFS algorithm are set as: γ = 2, α = 1, η = 1/N, δ = 1.0e-06. The SVM is trained using the obtained feature of reduced dimension. The features of the same indices (as defined in the training set) are selected from the test dataset to evaluate the MI classification performance of the proposed method. The classification accuracy of each subject is measured by implementing the k-fold (here k = 5) cross-validation approach. For an individual subject, the dataset is divided randomly into k equal groups. The (k-1) groups are assigned for training and one is designated for testing. The process is repeated k times. The classification accuracy is obtained by averaging the results of the k repetitions. The performance of the classification is evaluated by A cc = 100×(T C /T N ), where T N and T C are the numbers of trials in the test dataset and the number of trials correctly recognized out of T N , respectively.
Different experiments are conducted with BCI Competition III dataset 4a to illustrate the effectiveness of the proposed feature selection approach. The performances in terms of classification accuracy of MI-BCI using simple CSP-SVM (without feature selection), UDFS-based [35] feature selection, and the proposed NCFS-based feature selection methods for the five subjects are presented in Fig. 4. The CSP-SVM method is implemented with a full feature space. None of the features are discarded from the feature vector. The performance of CSP-SVM is always lower than that of the other two methods. It is observed that the dominant feature selection approach improves performance, whereas NCFS exhibits superior performance for all the subjects as well as on average across the subjects. The average MI classification accuracy of NCFS is 2.33% and 7.38% higher than that of UDFS and CSP-SVM, respectively. Using cross-validation procedure, individual repetition may produce a slightly different result. The average accuracy across all of the repetitions is taken as the final result for every subject.
After measuring the performance in terms of classification accuracy, the statistical test Friedman's one-way analysis of variance (ANOVA) is performed to study the significance level. Friedman's ANOVA is a non-parametric test [38] performed to detect the differences in the methods, including the proposed NCFS. Considering the result of Friedman's ANOVA, the methods have a significant main effect on accuracy (p < 0.006). To test the statistical significance of the methods, the Tukey-Kramer-based posthoc test is performed [1]. From the results of the post-hoc test, the NCFS-based method achieves a more significant An important reason for the performance improvement of the proposed method is the effective selection of features dominating in the correct classification. The two classes' raw features and the selected features using UDFS and NCFS of the same trial are shown in Fig. 5. The raw features include all of the 32 features, whereas the top-ranked 15 features are presented to illustrate the effectiveness of feature selection. It is observed that the top-ranked features are more separable from one class to another and, thus, are more discriminative than the raw features. The features selected using NCFS have higher disjointedness between the classes than that of UDFS (as observed in Fig. 5).
The number of features selected for the MI classification is one of the vital factors that affect accuracy. The performances of the individual subjects as a function of the number of selected features using NCFS are illustrated in Fig. 6. In addition, the mean values across the subjects are presented together. The accuracies are varied over the dimension of selected features and the maximum accuracies of individual subjects are achieved with different numbers of selected features. The number of features corresponding to the maximum average accuracies across the subjects is taken as the feature dimension to conduct the rest of the experiments. The comparison of average accuracy between the UDFS [35] and the proposed NCFS as a function of the number of selected features is shown in Fig. 7. It is observed that the maximum classification accuracies of UDFS and NCFS are achieved by using 8 and 10 selected features, respectively. The average accuracy of the NCFS-based method is higher than that of UDFS method for the low-dimensional features space.
The MI classification accuracies of the proposed feature selection with different classifiers, namely, SVM [33], linear  discriminant analysis (LDA) [39], and k-nearest neighbor (KNN) [40], are studied. The results for the BCI Competition III dataset 4a are presented in Table 1. In all of the cases, SVM performs better than LDA and KNN. Although LDA outperforms with subject 'aa' for NCFS, the average accuracy of SVM is significantly higher than that of the other classifiers. The proposed NCFS method with any classifier performs better than UDFS. The Tukey-Kramer-based posthoc test is performed to verify the significance of SVM with NCFS compared to LDA and KNN. From the results of the Tukey-Kramer-based post-hoc test, the SVM classifier with the proposed NCFS achieves significant improvement of classification accuracy for MI-BCI over the subjects than other classifiers (NCFS-SVM vs. NCFS-LDA: p < 0.05; NCFS-SVM vs. NCFS-KNN: p < 0.04).
The BCI Competition III dataset 4a is also used to evaluate the MI classification performance of several recently reported methods [41]- [44]. The comparative performances in terms of classification accuracy of the proposed method with the recently developed algorithms are illustrated in Table 2. The average classification accuracy over all subjects of the proposed approach is 92.20%. The performance of this method is compared with the methods implemented using regularized Riemannian features (RRF) [15] and sparse group representation model (SGRM) of the CSP features [24]. The average classification accuracies of RRF and SGRM with dataset III (4a) are 87.21% and 77.70%, respectively. It is noted that the Riemannian manifold-based feature is used in TABLE 2. Classification accuracy (%) on BCI competition III dataset 4a. the performance of the proposed method is compared with that of the recently developed seven algorithms. for each of the five subjects, the best result is marked in boldface.
regularized Riemannian features (RRF) [15] method rather than CSP. The attractor metagene-based feature selection is used in [41] with proper parameter optimization of SVM (AM-SVM) to implement the MI classification for BCI application with an average accuracy of 85.00%. The NCFS-based method achieves a noticeable improvement of accuracy using the effective method for discriminative feature selection. It is observed that SSCSP method [42] uses sparse CSP to obtain an accuracy of 73.36%. The spatial regularization of CSP is implemented in SRCSP [43] with a classification accuracy of 76.37% using BCI Competition III (4a) dataset. The transfer kernel common spatial pattern (TKCSP) is introduced by Dai et al. [44]. The proposed method outperforms TKCSP by 13.44% in terms of classification accuracy. Moreover, the average performance across all subjects of our previous work involving UDFS [35] is 2.33% less than the accuracy of the NCFS-based proposed method.
Friedman's ANOVA is performed to study the significance level. The test is performed to detect the differences in the performances of the various methods, including NCFS. According to the result of Friedman's ANOVA, the methods have a significant main effect on classification accuracy (p < 0.05). To test the statistical significance of the methods mentioned in Table 2, the Tukey-Kramer-based post-hoc test is performed. Based on the results of this statistical test, the NCFS-based proposed method achieves a more significant improvement of performance for MI-BCI over the subjects than other methods (NCFS vs. RRF: p < 0.04; NCFS vs. SGRM: p < 0.01; NCFS vs. SSCSP: p < 0.02; NCFS vs. SRCSP: p < 0.03; NCFS vs. TKCSP: p < 0.03; NCFS vs. AM-SVM: p < 0.03; NCFS vs. UDFS: p < 0.05).
The MI classification accuracies of all the nine subjects of BCI Competition IV dataset 2b obtained by the proposed method NCFS are illustrated in Table 3. The values of the parameters used in the NCFS algorithm are kept similar to the implementation of BCI Competition III dataset 4a. To obtain the maximum average MI classification accuracy using BCI Competition IV dataset 2b, the eight top-ranked features are used. The average accuracy over the subjects achieved by NCFS is 81.52%. The results are compared with the recently TABLE 3. Classification accuracy (%) on BCI competition IV dataset 2b. the performance of the proposed method (NCFS) is compared with that of the two recently developed algorithms (DLAV [14], SGRM [24], and UDFS [35]). for each of the nine subjects, the best result is marked in boldface.
developed three methods, namely, deep learning with variational autoencoder (DLVA) [14], SGRM [24], and UDFS [35]. The average accuracies over the nine subjects derived by DLVA [14], SGRM [24], and UDFS [35] are 78.19%, 78.24%, and 78.40%, respectively. The average MI classification accuracy for the BCI Competition IV dataset 2b of the proposed method outperforms the three mentioned algorithms at least by 3.12%. The Tukey-Kramer-based post-hoc test is performed to test the statistical significance of the methods. Based on the results of this test, the NCFS-based proposed method achieves more significant improvement of performance for MI classification accuracy over the subjects than most of the mentioned methods (NCFS vs. UDFS: p < 0.01; NCFS vs. SGRM: p < 0.03; NCFS vs. DLVA: p = 0.324). Although the performance improvement of NCFS is not statistically significant compared to DLVA [14], the average classification accuracy of the proposed NCFS-based method outperforms DLVA by 3.33%.
The BCI Competition IV dataset 2b is also used in [13], [18]. In both of the methods, each EEG trial is divided into segments that are 2 s in length with a 1.9 s overlap. The effective length of the segment is only 0.1 s. Then, the maximum Kappa value is selected over the time course and employed as the evaluation criterion. It is a somewhat different way to evaluate the MI classification accuracy compared to the traditional approach, yet higher classification accuracies are obtained. In the proposed method and in the methods mentioned in Table 4, each trial is considered as a single sample for classification. Then, the classification accuracy (%) is measured by the number of correctly recognized samples over the total number of test samples.
Feature selection has a vital role in MI classification. The features that are relevant in performing the classification are selected while removing the irrelevant or less important features that do not contribute much to the target variable in order to achieve better accuracy for the classification. Irrelevant or partially relevant features can negatively impact model performance. Thus, the feature selection method has certain advantages in improving the classification performance. The mean accuracy (over all subjects) without feature selection is much lower than that of the methods with feature selection approach. The reason is that the method without feature selection uses additional features that are not relevant and also decrease the performance of the classifier.  The features extracted from the different narrowband signals also has a significant role in the improvement of classification accuracy. The features selected from the different subbands for subjects 'av' and 'ay' of dataset BCI Competition III (4a) are illustrated (single trial) in Fig. 8. The 10 desired features are selected from all subbands for subject 'av', whereas none of the 10 features are selected from subband 22-35 Hz for 'ay'. It is noticed that different narrowband signals (subbands) contribute to building the subset of discriminative feature using the NCFS-based method. The evaluation results validate that the use of selected features improves the classifier performance. It is also observed that the proposed NCFS-based approach outperforms the recently reported algorithms.
The proposed NCFS-based method has some limitations. A fixed time window starting at the same temporal location of recorded EEG trial is used in this study. The latency in responding to the stimuli and the duration of MI are subject dependent. Therefore, the use of such time window is not compliant with the concept of BCI. The CSP-based features used in this work are suitable only for the binary class and not extendable for multiclass MI classification problems. Keeping these limitations in mind, this work can be extended in the future to overcome the mentioned drawbacks.

V. CONCLUSIONS
A supervised feature selection method is implemented in this paper for MI classification using EEG signals. The experimental evaluation is performed by publicly available BCI Competition III dataset 4a and BCI Competition IV dataset 2b. With the first dataset, 30 out of 118 channels are used to represent a two-class (right hand and right foot movement) MI task for EEG classification in the BCI paradigm. The multichannel EEG is decomposed into four subbands that include mu, low beta, high beta, and fullband within the frequency range of 8-35 Hz. Four pairs of CSP features are extracted from each subband and then combined to derive a high-dimensional feature vector. Not all of the features are always relevant for classification. The proper elimination of irrelevant and redundant features makes the feature vector more discriminative and, thus, improves the classification performance. The proposed NCFS method effectively selects the discriminative features.
An unsupervised feature selection method is implemented in previous work [35]. Given the label information available in both BCI Competition III dataset 4a and BCI Competition IV dataset 2b, the supervised approach is more suitable. The proposed supervised feature selection method outperforms the unsupervised approach UDFS [35], as illustrated in Table 2 and 3. It is a non-parametric method, that is, it does not require any information about the statistical distribution of the samples. Along with reducing the amount of data used in machine learning, it alleviates the effect of the problem of dimensionality to improve the algorithms' generalization performance. In this study, the optimal number of features for all subjects in a dataset is effectively implemented. It is extendable for multiclass problems of MI classification in the BCI paradigm.
Instead of using a filter bank to separate the rhythmic components, a number of bandpass filters are designed to extract the narrowband signals containing the components suitable for movement-related MI classification. In addition to the subband signals, the CSP-based features are also extracted from the full band (8-35 Hz) EEG signals. The inclusion of the fullband signal has a vital role in the discrimination of MI tasks. The scenario becomes clear when the proposed feature selection approach is implemented. The discriminative features are selected using NCFS from different subbands, as well as fullband EEG signals. Different experimental evaluations are conducted for the two-class MI-based EEG classification problem. The obtained results are compared with different recently developed algorithms. The experimental results establish that NCFS-based supervised feature selection with SVM classifier outperforms the recently developed algorithms. Thus, the proposed combination of fullband and subband signals, as well as the implementation of feature selection approach, enhances the MI classification accuracy, as presented in Table 3