Statistical Evaluation of Factors Influencing Inter-Session and Inter-Subject Variability in EEG-Based Brain Computer Interface

A cognitive alteration in the form of diverse mental states has a significant impact on the performance of electroencephalography (EEG) based brain computer interface (BCI). Such alterations include a change in concentration levels commonly recognized as being indicated by the alpha rhythm, drowsiness or mental fatigue which occurs during EEG signal acquisition. Change in mental state give rise to a challenge of variability in EEG characteristics across sessions and subjects. Consequently, this variability constitutes to low intention detection rate (IDR) that renders BCI performance unreliable. This study investigates the impact of multiple factors that lead to the poor performance of the EEG-BCI. Five factors 1) concentration level; 2) selection of independent components(IC); 3) inter-session variability; 4) inter-subject variability; and 5) classification methods on the IDR in EEG based BCI. The alpha rhythm, as the indicator of concentration level, is validated, and the relationship between the alpha rhythm and the IDR is studied among sessions. In addition, ICs are examined to determine their effects on the IDR across sessions. The possibility of two sessions to contain similar EEG characteristics is also examined, where both sessions are acquired from the same subject in different days. Moreover, the possibility of two different subjects to containing similar EEG characteristics is examined. Furthermore, to conquer the challenge of variability in EEG dynamics a feature transfer learning (TL) approach is proposed in this study. Furthermore, three classification methods (TL, K-NN and NB) are examined and compared to determine whether multi-source neural information can improve the classification accuracy of individual sessions or subjects. Three EEG datasets acquired using different paradigms are used for experiments. The datasets include steady state motion visual evoked potential (SSMVEP), motor imagery (MI) and BCI competition IV-a dataset. Experimental results have shown that selection of independent components has an effect on the IDR. In this case IC-2 and IC-11 achieved a lowest and highest accuracies of 51% and 100% for SSMVEP datasets, while IC-9 and the double-component (IC-2 and IC-13) achieved a lowest and highest accuracies of 40% and 69% for MI datasets respectively. The second experiment demonstrated that higher alpha rhythm, depicted by a lower IDR corresponds to a lower concentration level. While a lower alpha rhythm depicted by a higher IDR corresponds to a higher concentration level. Moreover, variability within sessions can significantly deteriorate intention detection rate across sessions. As such a decline in accuracy from 82% to 61%, and from 56% to 44% was observed across both SSMVEP and MI sessions during inter-session experiment respectively. Integration of samples from different sessions but same subject resulted in a highest accuracy of 65%, 59% and 40% for SSMVEP, MI and BCI competition dataset. Integration of samples from different subjects resulted in a highest accuracy of 65%, 44% and 48% for SSMVEP, BCI competition and MI datasets. When three classifiers are evaluated and compared to determine whether multi-source neural information can improve the classification accuracy of individual sessions and subjects or domains, both K-NN and NB achieved highest accuracies of 59% and 52% respectively, while TL showed a significant increase with an accuracy of 98% achieved using SSMVEP sessions. In a similarly manner both K-NN and NB achieved highest accuracies of 49%and 42%respectively using SSMVEP subjects,while TL showed a significant increasewith an accuracy of 64% achieved. Furthermore, when 9 MI subjects acquired from BCI competition dataset were used, both K-NN and NB achieved highest accuracies of 68% and 65% respectively, while a significant increase in accuracy was observed when TL is used with accuracy of 99% achieved. In conclusion, the change of alpha rhythm magnitude among sessions significantly affect the IDR across sessions. While component selection across sessions has significant effects due to non-linear and non-stationary nature of EEGsignals.Moreover,merging of ICs fromdifferent sessions, and inter-subject factor introduce challenges of overfitting resulting in low IDR. The classification methods are also found critical, because some advanced classification methods can improve the classification accuracy.


I. INTRODUCTION
using a pairwise performance associativity technique, while 89 there was a 31% decrease in classification accuracy across 90 sessions during inter-session experiment [14]. 91 Reference [16] evaluated the inter-subject variability of 92 BCI performance between paradigms and sessions, whereby 93 three experimentation paradigms (MI, ERP and SSVEP) were 94 considered to determine performance variations across ses-95 sions. Consequently, a highest average classification accu-96 racy of 72.2%, 96.6% and 95.5% for MI, ERP and SSVEP 97 was achieved respectively [16]. In this case for MI paradigm 98 EEG signals were filtered using a 5th order butterworth 99 band-pass filter (between 8-30 Hz), from which log-variance 100 features were extracted using common spatial pattern (CSP), 101 and classified using linear discrimination analysis (LDA) 102 classifier to predict two MI classes [17]. For ERP paradigm 103 EEG signals were filtered using a 5th order butterworth band-104 pass filter (between 0.5-40 Hz) [18], from which spatio-105 temporal features were extracted by calculating the mean 106 amplitudes in 10 discriminant time intervals, and classified 107 using linear discrimination analysis (LDA) classifier to pre-108 dict a target character of a 36 symbol ERP speller [19]. 109 Moreover, CCA was used to detect four SSVEP stimuli in 110 the form of four frequencies (5.45 Hz, 6.67 Hz, 8.57 Hz 111 and 12 Hz) for SSVEP BCI system [20]. [21] investigated the 112 feasibility of adding samples from different days to training 113 set to improve the generalization of the emotions classifier, 114 whereby five sessions in five different days were recorded 115 for each subject. in this case training sample having the 116 same number came from day 1,2,3 or day 4. The intervals 117 between two consecutive sessions of a subject were randomly 118 selected from four intervals (one day apart, two days apart, 119 one week apart and two weeks apart) [21]. Consequently, 120 an average classification accuracy of 64.9%, 68.7%, 70.9% 121 and 73% for day 1, 2, 3 and day 4 was achieved respectively. 122 To achieve the objectives of the investigation EEG signals 123 were transformed to independent components, from which 124 power spectral density was estimated for six frequency bands 125 (theta, alpha, beta1, beta2, delta1 and delta2), then SVM was 126 used to classify three emotion states (neutral, positive and 127 negative) [21], [22], [23]. In this study, we investigate five factors that constitute 129 to low intention detection rate across sessions and sub-

149
The rest of the paper is organized as follows. Section II 150 gives a detailed description of methods used to achieve the 151 objectives of this study namely EEG signal acquisition, sig-152 nal pre-processing, feature extraction, feature selection and 153 feature classification. Section III gives a detailed descrip-154 tion of experimentation procedures. Section IV illustrates 155 results of IC selection, concentration level, inter-session vari-156 ability, inter-subject variability and classification methods 157 experimentations. Section V discusses results for all five 158 experimentations. Finally, some conclusions are provided 159 in Section VI. 161 In this paper, we address the challenge of variability in EEG 162 characteristics that constitute to low intention detection rate 163 in BCI. As popularly used, the raw EEG signals firstly go 164 through independent component analysis (ICA). Five fac-165 tors including changes in concentration level, selection of 166 ICs, Feature classification, inter-session and inter-subject 167 variability are investigated. The first experiment investigates 168 the impact of concentration level on intention detection 169   [24]. In this case our own datasets were filtered 192 using (50 Hz notch, 0.5-60Hz band-pass and CAR filter), 193 while BCI competition IV-a dataset was pre-filtered using 194 (50 Hz notch and 0.5-100 Hz band-pass) [25], [26]. Moreover, 195 runICA algorithm in BCI2000 was then used to transform 196 EEG signals into 16 and 22 ICs respectively [27]. Three sets 197 of features (wavelet, band-power and statistical features) are 198 extracted from artifact free components, whereby statistical 199 computation, WPT and FFT algorithms were used [5]. DEFS 200 algorithm was then used to select relevant features with high 201 predictive capacity, while NB and K-NN classifier were used 202 to predict both SSMVEP and MI classes respectively [23]. BCI competition IV is a publicly available database, consist-206 ing of EEG dataset II-a acquired in a controlled environment 207 [28]. The dataset consists of four MI classes (left, right, 208 both feet and tongue) recorded from the brain using twenty-209 two Ag/AgCl electrodes. EEG signals were recorded at a 210 sampling rate of 250 Hz [29]. In this case electrodes were 211 positioned on the scalp with a distance of 3.5cm apart using 212 a 10-20 position system. A fixed cross '+' was displayed on 213 the screen at t = 0s to signify a start of a trial, while subjects 214 were seated facing a projecting computer screen. An external 215 stimulus in a form of a beeping sound after two seconds 216 (t = 2s) was also used at the beginning of a trial. A visual 217 cue represented by an arrow pointing to four directions was 218 then displayed for t = 1.25s on the screen, from t = 219 3.25s until t = 6s subjects were required to perform motor 220 imagery task as illustrated on the experimentation paradigm 221 in Figure 2 [29].

II. CLASSICAL BCI IMPLEMENTATION
222 2) OUR OWN RECORDED EEG DATASETS 223 Two of our own EEG datasets were recorded using a g.tec 224 EEG recording system from five subjects, all participants had 225 no BCI training prior to the experiment. In this case six-226 teen electrodes positioned on the scalp according to a 10-20 227 positioning system were used [19].    in radian per second, and the filter order denoted by n [33].

265
Moreover, a notch filter at a cut-off frequency of 50 Hz was 266 also applied on both datasets [29], [34].
Furthermore, common average reference (CAR) was applied 269 to eradicate the impact of noise from electrodes, and improve 270 signal-to-noise ration of EEG signals. In this case CAR was 271 computed using (2), whereby the potential between ith EEG 272 channel and reference is denoted by In this case, WPT was computed using (5), in which a daubuchies of order 4 (db4) mother wavelet was uti-310 lized to transform ICs into seven decomposed levels or 311 wavelet packet tree [41], [42]. The output from each decom-312 posed level represents approximation and detail coefficients.

313
A wavelet function denoted by ϕ(t) was used to decompose 314 EEG signals into detail coefficients, while a scaling function Equation (9) representing a distribution factor FD j,g was 362 utilized to prevent selected features from being selected more 363 than once in the same feature vector, in which the total 364 number of features is denoted by NF. Suitably chosen pos-365 itive constant is denoted by a 1 that shows the significance 366 of features in PD. In this case PD j represents subsets with 367 a lower fitness as compared to the average fitness of the 368 entire population, while DNF represents the desired number 369 of features to be selected, and ND j representing subsets with a 370 higher fitness as compared to the average fitness of the entire 371 Equation (10) was utilized to compare the previous itera-375 tion to the current iteration to determine features that have 376 made substantial improvement, and grant higher weights to 377 improved features which are then utilized in the next iteration. 378 In this case FD represents distribution factor [47].
Equation (11) was utilized to determine the number of times 381 a specific feature was utilized within each iteration based on 382 the updated distribution factor [47].
Consequently, several relevant parameters were assigned for 385 the DEFS algorithm to select relevant feature subsets. The 386 desired number of features (DNF) was set to 80, and the 387 population size (PSIZE) was set to 150, while the number 388 of generations (GEN) was set to 1000 as the terminating 389 condition [47].

391
A transfer learning approach is proposed in this section 392 and compared with two supervised machine learning algo-393 rithms namely K-NN and NB, mainly to investigate whether 394 multi-source neural information can improve the classifica-395 tion accuracy of individual sessions or subjects. 402 . In this case, the same procedure was applied 472 for the remaining two conditions (IC-10(Se1-Se2) and 473 IC-11(Se1-Se2)).

474
To be noted that ICA does not have a ordering/sorting 475 mechanism, therefore the order of an ICA component does 476 not have a physical meaning. In this paper, we used the 477 specific ICA components to discuss their impacts on the 478 EEG-BCI performance, and mainly intended to demonstrate 479 that different ICA component selection could lead to different 480 performance, therefore, show that ICA component selection 481 is one factor to be considered. The fourth experiment investigates whether two different 484 subjects possess similar EEG characteristics. Consequently, 485 samples from one IC acquired from one subject (S1(IC-9)) 486 were used for training and validation, while samples from the 487 same IC acquired from another subject (S2(IC-9)) were used 488 for testing during feature classification [14], [15]. The same 489 procedure was repeated for the remaining two conditions 490 (IC-10(S1-S2) and IC-11(S1-S2)). For MI sessions, Se2(IC2) and Se2(IC13) achieved the 529 highest accuracy of 69%, while Se1(IC2) and Se1(IC13) 530 achieved an accuracy of 57% and 48% only, as illustrated 531 in TABLE 1. In this instance a 12% variation in IDR 532 occurred across Se1(IC2) and Se2(IC2), while a 21% differ-533 ence was observed across Se1(IC13) and Se2(IC13). More-534 over, Se1(IC9) achieved the lowest accuracy of 40%, while 535 Se2(IC9) achieved an accuracy of 62% in MI session 2, 536 meaning a 22% variation.

537
From TABLE 1, one finds that individual ICs for 538 SSMVEP sessions yielded significant variation in IDR, 539 from component-to-component and from session-to-session. 540 In this case, a significant decline in accuracy between ses-541 sions was observed across components (IC2, IC5, IC8, IC10, 542 and IC15). Moreover, ICs in both MI sessions in most cases 543 achieved a high success rate, however it varies from IC-to-IC. 544 Some significant variations across sessions were observed (in 545 this case across IC4, IC9, and IC13). Based on these results 546 we can conclude that selection of ICs does significantly affect 547 the performance of both MI and SSMVEP based BCI.   The same process was repeated using NB classifier as illus-580 trated in Figure 4

588
From these experiments, one finds that higher alpha 589 rhythms (representing a lower concentration levels) were 590 generated in Se1 and Se4, and resulted in lower IDRs. How-591 ever, lower alpha rhythm magnitudes corresponding to higher 592 concentration level were observed in Se2 and Se3, which 593 resulted in higher IDRs. This means the existence of negative 594 correlation between the concentration level and IDR for both 595 SSMVEP and MI based BCIs. Therefore, the concentration 596 level is a significant factor to the IDR.

597
These experiments also reveal that the SSMVEP is not 598 mainly relevant to the frequency of flashing as commonly 599 recognized, but also affected by concentration levels of the 600 users.

601
Motor imagery (MI) task is a mental process in which a 602 subject performs imagined limb movements, as such MI task 603 modulation is highly dependent on concentration [39]. This 604 section investigates if the concentration levels across sessions 605 affect the IDR of MI based BCIs. Our own recorded MI 606 dataset was used to facilitate the investigation. As such the 607 highest alpha rhythm magnitude of 232.39 uV 2 associated 608 to 49% IDR was observed in the third session (Se3) using 609 K-NN classifier, while the lowest alpha rhythm magnitude 610 of 202.824 uV 2 is linked to 62% IDR in the second session 611 (Se2), as illustrated in Figure 5(A). Moreover, slightly similar 612 alpha rhythm magnitudes of 222.04 uV 2 and 222.47 uV 2 613 were observed in the first (Se1) and the fourth (Se4) ses-614 sions, where similar IDRs were achieved. Both the lowest 615 alpha rhythm magnitude and the highest IDR were captured 616 in Se2.

617
Moreover, NB classifier was further employed to evaluate 618 the relationship between the alpha rhythm magnitudes and 619 IDRs in MI sessions as illustrated in Figure 5(B). Conse-620 quently, the alpha rhythm magnitude of 232.39 uV 2 associ-621 ated to 48% IDR was observed in the third session (Se3), 622 while the lowest alpha rhythm magnitude of 202.824 uV 2 is 623 linked to 60% IDR in the second session (Se2). Moreover, 624 VOLUME 10, 2022  increased to 95%. When NB classifier is employed an alpha 643 rhythm magnitude of 27.99 uV 2 is associated to 88% IDR in 644 the first session (Se1), while after alpha rhythm magnitude 645 decreased to 23.86 uV 2 in Se2 the IDR increased to 94% in 646 the as illustrated in Figure 6(B). Furthermore, if the alpha 647 rhythm further decreased to 22.46 uV 2 in Se3, then the IDR 648 increased to 98%.

649
From these experiments, one finds that a lower alpha 650 rhythm magnitude corresponds to a higher concentration 651 level denoted by a higher IDR. This further validates the 652 negative correlation between concentration levels and IDRs 653 for the MI tasks.    Figure 7(B). The first dataset in this instance yielded sig-692 nificant results as compared to the second dataset. Moreover, 693 a higher decline in accuracy was observed when K-NN clas-694 sifier is employed as compared to NB classifier. However, 695 IDRs vary across sessions and same subject datasets acquired 696 in different days. Based on these results we can conclude that 697 session-to-session variability does affect the performance of 698 SSMVEP based BCI [10], [14], [16]. 699 This section further investigates the effect of session-to-700 session variability on BCI performance with a focus on 701 MI task. Two MI datasets (MI dataset-1 and MI dataset-2) 702 recorded for the same subject but in different days are used 703 to facilitate the investigation, in which the IDRs across inter-704 sessions and intra-sessions are evaluated. MI dataset-1 is used 705 to extract the features, train and test the classifier. Similarly, 706 MI dataset-2 is used to extract the features, train and test the 707 classifier. Subsequently, an average IDR of 56% is achieved 708 in the first dataset, while an average IDR of 44%, and 52% in 709 the second dataset for both K-NN and NB classifier respec-710 tively. However, the IDR variation across sessions due to 711 multiple factors resulted in a 12%, and 4% drop in IDR across 712 both datasets using K-NN and NB classifier respectively. 713 In this case Se3 achieved a lowest accuracy of 49% and 48%, 714 while Se2 and Se4 achieved a highest accuracy of 62% and 715 64% across all four sessions in MI dataset-1 using K-NN and 716 NB classifier respectively, i.e., a highest decline in accuracy 717 of 13%, and 12% was observed between Se2 and Se3 when 718 both K-NN and NB classifier respectively as illustrated in 719 Figure 8(A). Furthermore, Se2 achieved a highest accuracy 720 of 47% and 54%, while Se4 achieved a lowest accuracy of 721 37% and 48% across all four sessions in MI dataset-2 for both 722 KNN and NB classifier respectively. Consequently, a highest 723 decline in accuracy of 9% and 5% was observed between 724 Se3 and Se4 for both K-NN and NB classifier respectively 725 as illustrated in Figure 8(B). Moreover, sessions in the first 726 dataset yielded higher IDRs as compared to sessions in the 727  second dataset. However, IDRs across sessions fluctuate in 728 turn affect the overall success rate. As such we can conclude 729 that session-to-session variability does affect the performance 730 in the form of success rate of MI based BCI [16], [21].

766
Moreover, the same experiment was repeated using NB 767 algorithm to classify both (Se1-Se1) and (Se1-Se2) samples 768 as illustrated in     Samples from IC9 acquired from the second subject, denoted 807 as IC9(S2), are used for testing the classifier, while using the 808 same method and parameters to extract features from IC9(S2) 809 as IC9(S1) used. This procedure is denoted by CS repre-810 senting cross-subjects, while WS represents within-subject 811 experiment. The resulting IDRs are denoted as IC9(S1-S2) 812 to present using IC9 of the first subject to classify the IC9 of 813 the second subject. IC10 and IC11 are also considered due to 814 the same reason. Therefore, the same procedure is repeated 815 on IC10 and IC11, for both SSMVEP and MI dataset. In this 816 case K-NN algorithm was used to classify both (S1-S1) and 817 (S1-S2) samples. 818   TABLE 3 shows the results, from where one finds that 819 IC11(S1-S2) achieved the highest IDR of 65% in the inter-820 subject experiments for SSMVEP dataset, while IC9(S1-S2) 821 and IC10(S1-S2) achieved only 30% and 38% respectively. 822 Moreover, IC10(S1-S2) achieved the highest accuracy of 823 48%, while IC9(S1-S2) and IC11(S1-S2) achieved 35% and 824 41% respectively for MI dataset. 825 Similarities in EEG characteristics in different subjects 826 is further explored using BCI competition IV-a dataset for 827 inter-subject experiments, where IC9, IC10, and IC11 are 828 used as example to study the effects of inter-subject EEG 829 characteristics on IDRs. TABLE 3 also shows the results, 830 where IC9(S1-S2) achieved the highest accuracy of 44%, 831 while IC10(S1-S2) and IC11(S1-S2) achieved 28% and 34% 832 respectively.

847
Based on these results, it is worth noting that SSMVEP 848 based BCI yielded higher inter-subject IDR, when IC11 849 (S1-S2) is used. However, there was a 29% decline in IDR 850 when compared to IC11(S1-S1), in this case (S1-S1) repre-851 sents training, validation and testing samples from a single 852 subject. A 21% decline in IDR was observed when IC11(S1-853 S2) is compared with IC11(S1-S1), whereby IC11(S1-S1) 854 achieved an accuracy of 69% for MI based BCI. Moreover, 855 a 46% decline in IDR was observed when IC9(S1-S2) is 856 compared with IC9(S1-S1), whereby IC9(S1-S1) achieved an 857 accuracy of 90% for BCI competition IV-a dataset. Using 858 VOLUME 10, 2022 sifier. In a similar manner TL achieved highest accuracies of 888 95% and 89% when Se4 and Se8 are target domains respec-889 tively. Moreover, accuracies of 37% and 52% were observed 890 for NB, while 44% and 55% were observed for K-NN across 891 Se4 and Se8. An accuracy of 88% was obtained for TL 892 across Se2 and 69% across Se5 as target domains. However, 893 a decline in accuracy was observed whereby K-NN achieved 894 accuracies of 40% and 55%, while NB achieved accuracies 895 of 26% and 49% across both Se2 and Se5 respectively. Fur-896 thermore, accuracies of 48% and 50% were obtained for TL 897 when Se1 and Se9 are target domains, accuracies of 52% and 898 43% were observed for NB, while K-NN achieved accuracies 899 of 57% and 46% respectively. Moreover, a lowest accuracy of 900 24% was obtained when Se3 is a target domain, while the rest 901 of the sessions are source domains (Se1∼Se2 and Se4∼Se9). 902 In this case noise, low concentration level or mental fatigue 903 can affect transferability across domains, in turn constitute 904 to low IDR when Se1, Se3 and Se9 are considered as target 905 domain.

906
Based on this results it is worth noting that transfer learning 907 approach can significantly improve classification accuracy, 908 as compared to using samples from multiple sessions to 909 train, while samples from a single session are used to test 910 the classifier. In this case a significant decline in accuracy 911 was observed for both K-NN and NB when samples from 912 8 different sessions are used to train and 1 session to test 913 the classifier. However, a significant increase in accuracy was 914 observed when knowledge from multiple source domains are 915 used to enhance learning performance of the target domain as 916 depicted in Figure 9.   In a similar manner TL achieved an accuracy of 53% when 936 both S3 and S4 are target domains respectively. Moreover, 937 accuracies of 25% and 37% were observed for NB, while 938 45% and 40% were observed for K-NN across S3 and S4 939 respectively.

940
A lowest accuracy of 50% was achieved when S1 is the 941 target domain and the other subjects (S2∼S5) are source 942 domains. Subsequently, a decline in accuracy was observed 943 when samples from S1 are used to test and samples from 944 (S2∼S5) are used to train the classifier. As such both K-NN 945 and NB achieved an accuracy of 40% and 37% respectively. 946 In this case evaluating commonalities across different 947 subjects using samples from multiple subjects to train the 948 classifier, and samples from another single subject to test 949 the classifier resulted in a significant decline in accuracy 950 as depicted in Figure 10. However, an increase in IDR was 951 observed when TL is employed to transfer knowledge in the 952 source domains to a target domain, while IDR deteriorated 953 for both K-NN and NB when samples from five subjects are 954 used to train, and samples from a single subject to test the 955 classifiers. Moreover, based on this results it is also worth 956 noting that all 5 subjects were BCI illiterates and had no 957 prior BCI training before the experiment, as such a factor 958 such as a feeling of tiredness was observed during SSMVEP 959 signal acquisition, which may in turn affect the quality of 960 EEG signal and resulting in low prediction rate. 961 VOLUME 10, 2022 FIGURE 11. Classification results using nine MI subjects.
To further investigate whether neural information from var-  In a similar manner TL achieved highest accuracies of 91% 979 and 94% when S1 and S8 are target domains respectively. was observed across S6, whereby NB achieved an accuracy 992 of 42% and K-NN an accuracy of 51%, while an increase 993 in accuracy was observed when S6 is the target domain, 994 in this case TL achieved an accuracy of 69%. Moreover, 995 a lowest accuracy of 52% was obtained when S2 is a target 996 domain, while the rest of the sessions are source domains 997 (S1 and S3∼S9).

998
From these results it is worth noting that learning tasks in 999 unrelated domains during BCI training turns to show intense 1000 individual variations across subjects. However, a significant 1001 increase in accuracy or learning performance was observed 1002 across domains, when TL is employed to transfer features in 1003 the source domains to the target domains. In this case TL 1004 approach shows significant increase in classification accu-1005 racy as compared to both K-NN and NB when samples from 1006 8 subjects are used to train, while samples from a different 1007 single subject are used to test the classifier. Moreover, sig-1008 nificant variations in CA across all 9 target domains were 1009 observed, and based on these results it is also worth noting 1010 that variability in neural dynamics across different subjects 1011 can result in negative transfer, which in turn deteriorate CA 1012 across target domains.

1013
The performance of all three classifiers is further evalu-1014 ated to determine whether features from different sources 1015 can enhance prediction rate of individual sessions/subjects. 1016

1076
A significant decline in IDR was observed when EEG 1077 signals acquired from different subjects were merged. The 1078 highest accuracy of 65% was observed for SSMVEP when 1079 samples from IC12 of the first subject (S1) were incorporated 1080 with samples from IC12 from the second subject (S2) for 1081 SSMVEP tasks. A maximum accuracy of 48% was observed 1082 for MI-tasks, when samples from IC11 of the first subject 1083 (S1) were incorporated with samples from IC11 of the second 1084 subject (S2). This is interpreted to significant inverse effects 1085 of inter-subject variability on IDRs, meanwhile, the existence 1086 of common inter-subject characteristics is also supported by 1087 the experimental results, although such characteristics might 1088 be weak.

1089
Using neural information from multiple sources to enhance 1090 learning performance of the target domain resulted in a sig-1091 nificant increase in CA, while a decline in CA was observed 1092 when samples from multiple sources are used to train the 1093 classifiers, and samples from a single source to predict. Trans-1094 fer learning in this instance has yielded a significantly high 1095 accuracy as compared to both K-NN and NB classifiers using 1096 SSMVEP sessions, whereby a highest accuracy of 98% was 1097 achieved when Se6 is the target domain, and the remain-1098 ing sessions (Se1∼Se5 and Se7∼Se8) are source domains 1099 as depicted in Figure 9. In this case both K-NN and NB 1100 achieved accuracies of 59% and 52% which is significantly 1101 low as compared to TL. In a similar manner TL yielded a 1102 highest accuracy of 64% as compared to both K-NN and 1103 NB using SSMVEP subjects, when S5 is the target domain 1104 and (S1∼S4) are source domains. K-NN yielded a highest 1105 accuracy of 49% when samples from (S1∼S4) are used to 1106 train and S5 to predict, while NB achieved a highest accuracy 1107 of 37% when samples from (S2∼S5) are used to train and 1108 S1 to predict as demonstrated in Figure 10. Furthermore, 1109 TL achieved a highest accuracy of 99% as compared to both 1110 K-NN and NB using BCI competition IV-a dataset, when 1111 S3 is the target domain and (S1∼S2 and S4∼S9) are source 1112 domains. K-NN obtained a highest accuracy of 68% when 1113 samples from (S1∼S2 and S4∼S9) are used to train and S3 to 1114 predict, while NB achieved a highest accuracy of 65% when 1115 samples from (S1∼S7 and S9) are used to train and S8 to 1116 predict as depicted in Figure 11.

1118
This study investigated the impact of five factors contributing 1119 to low IDR of EEG based BCIs, including the selection of 1120 ICs, changes in concentration level, inter-session and inter-1121 subject variability, feature classification. Three datasets (BCI 1122 competition IV-a and our own recorded SSMVEP and MI 1123 datasets) were used to facilitate the investigation. After ICA, 1124 16 independent components were obtained and used in all 1125 experiments. Significant varying IDRs were captured, corre-1126 sponding to concentration levels indicated by the following 1127 factors: (i) alpha rhythm magnitude, (ii) variability within 1128 components, (iii) variability within sessions, (iv) variability 1129 within subjects, and (v) classification methods.