Eliminating or Shortening the Calibration for a P300 Brain–Computer Interface Based on a Convolutional Neural Network and Big Electroencephalography Data: An Online Study

A brain-computer interface (BCI) measures and analyzes brain activity and converts it into computer commands to control external devices. Traditional BCIs usually require full calibration, which is time-consuming and makes BCI systems inconvenient to use. In this study, we propose an online P300 BCI spelling system with zero or shortened calibration based on a convolutional neural network (CNN) and big electroencephalography (EEG) data. Specifically, three methods are proposed to train CNNs for the online detection of P300 potentials: (i) training a subject-independent CNN with data collected from 150 subjects; (ii) adapting the CNN online via a semisupervised learning/self-training method based on unlabeled data collected during the user’s online operation; and (iii) fine-tuning the CNN with a transfer learning method based on a small quantity of labeled data collected before the user’s online operation. Note that the calibration process is eliminated in the first two methods and dramatically shortened in the third method. Based on these methods, an online P300 spelling system is developed. Twenty subjects participated in our online experiments. Average accuracies of 89.38%, 94.00% and 93.50% were obtained by the subject-independent CNN, the self-training-based CNN and the transfer learning-based CNN, respectively. These results demonstrate the effectiveness of our methods, and thus, the convenience of the online P300-based BCI system is substantially improved.


I. INTRODUCTION
A BRAIN-COMPUTER interface (BCI) provides a direct human-machine interaction pathway between the brain and external devices without relying on the peripheral nervous system and muscles [1]. It acquires brain signals and translates them into computer commands to control external devices. Electroencephalography (EEG)-based BCIs are some of the most commonly used BCIs. They mainly include P300-based BCIs, steady-state visual evoked potential (SSVEP)-based BCIs, and motor imagery (MI)-based BCIs. In this study, we mainly focus on P300-based BCIs.
A BCI usually requires a subject-specific calibration phase, during which the user is required to perform a specific task while labeled EEG data are recorded for training a subject-specific EEG decoding model. However, the calibration phase is generally tedious and time-consuming, making BCIs inconvenient to use. Some attempts have been made to completely eliminate the calibration phase and build BCIs with instant operation. Such BCIs are usually called zero-calibration/training BCIs or calibration-free BCIs. To build a zero-calibration BCI, a natural idea is to employ a subject-independent model for EEG decoding. Researchers have conducted various offline studies to build subjectindependent P300 detection models. These models are usually obtained using two approaches, i.e., the pooled approach and the ensemble approach [2]. The pooled approach involves training a model such as a convolutional neural network (CNN) [3], [4], [5], [6] or a hierarchical recurrent network [7] on a pool of data derived from multiple subjects to extract invariant patterns across the subjects and then using the This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ obtained model to directly predict for new users. The ensemble approach combines a committee of weak models learned from the EEG data of a pool of subjects or a single subject to create a subject-independent model [8], [9]. Previous studies on building subject-independent models generally achieved accuracies of approximately 60%-90% in offline analyses. In addition to these offline models, in [10], an online zerocalibration P300 spelling system was developed based on a CNN trained with a large dataset containing EEG data from 55 subjects. Another idea for building a zero-calibration BCI is to apply semisupervised learning methods. Researchers first trained a subject-independent model, for example, one based on a support vector machine (SVM) [11] or a linear discriminant analysis (LDA) classifier [12], [13], and then adapted the model based on EEG data recorded from the user and the corresponding labels predicted by the model. In this way, the labeled data for model pretraining are entirely collected from other users, and a subject-specific calibration phase for the user is not needed. However, to our knowledge, such online P300 systems are rarely implemented.
In addition to eliminating the calibration process, other approaches propose shortening the calibration time. When a training set containing EEG signals collected from a pool of subjects is available, researchers typically use this training set along with a small quantity of subject-specific calibration data to build models based on transfer learning methods. For instance, a classifier based on an xDAWN filter [14], a CNN [15], [16], [17], and a reinforcement learning model [18] were previously trained on data acquired from a pool of subjects and then adapted with subject-specific labeled data via incremental training or model fine-tuning. By applying probabilistic frameworks, each parameter of the subject-specific models shared the prior learned from a pool of subjects and was optimized using subject-specific data [19], [20]. Riemannian geometry methods affine transform the covariance matrices of different subjects to center them with respect to a reference covariance matrix and then classify them using minimum-distance-to-mean (MDM) classifiers [21], [22], [23], [24]. In [25], a small quantity of user data was added to the training datasets, each of which contained data from one subject, and the model was trained by an ensemble method. These methods generally obtain accuracies of approximately 75%-90% in offline analyses. For online implementation purposes, several P300 spelling systems [26], [27], [28] and a robot control system [29] based on transfer learning were proposed, and accuracies of approximately 80%-90% were achieved. When a training set containing EEG signals collected from a pool of subjects was unavailable, some other studies trained their models based on a small quantity of subject-specific labeled data as well as unlabeled data recorded during use. These studies were mainly based on semisupervised learning algorithms. For instance, a model was initially trained with a small quantity of subject-specific data and then adapted with unlabeled data [30], [31]. In [32], two models were first trained with a small quantity of labeled calibration data, and then the models taught each other to build a final classifier with unlabeled data using a cotraining algorithm. In [33], the relationship between unlabeled data and labeled data was used to define a penalty term for a regularized discriminant analysis model. Most of these semisupervised learning-based models have achieved accuracies of approximately 80%. In addition to the above offline analyses, Gu et al. pushed the related research to online practices, and accuracies above 85% were achieved [34], [35]. Although existing studies have shown that various methods can build models or develop BCI systems with zero-calibration or shortened calibration processes, such studies are still in their infancy. Most previous studies did not adopt large training datasets, which are more likely to contain individual diversity and provide the possibility to learn invariant brain patterns across subjects. Additionally, most studies only established their models via offline analyses, which require online validations. The performance of the existing online BCI systems with zero-calibration or shortened calibration processes needs further improvement. Therefore, most existing studies can hardly meet the practical requirements of this task.
In this study, based on a CNN and big EEG data, an online P300 BCI spelling system with zero-calibration or shortened calibration is developed. Specifically, three methods for training cross-subject P300 detection models are proposed, including (i) training a subject-independent CNN with a dataset containing EEG signals collected from 150 subjects; (ii) adapting the CNN trained in (i) online via a self-training algorithm based on unlabeled data collected during the user's online operation; and (iii) fine-tuning the CNN trained in (i) through a transfer learning method with a small quantity of labeled data, which are collected during a calibration phase before the user's online operation. Based on these methods, an online P300 BCI spelling system is developed. Twenty healthy subjects participated in our online experiments. The experimental results demonstrated that with the help of a CNN and a training dataset collected from a large pool of subjects, an online P300 BCI with zero-calibration or shortened calibration can be established, which will substantially improve the convenience of the use of P300 BCIs.
The remainder of this paper is organized as follows. Section II presents the utilized methods, including those for data acquisition, P300 detection model establishment, and online decision making. The experimental implementation and results are presented in Section III, and a discussion is provided in Section IV. Finally, the conclusion in Section V reviews the approach developed in this paper.

A. Equipment
During the experiment, EEG signals were collected at a sampling rate of 1,000 Hz with a 30-channel EEG cap (LT 37) following the extended 10-20 system and referenced to the right mastoid. A SynAmps2 amplifier (Compumedics, Neuroscan, Inc., Australia) was used to collect EEG signals. All electrode impedances were maintained below 5 k during the experiment.

B. Subjects
Twenty healthy subjects (14 males and 6 females, aged between 21 and 41 years, average age 25.85 years) participated in all online experiments, which are detailed in Section III-A. The study was approved by the Ethics Committee of Guangzhou First People's Hospital, China. Written informed consent was obtained from all subjects.

C. Graphical User Interface
The graphical user interface (GUI) of the proposed P300 spelling system is shown in Fig. 1. A 4 × 10 button matrix of characters was presented to each subject for stimulus presentation. The paradigm was the same as that employed in our previous study [5]. Specifically, for each trial corresponding to one character input, to prepare the subject, during the 3 s before the stimulus onset, all buttons were not intensified. Upon onset, all buttons started to flash successively in a random order. Each flash lasted for 100 ms, and the interval between the onsets of two successive flashes was 30 ms, which meant that there was an overlap of 70 ms between any pair of successive flashes. Each of the 40 buttons flashed once in each round, and 10 rounds of button flashes formed a trial. No pause occurred between adjacent rounds. Therefore, it took [(400 − 1) × 30 + 100] ms = 12.07s to complete 400 flashes in a trial.
During each trial, to input a character, the subject was instructed to focus his/her attention on the flashes of the character he/she intended to input (i.e., the target) and to keep a running mental count of the number of flashes.

D. A Subject-Independent CNN Model
In this study, a subject-independent CNN, which was established for an offline analysis in our previous study [5], was applied as one of the three P300 detection models. We briefly review the method for training the CNN model in this section for the sake of the completeness of this paper.
1) Training Set Construction: We applied a large EEG dataset collected in our previous study [5] as a training set. To build this dataset, we recruited 150 subjects (128 males and 22 females between 18 and 32 years of age) in an experiment to collect training data. Each subject performed 60 character input trials. During this phase, the target of each trial was randomly specified by the system rather than freely determined by the subject.
2) Data Preprocessing: The EEG signals were first bandpass filtered at 0.5-10 Hz using a fourth-order Butterworth filter. After that, epochs corresponding to each button flash from 0 to 600 ms after the onset of the stimulus were extracted and then downsampled at a rate of 24. Consequently, in each trial, there were N c · N r epochs, and in each epoch, there were 1, 000 Hz × 600 ms × 1 24 = 25 sampling points for each channel. Here, N c and N r are the numbers of buttons (40 in this study) and rounds (10 in this study), respectively. Finally, the signals of each epoch were normalized as follows: where f i, j andf i, j are the unnormalized and normalized signals of channel i at sampling point j, respectively, and f i and σ i are the average and standard deviation of the signal of channel i in the epoch, respectively. After preprocessing, the data of each epoch formed a 30 × 25 matrix denoted as F n s ,n t ,n r ,n c , where n s represents the index of the subject (ranging from 1 to N s ), n t represents the index of the trial (ranging from 1 to N t ), n r represents the flash round index (ranging from 1 to N r ), and n c represents the character index (ranging from 1 to N c ). Herein, N r = 10, and N c = 40.
To reduce the influence of the low EEG signal-to-noise ratios (SNRs) and the short interstimulus intervals (ISIs) of the stimuli, we averaged the preprocessed signals corresponding to the first n r (n r = 1, 2, . . . , N r ) rounds in a trial as follows: In our online study, only X n s ,n t ,N r ,n c (n s = 1, 2, . . ., N s , n t = 1, 2, . . . , N t , n c = 1, 2, . . . , N c ) were used for both model training and online prediction.
A sample X n s ,n t ,n r ,n c was labeled as a positive sample if and only if its corresponding character n c was the target of the current trial. Otherwise, it was labeled as a negative sample.
3) CNN Architecture: We built a CNN with the architecture shown in Fig. 2 for cross-subject P300 detection. This network architecture is similar to the one used in [36]. It contains three convolutional layers and two fully connected layers. All layers except FC5 use the rectified linear unit (ReLU) function as the activation function, while FC5 uses the logistic sigmoid function as its activation function. The network takes preprocessed data X n s ,n t ,n r ,n c as its inputs, and the output can be regarded as the modeled probability of the presence of a P300 potential P y = 1 | X n s ,n t ,n r ,n c ; M , where y is the binary label indicating the presence or absence of a P300 potential with values of 1 or 0, respectively, and M is the model for P300 potential detection.
4) Subject-Independent CNN Model Training: The subjectindependent CNN model was the same as the model employed in our previous offline study [5]. It was established by training a CNN with the architecture described in Section II-D.3 offline using the large training set described in Section II-D.1. The convolutional kernels and weights of the network were initialized with the Xavier initialization method [37]. The model was trained using adaptive moment estimation (Adam) [38] to optimize the mean-squared error (MSE). Since the ratio of positive and negative samples in the training set was 1 : 39, the loss function was weighted by multiplying the positive samples by 39. The model was trained on an NVIDIA GeForce GTX 1080 Ti GPU with CUDA 9.0 and cuDNN v7 using TensorFlow [39]. 5) Online Decision Making: In this study, the subjectindependent CNN was employed online as a P300 detection model for the proposed system. Specifically, in each trial, once the system stopped the stimulus presentation process, each preprocessed signal segment was input into the model, whose output was regarded as the probability of the presence of a P300 potential P y = 1 | X n s ,n t ,n r ,n c ; M . The system output the character with the maximum probability of P300 potential presence as the predicted target, i.e., With the subject-independent model, users operated the system instantly without subject-specific calibration.

E. A Subject-Specific CNN Model Adapted Online by Self-Training
In the following, we propose a semisupervised learning/selftraining method to adapt the CNN model online and improve its performance. Specifically, the user operated the BCI at the beginning without calibration, and the subject-independent CNN was employed as the P300 detection model. After 10 character input trials, the model was automatically adapted online based on the subject-independent model and the data derived from the 10 trials by using the self-training algorithm presented in Algorithm 1. In the next 10 trials, the updated model was employed instead of the subject-independent model for P300 detection and target character identification. Then, the model was adapted online once again based on data recorded in trials 11-20 using the self-training method, and the obtained model was used in the remaining trials.

F. A Subject-Specific CNN Model Fine-Tuned by Transfer Learning
We further propose a transfer learning method to adapt the CNN model and improve its performance. Specifically, before the online operation, the user performed a calibration task containing five character input trials. During the calibration process, the target character for each trial was cued by the Apply the CNN model to the data from N trials. For each trial, we obtain a predicted label as well as a probability showing confidence of the prediction.

3:
Select the 2n (n is the index of the current iteration) trials with the largest probabilities.

4:
Retrain the CNN model using the data from the selected 2n trials with the predicted labels. 5: until The maximum number of iterations (5 in this study) is reached.
computer. The subject-independent CNN was fine-tuned using the calibration data with labels. The fine-tuned CNN was used for online prediction. As described in Section II-C, in each trial, 12.07 s of stimulus presentation was employed. Therefore, it took approximately 1 min to perform the calibration task for each user, which is much shorter than the full calibration process.

A. Experiments
Twenty subjects participated in three online experiments. The order of the experiments was random for each subject. Experiments I, II and III correspond to spelling tasks in which the subject-independent CNN, the self-training-based CNN and the transfer learning-based CNN, respectively, were employed.
Experiment I: An online test was conducted for the system with the subject-independent model. Specifically, each subject performed a spelling task involving the spelling of the following 40 characters: "THE FIVE BOXING WIZARDS JUMP QUICKLY. -510641?".
Experiment II: An online test was conducted for the system with the self-training-based model. Each subject spelled the same characters as those in Experiment I. The experiment containing 40 character spelling trials was divided into three stages. The first stage containing trials 1-10 employed the subject-independent model, while the second stage containing trials 11-20 and the third stage containing trials 21-40 respectively employed the models adapted once (using the data from trials 1-10) and twice (first using the data from trials 1-10 and then using the data from trials [11][12][13][14][15][16][17][18][19][20]. We calculated the performance achieved for each stage, and the performance attained during the last stage was regarded as the performance of the self-training-based model. Experiment III: An online test was conducted for the system with the transfer learning-based model. Specifically, each subject performed a calibration task involving the spelling of five characters cued by the computer. The model was fine-tuned with the data recorded during the calibration process and was then employed for online decision making. After that, each subject spelled the same characters as those in Experiment I.

B. Results of the Online Experiments
In this study, accuracy, defined as the ratio of the number of correctly spelled characters to the total number of spelled characters, was adopted as a performance metric. Moreover, the information transfer rate (ITR) was also applied to evaluate the ability of the system to balance accuracy and spelling speed. The ITR is defined by where a is the accuracy of target character prediction, N c is the number of characters in the GUI, and T is the time needed to spell one character. Herein, N c = 40, and T = 1 60 (1.2 N r + 0.07) min. The results of online Experiments I-III are presented in Table I. As shown in the table, with the subject-independent CNN, the self-training-based CNN and the transfer learningbased CNN, average accuracies of 89.38%, 94.00% and 93.50% were achieved, respectively. These results demonstrated that with the subject-independent CNN, the system was able to achieve satisfactory performance. The performance was further improved when the self-training or transfer learning method was applied.
It is worth noting that the results of Experiment II in Table I were obtained from the last 20 online trials, where the updated CNN model was applied. To explore the difference between the performance achieved before and after the online adaptation process based on self-training, we present the average accuracies obtained across all subjects in the three stages of Experiment II, as shown in Table II. Note that the results of trials 1-10, trials 11-20, and trials 21-40 were obtained with the subject-independent CNN model, the updated CNN model based on the data of trials 1-10, and the updated CNN model based on the data of trials 1-20, respectively. It follows from Table II that the average accuracies increased gradually. With an online adaptation based on the unlabeled data collected from 20 character input trials, the system performed significantly better in trials 21-40 than in trials 1-10 ( p = 0.034), with the average accuracy improved from 87.00% to 94.00%.   flash rounds N r for each trial was 10. In order to explore the relationship between N r and the accuracy as well as the ITR, we conducted an offline test on the changes in the accuracy and the ITR with respect to N r . As shown in Fig. 3, the average accuracy monotonically increased as the number of flash rounds increased for all models. However, the average ITRs increased at first, reached maximum values at approximately 2-3 rounds, and then gradually decreased. The best average ITR was 51.72 bits/min, achieved at 2 rounds of button flashes when the self-training-based CNN was applied.
2) The Performance of the Transfer Learning-Based Models With Respect to the Number of Calibration Trials: In Experiment III, for each subject, the CNN model was fine-tuned using five trials of calibration data and consequently outperformed the subject-independent CNN validated in Experiment I. We further conducted an offline analysis to explore the relationship between the model performance and the quantity of calibration data used to fine-tune the CNN model. Specifically, for each subject, the CNN model was fine-tuned based on the subjectindependent CNN, with the number of calibration trials varying from one to five, and the fine-tuned models were validated with data collected in Experiment III. The average accuracy and ITR are shown in Fig. 4. As seen in the figure, as the number of trials used to fine-tune the CNN increases from one to five, both the average accuracy and the ITR increase gradually.

3) Results of an Offline Analysis Conducted on a Dynamic
Stopping Strategy: The above experiments were all based on a system with a consistent number of flash rounds for all trials. To seek a better balance between accuracy and spelling speed, we further conducted an offline analysis where a dynamic stopping strategy was used in each trial, i.e., the number of flash rounds was adaptive. The dynamic stopping strategy was described in our previous study [5], and we briefly review it here. First, we empirically set the minimum and maximum numbers of flash rounds for each trial to 4 and 9, respectively. Second, in each round after the fourth round of each trial, we fed the data into the CNN model and obtained a predicted target character as well as a probability showing the confidence of the prediction. If the probability was larger than a preset threshold or the number of flash rounds reached 9, the system output the predicted target character. Otherwise, the next round of button flashes progressed. The threshold for the probability in each round was set by applying leave-one-subject-out cross-validation to the training set (collected from 150 subjects). Specifically, the data of 149 subjects were used to train a CNN model, whereas the data of the remaining subject were used for testing. To set the threshold for the fourth round, for each trial in the training set, the data from the first to the fourth rounds were averaged and then fed into the CNN, and a predicted target character as well as its probability showing the confidence of the prediction were obtained. The probabilities were averaged over all trials for the test subject. The probabilities of the 150 subjects were obtained through leave-one-subject-out cross-validation, which formed a distribution. A threshold was set for the fourth round such that the top 20% of the probabilities were larger than it (0.9811 in this study). By using the same method, we set the thresholds for the fifth to the eighth rounds according to the top 40%, 60%, 80%, and 100% of the probability values in the distributions obtained after the fifth to the eighth rounds.
The results of the offline analysis based on a dynamic stopping strategy are shown in Table III. Note that the results of the self-training-based CNN were obtained from the last 20 online trials, where the updated CNN model was applied. Comparing the results shown in Tables I and III, we can see that the dynamic stopping strategy improved the spelling speed with acceptable average accuracies and thus improved the ITR.

IV. DISCUSSION
In this study, we developed a CNN-and big EEG data-based online P300 BCI spelling system with zero-calibration or shortened calibration. Specifically, three methods were proposed to train cross-subject P300 detection models, including (i) training a subject-independent CNN using data collected from 150 subjects, (ii) adapting the CNN online based on a self-training method and the unlabeled data collected during the user's online operation, and (iii) fine-tuning the CNN based on a transfer learning method and a small quantity of labeled  5. Spatial filter obtained with the subject-independent CNN. The spatial filters are obtained by averaging the absolute values of the weights in layer C1 across the ten kernels. Fig. 6. Spatial filters obtained with the self-training-based CNNs. Compared with the spatial filter obtained with the subject-independent CNN shown in Fig. 5, the spatial filters obtained with the self-trainingbased CNNs change slightly for most subjects.
data. The experimental results demonstrated the effectiveness of our system.
The online P300 spelling system developed in this study achieved good performances, with accuracies near or above 90% for all three models. This is probably due to the following reasons. First, deep neural networks have excellent data-fitting abilities, and our dataset included data from a relatively large number of subjects compared with those in existing works. As demonstrated in [5], these two factors provided the model with the possibility of extracting subject-independent features. Second, we adapted the CNN by performing self-training or transfer learning during or before the online operation to further improve its performance. Third, we implemented the P300 spelling system online, and thus, during the online operation, users received feedback regarding the spelling results from the system and accordingly adjust their mental states in real time to better complete the spelling task. Note that the same subject-independent CNN was employed for both the offline analysis (see our previous study [5]) and the online test, and the average accuracies were 83.74% and 89.38%, respectively. The fact that the online test yielded better performance than the offline analysis is probably due to the effect of the feedback presented to the subjects.
To further explore what spatial and temporal features are important for EEG classification and how does parameter update affect the models, we visualize the models before and after the adaptation from two aspects. (i) We first visualize the convolutional kernels of the first convolutional layer C1, which plays a role in spatial filtering. Specifically, for each trained model, the absolute values of the weights in layer C1 are averaged across the ten kernels, resulting in a 30-dimensional weight vector with each entry representing the discriminant power of the corresponding channel. We use this weight vector to generate a topology map to show the importance of each channel to the classification result. The topology maps of the subject-independent CNN, the self-trainingbased CNN, and the transfer learning-based CNN are shown  in Figs. 5, 6, and 7, respectively. From the figures, we can see that after the model adaptation, for both the self-trainingbased CNN and the transfer learning-based CNN, the weights in layer C1 change, reflecting interindividual variability. For instance, we find that the self-training-based CNNs for Subjects 13 and 17 and the transfer learning-based CNNs for Subjects 11, 17, and 18 have relatively large weight changes on the spatial filters, while the weight changes for other subjects are slight. (ii) We then use the gradient-weighted class activation mapping (Grad-CAM) algorithm [40] to produce a coarse localization map highlighting the important time intervals of the signals for EEG classification. Specifically, for each trial, the EEG signal corresponding to the target character is fed into the subject-independent CNN and the self-training/transfer learning-based CNN to obtain a heatmap for each model. For each subject and each model, the EEG waveforms and the heatmaps are averaged across the 40 trials. Several averaged waveforms from the EEG channel OZ and the corresponding heatmaps obtained with the subject-independent CNN and the self-training/transfer learning-based CNN are presented in Figs. 8 and 9, respectively. Note that the subject-independent CNN is applicable for all subjects, and its corresponding heatmaps show some consistency. After model adaptations, the models become subject specific, and the important time intervals vary by subject. This is probably because the EEG signals contain different discriminative components effective for the classification, and these components vary by subject. For instance, as shown in Fig. 8, the time intervals where typical event-related potential (ERP) components (such as N200 or P300) occur are coarsely marked as the important time intervals for the subject-independent CNN, while the selftraining-based CNNs utilize more components in different time intervals for Subjects 2 and 12 and focus more attention on the P300 component for Subject 15. Similarly, in Fig. 9, the classification with the subject-independent CNN mainly relies on a single time interval with typical ERP components, while the transfer learning-based CNNs additionally utilize the signal at approximately 400 ms after the stimulus onset for Subject 17 and pay more attention to the time interval where a P300 component occurs for Subjects 3 and 15.
Compared with existing online BCI systems, the advantages of the system developed in this study are as follows. First, the calibration phase is completely eliminated or dramatically shortened, and the convenience of the BCI system is thus improved. By applying the subject-independent model or the self-training-based model, the system is plug-and-play, which means that new users can operate this system without a calibration phase. It is worth mentioning that although the self-training-based model needs online adaptation, the required calibration data are entirely unlabeled data collected during user operations. Moreover, the model adaptation process does not suspend the user's operation of the system. For the use of the transfer learning-based model, the system requires users to perform a short calibration task. Specifically, the time needed for the calibration of this system, which is approximately 1 min, is much shorter than that for traditional P300 BCIs, which usually take more than 10 min for subjectspecific calibration. Eliminating or shortening the calibration also reduces the mental load for the users; our experiments reflect that most subjects did not feel obvious fatigue. Second, by applying a zero-calibrated CNN (the subject-independent CNN or the self-training-based CNN) as a P300 detection model, this system achieves comparable performances to those of traditional P300 BCIs with full calibration [41], [42], [43]. Additionally, to the best of our knowledge, few studies have implemented zero-calibrated models online. In [10], an online spelling system was developed, and an average accuracy of 85% was achieved after 33 s of button flashes for each trial. Although it was a good attempt to produce BCIs with zero-calibration, this approach still needs further performance improvement. In our study, two zero-calibrated CNNs were implemented online. An average accuracy of 89.38% was achieved after 12 s of button flashes for each trial when the subject-independent model was applied, and the accuracy improved to 94.00% after the model was adapted by selftraining. Finally, this study improves the online performance of the BCI system using a transfer learning-based model. Existing studies based on traditional transfer learning have obtained accuracies of approximately 80%-85% in online experiments [26], [28]. Among the several existing P300 BCI studies based on deep transfer learning, almost all of them only provided offline analyses with accuracies of 70%-90%, and their models needed further online validation [15], [16], [17].
Our experimental results showed that by using the transfer learning-based CNN, the online accuracy could be improved from 89.38% (obtained with the subject-independent CNN) to 93.50%. In addition, all subjects achieved accuracies of 80% or above.
Three models are available in our BCI system. The user can select one model for operating the system according to the following strategy. (i) If the available computing resources are insufficient for supporting CNN retraining, the subject-independent CNN can be conveniently applied. (ii) If the available computing resources are sufficient for retraining the CNN, the self-training method can be used when a calibration phase is not allowed, for instance, when the user does not know how to collect calibration data or when the user is unwilling to perform the shortened calibration phase. Note that there needs to be a period of model adaptation (4 min in this study) via self-training that does not suspend the user's operation of the BCI system. During this period, the performance of the CNN model is improved step by step. (iii) If the available computational resources are sufficient for retraining the CNN and a short period of calibration is allowed, the transfer learning-based CNN is a good choice since comparable performance to that of a fully calibrated BCI model can be achieved with a much shorter calibration time period.

V. CONCLUSION
This study developed an online P300 BCI spelling system with zero-calibration or shortened calibration based on a CNN and big EEG data. Specifically, three methods to train CNNs for the online detection of P300 potentials were proposed: training a subject-independent CNN with data collected from 150 subjects, adapting the CNN online based on a self-training method and unlabeled data collected during the user's online operation, and fine-tuning the CNN based on a transfer learning method and a small quantity of labeled data collected before the user's operation. Based on these methods, an online P300 spelling system was developed. Average accuracies of 89.39%, 94.00% and 93.50% were achieved with the subject-independent CNN, the self-trainingbased CNN and the transfer learning-based CNN, respectively. These experimental results indicated that based on a CNN and big EEG data, an online P300 BCI with zero-calibration or shortened calibration could be built. In future studies, we will extend this system to patients, such as those with strokes or spinal cord injuries, to help them improve their self-care ability.