A Novel Multi-Module Neural Network for EEG Denoising

In this paper, a novel multi-module neural network (MMNN) is proposed to remove ocular artifacts (OAs) and myogenic artifacts (MAs) from noisy single-channel electroencephalogram (EEG) signals. This network is a based on deep learning (DL) architecture consisting of multiple denoising modules connected in parallel. Each denoising module is built using one-dimensional convolutions (Conv1Ds) and fully connected (FC) layers, and it estimates not only clean EEG signals but also artifacts. The proposed MMNN has two main advantages. Frist, the multiple denoising modules can purify noisy input EEG signals by continuously removing artifacts in the forward propagation. Second, the parallel architecture allows the parameters of each denoising module to be updated concurrently in the backpropagation, thereby improving the learning capacity of neural networks. We tested the network denoising performance using a recent public database, namely, EEGdenoiseNet. The results revealed that the proposed network reduced the temporal relative root mean square error (T-RRMSE) and spectral relative root mean square error (S-RRMSE) by at least 6% and enhanced the correlation coefficient (CC) by at least 3% over the state-of-the-art approaches. These significant performance improvements were confirmed by observing the deviation distribution between the denoised and clean signals. Furthermore, the proposed network achieved a similar performance efficiency with only 60% of the training data compared to the existing DL models.


I. INTRODUCTION
Electroencephalography (EEG) is a safe, reliable, and relatively non-invasive measurement tool to study human brain activity. EEG signals are used in various fields, such as the diagnosis and treatment of diseases [1][2] [3], investigations of brain's neurobiological mechanisms [4][5] [6], and brain-computer interface (BCI) systems [7] [8]. Related studies strongly rely on whether EEG data accurately represent brain activity. However, noise and artifacts are always contained in EEG signals, and they are entangled with brain activity.
Eye movements [9] and facial muscle activity [10] are two common causes of noise and artifacts in EEG epochs. Eye movements distort the electric field around the eyes and over the scalp, thus causing ocular artifacts (OAs) [11]. Facial muscle activity responds to pressure changes in the upper airway, generating electrical amplitude signals, called myogenic artifacts (MAs) [12]. Recently, many approaches, such as regression [13] [14] [15], adaptive filtering [16] [17], blind source separation (BSS) [18] [19] [20] [21], and empirical mode decomposition (EMD) [22] have been proposed to remove OAs and MAs from EEG epochs. A high-performance denoising approach should be able to accurately remove artifacts in real-time without distorting the signal of interest and be sufficiently robust to reconstruct EEG data in various formats, especially the signals recorded using only a few electrodes. However, these methods have not fully competent in fulfilling these criteria. The regression and adaptive filtering techniques are not efficient for realtime applications, because they need to estimate the transfer and filtering coefficients before removing the artifacts. BSS and EMD are not functional for single-channel signals, because they remove artifacts by decomposing and reconstructing the EEG signals in the time and frequency domains. However, the signal decomposition relies on the independence between channels that the single-channel signals do not have.
One of the key aspects of the DL models is that artifact removal strategies are not designed by human engineers but are learned from data, so the model performance is greatly influenced by training data [38]. However, in many cases, particularly in real applications, it is highly expensive to collect high-quality training data [39]. Therefore, there is value in studying the DL model performance on limited data. On the other aspect, DL models usually run as black boxes [40]. The lack of transparency may hinder DL applications in the medical field because it is difficult for humans to verify whether a complex DL model has expert medical or signal-processing knowledge. Thus, DL models that provide the explanations for their mechanism deserve to be further explored.
In this study, we propose a novel multi-module neural network (MMNN) for EEG denoising. This network can be implemented in real-time and applied to single-channel EEG data. Our contributions are as follows: 1) Artifact removal is defined as the detachment of pure EEG signals from signals containing additive noise, therefore we create a network flow that can constantly decompose and assemble EEG information using the proposed denoising modules. 2) We designed the denoising modules using Conv1Ds and FC layers, aiming to customize a solution specialized at separating OAs or MAs from noisy EEG signals by network learning. Conv1Ds were used to extract and generalize the informative features of brain activity, and FC layers were used to reconstruct the clean signals and artifacts. Their combination acted as an end-to-end trainable filter. 3) Referring to the work of EEGdenoiseNet [37] that provided a publicly available structured database for EEG denoising studies, we compared the proposed network with the existing DL and conventional techniques under the same condition. 4) The model denoising performance when different amounts of learning data are available were explored, and the visualization of its component was discussed.
The remainder of this paper is structured as follows. The description of the experimental materials is given in Section II; Section III presents the proposed MMNN; the experiments and results are given in Section IV; and the discussion and conclusion are presented in Section V and Section VI.

II. MATERIALS
The database used in this study is summarized in Table 1. This database provides large-scale clean EEG and artifact epochs, involving 4514 clean EEG epochs, 3400 EOG epochs, and 5598 EMG epochs. In the previous DL denoising studies [36] [37], these epochs were used to synthesize the training and testing data. Their extraction process is briefly described as follows: 1) Clean EEG data The EEG data are composed of 4514 clean EEG epochs, and each epoch is a single-channel EEG segment of 2s. As described in [36], 64-channel EEG epochs were collected from a public database of motor-imagery BCI [41]. These epochs were then band-pass filtered between 1 and 80 Hz, notch-filtered (50 Hz), detrended, and processed using independent component analysis on ICLabel [42]. Finally, the processed epochs were sampled at 256 and 512 Hz, respectively, cut into single-channel epochs, and manually checked to ensure that each one was clean. 2) Electrooculogram (EOG) data The EOG data contain 3400 single-channel OA epochs with a sample rate of 256 Hz. These epochs were extracted from previous studies [43][44] [45][46] [47]. As described in [36], the data were bandpass filtered between 0.3 and 10 Hz, notch-filtered (50 Hz), and detrended. The extracted OAs were subsequently segmented into 2s per epoch and visually checked by experts.

3) Electromyography (EMG) data
The EMG data consist of 5598 MA epochs, and each epoch is a single-channel EEG epoch with a duration of 2s and a sampling rate of 512 Hz. These MA epochs were collected from [48] and band-pass filtered between 1 and 120 Hz. Afterwards, these epochs were notch-filtered (50 Hz), detrended, and visually checked by experts.

III. METHODS
This section describes the proposed MMNN in detail. We first define the EEG denoising problem and then describe the denoising module, which serves as the basic component in our model, followed by the model structure. Finally, we introduce the synthesis process of noisy EEG signals, as well as training and testing data for OA and MA removals.  [50]. They are generated independent of the clean signals, therefore the relationship among clean signals, OAs or MAs, and noisy signals in EEG recordings can be expressed as [51][52]: = + (1) where , and denote clean signals, ocular or myogenic artifacts, and noisy signals, respectively.
The essence of EEG denoising is to estimate the clean signals using the noisy signals . For DL denoising models, it is challenging to use the prior knowledge learned from distribution to filter [53].

B. DENOISING MODULE
In our design, the denoising module is constructed by four Conv1Ds with rectified linear units (ReLUs), a residual connection, and two FC layers, as shown in Figure 1. Table  2 provides the details of the hyperparameters, and the parameter tuning process of c and k is given in Section III. Notably, the proposed denoising module outputs both clean signals and artifacts. 1) Conv1Ds with ReLUs. The objective of Conv1Ds is to decompose noisy EEG signals, and their parameters need to be learned from training data. Within each Conv1D, the output channel and kernel size determine the computational complexity and filter length of feature extraction, respectively. The zero-padding operation can maintain the structural consistency between the inputs and outputs. ReLUs can improve the model's nonlinearity and avoid the vanishing gradient problem in the learning stage. Conv1D-A first dismantles noisy singlechannel EEG signals into features in multiple dimensions. Conv1D-B, C, and D are subsequently used to continuously generalize the extracted features. The combination of multiple Conv1Ds with ReLUs can build complex mappings, thus dismantling EEG signals more finely. (The discussion regarding the number of Conv1Ds is provided in Appendix. A) 2) Residual connections Residual connections can accelerate network convergence and improve the model's learning ability.

3) FC layers
The function of the FC layers is to reconstruct clean signals and artifacts by connecting the generalized features. The output clean signals and artifacts have the same data size as the input (data size: 1 × T).

C. NETWORK STRUCTURE
The proposed MMNN is built using multiple denoising modules, their number is flexible and can be adjusted according to different denoising tasks. An MMNN assembled using denoising modules (MMNN-n) is shown in Figure 2, where the inputs and outputs of denoising modules are , − to − and , to . The final estimation of clean signals is the sum of , to . In our designed structure, the proposed MMNN constantly purifies the inputs for each denoising module by removing the artifact estimation. Specifically, − replaces itself as the input for the denoising module. According to (1), the former is a purer EEG signal than the latter. Therefore, there is a high probability that the outputs of the denoising module, and , are closer to the ground truth of EEG signals and artifacts in theory. The related discussion is presented in Section V.
Based on the above, a workflow of multiple denoising modules can constantly improve the network performance in theory. However, multi-stacking structures may lead to vanishing gradient problem during backpropagation [54]. To hedge this risk, the proposed model is designed as a parallel architecture, thus allowing the parameters from each denoising module to be updated synchronously. The network architecture is expressed as: where is the final estimation of the clean signal; is the proposed MMNN, is the input noisy signal, and is the number of the denoising modules; ℱ indicates the denoising module, and are the reconstructed clean signals and artifacts, respectively, and = 0.

D. NOISY SIGNAL SYNTHESIS
Using the clean EEG, EOG, and EMG epochs from the mentioned database, we synthesized noisy EEG epochs for model training and testing. The synthesized noisy EEG epochs and clean EEG epochs were the data and labels, respectively. In the training stage, the Adam optimizer [55] was adopted to minimize the mean squared error (MSE) [56] between the model outputs and labels. The details of the noisy signal synthesis are as follows.
To synthesize noisy signals with different noise levels, the signal-to-noise ratio (SNR) as a reference is first given, as shown in (4). It describes the ratio of the true signal to the background noise and is widely used to evaluate noise levels.
where and are the discrete-time clean EEG signal and artifact, respectively; and is the root mean squared value, as defined: where indicates the discrete time point in an epoch of , and is the number of time points in the epoch. For signal synthesis with the given database, denotes any clean single-channel EEG epoch, and is expressed as × , where is any single-channel EOG or EMG epoch and is the parameter used to control the of the noisy signal. By (4), can be derived as: According to (1), a clean EEG epoch and an EOG or EMG epoch can be simulated into a noisy EEG epoch with any SNR level, as shown in (7): = + × (7)

E. TRAINING DATA AND TESTING DATA
Previous studies [57][58] [59] have demonstrated that the SNR values of OAs and MAs are commonly between -7 and 2dB, thus the noisy signals were synthesized within this SNR range. The OA removal task was implemented using 30000 pairs and 4000 pairs of training and testing samples, respectively. The OA removal task was implemented using 30000 pairs and 4000 pairs of training and testing samples, respectively. They were synthesized using 3400 clean EEG epochs (randomly selected from 4514 clean EEG epochs) and all 3400 EOG epochs, according to (6) and (7). For the noisy EEG synthesis of training set, the values were followed a uniform distribution from -7 to 2dB, and were 3000 of 3400 clean epochs and EOG epochs, respectively. The testing set were synthesized using the remaining 400 pairs of EEG epochs and EOG epochs, and the values ranged from -7dB to 2dB at an interval of one. (-7dB, -6dB, -5dB, -4dB, -3dB, -2dB, -1dB, 0dB, 1dB, 2dB).

… …
In the MA removal task, all 4514 clean EEG epochs and 5598 EMG epochs were utilized, where we randomly copied 1084 clean EEG epochs into original EEG epochs, thus producing 5598 clean EEG epochs. Finally, we used 5000 of 5598 EEG epochs and EMG epochs to construct 50000 pairs of training samples, and the remaining 598 pairs were used to construct 5980 pairs of testing samples. The synthesis process was followed the OA removal task. Figure 3 briefly summarizes the above process.

IV. EXPERIMENTS AND RESULTS
In this section, we first present the experimental hardware and evaluation metrics. Then, the hyperparameter tuning process of the denoising module is described. Finally, we compare the proposed model with other DL and conventional approaches through scoring and visualization.

A. HARDWARE AND EVALUATION METRICS
All the experiments were implemented using Pytorch [60] and two GeForce GTX 1080 GPUs in a Linux system. The evaluation metrics included the temporal relative root mean square error (T-RRMSE), spectral relative root mean square error (S-RRMSE), and correlation coefficient (CC), as shown in (8)(9) and (10).
where and are the clean EEG epoch and input noisy EEG epoch, respectively; indicates the proposed model; is the power spectral density function; and are short for the covariance function and variance function. In general, the smaller the T-RRMSE and S-RRMSE values are, the closer to 1 the CC value is, the better the performance is.   Testing set: 5980 pairs of clean EEG epochs (Labels) and noisy EEG epochs (data)

B. HYPERPARAMETER TUNING
To select the hyperparameters (c, k) of the denoising module, we performed a 10-fold cross-validation on the given 30000 and 50000 pairs of training samples in the OA and MA removals, respectively. In comparison with the specified validation dataset, a cross-validation strategy can avoid the problems caused by the unreasonable division of the dataset.
The CC results of the 10-fold cross-validation from MMNN-1 to MMNN-6 are shown in Figures 4 and 5. We can see that the model performance did not significantly improve, but the computational complexity increased when the number of output channels exceeded 32 in both the OA and MA removal tasks. Moreover, the filter lengths of 0.1s (kernel size = 25) and 0.2s (kernel size = 103) were capable of obviously improving the model performance with fewer training parameters in the OA and MA removals. Therefore, we separately chose (32,25) and (32,103) as the module's hyperparameters for removing OAs and MAs.

C. RESULTS OF OA AND MA REMOVALS
We performed EEG denoising using MMNN-1 to MMNN-6. The reference DL models (Appendix. B) included fully connected neural network (FCNN), simple convolution neural network (Simple CNN), complex convolution neural network (Complex CNN), and recurrent neural network (RNN) from [36], and novel Convolutional Neural Network (Novel CNN) [37]. To fairly compare the model performance under the same condition, we trained and tested the models using the same amount of dataset as the references [36] [37], as shown in Figure 3. The learning rate and batch size were 0.0001 and 128, respectively. The trained parameters of our models at the 10th iteration, were used to test the denoising performance.   Tables 3 and 4 show the denoising performance on the 4000 pairs of testing samples in the OA removal and 5980 pairs of testing samples in the MA removal. The results show that the scores of the proposed model can be constantly improved when using one to four denoising modules, whereas more than four denoising modules cannot significantly enhance its performance. Moreover, compared to the best results of the reference models, the proposed model (MMNN-4) reduced the T-RRMSE and S-RRMSE by at least 6.3% and 6.4%, respectively, and improved the CC by at least 3.5% when removing OAs. In the MA removal, it reduced the T-RRMSE and S-RRMSE by at least 6.2% and 6.4%%, respectively, and improved the CC by at least 3.3%. These results illustrate that the proposed model performs well on the given database.
Subsequently, through the visualization of the denoised results, we compared the robustness between the proposed model and the top-scoring reference models for OA and MA removals (Complex CNN and Novel CNN). As shown in Figure 6 and Figure 7, we presented the signal deviation in the time and frequency domains between the denoised results and the sample labels by calculating the absolute values of the noise-free epoch minus the denoised epoch. From the deviation results of the OA and MA removals within the 95% confidence interval, we can observe that the signal deviation of the proposed model (MMNN-4) is closer to the horizontal axis (noise-free situation) and exhibits a smaller range of deviation than the other competitors in both the time and frequency domains, which confirms the relative robustness of the proposed model.

D. PROPOSED MODELS VS CONVENTIONAL MODELS
We further compared the proposed model with three conventional models: Regression [14], ICA [61], and SSP [62]. These models are classic EEG denoising approaches applied to MNE toolbox [63]. Figure 8 and Figure 9 show the score distributions of the OA removal (4000 testing epochs) and MA removal (5980 testing epochs), respectively, where the proposed model achieves higher CC and smaller T-RRMSE and S-RRMSE scores than the conventional ones. According to the ANOVA results with Holm-Bonferroni correction, the performance differences between the proposed model and the classical approaches are significant (all p-values < 0.001) in both the OA and MA removals.

E. OA AND MA REMOVALS ON LIMITED TRAINING DATA
In the applications of DL-based EEG denoising, sufficient high-quality training data are usually unavailable. Therefore, we investigated the robustness of DL models when using limited training data, as shown in Table 5 and Table 6. The proposed MMNN-4 was compared with the top-scoring reference models for the OA and MA removals, Complex CNN, and Novel CNN, respectively, where 10 to 100% of the training data were separately selected from the given database for network learning, and the training iterations and parameters were consistent with the former settings. The results show that the proposed model always has a superior performance over its competitors when using the same amount of training data both for the OA and MA removals. Notably, our model can reach scores similar to the reference ones using only 60% of the training data when removing OAs and MAs.

V. DISCUSSION
In this paper, we proposed a novel DL-based EEG denoising model called MMNN. This model achieved smaller T-RRMSE and S-RRMSE scores and higher CC scores than the other models when removing OAs and MAs. It can reach a performance similar to that of the reference DL models with only 60% of the training data. Through the visualization of signal deviation distribution, the performance differences between the reference and the proposed models are clearly observed in the time and frequency domains.   Overall, the proposed model has a superior performance compared to the reference DL models. There are two reasons for this. First, the proposed model enables constantly providing more purified input signals for denoising modules. As shown in Figure 10 and Figure 11, we present the signal deviation distribution between the inputs of four denoising modules and clean signals. In Figure 10, the OAs in the range of 0-80 Hz were rapidly suppressed using two denoising modules, and the OAs above 80 Hz were gradually reduced when more modules were used. In Figure 11, the MAs of the input signals were suppressed by degrees from the first to the fourth module. Second, the parallel architecture of the proposed model allows the gradients to flow through each denoising module directly in the backpropagation, thereby avoiding the vanishing gradient problem and enhancing the network learning ability. For further clarification, the parallel and series mechanisms of the denoising modules are presented in Figure 12. The training and testing losses of the two mechanisms (batch size = 128 and learning rate = 0.0001) using the given database are shown in Figure 13 and Figure 14. The followings were observed, respectively: 1) both the two models converge in 10 iterations; 2) the training and testing losses of the parallel mechanism are smaller than those of the series mechanism when more than two denoising modules are assembled within our model, which illustrates that the parallel mechanism of the proposed model possesses a stronger learning capacity; 3) for the series mechanism, the network learning capacity is weakened when more denoising  modules were stacked in the model, which is possibly caused by the vanishing gradient problem in the learning process; 4) in contrast with the series mechanism, the parallel mechanism can improve the learning capacity when more denoising modules were used. However, there was a limitation to the improvement of network learning. We can see that the training and testing losses of MMNN-4, MMNN-5, and MMNN-6 are almost the same for the parallel model, which explains their similar scores in the experiment. Furthermore, the loss comparison between the proposed MMNN-4 and the other DL models is given in Figure 15, where our model has smaller training and testing losses and can converge faster than the others, which is possibly the reason why it performs well with fewer training data in both the OA and MA removals.
In the future, there are some challenges worth exploring using the proposed model. Specifically, we used the denoising modules of the different filter sizes in the OA and MA removals. Whether the filter size of feature extraction is caused by noise feature differences should be further studied. Moreover, OAs and MAs are entangled with motion artifacts in a real EEG epoch, however, for the mixed signals, there is    [64]. Given that the proposed model offers significant advantages over the conventional and DL models in this study, the related research is within the scope of the further work.

VI. CONCLUSION
A novel MMNN (multi-module neural network) is proposed in this study, which is a parallel architecture assembled with multiple denoising modules. The results revealed that the proposed model can automatically remove OAs and MAs from single-channel noisy EEG signals. Compared to the existing models, it achieved higher signal reconstruction accuracy and reached this goal with less training data. In the future, we expect that our model will play a critical role in EEG denoising applications.
APPENDIX A: CONFIGURATION OF DENOISING MODULES Figure 16 shows the different configurations of the denoising modules, in which the number of parameters, and running time increased with the number of Conv1Ds. However, the module performance reaches its limit when using four Conv1Ds. Therefore, we configured four Conv1Ds for the denoising module in our model. Figure 17 presents the architecture of the reference models, including FCNN, Simple CNN, Complex CNN, and RNN from [36], and Novel CNN from [37]. These models decomposed noisy EEG signals using the different combinations of Conv1D, FC, and LSTM blocks, and then reconstructed clean signals using an FC block.