EEG-Based Neonatal Sleep-Wake Classification Using Multilayer Perceptron Neural Network

Objective: Classification of sleep-wake states using multichannel electroencephalography (EEG) data that reliably work for neonates. Methods: A deep multilayer perceptron (MLP) neural network is developed to classify sleep-wake states using multichannel bipolar EEG signals, which takes an input vector of size 108 containing the joint features of 9 channels. The network avoids any post-processing step in order to work as a full-fledged real-time application. For training and testing the model, EEG recordings of 3525 30-second segments from 19 neonates (postmenstrual age of 37 ± 05 weeks) are used. Results: For sleep-wake classification, mean Cohen’s kappa between the network estimate and the ground truth annotation by human experts is 0.62. The maximum mean accuracy can reach up to 83% which, to date, is the highest accuracy for sleep-wake classification.


I. INTRODUCTION
Sleep is an important human function, which is identified by the sequence of brain alterations. For neonates, they spend most of their time resting in a sleep state. Sleep ontogenesis is an active process for brain maturation and the central nervous system. Clinically, sleep-wake cycling (SWC) is the main hallmark of brain development in neonates [1], [2]. In particular, in a neonatal intensive care unit (NICU), neonatal sleep should be protected and promoted.
Polysomnography (PSG) is considered as the gold standard to monitor sleep and diagnose sleep disorders [3]. In the past decade, many studies have demonstrated the feasibility of automated sleep staging algorithms with PSG signals, among The associate editor coordinating the review of this manuscript and approving it for publication was Wei Feng . which EEG is considered as the most reliable signal for both adults [4]- [6] and infants [7]- [9].
Hans Berger recorded the first EEG of humans in 1924 [10]. Brain's electrical activity takes place via electrical impulses and can be measured from the scalp of the patient. Electrodes are placed as per the international 10-20 system for electrode placement [11]. Neurologists have established clear EEG patterns in SWC from 30 weeks postmenstrual age [12]. In 1937, Loomis et al. proposed the first application of EEG based study of human sleep patterns [16]. After the novel research of Loomis, multiple algorithms have been proposed for adult sleep staging using machine [17][18][19][20] and deep learning. Deep learning algorithms for sleep staging include convolutional neural network (CNN) [21], recurrent neural network (RNN) [22], the combination of CNN or RNN [23][24]and Long Short-Term Memory (LSTM) [25,26]. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Contrasting EEG patterns have been observed for neonates and adults. The neonatal EEG pattern exhibits a smaller magnitude, as compared to the grown-ups. Multiple maturation changes occur within the first three years [27]. For this purpose, multiple automatic neonatal sleep staging algorithms based on EEG have been proposed. To the best of our knowledge, most existing algorithms described in previous studies lack in characterizing 'wake' as a distinct state. Other algorithms classified sleep stages based on different characteristics of EEG signals i.e. low voltage irregular (LVI), Active Sleep II (AS II), high voltage slow (HVS) and Trace Alternant (TA)/Trace Discontinue (TD) [9], [13]- [15]. The brain maturation process initiates during AS and wake, whereas within the existing set of frameworks, amalgamation of 'AS' and 'wake' into LVI state. The bio-insights of sleep and awake stage is illustrated in Table 1.
In this paper, we present a sleep-wake classification algorithm based on multilayer perceptron (MLP) neural network. It attempts to resolve the intermixing of different sleep stages by classifying wake and sleep as un-identical states. Our work is mainly divided into two parts: Feature extraction and Classification. Twelve features were extracted from multichannel EEG and subsequently the MLP was applied for training and testing the neural network. The remaining part of the proposed paper is arranged in the following manner: Section II introduces related work. Section III presents the materials and methods. Sleep-wake classification results using the proposed method are reported and discussed in Section IV and V, respectively. Finally, Section VI concludes the study.

II. RELATED WORK
For EEG-based neonatal sleep stage classification, different features have been proposed to measure, for example, EEG (dis)continuity [8], frequency content [28], proportional duration of bursts [29], and the frequency content of bursts [30]. Some maturational abnormalities in the neonatal brain are only apparent in QS, reflecting more alterations in brain function [32]- [36]. Turnbull et al. [37] detected a specific discontinuous EEG pattern, known as TA as mentioned before. While proving reliable for TA detection, this was shown to be not sufficient to infer QS over a wide age range, as QS also contains HVS signals.
DeWel et al. [38] proposed a supervised algorithm for sleep state classification across a wide range of neonates (27-42 postmenstrual age), using an LS-SVM classifier and multi-scale entropy features. Firstly, the proposed algorithm estimates the postmenstrual age (PMA) using four complexity features. Then, a sleep state classifier was developed using these features to identify quiet sleep from neonatal EEG data. In 2017, Dereymaeker et al. [7] proposed CLuster-based Adaptive Sleep Staging (CLASS) to automatically detect quiet sleep (QS) [7]. They highlighted the benefit of QS detection in brain maturation. Another algorithm based on SVM with radial basis function was proposed by Koolen et al. [8], which can detect QS with an accuracy of 85%.
To classify all stages of neonate sleep, Pillay et al. [9] proposed an algorithm based on a generative modeling approach. Hidden Markov models (HMMs) and Gaussian mixture models (GMMs) were trained using EEG features extracted from 16 EEG recordings. For four stages (LVI, TA/TD, AS II, HVS), HMMs were showed to be superior with a Cohen's kappa of 0.62. Recently, a CNN-based algorithm outclasses the state-of-the-art sleep stage classification algorithms with a kappa ranged from 0.66 to 0.70 [39]. However, none of these algorithms considered wake as a separate state in neonates.  As stated before, the current method for sleep-wake classification in neonates is limited. Fraiwan and Lwseey [31] proposed an algorithm based on auto-encoders to classify neonatal sleep stages. It explained two main steps: feature extraction and classification. Twelve EEG features were extracted from neonatal EEG for training and testing, yet its accuracy to detect wake state was restricted to merely 17%.

III. MATERIALS AND METHODS
In this section, we present the complete description of the proposed MLP neural network. Figure 1. shows the block diagram of the proposed algorithm.

B. VISUAL SLEEP SCORING
EEG segments were visually annoted by two trained doctors. The primary rater (CL) labelled the start and end time of each stage i.e. wake, sleep and the artifactual region where clear sleep stage cannot be classified. Whereas the secondary rater (LW) verified the regions annoted by the primary rater. Also, LW annoted the regions where CL were not agreed on. Sleep and wake stage were identified using both EEG and non-cerebral characteristics. Videos of the NICU were also considered during annotation. The annoted regions were divided into three main categories: wake, sleep and artifacts.

C. PRE-PROCESSING
During recording and processing, EEG recordings got contaminated with noise and artifacts. EEG recordings were processed on their original frequency of 500 Hz. Before further processing, these artifacts and noises should be removed from the EEG recordings. Our pre-processing is divided into three parts: 1) Filtering the EEG signal using a FIR filter [41].
2) Segmenting the multichannel EEG into 30 sec epochs and assigning a label to each epoch. 3) Post segmentation, artifacts were removed from the EEG recordings, in case, if it constitutes 20% of the artifact. Artifacts were removed based on the annotation provided by the professional doctors. After eliminating artifacts, we were left with 3525 segments for training and testing.

D. FEATURE EXTRACTION
After the pre-processing, we extracted 12 potential features from each EEG channel and combined them to form an input vector of size 108. The feature set includes 8 time and 4 frequency domain features. Frequency domain features were extracted by taking the Fast Fourier Transform (FFT) of the EEG segments. After FFT, mean frequencies were extracted from each band (alpha, delta, theta, and beta). Table 2. shows the list of the extracted features.

E. MULTILAYER PERCEPTION NEURAL NETWORK
MLP neural network has advantageous properties such as, smaller training set requirements [42], [43], easy implementation and fast operation. It consists of three primary layers: an input layer, hidden layer(s), and an output layer. The hidden layers are responsible for processing and transmitting the input information to the output layer. An MLP is a mapping between two Euclidean spaces (RR n 1 n1, R n 2 ). The mapping is defined as a sequence of Euclidean spaces R n 1 , R n 2 . . . R n L and the mapping (F) connecting them: where L are the total number of MLP layers. In the MLP neural network, each neuron j in the hidden layer sums its input signals x i after multiplying them by the strengths of the respective connection weights w ij and computes its output y j as a function of the sum. Mathematically, Here w are the weights and are updated according to the gradient descent algorithm whereas O symbolizes the hidden layers.

F. GRADIENT DESCENT ALGORITHM
For a neural network, efficiency of learning is important. One of the efficient algorithms to train the neural network is the standard gradient descent algorithm (GDA) [44]. GDA works by taking the derivative of the error function with respect to the weights at a specific position on the loss function and then updating the weights towards a negative gradient. Initially, the weights for the GDA are selected randomly. Once the maximum number of iterations are reached, the training algorithm stops immediately. The main objective of the gradient descent algorithm is to minimize the loss function. The loss function is given as: N is the total number of iterations, whereas q m and y m are the expected and desired value, respectively.

G. PROPOSED MLP NEURAL NETWORK ARCHITECTURE
In this study, an efficient yet simple deep MLP neural network is proposed. The deep MLP has a densely connected architecture with an input layer of size 256, 3 hidden layers of size 256-128-64 and an output layer with sigmoid activation function, resulting in a total of 5-layer MLP neural network. RMSProp algorithm [45] was used for training the neural network with a learning rate of 0.001. Two other deep neural network algorithms including CNN and RNN were implemented for performance comparison. The network parameters for all the other networks were well tuned and the one giving best accuracy are reported for comparison. MLPs are universal function approximators [46] therefore they can be used for creating mathematical models by regression analysis. This is the main reason behind the success of MLP neural network in this particular case. The network parameters of different neural networks implemented are shown in Table 3.  Figure 3 shows the proposed MLP neural network architecture. The input is a combination of features extracted from the 9 bipolar EEG channels. A total of 12 features are extracted from each bipolar EEG recording, consisting of 8-time domain features and 4-spatial domain features as described in Table 2 of the feature extraction subsection. The learning parameters of the proposed MLP neural network are shown in Table 4.

H. EVALUATION
To access the performance of the proposed scheme, multiple performance matrices have been used. These matrices are mean accuracy, mean Cohen's kappa, sensitivity and specificity. For comparison between different NNs, we reported mean kappa and accuracy. Mathematically, these performance parameters are given as: where TP are true positives, TN are true negatives, FP are false positives and FN are false negatives. In addition to these, the confusion matrix is also reported.
In order to validate the proposed algorithm during the design process, the neonatal sleep data was split into 4 folds. The dataset is divided into 4 subsets and holdout method is repeated 4 times. Each time, one of the 4 subsets is used as the test set and the other 3 subsets are put together to form a training set. Then the average error across all 4 trials is computed. The advantage of this method is that it matters less how the data gets divided. 3525 segments were used from 19 subjects for training and testing the neural network. The final results were obtained by taking the mean of the stratified 4-fold permutation ± standard deviation.
Learning rate, also known as step size, is one of the most important tuning parameters in an optimization algorithm. Usually, the learning rate has a small value within the range 0-1. It determines the step size of every iteration while moving towards the minimum of the loss function. For this purpose, different learning rates were used to evaluate the performance of the proposed algorithm. Processing time is also an important parameter to access the quality of the proposed algorithm. Computational cost is directly proportional to the processing time. We also calculated and analyzed the time used for training and testing the neural network.
As per our assiduous research, there is only one proposed algorithm which classifies awake as a separate state [31]. For this reason, we applied different machine learning and deep neural network algorithms on our dataset and compared the results accordingly. It is pertinent to mention that the same training and testing datasets were used for each reported algorithm. Also, same features were used as an input for each reported network. This helps us to have a fair comparison between different algorithms.

IV. RESULTS
All the networks are trained and tested on Intel Core i5-8400, RAM 16GB with GTX 1050ti. The proposed neural network was implemented using Keras and TensorFlow. The features were obtained using MATLAB 2019b. The testing results give the highest accuracy of 82.53 ± 1.63% (standard error = 0.82) for sleep-wake classification using 5-layer MLP neural network. Table 5. Shows the confusion matrix for sleep-wake classification using MLP neural network.  Table 6. shows the overall test performance alluded by the confusion matrix. Accuracy, kappa, sensitivity and specificity are calculated using the confusion matrix shown in Table 5. The results of MLP neural network using different number of hidden layers are illustrated in Table 7. The proposed network is optimal with 5 layers having the highest accuracy of 82.53%, which reduces with increment or decrement in the number of layers. If we increase the number of layers the network starts overfitting. Whereas by reducing the number of hidden layers the networks give underfitted results (Table 7). The accuracy of the neural network changes by changing the learning rate. Table 8. shows the results of the proposed MLP neural network using different learning rates. With the learning rate of 0.001, MLP neural network gives highest reported accuracy. It is important to note that for learning rates 0.1 and 0.01, the kappa is zero i.e. the chances are   random for sleep-wake classification. Selecting high learning rate caused undesirable divergent behavior in our loss function.
The test results of the proposed network architecture are compared with other neural network architectures. The results are illustrated in table 9. It is very important to note that wake and sleep both contain LVI signals, so it is challenging to distinguish the two stages. Being cognizant of this fact, the proposed MLP neural network achieved very promising results to classify sleep-wake cycling with EEG recordings. Furthermore, deep neural networks are usually computationally intensive having high computational cost. In this regard, the proposed network architecture is simple, efficient and have low computational time (Table 9 ). Therefore, MLP neural network has potential to be used for real-time sleep-wake states classification as the network didn't use any post-processing process.
Not every NICU is equipped with large amount of EEG electrodes therefore we also investigated the results of MLP neural network over a smaller number of EEG electrodes. For all the cases, we used bipolar EEG recordings. Table 10. shows the results of MLP neural network using different number of channels. In the proposed network architecture, we used 9 bipolar EEG channels. By reducing the number of channels, the network accuracy decreases. In case of using 4 channels, the accuracy is 74.7% and if using only one channel, the accuracy drops to 71%.
Mostly, Machine learning algorithms like, SVM, K Nearest Neighbors (KNN), and Decision Tree give better results while using handcrafted features. For this purpose, we compared the results of MLP neural network with machine learning algorithms. Table 11 gives the comparison table of MLP neural network vs machine learning algorithms. It has been noticed that MLP neural network performs better as compared to machine learning algorithms. For each reported algorithm, results are shown with the parameters giving the best possible results. Also, it is very important to note that the same training and testing data were used for every algorithm.

V. DISCUSSION
MLP is proven to be a very successful classifier for different applications, but to the best of our knowledge this is the first time MLP has been used for sleep staging. There are number of algorithms which classified neonatal sleep stages but none of those algorithms classify wake as a separate sleep stage. Mostly, wake is combined with AS I to form an LVI stage. This results in the intermixing of two sleep stages. In this paper, we classified sleep and wake as distinct stages with an accuracy of 82.53.
We extracted 12 EEG features from each 30 sec EEG segment. They are divided into two categories: time domain and frequency domain. These features were selected because we noticed highest accuracy with these features. If we change the number of features, the accuracy decreases gradually. The most prominent features are the frequency domain features. These features are calculated by taking the Fourier transform of the EEG segment. After applying Fourier transform, we calculated the mean amplitude of the given bands i.e. alpha, beta, theta and delta. By adding these features, the accuracy increases by 15%.
Mostly, CNN method outperforms other neural networks. To the best of our knowledge, CNN works well with raw data which has a spatial relationship to extract its own features. While, in our case, we engineered our own features which were best suited for sleep-wake classification. The engineered features do not possess any spatial relationship, which a convolutional kernel in a CNN assumes. The use of different weights for each feature compared to same weights of a convolutional kernel assumes no spatial relation, which is truly the case for our features and hence provides better classification performance.
To report the overall performance of the proposed study along with the existing algorithm for sleep-wake classification. Table 12. Provides the complete evaluation for the metrices discussed in section III-H. In addition, Table 12. provides the additional information related to the epoch length selection and other evaluation metrices. From this, we can conclude that MLP performs better as compared to the existing algorithm i.e. deep learning autoencoders.
There are two main limitations of this study that should be taken into consideration: first, the data used in this study consists of 19 subjects which is very small. A larger dataset is likely to increase the performance and concreteness of the proposed algorithm. Second, there is no algorithm with which we can compare our proposed algorithm. For this purpose, we applied different machine and deep learning algorithms on our dataset and compared the results with MLP neural network. MLP neural network outclasses all algorithms.
In the proposed study, artifacts were not considered and were removed manually during preprocessing. These artifacts, in the NICU, can contaminate the EEG recordings and decrease the network performance. It would be advantageous if we have an automatic method for artifact removal so that the proposed study can be used directly in an NICU.
As a future work, we aim to classify further sleep stages: AS, QS, IS and wake. In addition, more data will be used in VOLUME 8, 2020 the future study to increase the performance. More considerations will be taken during training and testing to improve the performance with a smaller number of EEG channels, as this may help in practical usage.

VI. CONCLUSION
In this study, we proposed a low cost, efficient, and simple deep MLP neural network for sleep wake classification using multichannel EEG signals. 8-time domain and 4-spatial domain features were extracted from neonatal EEG recordings and combined to form an input to the neural network of size 108. This is the first reported automatic sleep wake classification algorithm. The proposed neural network didn't use any post-processing technique which strengthen its candidature to be used for real-time sleep-wake classification. For comparison, different neural networks were applied on the same dataset. It is evident from the results that the proposed MLP neural network shows better results for sleep wake classification. To conclude, we can say that properties like real-time processing, low computational cost and easy implementation makes MLP neural network feasible for neonatal sleep staging. More importantly, this network can help to quantify the abnormalities in neonatal brain development. The future work aims to develop a neural network algorithm for the classification of more sleep stages.  Since October 2015, she has been a Full Professor and the Director of the Center for Intelligent Medical Electronics (CIME) with the Department of Electronic Engineering, School of Information Science and Technology, Fudan University. Her research interests include patient health monitoring, medical monitoring system design using wearable sensors, sleep monitoring, brain activity monitoring, wireless body area networks, ambient intelligence, personalized and smart environment, smart sensor systems, and signal processing. She is an Associate Editor of IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING (TNSRE), and IEEE JOURNAL ON BIOMEDICAL HEALTH INFORMATICS (JBHI), and the Managing Editor of IEEE REVIEWS IN BIOMEDICAL ENGINEERING (R-BME).