Single-Channel EEG Data Analysis Using a Multi-Branch CNN for Neonatal Sleep Staging

Neonatal sleep staging is crucial for understanding infant brain development and assessing neurological health. This study explores the optimal electrode configuration to reduce technical complexities and potential risks of causing skin irritation to neonates during data collection. A Multi-Branch Convolutional Neural Network (CNN) is used to categorize neonatal sleep states based on single-channel Electroencephalography (EEG) data. The proposed model was trained and tested on 16803 30-second segments from 64 infants, all of whom were at post-menstrual age between 36 and 43 weeks at the Children’s hospital of Fudan University. A total of 74 extracted time and frequency domain linear and non-linear features are applied to improve the performance of a Multi-Branch CNN-based classification model. Additionally, using principal component analysis (PCA), feature selection and feature importance are also applied to identify the most important features. Notably, the F3 channel outperforms other single-channels and has accuracy and kappa values 74.27±0.80% and 0.61, respectively. Furthermore, a combination of four left-side electrodes yields slightly better classification accuracy (75.36±0.57%) compared to the four right-side electrodes (74.76±1.10%), with corresponding kappa values of 0.63 and 0.62, respectively. In addition to providing insights into optimal electrode configuration using single-channel and multi-channel EEG data, the results highlight the critical role played by specific EEG channels in sleep stage classification. This research has the potential to enhance neonate care and monitor sleep more effectively, enabling early detection of sleep-related abnormalities such as sleep disorders. Furthermore, this research effectively captures information from a single-channel, reducing computing load while maintaining commendable performance. Additionally, integrating time and frequency domain linear and non-linear features into neonatal sleep staging can enhance accuracy and provide a deeper insight into the complex dynamics and irregularities of newborn’s sleep patterns.


I. INTRODUCTION
Sleep is a natural, repetitive period of relaxation and unconsciousness necessary for the body and mind's proper functioning [1].As the body goes through several stages during sleep, each stage offers a distinct benefit and has a significant impact on a variety of physical and psychological functions, such as consolidation of memory, cognitive function, regulation of mood, and restoration of physical abilities.Generally, sleep is characterized by the reduction of consciousness and awareness of the surroundings, reduced activation of voluntary muscles, decreased metabolism, and a periodic and reversible nature.Lack of sleep can impair cognitive function, weaken the immune system, and increase the chances of chronic illnesses such as obesity, diabetes, heart disease, and hypertension [2], [3], [4].In order to maintain good health and wellbeing, adults should sleep between 7-9 hours at night [1].Infants are more likely to experience unpredictable sleep patterns because of shorter sleep cycles compared to adults.Infants sleep for 16 to 17 hours a day, although sleep durations differ from baby to baby.
Like adults, infants also have different stages of sleep [5].However, infants sleep differently from adults.Infants sleep in two main stages: Active Sleep (AS) and Quiet Sleep (QS).Infants in active sleep have a high heart rate, involuntary breathing, and rapid eye movements.This is the stage of sleep where infants may move, make facial expressions, and even suckle while asleep.AS is important for infant's brain development and learning.During QS, infant's hearts beat slower, their breathing is regular, and they make very little or no movements.QS plays a key role in physical development and growth.In addition to AS and QS, infants also undergo a third transitional stage of sleep which is the combination of both AS and QS.AS is further divided into Active Sleep 1 (AS1) and Active Sleep 2 (AS2).AS1 and AS2 differ mainly by the level of activity in the brain and the movement of the eyes.AS1 is characterized by irregular brain waves and frequent eye movements, while AS2 is characterized by less frequent eye movements and regular brain activity.Alternatively, QS can be subdivided into Quiet sleep 1 (QS1) and Quiet sleep 2 (QS2).The movements of the body and brain waves differ significantly between QS1 and QS2.The QS1 is more active, characterized by an irregular brain wave pattern and some movement of the body.In contrast, the QS2 is a quieter state, characterized by more regular brain activity and minimal activity of the body.
Main Contributions: The main aim of this study is to assess the feasibility of using single-channel and multi-channel EEG data to distinguish neonate's sleep states.One of the reasons for using single-channel EEG data is to explore the optimal configuration of electrodes to reduce technical complexities and potential risks of causing skin irritation to neonates during data collection.A Multi-Branch CNN categorizes neonate's sleep into three states based on the integration of different features extracted from EEG data.This research is primarily divided into five parts: 1) Different time and frequency domain linear and nonlinear features extraction.
2) Incorporating advanced features, such as DFA, Multiscale Fluctuation Entropy, and Lyapunov Exponent, as state-of-the-art nonlinear approaches for EEG-based sleep staging.3) Feature normalization and feature selection using PCA.4) Classification of sleep states using one-channel at a time, and then using different combinations of multichannels.5) Furthermore, this study explores the optimal EEG electrode configuration for three-state classification, in terms of the number of channels and placement on different sides of the head.Aiming to reduce the complexity, potential risks of causing skin irritation and cost of EEG monitoring in neonatal sleep studies, this study evaluated the sleep stage classification accuracy using various electrode setups.The article is structured as follows: Section II reviews related work, Section III presents the proposed methodology, Section IV reports the classification results using the proposed method, and Section V discusses the findings and limitations of proposed work.Section VI presents the conclusion of the proposed study.

II. RELATED WORK
EEG was first applied to study human sleep behavior in 1937 by Loomis et al. [6].Since then, numerous algorithms have been developed to classify adult sleep using deep and machine learning techniques [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17].According to Pillay et al., using Hidden Markov Models (HMMs) and Gaussian Mixture Models (GMMs), they developed a model to automatically classify noenate's sleep based on multichannel EEG recordings.Their Cohen's Kappa was 0.62, which was superior to that of GMMs [15].Additionally, they used a CNN to classify 2 and 4 sleep states [16].The wake state wasn't included as a separate stage in any of these algorithms.Awais et al. proposed support vector machines (SVMs) to classify neonatal sleep and wake using pre-trained CNN as feature extractors [18].This study found that a pretrained model did not suffice for highly authentic sleep and wake categorization in newborns, as it achieved 65.3% accuracy, 69.8% specificity, and 61.0%sensitivity.Using video facial expressions, the authors in [19] developed a model to classify infant sleep and waking states by combining deep convolutional neural networks (DCNN) and SVM.The accuracy of the video EEG classification was 93.8 ± 2.2% and the F1-score was 0.93 ± 0.3.However, infant's faces and voices can be found in EEG video data, privacy concerns are raised.
To classify non-contact sleep and wake in infants, Leeet al. used Impulse-Radio Ultra wideband radar in 2021 and achieved an accuracy of 75.2% [20].In another study that analyzed EEG data to classify quiet sleep, the mean Kappa was 0.77 ±0.01 (for an EEG with eight channels) and 0.75 ±0.01 (for an EEG with one bipolar channel) [21].Abbasi et al. developed an algorithm based on MLP neural networks using EEG recordings from 19 neonates and achieved Cohen's Kappa of 62.5% and accuracy of 82.5% [22].Using the same data, they performed three-state classification with bagging and stacking ensemble techniques in 2022 and achieved an accuracy of 81.99% [23].According to Yu et al. [24], neonate's sleep patterns were classified into W, N1, N2, and N3 by using publicly available singlechannel EEG datasets.In order to classify sleep patterns, the multi-resolution attention sleep network (MRASleepNet) module was used.It was composed of three modules: feature extraction, multi-resolution analysis, and gated multi layer perceptron (MLP).Emad et.al. performed active and quiet sleep classification using an adaptive boosting (AdaBoost) classifier and achieved accuracy of 81% over a 10-fold crossvalidation [25].Siddiqa et al. compared 18 different machine learning models based on autoML for predicting neonatal sleep-wake state, obtaining an overall mean accuracy of 84.78% and kappa of 69.63% using AutoML-based Random Forest estimator [26].In recent years, Ansari et al. proposed an 18-layer CNN for neonatal QS sleep stage detection using multi-channel EEG data [27].Recently in 2023, A Multi-Scale Hierarchical Neural Network (MS-HNN) was designed by Hangyu et al. for the automatic classification of neonatal sleep states using two, four, and eight number of channels [28].They used multi-scale convolutional neural networks (MSCNN) to extract features including temporal information.For three-stage classification, they achieved 75.4% accuracy with single-channel and 76.5% with combined eight channels.Supratak et al. used DeepSleepNet to perform neontal sleep state classification and achieved accuracy of 69.8% [29].Eldele et al. proposed AttenSleep, which is an attention-based deep learning approach for sleep stage classification in [30].Rather than applying RNN, AttenSleep makes use of multi-head attention (MHA) to learn temporal information between sleep stages.
Existing methods for identifying neonatal sleep stages have several drawbacks, including limited classifications, privacy concerns, long training times, and insufficient accuracy.Many of these methods do not take into account of additional sleep stages such as wakefulness that are crucial to the classification of neonatal sleep.Moreover, DFA, Multiscale Fluctuation Entropy, and the Lyapunov Exponent are some examples of non-linear features not typically included in existing methods for neonatal sleep staging.In addition, these methods rely on multi-channel EEG data, which can disrupt a neonate's skin and cause discomfort, emphasizing the need for less invasive approaches.Consequently, there is a need for a reliable and privacy-preserving method that accurately distinguishes three-stage sleep patterns in infants, while ensuring high accuracy and minimizing potential adverse effects.

III. MATERIALS AND METHODS
In this paper a CNN is presented for the classification of neonatal sleep stages.A comprehensive description of the proposed design is presented in this section.Figure 1, depicts the step-by-step flowchart of the proposed methodology.Following are the steps that can be taken to elaborate the method: A. DATASET EEG recordings from 64 infants were recorded at the neonatal intensive care unit (NICU) of Children's Hospital of Fudan University (CHFU), China.This study has been approved by the Research Ethics Committee of Children's Hospital of Fudan University (Approval No. (2017) 89).Tests and training for the proposed model were conducted using these EEG recordings.The data was recorded while neonates were observed at different times.A complete 10-20 electrode installation system includes: ''FP1'', ''FP2'', ''F3'', ''F4'', ''F7'', ''F8'', ''C3'', ''C4'', ''P3'', ''P4'', ''T3'', ''T4'', ''T5'', ''T6'', ''O1'', ''O2'', and ''Cz'' (17 electrodes).Each letter represents a specific location or lobe of the brain.For example, the letters FP, F, T, P, O, and C correspond to the prefrontal, frontal, temporal, parietal, occipital, and center of the brain.During this time, we observed different sleep cycles.EEG recordings used in this study includes the following eight channels: ''C3'', ''C4'' ''F3'', ''F4'', ''P3'', ''P4'', ''T3'', and ''T4''.This multichannel EEG was recorded using the NicoletOne EEG system.NicoletOne EEG systems are designed with lightweight electrode caps that securely hold scalp electrodes in place, ensuring accurate signal acquisition.With NicoletOne EEG, detailed EEG signal capture is possible at high sample rates up to 2 kHz and wide bandwidths of 0.053 to 500 Hz.For additional physiological measurements, there are 9 auxiliary pairs, a glow-in-the-dark overlay on the passive head box for electrode placement in low-light conditions, SpO2 measurement capability, a built-in impedance display, a data transfer interface via Ethernet amplifier, as well as 12 DC inputs that can be used to record auxiliary information.With its advanced signal processing capabilities and high-quality amplifiers, researchers can collect accurate and efficient EEG data with precision and efficiency.According to the 10-20 system, Figure 2 illustrates the locations of the eight electrodes that are used in this study.Nz represents the root of the nose while Iz indicates the protuberance.

B. VISUAL SLEEP SCORING OF DATASET
The EEG segments were visually categorized by professional doctors into five main categories: Wake, Active sleep 1, Active sleep 2, Quiet sleep 1, and Quiet sleep 2. When identifying sleep stages, non-cerebral characteristics were used along with the EEG.Further, the doctors considered NICU videos during the annotation process.Table 1 gives detailed information about the dataset.

D. FEATURES EXTRACTION
Features extraction from EEG data is crucial for classification.This is because it helps distinguish between different sleep states or conditions based on patterns and characteristics.A detailed interpretation of EEG data can be challenging due to the numerous time-varying signals generated by brain electrical activity.By extracting features, this study reduces dimensionality in the data and highlighted information that can be used for classification, such as time-frequency distributions [31].A total of 74 features were extracted from each channel using different techniques, including: 1) TIME DOMAIN FEATURES • Lyapunov Exponent: The Lyapunov exponent, another non-linear feature, quantifies the sensitivity of a dynamical system is to initial conditions [33].In the field of EEG feature extraction, it provides significant insight into neural dynamics predictability and stability.Rosenstein algorithm is used to estimate the Lyapunov exponent from EEG data.Using the algorithm, parameters are defined for embedding the data, tangent vectors are initialized, and Jacobian matrix calculations are performed.By QR decomposition, the tangent vectors are orthogonalized, followed by normalization, which measures the sensitivity of the system to perturbations.In order to obtain the Lyapunov exponent, divide the logarithms of Jacobians by the number of iterations and tangent vectors.In addition to providing insight into sleep stage classification in EEG analysis, the Lyapunov exponent values reveal the complexity and predictability of neonatal sleep dynamics.[35], [36].One can gain a deeper understanding of the MFE approach's potential in sleep staging by acknowledging these advancements.

2) FREQUENCY DOMAIN FEATURES
In EEG signal analysis, frequency domain features are essential for diagnosing neurological disorders or monitoring brain activity during cognitive tasks.This study calculated the following frequency domain features: • Spectral Statistics of EEG Bands to Identify Central Tendency Features: Spectral statistics of four bands (delta, theta, alpha, and beta) of EEG data can be used to determine central tendency features (Mean, Median, Mode, Variance, Standard Deviation, Kurtosis, Skewness, Minima, and Maxima) [26].The central tendency of a set of data refers to its tendency to cluster around the central or average value.Measures of central tendency provide information about the typical or representative values within a dataset.They can be used to describe and summarize data distributions.
To calculate central tendency features using spectral statistics, firstly the power spectral density (PSD) of the EEG data was calculated.Then, the PSD was divided into four frequency bands: delta (0.5-3 Hz), theta (4-7 Hz), alpha (8-12 Hz), and beta (13-30 Hz).After that, 32 central tendency features were calculated based on spectral statistics for each frequency band.
• Fast Fourier Transform (FFT): Using FFT, we can analyze the time-domain EEG signal in terms of its frequency components by converting it into the frequency domain.In this step, the FFT of the input EEG data was computed and the top 10 frequencies with the highest FFT values were selected.
• Other Features: The other frequency domain features calculated include normalized power, average frequency, and maximum power for each of four frequency Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.band.The power ratio between the delta to theta bands and alpha to beta is also calculated.Table 3 illustrates a list of all extracted time and frequency domain features.As a result, these features can be combined with one another to develop automated sleep staging algorithms that could improve a neonatal sleep disorder's detection and treatment.

E. FEATURE IMPORTANCE AND FEATURE SELECTION
Feature importance and selection techniques are very significant for EEG-based sleep state classification.By using these techniques, we can identify the most informative time and frequency domain features, that can help distinguish sleep stages.It is possible to achieve better performance and more accurate results using machine learning models by making use of feature importance and selection.In this study, Principal Component Analysis (PCA) is used for feature importance and selection.By analyzing the variance in the dataset, PCA will determine which principal components capture the most variance.In order to reduce the dimensionality of data, a subset of principal components can be selected based on the explained variance ratio.A high variance indicates how much information each feature in the dataset captures.By keeping this variance, we are able to preserve the most relevant information in the dataset, while eliminating the less relevant data.The designed PCA selected the most informative features that explain 95% of the EEG variance.
To obtain principal components of data that capture most of the variance in the dataset, we first scaled the dataset and then performed PCA.95% of the variance is retained by a small number of principal components.As a result, a new dataframe is created with the selected columns and the target variable in accordance with the number of principal components.By doing so, the amount of information that is most relevant for predicting the target variable is preserved, while the amount of dimensionality of the data is reduced.The dataset originally contained 74 extracted features from preprocessed EEG data.Nevertheless, after PCA, only 19 features were retained for further analysis.The resulting dataframe is used as an input to Multi-Branch CNN for three state sleep classification.

F. MULTI-BRANCH CNN MODEL
CNNs are deep learning models designed to process structured grid-like data, such as images and time series data [37], [38].They are widely used in classification, object detection, and image segmentation [38].By incorporating multiple parallel branches into the network architecture, multibranch CNNs extend the capabilities of traditional CNNs.As each branch processes the input data differently, the network can learn diverse representations and capture different aspects of it.Convolution operation, activation function, pooling operation, fully connected layer operation, and loss functions are some of the mathematical components involved in the model.These components are defined as follows:

1) CONVOLUTION OPERATION
The convolution operation involves sliding a filter/kernel over an input data and computing the element-wise product between the filter and the corresponding region of the data [39].The results are summed to produce an output feature map.
Here, f is the input value, g is the filter/kernel, and (x, y) are the coordinates of the output feature map.

2) ACTIVATION FUNCTION
Activation functions introduce non-linearity into the model [39].One common activation function is the Rectified Linear Unit (ReLU), defined as: 3) POOLING OPERATION Pooling layers are used to downsample the spatial dimensions of the feature maps, reducing computation and memory requirements.Max pooling takes the maximum value from a local region.

4) FULLY CONNECTED LAYER OPERATION
Fully connected layers are traditional neural network layers where each neuron is connected to every neuron in the previous and subsequent layers [39].Matrix multiplication is used to compute the outputs of these layers.
where Y is the output, X is the input, W is the weight matrix, and b is the bias vector.

5) LOSS FUNCTION
The choice of loss function depends on the task (e.g., classification, regression).Cross-entropy loss is used for this classification task.Mathematically, the categorical crossentropy loss can be represented as [40]: Here, N represents the number of samples, C represents the number of classes, y ij represents the true label (0 or 1) for sample i and class j, and ŷij represents the predicted probability for sample i and class j.

G. PROPOSED MODEL ARCHITECTURE
Detailed descriptions of the mathematical model of Multi-Branch CNN, its architecture, and all parameters used in the proposed Multi-Branch CNN model for neonatal sleep staging are presented in this subsection.The Multi-Branch CNN architecture proposed in this study consists of five branches, each performing different convolutional operations.As a result of this architecture, the model is able to capture diverse spatial information from the input data.An overview of the model's architecture is shown in Figure 3.A tensor of shape (batch_size, num_features, 1) is passed as the input feature in the EEG analysis.Batch_size indicates how many samples will be collected, while num_features indicates how many features will be examined.Each branch uses a set of convolutional layers with different filter sizes and activation functions.The output tensor of first branch is obtained by applying a 1D convolutional layer with 128 filters and kernel size 1.The second branch also starts with a 1D convolutional layer with 128 filters and a kernel size of 1.Following that, another 1D convolutional layer with 256 filters and a kernel size of 5.After each convolutional layer, the ReLU activation function is applied.Third branch begins with a 1D convolutional layer with 128 filters and a kernel size of 1, followed by a 1D convolutional layer with 256 filters and a kernel size of 5.After each convolutional layer, the ReLU activation function is applied.In the forth branch, a 1D convolutional layer with 128 filters and a kernel size of 1 is followed by a 1D convolutional layer with 256 filters and a kernel size of 5.The activation function of the ReLU is applied after every convolutional layer.In the last branch, there is a maxpooling layer with a pool size of 4 and a stride of 1, followed by a convolutional layer with 256 filters and a kernel size of 5 in this branch.After the convolutional layer, the ReLU activation function is applied.
All five output tensors from each branch are then concatenated along the channel axis using the concatenate operation, resulting in a tensor known as concat.In the next step, the concatenated tensors are flattened with the Flatten layer to become a 1D vector of features.The resulting flattened tensors are then passed through two dense layers each with 256 units.A ReLU activation function is applied to each dense layer.A regularization parameter of 0.0001 is applied to the weights of these layers after the first two dense layers.To prevent overfitting, dropout layers with a dropout rate of 0.3 are added after the first two dense layers.Finally, a dense layer with softmax activation is applied to the output of the last dense layer, resulting in class probabilities.During this output layer, the number of units is determined by the number of classes in the classification task.An Adam optimizer is used to optimize the model's parameters during training.The model is constructed using a categorical crossentropy loss function, suitable for classification tasks.The Adam Optimization algorithm is used to minimize the loss by updating the network's parameters.Table 4 presents details about all other hyper-parameters used in proposed Multi-Branch CNN.While the details of layers added are given in Table 5.

IV. RESULTS
The proposed scheme is tested and evaluated via different performance matrices such as confusion matrix, accuracy, Cohen's kappa, recall, precision, Methews co-relation coefficient, and F1-score.Based on these metrics, the study 29916 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.analyzes the classification model's ability to identify EEG patterns accurately and their relevance when applied to EEG analysis.

A. CONFUSION MATRIX
A confusion matrix is used to analyze the quality of a classification model.This matrix illustrates the actual and predicted classification information.Figure 4 shows the general three class confusion matrix [41].In Figure 4, TP represents true positives values and which means the classifier correctly predicts the positive class as positive, TN indicates true negatives values, which means the classifier correctly identifies the negative class as negative, and FN represents false negatives values and it shows that the classifier incorrectly identifies the positive class as negative.For single-channel EEG data, F3 and C3 show highest values for confusion matrices.Figure 5 shows confusion matrices for all single-channels, combined four left-side channels, and four right-side channels.

B. ACCURACY
Machine Learning (ML) algorithms are commonly evaluated based on their accuracy.Accuracy is expressed as a percentage of correctly classified measurements.A mathematical calculation of accuracy can be done by using the formula [42]: The computed accuracy values are presented in Table 6 and 7. Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

C. COHEN'S KAPPA
To estimate the measure of agreement between two raters, Cohen's Kappa is commonly used.It can also be used to evaluate a classifier's performance.It is calculated as follows by using the cells in a confusion matrix: [43]: A Kappa value of −1 is the worst, whereas a value of +1 is the best.The mathematically estimated values of Cohen's kappa are demonstrated in Table 6 and 7.

D. RECALL
In machine learning, recall measures how accurate a model is at identifying a class from a set of data samples.It can also be referred to as the true positive rate.In mathematics, it can be expressed as [44]: The mathematically estimated values of recall are illustrated in Table 6 and 7.

E. PRECISION
The precision of a model is measured by the number of items that are actually relevant to the model.Accordingly, we can write it as [44]: The mathematically calculated values of precision are presented in Table 6 and 7.

F. METHEWS CO-RELATION COEFFICIENT
A Methews Co-relation Coefficient (MCC) is a way of evaluating models statistically.In general, it measures how different the predicted values are from the actual ones.Using a confusion matrix, the MCC is calculated as follows [43]: The worst value of MCC is −1, while the best value is +1.
The mathematically estimated values of MCC are presented in Table 6 and 7.

G. F1-SCORE
F1-score calculates the combined view of both recall and precision, which makes it a strong metric.Mathematically, it is equivalent to [44] and [45]: The mathematically calculated values of F1 score are demonstrated in Table 6 and 7.

H. ACCURACY LINE GRAPH
Accuracy line graph shows the trend of accuracy over a variety of values or iterations.It helps compare, select thresholds, and make decisions based on a model's performance.A line graph of accuracy is shown in Figure 6 for all singlechannels, combined four left-side channels, and four rightside channels.In Figure 6, the x-axis represents the number of folds, and the y-axis represents the corresponding accuracy values.

V. DISCUSSION
In   of the proposed Multi-Branch CNN is shown in Figure 3. Table 4 contains details of all parameters and their values, while Table 5 contains details of all layers with their types and parameters.Furthermore, the kernel size, filter number, and activation function were selected based on empirical evaluations.Several experiments were conducted to optimize the model's performance.After testing various combinations to determine their impact on classification accuracy, the final choices were made based on a balance between model complexity and generalization ability.In the performance evaluation step, a 10-fold cross-validation procedure was employed.Data sets were shuffled randomly beforehand to avoid bias.Ten subsets of data were used in this methodology, with one set serving as a testing set and the remaining nine as training sets.In this way, the generalization performance of the proposed model was assessed while minimizing leakage between the training and testing phases.Using this rigorous methodology, the proposed model's performance was rigorously and unbiasedly evaluated.The experimentally computed results of three-state neonate sleep classification using all single and four-channel EEG data are presented in Table 6 and 7.The maximum mean accuracy and kappa for single-channel three state classification are achieved by F3 and C3 channels.The values of accuracy and kappa for F3 are 74.27±0.80%and 0.61, respectively.For C3 channel these values are 73.75±0.86%and 0.61, respectively.Moreover, the maximum mean accuracy and kappa for four channels are achieved by the combination of four left side channels (F3, C3, P3, and T3) and the values for accuracy and kappa are 75.36±0.57%and 0.63, respectively.For the right side electrode combination (F4, C4, P4, and T4), these values are 74.76±1.10%and 0.62, respectively.Moreover, to visualize the performance of the proposed model and its learning progress, accuracy line curves are also presented in Figure 6 and the confusion matrices for three state are also shown in Figure 5.As can be seen in Table 6, channels C4, P3, P4, T3, and T4 channels contribute less to the classification of the three-state sleep stage in neonates.However, F3, F4, and C3 perform well.While in the case of four channels, left side channels performed better than right side channels.Based on the performance parameters presented in Tables 6 and 7, it can be clearly seen that performance is still favorable even with fewer channels.It is possible to enhance neonate care and monitor sleep more effectively, enabling early detection of sleep-related abnormalities such as sleep disorders, through sleep analysis.
Table 8 shows a comparison of existing and proposed methods.Compared with [16] and [27], the datasets used in the proposed study are several times larger than those used in [16] and [27].Underfitting was observed in these models when applied to the dataset used in this research.The sleep models in [29] and [30] are presented for adult sleep.Due to the different sleep characteristics between infants and adults, these models exhibit convergence problems and overfitting.Consequently, transferring a model for adult sleep staging directly to neonate sleep data is difficult.The model must be adjusted to suit the neonate's sleep characteristics.Moreover in [28], the model architecture uses serial recurrent neural networks (RNNs) in the TIL module, which results in a long training time and inefficiency.
It is necessary to identify limitations and future directions based on the experiments.A primary objective of this paper is to examine the reliability and feasibility of the proposed scheme based on the use of only EEG signals as input signals.This study did not use signals such as electrooculography (EOG), electromyography (EMG), or ECG.They can be utilized in the future to study the effect of a variety of input signals on neonatal sleep.Additionally, by using an effective learning technique such as Transformer [46] instead of CNN, the accuracy of the work could be further enhanced.Furthermore, this study only explored three-state neonatal sleep classification.There could be five classes of sleep staging in the future if the AS state is further divided into ASI and ASII.In contrast, the QS state is further divided into QSI and QSII.Furthermore, the data in this study were shuffled into training and test sets by shuffling all subjects.However, future research may include subject-independence in both training and testing sets to provide more accurate neonatal sleep stage classification.Further, it would be useful to compare the performance of MFE with Multiscale Dispersion Entropy (MDE) and Multiscale Fluctuation Dispersion Entropy (MFDE) methods in the context of sleep staging.In the recent literature, these methods have been demonstrated to be superior at detecting meaningful patterns [35], [36].Future research based on MFE approaches can further enhance sleep staging algorithms by incorporating and evaluating these techniques.

FIGURE 1 .
FIGURE 1.Step-wise flowchart of the proposed methodology.

FIGURE 2 .
FIGURE 2. Electrode locations for the 8 electrodes used in this study.

•
Multiscale Fluctuation Entropy (MFE): In the context of the provided research, MFE values are calculated for each epoch of EEG data to quantify signal complexity and irregularity [34].Standard deviation is computed for each segment based on a scale factor.The process involves several steps.The EEG signal is segmented based on the scale factor.Each segment's fluctuation is calculated by comparing its standard deviation to its mean, and then the average of these standard deviations is calculated.Shannon's entropy formula is used to determine the entropy of the resulting fluctuation series.This study aims to gain insight into the complexity and irregularities of neural activity at various scales by computing MFE values.It contributes specifically to the analysis of EEG data, which can provide valuable insight into underlying brain activity by understanding fluctuation and complexity patterns.Additionally, multiscale fluctuation dispersion entropy (MFDE) and multiscale dispersion entropy (MDE) have been demonstrated to be effective in detecting meaningful patterns in recent studies

FIGURE 3 .
FIGURE 3. A complete architecture of proposed multi-branch CNN.

FIGURE 4 .
FIGURE 4. General confusion matrix for three-state classification.
HAFZA AYESHA SIDDIQA received the B.Sc. and master's degrees in electrical engineering from the Department of Electrical Engineering, HITEC University Taxila, Pakistan, in 2015 and 2019, respectively.She is currently pursuing the Ph.D. degree in biomedical engineering with Fudan University, China.Her research interests include machine learning, deep learning, fuzzy control, signal processing, EEG monitoring, sleep study, biomedical signal processing, neonatal health monitoring, and chaos.ZHENNING TANG received the bachelor's degree from Tianjin University, in 2022.He is currently pursuing the Ph.D. degree with the Center for Biomedical Engineering, Fudan University.His research interests include biomedical engineering, focusing on wearable sensor systems, biomedical signal processing, and sleep regulation.YAN XU received the medical degree from Southern Medical University, in 2008.She is currently a Neurophysiologist with the Children's Hospital of Fudan University.She has been a neonatologist for about ten years.She became a Paediatric Neurophysiologist seven years ago, working mainly on neonatal neurophysiology.To date, she has analyzed thousands of neonatal EEGs.

TABLE 1 .
A detailed description of the dataset.

TABLE 2 .
A list of all neonatal three-state sleep states, their annotations, and the number of epochs.
[26]rder to analyze and summarize the key statistical properties of a signal and its derivatives, it is convenient to extract time-domain features from EEG data for neonate sleep staging[26].For clinical and research purposes, time-domain feature extraction is a useful and practical method of analyzing EEG signals.First, the five statistical features (mean, standard deviation, minimum, maximum, and range) of the signal are calculated.Next, the same five statistics are calculated for the first and second derivatives of signal.
[32]atistical Features of Signal and Its Derivatives:• Detrended Fluctuation Analysis (DFA): In order to quantify long-range correlations or self-similarity in EEG signals, DFA, a non-linear feature, is calculated.When a signal is integrated and detrended at different time scales, DFA measures how far its fluctuations deviate from a straight line[32].DFA is a scaling exponent describing the relationship between fluctuation amplitude and time scales.In contrast, lower DFA values suggest weaker correlations or more random behavior, while higher DFA values indicate stronger long-range correlations or self-similarity, which suggests a more organized and predictable signal.Using these DFA features, one can gain insight into the signal's dynamics and complexity.These features can be useful for a variety of applications, including signal processing, time series analysis, and biomedical research.

TABLE 3 .
List of all extracted time and frequency domain features.

TABLE 5 .
Details about layers of multi-branch CNN added.

TABLE 6 .
Classification results for three-state using single-channel EEG data.

TABLE 7 .
Classification results for three-state using four-channel EEG data.
An overview of the proposed Multi-Branch CNN is already presented in section III.A total of 74 features extracted from each EEG channel are fed into a model to classify sleep states.The combined features from four left side channels and from four right side channels are used to classify neonate's sleep states.A complete representation

TABLE 8 .
Comparison of existing and proposed methods.