Quantification of Hypsarrhythmia in Infantile Spasmatic EEG: A Large Cohort Study

Infantile spasms (IS) is a neurological disorder causing mental and/or developmental retardation in many infants. Hypsarrhythmia is a typical symptom in the electroencephalography (EEG) signals with IS. Long-term EEG/video monitoring is most frequently employed in clinical practice for IS diagnosis, from which manual screening of hypsarrhythmia is time consuming and lack of sufficient reliability. This study aims to identify potential biomarkers for automatic IS diagnosis by quantitative analysis of the EEG signals. A large cohort of 101 IS patients and 155 healthy controls (HC) were involved. Typical hypsarrhythmia and non-hypsarrhythmia EEG signals were annotated, and normal EEG were randomly picked from the HC. Root mean square (RMS), teager energy (TE), mean frequency, sample entropy (SamEn), multi-channel SamEn, multi-scale SamEn, and nonlinear correlation coefficient were computed in each sub-band of the three EEG signals, and then compared using either a one-way ANOVA or a Kruskal-Wallis test (based on their distribution) and the receiver operating characteristic (ROC) curves. The effects of infant age on these features were also investigated. For most of the employed features, significant ( ${p} < {0}.{05}$ ) differences were observed between hypsarrhythmia EEG and non-hypsarrhythmia EEG or HC, which seem to increase with increased infant age. RMS and TE produce the best classification in the delta and theta bands, while entropy features yields the best performance in the gamma band. Our study suggests RMS and TE (delta and theta bands) and entropy features (gamma band) to be promising biomarkers for automatic detection of hypsarrhythmia in long-term EEG monitoring. The findings of our study indicate the feasibility of automated IS diagnosis using artificial intelligence.

ficient were computed in each sub-band of the three EEG signals, and then compared using either a one-way ANOVA or a Kruskal-Wallis test (based on their distribution) and the receiver operating characteristic (ROC) curves.The effects of infant age on these features were also investigated.For most of the employed features, significant (p < 0.05) differences were observed between hypsarrhythmia EEG and non-hypsarrhythmia EEG or HC, which seem to increase with increased infant age.RMS and TE produce the best classification in the delta and theta bands, while entropy features yields the best performance in the gamma band.Our study suggests RMS and TE (delta and theta bands) and entropy features (gamma band) to be promising biomarkers for automatic detection of hypsarrhythmia in long-term EEG monitoring.The findings of our study indicate the feasibility of automated IS diagnosis using artificial intelligence.

I. INTRODUCTION
I NFANTILE spasms (IS), known also as West Syndrome [1], is a unique neurological disorder affecting infants aging mainly between one week to three years [2], [3].It is reported that the average incidence rate and prevalence rate of IS are 0.31 and 0.25 per 1000 children, respectively [2], [3], [4].Higher values are observed in the regions with higher geographic latitudes, such as Finland, Sweden, and Denmark [5], [6], [7].Despite the low incidence and prevalence rates, severe and frequent epileptic seizures can permanently impair the cognitive, learning, and language functions of the brains in most infants [8], [9], [10], [11].
Timely medical intervention may prevent IS-caused mental and/or developmental retardation [12].In fact, since 1958, a number of studies have reported the effectiveness of adrenocorticotropic hormone, corticosteroids, and more traditional anticonvulsants in the treatment of IS [13], [14], [15].However, proper medical intervention lies strongly on timely diagnosis of this disorder.IS is characterized by three main motor spasms, i.e., flexor, extensor, and mixed extensor-flexor [16].The intensity of the spasms may vary from a massive contraction of many muscle groups to a minimal contraction of isolated muscle groups.Typical clinical manifestation of IS includes sudden contractions of the trunk and limbs followed by brief episodes of rigidity, e.g., repeated bowings and relaxings [1].Some subtle spasms may, unfortunately, be easily undetected by casual observation.
Several studies have associated the electroencephalography (EEG) features with IS, and coined the term hypsarrhythmia to describe the interictal pattern in IS patients [17], [18], [19].Long-term video/EEG monitoring has therefore being proposed for the diagnosis of this disorder [20].Brain imaging technologies such as computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET), have also been proposed to detect the brain abnormalities associated with IS, providing not only helpful information for the classification between cryptogenic or symptomatic patients but also some insights into possible pathophysiologic mechanisms of this disorder [21], [22], [23], [24].
Among these technologies, long-term video/EEG monitoring is most frequently adopted in clinical practice for IS diagnosis [25], [26], [27].Unfortunately, the detection of hypsarrhythmia in the EEG signals is currently performed by manual screening due to the lack of automatic tools.This procedure is labor intensive and time consuming, particularly for long-term, e.g., 24 hour, monitoring [27].Besides, although the definition of hyperarrhythmia is straightforward, i.e., random high voltage slow waves and spikes [1], manual EEG screening lacks precision and is usually subjective, leading to low inter-rater reliability even among experienced pediatric electroencephalographers and/or even missed diagnosis of this disorder [25], [28].Automatic hyperarrhythmia detection from long-term EEG recordings is therefore demanded for efficient and precise diagnosis of IS.
Quantitative analysis of the EEG signals is essential for identifying reliable biomarkers for accurate hypsarrhythmia detection.Unfortunately, due mainly to the rareness of this disease, only few studies have investigated the quantitative characteristics of hypsarrhythmia in IS patients [29], [30], [31], [32], [33], [34], [35].Smith et al. quantified the amplitude and power spectral features in hypsarrhythmia EEG [30].Some other studies identified high frequency oscillations during interictal periods as an objective biomarker for IS [31], [34].Chu et al. proposed multiscale entropy as a biomarker for abnormal EEG patterns in IS, and investigated its variation between pre-and post-treatment [32].Zheng et al. compared the functional connectivity of the brain in three different states, i.e., pre-, during, and post-spasms in IS patients [33].
These studies show clear differences in the proposed features between hypsarrhythmia and non-hypsarrhythmia EEG.However, only few features have been investigated in these studies, with one in each study.In fact, a number of linear and non-linear features have been employed to analysis biomedical signals, such as EEG, ECG, and EMG [36], [37], which have, unfortunately, never been investigated for the characterization of hypsarrhythmia.Besides, the statistical significance of the few existing hypsarrhythmia studies is limited due mainly to the enrolled small cohorts of infants, i.e., between 15 to 30 [30], [31], [32], [33], [34].
The aim of the present study is, therefore, to quantify reliable and statistically significant EEG features as biomarkers for the discrimination between hypsarrhythmia and non-hypsarrhythmia or normal EEG based on a large cohort of IS infants.The adopted dataset consists of 101 IS patients and 155 healthy infants with 16-channel 24-hour scalp EEG recordings.A number of features, including amplitude and spectral features, entropy features, and some nonlinear features, are estimated in each sub-band of the EEG signals, i.e., delta, theta, alpha, beta, and gamma.Dedicated statistical analysis is performed on each feature in order to examine its statistical differences between hypsarrhythmia and non-hypsarrhythmia or normal EEG.The receiver operating characteristic (ROC) curves and the area under the curve (AUC) are computed for individual features to explore their classification power in the detection of hypsarrhythmia.Furthermore, the effect of infant age on the EEG features is also investigated by dedicated comparison among sub-groups with different ages, i.e., 0-6 months, 6-12 months, 12-24 months, and older than 24 months.

A. Dataset and Preprocessing
The present study involved a large cohort of 101 infants that were suspected of having infantile spasms and underwent overnight video/EEG evaluation at Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine between January 2018 and November 2022.Besides, 155 heathy infants were included as heathy control (HC).The demographic information of the enrolled infants is reported in Table I.This study followed strictly the guideline of 'Declaration of Helsinki: ethical principles for medical research involving human subjects'.The Ethics Committee at Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine approved the use of human subjects and waved the requirement for the informed consents as this is a retrospective study.
For both IS patients and HC, overnight scalp EEG were recorded according to the 10-20 international system [38] using 10-mm golden plate electrodes.The detected EEG signals were amplified by a 32-channel NicoletOne TM EEG System (Natus, USA) with a sampling frequency of 500 Hz.A clinical expert annotated manually the hypsarrhythmia segments in all the 101 IS infants.A 15-s hypsarrhythmia segment and a non-hypsarrhythmia segment with the same length were selected by the expert for each patient.Those infants who could not produce typical hypsarrhythmia and non-hypsarrhythmia EEG signals with sufficient length (15 s) were excluded from this study, resulting in 92 infants remained for subsequent analyses.Similarly, for the HC group, a 15-s segment was randomly picked from each subject.As a consequence, three datasets, i.e., hypsarrhythmia, nonhypsarrhythmia, and HC, were generated and considered in the present study.An example of the three EEG signals is shown in Fig. 1.
For each EEG recording, the 16 channels placed at Fp1, F3, C3, P3, O1, F7, T3, T5, Fp2, F4, C4, P4, O2, F8, T4, and T6 were used for analysis.The EEG signals were re-referenced with respected to the common average and then band-pass filtered between 0.5 and 70 Hz using a third-order Butterworth filter.Then a second-order IIR notch filter at 50 Hz was applied to each channel to remove possible power-line interference.For both filters, the forward-backward filtering approach was implemented in order to avoid phase shift.Finally, each EEG signal was divided into five sub-bands, i.e., delta (0-4 Hz),

B. Feature Extraction
A number of features were extracted from each sub-band of the EEG signals, including amplitude and spectral features, entropy features, and nonlinear correlation coefficient.In fact, these features are commonly used for the analysis of biomedical signals, quantifying information such as energy, periodicity, complexity, and nonlinearity of the data [36], [37].Estimation of these features were briefly summarized hereafter.
1) Root Mean Square (RMS): RMS is a statistical measure of signal energy.For a discrete EEG signal x[n] with a length of N, RMS is calculated as 2) Teager Energy (TE): Teager energy is an estimation of instantaneous energy with an excellent time resolution [39], and has been employed for the analysis of many biomedical signals [40].For a discrete EEG signal x[n], the instantaneous TE is estimated as In the present study, the average value of TE[n] over all the time instances is considered for one channel.

3) Median Frequency (MF):
The MF is computed as where P[ f ] is the power spectral density (PSD) of a single channel EEG signal x[n] and f s the sampling frequency.4) Sample Entropy (SamEn): Approximate entropy (ApEn) and SamEn have been widely used to assess the irregularity of a time series [37], [41].Given the fact that estimation of ApEn and SamEn are quite similar, only SamEn was considered in the present study, which is estimated in each sub-band of the 15-s EEG segment.A higher value of SamEn indicates a larger degree of irregularity.The 15-s (length N = 7500) signal x[n] is divided into epochs with length of m (m = 2 in the present study), producing L = N − (m + 1) × △ vectors, denoted as x m and given by where △ is an integer time delay and is set to 1 in the present study in order to assess the entropy measures without downsampling [41].
The number of epochs in x m [q] (q = 0, 1, . . ., L − 1) with distance from a fixed epoch x m [ p] (q ̸ = p) smaller than a pre-defined tolerance r is then counted as B p .In the present Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
study, the Euclidean distance ||x m [ p] − x m [q]|| 2 is adopted as distance measure and the tolerance r is set to 0.2 times the standard deviation of the data.After obtaining B p , the empirical probability C m p (r ) that a epoch in x m [q] is within r from x m [ p] can be estimated as With the definition the ApEn is calculated as [41] SamEn(m, r, N 5) Multi-Channel Sample Entropy (McSamEn): Multichannel SamEn is used to asses the spatiotemporal irregularity of all the 16-channel EEG recordings [42].The signal in each channel is first divided into length-m epochs, in a similar way as SamEn.For a fixed epoch p in channel i, i.e., x im [ p], the number of epochs in channel j ( j = 1, . . ., 16), i.e., x jm [q], q ̸ = p, with distance from x im [ p] smaller than r is counted as B j p .The empirical probability is estimated as Defining the MApEn is calculated as 6) Multi-Scale Entropy (MsSamEn): Multi-scale entropy has been proven to be more suitable for analyzing complex physiological signals [32] and is, therefore, adopted in the present study.MsSamEn is developed based on the SamEn by first deriving a set of coarse-grained time series Sample Entropy is then calculated on x τ [k].Given the length of the original single x[n], ten different scales, i.e., τ = 1, 2, . . ., 10, were considered in the present study, and the average result over all scales was adopted as the complexity index of the signal.

7) Nonlinear Correlation Coefficient (NCC):
The nonlinear correlation coefficient of a time series signal is used to describe the nonlinear dynamic behavior between two signals.A higher NCC indicates a higher nonlinear correlation between the signals.NCC is calculated as follows: where f (x) is a linear piecewise approximation of the nonlinear regression curve.

C. Statistical Analysis
For each subject, McSamEn was estimated among all the 16 EEG channels.RMS, TE, MF, SamEn, and MsSamEn were calculated in individual channels, and the average results over the 16 channels were considered for each subject.Besides, NCC was calculated between two channels, and the average result over the 120 (16 × 15/2) different pairs was taken into account.
Each feature extracted from the three datasets, i.e., hypsarrhythmia, non-hypsarrhythmia, and HC, was expressed as mean ± standard deviation.A Kolmogorov-Smirnov test was employed to examine the distribution of each feature.For the features with normal distribution, a oneway ANOVA was adopted to assess the global difference in each feature among the three datasets.Besides, the differences in each feature between paired groups, e.g., hypsarrhythmia vs. non-hypsarrhythmia, hypsarrhythmia vs. HC, and non-hypsarrhythmia vs. HC, were also tested using a multi-comparison with the 'tukey-kramer' criterion.The significant level was set to 0.05.For un-normally distributed features, a non-parametrical method, i.e., Kruskal-Wallis test, was used for the statistical analysis.
Furthermore, the ability of each feature to provide correct classification between hypsarrhythmia and non-hypsarrhythmia EEG was also assessed by the area under the receiver operating characteristic (ROC) curve,derived over the full dataset of remained 92 IS patients using a threshold procedure.In addition, a 5-fold cross-validation was performed in order to evaluate sensitivity, specificity, and accuracy of each feature for the classification between hypsarrhythmia and non-hypsarrhythmia EEG.To this end, for each feature, both hypsarrhythmia and non-hypsarrhythmia were subdivided into 5 groups.An optimal threshold was then determined by ROC curve analysis (point closest to the upper left corner) on 4 groups and applied to the remaining group to evaluate the classification performance, rounding until all groups underwent classification.This procedure was repeated for 10 random subdivisions in 5 groups.

III. RESULTS
Figure 2 shows the mean and standard deviation of each feature extracted from the three datasets, calculating in the sub-bands of the EEG signals.As indicated by RMS, the average signal energy concentrates in the delta band for all the three datasets, and it decreases dramatically with increased frequency band.Besides, in all frequency bands, RMS extracted from hypsarrhythmia dataset is significantly ( p < 0.05) higher than that from non-hypsarrhythmia dataset or HC, indicating significantly larger signal energy during hypsarrhythmia.Such difference decrease also with increased frequency band.No significant difference between non-hypsarrhythmia and HC is observed in RMS.Similar trend is also observed in NCC.
Different from RMS and NCC, the entropy features increases with increased frequency band for all the three datasets.However, similar to RMS and NCC, all entropy features, including SamEn, McSamEn, and MsSamEn, extracted from the hypsarrhythmia EEG are significantly different (lower) from that extracted from non-hypsarrhythmia EEG or Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Features extracted from the hypsarrhythmia and non-hypsarrhythmia EEG for different age groups.
HC, suggesting a more regular patterns in the EEG signals during hypsarrhythmia as compared to the rest state.Besides, such difference seems to increase with increased frequency band until to beta.In addition, in some frequency band, significant differences in the entropy features are also observed between non-hypsarrhythmia EEG of the IS patients and the normal EEG of the HC.
Difference in MF among the three datasets is less impressive.MF extracted from the hypsarrhythmia EEG is significantly different from that extracted from the non-hypsarrhythmia EEG only in the beta and gamma bands.And we can also observe significant difference between the non-hypsarrhythmia EEG and HC in delta, theta, and gamma bands.
The ROC curves of each feature discriminating between hypsarrhythmia and non-hypsarrhythmia EEG, derived from the full IS datasets, are shown in Fig. 3.In the delta band, RMS and TE produce the best classification while the entropy features yield the poorest classification.However, with an increase in the frequency band, the classification power of RMS and TE decreases gradually while the classification power of the entropy features increases gradually.Eventually, in the gamma band, the entropy features, particularly the MsSamEn, produces the best discrimination ability between hypsarrhythmia and non-hypsarrhythmia EEG.The classification power of NCC and MF seems to be less promising except for NCC in the delta band.The AUC values of each feature in each EEG band are reported in Table II.
Accuracy, sensitivity, specificity, and F 1 score of the 5-fold cross validation are also reported in Table II.Similarly, RMS and TE produces the best performance in the delta band and gradually decreased performance with increased frequency band.Entropy features produce gradually increased performance with increased frequency band until beta.Although the performance of the entropy features drops a little in the gamma band as compared to beta, they are the best among all the features in this band.MF and NCC produces less impressive cross-validation results in all frequency bands.
The effects of infant age on the extracted features are shown in Fig. 4. For RMS, significant differences between hypsarrhythmia and non-hypsarrhythmia EEG are determined almost in all sub-age groups except for one age group (0-6 m) in the gamma band.Besides, the observed difference seems to increase with increased infant age.Similar trend can also been observed in the TE feature.All the entropy features extracted from the hypsarrhythmia EEG are significantly lower than that from non-hypsarrhythmia EEG in most of the sub-bands and sub-age groups.These difference seems also to increase with an increase in the infant age.Such age effect can also IV.DISCUSSION The primary goal of this study was to explore whether quantitative EEG features might be employed for the discrimination between hypsarrhythmia and non-hypsarrhythmia EEG in IS patients, and thus identify them as potential biomarkers for automatic hypsarrhythmia detection in long-term EEG monitoring.RMS, TE, MF, SamEn, McSamEn, MsSamEn, and NCC are widely used features in the analysis of physiological signals [36], [37] and therefor are analysed in the present study.
Our results show that the hypsarrhythmia EEG produces large amplitude (RMS) in the delta and theta bands.This is due to the fact that hypsarrhythmia EEG consists mainly of high voltage slow waves [1], [13].On the other hand, the non-hypsarrhythmia and HC EEG are recorded during sleeping and therefore consist also mainly of slow waves.Yet, significant difference in RMS between hypsarrhythmia and non-hypsarrhythmia or HC EEG is observed in the delta and theta bands, suggesting RMS, particularly computed in the lower EEG bands, to be an excellent biomarker for automatic hypsarrhythmia detection.
Note that RMS computed in lower EEG bands is very sensitive to motion artifacts, which may violate its ability to distinguish between hypsarrhythmia and non-hypsarrhythmia EEG.TE computed in the delta band produces good classification performance between hypsarrhythmia and non-hypsarrhythmia EEG and, therefore, may also be a promising biomarker for hypsarrhythmia detection.In fact, TE results in an excellent time resolution and is robust to white noise and tonal interference [43].Unfortunately, due to the short support width, TE is sensitive to transient signals and hence susceptible to transient noise such as motion artifacts.
Apart from the slow waves, it is reported that hypsarrhythmia may also contain sharp waves and spikes [1], [13].Consequently, as shown in our results, RMS, SamEn, MsSamEn, and McSamEn computed in higher EEG bands show also significant difference between the hypsarrhythmia EEG and the other two.Furthermore, the entropy features, Features extracted from the hypsarrhythmia and non-hypsarrhythmia EEG for different age groups.
particularly MsSamEn, computed in the gamma band produces the best classification between hypsarrhythmia and nonhypsarrhythmia.MsSamEn in the gamma band can therefore be considered as a most powerful biomarker for hypsarrhythmia detection with minimized disturbance from motion artifacts, since motion artifacts usually present in the low frequency band and may be expected to be eliminated in the gamma band.
Besides, studies have reported that scalp-recorded high frequency oscillations, i.e., > 80 Hz, may serve as an objective EEG biomarker of infantile spasms [31], [34].We do observed significant difference in MF, computed in the beta and gamma bands, between hypsarrhythmia and non-hypsarrhythmia EEG.However, we do not investigate the high frequency components beyond 70 Hz, as, for scalp-recorded EEG, these high frequency components may suffer from extremely low signal to noise ratio and are, therefore, not reliable for identifying biomarkers of IS.In fact, most of previous studies investigating the high frequency oscillations utilized invasive EEG measurements rather than scalp EEG [44], [45], [46].
Interesting to note that for most of the computed features, the difference between hypsarrhythmia and non-hypsarrhythmia EEG increases with increased infant age.This observation may be ascribed to the fact that the infant brain is undergoing a rapid development during the first couple of years.It may also suggest that automatic detection of hypsarrhythmia EEG in infants with age below 6 months is much more challenging.More efforts are therefore required for improving auto-detection of hypsarrhythmia in the early age of infants.
Worth also to note that for TE and many of the entropy features, significant difference is also observed between non-hypsarrhythmia EEG and the EEG of the HC.This may, on the one hand, be explained by inter-subject variability.On the other hand, it may indicate inherently different EEG patterns for the IS infants even without hypsarrhythmia as compared to the HC.Nevertheless, it will not violate the identification of TE and the entropy features as objective hypsarrhythmia biomarkers since TE, SamEn, MsSamEn, and McSamEn extracted from hypsarrhythmia EEG are significantly different from that extract either from the non-hypsarrhythmia EEG nor the HC (Fig. 2).
As discussed, RMS, TE and the entropy features may be compensatory for each other.It is therefore reasonable to expect an improved classification performance by combining these features using dedicated machine learning algorithms.However, the present study is a proof of principle, focusing mainly on the identification of individual biomarker of hypsarrhythmia.Combining different features and dedicated machine learning algorithms for automatic hypsarrhythmia detection may be interesting and important directions for our future studies.
Worthy also to note that, in the present study, all the data used for analysis are selected during sleep.However, the EEG signals can be more sensitive to noise during daytime.The applicability of the identified biomarkers during daytime may therefore be violated, and thus needs more extensive evaluation.Yet, 24-hour EEG recording is a standard procedure for the diagnosis of infantile spasms in clinical practice, and hypsarrhythmia detection during sleep is sufficient for clinical decision making.

V. CONCLUSION
In the present study, we quantitatively analysed several EEG features in a large cohort of 101 IS patients and 155 healthy control in order to identify possible biomarkers of hypsarrhythmia.Our results suggests that RMS and TE computed in the delta and theta bands and the entropy features computed in the gamma band may be considered as promising biomarkers for automatic hypsarrhythmia detection.Besides, infant age may influence the difference in these features between hypsarrhythmia and non-hypsarrhythmia.Auto-identification of hypsarrhythmia in the early age is challenging and therefore requires more efforts and investigation in future studies.

Fig. 2 .
Fig. 2.Features extracted from the hypsarrhythmia and non-hypsarrhythmia EEG for different age groups.

Fig. 3 .
Fig. 3.ROC for discrimination between hypsarrhythmia and non-hypsarrhythmia EEG derived from the full IS datasets.FPR: false positive rate; TPR: true positive rate.

Fig. 4 .
Fig. 4.Features extracted from the hypsarrhythmia and non-hypsarrhythmia EEG for different age groups.

TABLE II FEATURES
EXTRACTED FROM THE FULL DATASET OF THE HYPSARRHYTHMIA AND NON-HYPSARRHYTHMIA EEG TOGETHER WITH THE RESULTS OF THE 5-FOLD CROSS VALIDATIONbe determined in the NCC feature in all EEG bands except for gamma.No age-related increase in the difference between hypsarrhythmia and non-hypsarrhythmia is observed for MF.