Automatic Detection of Scalp High-Frequency Oscillations Based on Deep Learning

Scalp high-frequency oscillations (sHFOs) are a promising non-invasive biomarker of epilepsy. However, the visual marking of sHFOs is a time-consuming and subjective process, existing automatic detectors based on single-dimensional analysis have difficulty with accurately eliminating artifacts and thus do not provide sufficient reliability to meet clinical needs. Therefore, we propose a high-performance sHFOs detector based on a deep learning algorithm. An initial detection module was designed to extract candidate high-frequency oscillations. Then, one-dimensional (1D) and two-dimensional (2D) deep learning models were designed, respectively. Finally, the weighted voting method is used to combine the outputs of the two model. In experiments, the precision, recall, specificity and F1-score were 83.44%, 83.60%, 96.61% and 83.42%, respectively, on average and the kappa coefficient was 80.02%. In addition, the proposed detector showed a stable performance on multi-centre datasets. Our sHFOs detector demonstrated high robustness and generalisation ability, which indicates its potential applicability as a clinical assistance tool. The proposed sHFOs detector achieves an accurate and robust method via deep learning algorithm.


I. INTRODUCTION
H IGH-FREQUENCY oscillations (HFOs) are considered a promising biomarker for epileptogenic zone (EZ) localisation and are highly correlation with a good surgical prognosis [1], [2], [3], [4].HFOs are defined as at least four continuous oscillations between 80 and 500 Hz that are significantly higher than the background brain activity [5], [6].According to the spectral range, HFOs can be subdivided into ripples (80-200 Hz) and fast ripples (200-500 Hz) [7], [8].However, intracranial EEG can be difficult to obtain.Conversely, scalp EEG has the advantages of non-invasiveness, low costs and repeatable rerecording [9]; it can be used to detect scalp HFOs (sHFOs) [10], [11], [12] as a promising non-invasive biomarker of epilepsy [13].These sHFOs originate from small cortical areas [14], [15] and show up in the scalp EEG signal as continuous oscillations.However, the thickness and high electrical resistance of the skull reduce intracranial electrical signal conduction, so sHFOs have a lower amplitude than intracranial HFOs, which often causes an isolated-island waveform to appear in the high-frequency band (>80 Hz) of time-frequency maps.Recent studies have applied sHFOs to diagnosing epilepsy [16], predicting the risk of a seizure [17], evaluating the effectiveness of treatments [18] and evaluating the postoperative situation [19].
The gold standard for sHFOs detection is visual marking by experienced electrophysiologists [20], [21], but this is highly time-consuming and subjective, which has limited the clinical application of sHFOs [22].To provide a more objective tool and reduce clinical workload, many studies have explored automatic sHFOs detection.Ellenrieder et al. [23] designed a sHFOs detector based on extracting two EEG features: the band signal amplitude ratio and absolute narrowband signal amplitude.Their detector could detect >95% of sHFOs events, but only 40% of the events were true positives.Chu et al. [24] applied the Hilbert transform to the filtered data and computed the amplitude envelope as the threshold.However, their approach is only semiautomatic and artifacts still need to be removed manually.Wang et al. [18] adjusted the parameters of the maximum distribution peak point algorithm for detecting intracranial HFOs and applied it to detecting sHFOs.They realised a sensitivity and specificity for the algorithm of 82.666% ± 5.428% and 63.352% ± 10.424%, respectively.Thus, existing sHFOs detectors have achieved a high detection rate, but the false detection rate is still very high.This indicates that they cannot effectively identify and remove artifacts such as EMG, harmonic signals or sharp transients [25].
In recent years, deep learning has shown a distinguished performance at EEG data processing [26], [27], [28] and has achieved a state-of-art performance in intracranial HFO detection.The ability of a convolution neural network (CNN) to extract features from time-frequency images and the ability of long short-term memory (LSTM) to integrate information from temporal EEG data make deep learning-based HFO detectors superior to traditional detection algorithms.Zuo et al. [29] converted 1D intracranial EEG (iEEG) into a 2D image signal and proposed a CNN-based intracranial HFO detector.Zhao et al. [30] applied the Morlet wavelet transform to converting candidate HFOs (c-HFOs) into time-frequency maps as the input for a CNN classifier.Lai et al. [31] combined short-time energy and a CNN for application to HFO detection.Both the time-domain features (EEG) and frequency-domain features (time-frequency maps) of sHFOs contain important information.However, most existing deep learning-based detectors only focus on data from a single dimension and do not integrate the two-dimensional characteristics of sHFOs.Artifacts due to muscle activity or bad electrode connections generally have similar continuous oscillations and a high amplitude in the time domain as sHFOs as well as a similar frequency range in the frequency domain.Therefore, a detector is needed that combines the two-dimensional information of sHFOs to eliminate artifacts in scalp EEG data.
In this paper, we propose an automatic sHFOs detector that addresses the above issues.It begins with an initial detection module to extract candidate HFOs (c-HFOs) in long-term EEG data and plot the corresponding time-frequency map.Then, 1D-CNN+LSTM and attention-based 2D-CNN deep learning models are applied to sHFOs detection.The 1D model takes the c-HFOs as input data to extract the time-domain characteristics and the 2D model takes the time-frequency map that is plotted by a continuous Morlet wavelet transform as the input data.Finally, weighted soft voting is applied to ensemble the prediction probabilities of the two models and to obtain the final classification results.

A. Data Acquisitions
In this study, scalp EEG data were collected from three clinical hospitals.Dataset   Hospital.The scalp EEG monitoring (10-20 system) data were obtained with a 10-20 system at a sampling rate of 1000 Hz.Approval for this study was obtained from the Ethical Committee of Shenzhen Children's Hospital (reference number 201904102).Dataset 3 included 11 patients (six males, aged 1-5 years old) diagnosed with epilepsy from March 2016 to September 2021 at Beijing Tiantan Hospital.The scalp EEG monitoring data were obtained with a 10-20 system at a sampling rate of 500 or 1000 Hz.Approval for this study was obtained from the Ethical Committee of Beijing Tiantan Hospital (reference number KY2022-016-02).
The datasets were divided as follows.For Dataset 1, scalp EEG data were recorded once for 37 patients.Then, 28 of 35 cases were selected for the training set and the remaining seven cases were selected for the test set.In addition, scalp EEG data were recorded twice for eight patients within an interval of less than 2 months.The first recording was selected for the training set and the second recording was selected for the test set.Overall, the training set had 36 cases and the test set had 15 cases.Datasets 2 and 3 were used as external datasets to test the robustness of the proposed detector.The details are presented in Table I.

B. Data Pre-Processing
The raw EEG data were pre-processed by using EEGLAB (https://sccn.ucsd.edu/eeglab/index.php), which is an opensource MATLAB toolbox that is widely applied to processing EEG data.The sHFOs were mainly detected as ripples (80-200 Hz) [10] because scalp EEG can only record fast ripples (200-500 Hz) in a few patients [32], [33].We used the finite impulse response filter for 80-200 Hz band-pass filtering of all data.The data were then transformed to a bipolar montage of 18 bipolar channels for each patient.To obtain the spectral characteristics of sHFOs, the time-frequency map of the 80-200 Hz bandpass-filtered signal was plotted by a continuous Morlet wavelet transform (Fig. 1).

C. Visual Marking of sHFOs
For each patient, 5-10-min segments of scalp EEG data were selected in the non-rapid eye movement (non-REM) sleep phase [34], [35], [36].Two experienced electrophysiologists visually marked sHFOs independently and the kappa coefficient was used to evaluate the consistency of the visual marking.The two electrophysiologists visually marked the scalp EEG data for the first minute and calculated the kappa coefficient [29], [37], [38].If kappa > 0.5, one of the viewers visually marked the remaining 4 min of scalp EEG data.If kappa < 0.5, the two viewers visually marked the data jointly again to establish a consensus.

D. Proposed Detector
Fig. 2 shows the framework of the proposed detector, which includes the initial detection and deep learning models.
1) Initial Detection: Some studies have used root mean square or short-time energy detectors for the initial detection of HFOs, but the reported detection rate is less than 90% [37], [39], which implies that about 10% of HFOs will be missed.To improve the initial detection performance, we designed a threshold-filtering module to extract c-HFO segments from long-term scalp EEG data.The module uses a sliding window to calculate a dynamic threshold and to locate and extract c-HFO segments (250 ms) that exceed the target threshold.Because sHFOs have a duration of 30-100 ms, we selected a length of 250 ms to extract a complete sHFOs and sufficient background information from the EEG data.
The procedure is as follows.Two sliding windows are applied with lengths of 1000 and 50 ms with an overlap of 50%.Each 1000-ms sliding window has 39 50-ms sliding windows (with 50% overlap).The 1000-ms sliding window was taken as the baseline and the ratio of the standard deviation (S D 50ms /S D 1000ms ) for each 50-ms sliding window is calculated.A c-HFO event was defined as when at least 1 ratio (S D 50ms /S D 1000ms ) of 50-ms sliding window exceeds the target threshold (1.7 SD) and at most 7 continuous ratios (S D 50ms /S D 1000ms ) exceed the target threshold (1.7 SD) (the actual duration of continuous sliding windows is less than 200ms, see Fig. 3(B)).Then, the center position of the continuous ratios is calculated.Finally, two 125ms EEG segments before and after the centre position are extracted as a c-HFO segment (250ms×1) and time-frequency map (250ms × 250Hz) is extracted at the corresponding positions.Fig. 3 shows the threshold-filtering module.

TABLE II ARCHITECTURE OF THE 1D-CNN+LSTM MODEL
feature maps are concatenated into the 2D matrix.Then, two stacked bidirectional LSTM layers with 256 cells are used to extract temporal features.Finally, three fully connected (FC) layers are employed to integrate features and the last FC layer has a sigmoid activation function to obtain the final prediction results.
3) Attention-Based 2D-CNN Model: Fig. 4(B) shows the attention-based 2D-CNN model.The detailed parameters are presented in Table III.Resnet18 is applied as the backbone model to extract the frequency-domain characteristics of HFO events [40].The core feature of Resnet18 is the stacking of residual blocks, which uses skip connection to reduce the influence of the vanishing gradient problem on the network.H (x) is the desired underlying mapping to be fitted by a few stacked layers: where x denotes the inputs to the first of these layers.Each residual block comprises two stacked 3 × 3 convolutional layers and the number of block stacks is [2, 2, 2, 2].

TABLE III ATTENTION-BASED 2D-CNN MODEL
We added a convolutional block attention module (CBAM) to each residual block to capture texture details in the feature maps [41].As shown in Fig. 4(C), CBAM is a lightweight attention mechanism module comprising a channel attention module and spatial attention module.As shown in Fig. 4(D), the channel attention module analyses the information between feature maps by using average pooling and max pooling.Then, a multilayer perceptron (MLP) is used to add the output features, which are activated by a sigmoid function to obtain the attention-weighted feature map.The channel attention is computed as follows: where σ denotes the sigmoid function and F is the input feature map.
As shown in Fig. 4(E), the spatial attention module is calculated by global average pooling and global max pooling.Then, the output features are concatenated and a convolution operation is used to reduce the feature map dimensions to one.Finally, the sigmoid activation function is used to obtain the spatial weighted feature map.The spatial attention is computed as follows: where σ denotes the sigmoid function, f 7×7 represents a convolution operation with a kernel size of 7 × 7 and F is the input feature map.4) Implementation: All deep learning models were implemented in Python 3.7 and TensorFlow 2.2 and were trained on Intel(R) Xeon(R) Gold 5118 CPU and NVIDIA Quadro P5000 GPU.The positive samples (sHFOs) of the training set are obtained from visual marking result, while the negative samples (artifacts) are extracted from EEG segments that do not include sHFOs.To ensure that the number of positive and negative samples is 1:1, we performed data augmentation on the positive samples and shifted the midpoint of each highfrequency oscillation by 0-50 ms to extract multiple positive samples with variation.Then, we randomly selected 90% of samples for training and 10% of samples for validation.The batch size was 32 with 35 epochs for the 1D-CNN+LSTM Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
model and 32 with 50 epochs for the attention-based 2D-CNN model.We applied the Adam optimiser [42] at a learning rate of 1e−4.Binary cross-entropy was used for the loss function: where y i is the actual label, y ∈ {0, 1}, y is the predicted output and N is the batch size.We used five-fold cross-validation to evaluate the stability and generalisation ability of the 1D-CNN+LSTM and attention-based 2D-CNN models.5) Weighted Soft Voting: The 1D-CNN+LSTM and attention-based 2D-CNN models were each trained and used to extract features from the time and frequency domains of scalp EEG data for sHFOs detection.The two models focus on different features, so we selected weighted soft voting to combine their outputs [43].The model outputs were given different weights and combined linearly to obtain the final predicted detection probability.Weighted soft voting is formulated as follows: Pr edict voting = w × Pr edict 1D +(1 − w)× Pr edict 2D (5) where w and Pr edict 1D are the weight and output, respectively, of the 1D-CNN+LSTM model and (1 − w) and Pr edict 2D are the weight and output, respectively, of the attention-based 2D-CNN model.w has a range of 0-1.

E. Statistical Analysis
For sHFOs detection, we evaluated the performance of sHFOs detectors using c-HFOs segments obtained from the initial detection; each c-HFOs is marked as sHFOs or artifact based on the visual marking result.We selected the accuracy, precision, recall, specificity and F1-score as the main evaluation indicators: where a true positive (T P) is a c-HFO that is detected by the proposed detector and visually marked as sHFOs; a true negative (T N ) is a c-HFO that is not visually marked as sHFOs; and a false positive (F P) is an artifact that is detected as an sHFOs and a false negative (F N ) is a visually marked HFO that is missed by the proposed detector.We applied the kappa coefficient to evaluate the consistency of the visual marking and proposed detector, where we defined kappa < 0.5 as agreement due to chance, kappa > 0.5 as excellent consistency and kappa = 1 as complete agreement.The Pearson correlation coefficient was used to analyse the correlation between the automatic detection and visual marking results and it is expressed as follows: where n is the sum of bipolar channels in the test set and X i and Y i are the sHFOs number of automatic detection and visual marking for each bipolar channel, respectively, for each EEG electrode.A correlation coefficient (r) of 0.8-1 indicates a strong correlation and P < 0.05 was considered significant.

B. Initial Results
We applied the threshold-filtering module to automatically extract c-HFOs from long-term EEG data.The module extracted about 96% of the visually marked sHFOs on average, but a large number of artifacts exceeding the threshold were also extracted and the number of artifacts increased exponentially as the threshold decreased.Fig. 5(A) shows the initial results for the test set with different thresholds.Based on the elbow method [44], a target threshold of 1.7SD  Comparison between number of sHFOs with automatic detection and visual marking.
was selected to balance the number of sHFOs and artifacts.Fig. 5(B) shows the initial results for the test set with a threshold of 1.7SD.In addition, we compared the initial detection results of the threshold-filtering module and the RMS method [39].The threshold filtering module has a higher detection rate and lower number of artifacts.Table IV shows the comparison of initial detection.
We applied weighted soft voting to combine the predicted detection probabilities of the two models.We selected weights of 0.65 for the 1D-CNN+LSTM model and 0.35 for the attention-based 2D-CNN model.Fig. 6 shows the test results with different weight combinations.Combining the two models achieved an accuracy of 94.39% (precision = 83.44%,recall = 83.60%,specificity = 96.61%,F1-score = 83.42%).Table V presents the weighted soft voting results.The automatic detection and visual marking results had a kappa coefficient of 0.8001, Pearson correlation coefficient of 0.9901 and significance level of less than 0.0001.Fig. 7 compares the automatic detection and visual marking results for 15 patients in the test set.Our proposed detector not only achieved a high sHFOs detection rate but also minimised the false positive rate.However, the proposed detector still had some false detections and missed detections (Fig. 8).The missed detections can be attributed to the small amplitude of HFO events (<3 µV).The false detections included many irregular waveforms or three continuous oscillation events.

D. Comparison With Other Detectors
At present, few algorithms are available for the reliable and automatic detection of sHFOs.We compared our proposed detector with previous detectors that have been applied to sHFOs detection, including the max-distributed peak points detector [18] and statistic method detector [45].We tested the above two detectors on Dataset 1 and Table VI compares the results.

E. Ablation Study
To evaluate the impact of different modules in the proposed detector, we performed ablation experiments by removing the initial detection module, 1D-CNN+LSTM model, or attentionbased 2D-CNN model.Table VII presents the influence on each module.When the initial detection module was removed, the input data were extracted as 250-ms segments from the long-term EEG (80-200 Hz) data without overlap.The application of the initial detection module effectively located each sHFOs event, which significantly promoted the recall of the classification performance.

F. External Dataset Results
We tested our proposed detector on datasets 2 and 3 from Shenzhen Children's Hospital and Beijing Tiantan Hospital, respectively.For Dataset 2, the proposed detector achieved an Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

IV. DISCUSSION
For epilepsy diagnosis and treatment, sHFOs are a promising non-invasive biomarker.However, visual marking of sHFOs in EEG data is time-consuming and existing detectors have demonstrated an unreliable performance, which has limited their clinical application.We propose an automatic sHFOs detector based on a deep learning framework that extracts EEG features in both the time and frequency domains.Compared with existing algorithms, our detector achieved a better performance with a higher detection rate and lower false positive rate.The results of our detector showed a high level of consistency with the visual marking results and an excellent performance with external datasets, which indicate its potential for clinical application.
We designed a threshold-filtering module for the initial detection of sHFOs (Fig. 3).However, this module also extracts a large number of artifacts, which affects the detection performance.We selected a target threshold of 1.7SD to balance the sHFOs detection rate and number of artifacts (Fig. 5).At this threshold, about 4% of the sHFOs were missed by the initial detection, including some low-amplitude sHFOs that did not exceed 1.7SD.In addition, artifacts with a high amplitude before and after a sHFOs event may have also resulted in a missed detection.In the ablation experiments, the initial detection module was shown to be essential to the proposed detector.Setting an appropriate target threshold is essential to eliminating most artifacts and improving the detector performance.
Many sHFOs detectors only extract features in the time domain or frequency domain [18], [22], [24], [46].However, the complex recording environment of scalp EEG makes it difficult for these detectors to effectively identify artifacts with similar characteristics to sHFOs, which results in a high false detection rate.Our proposed detector combines a 1D-CNN+LSTM model to extract time-domain features from the EEG data and an attention-based 2D-CNN model to extract frequency-domain features from the time-frequency maps.To improve the reliability of the deep learning model in classifying artefact segments, when processing the training set, we retained the abnormal channels and the data segments with Electromyographic (EMG) artefacts and abnormal waveforms.To avoid data leakage, we chose an independent test set to evaluate the proposed model.Table V indicates that the proposed detector performed better (precision = 83.44%,recall = 83.60%,F1-score = 83.42%)than the 1D-CNN+LSTM model or attention-based 2D-CNN model alone.The visual marking and automatic detection results had an average kappa coefficient of 0.8001 and Pearson correlation of 0.9901.This indicates that our proposed detector shows close agreement with visual marking, which is the gold standard for sHFOs detection.Many automatic detectors have been reported with an excellent detection performance for a single dataset.However, few researchers have evaluated the performance of their detectors with external datasets.To evaluate the robustness and generalisation ability of the proposed detector, we tested it on external datasets collected from two other clinical hospitals.Tables VIII and IX present the results of the two datasets.Our detector achieved a high level of performance and showed good generalisation and robustness.Our detector had kappa coefficients of over 0.79 for all three test sets, which indicates a high level of consistency with the visual marking results.
There are several limitations to our research.Scalp EEG signals are the sum of the cerebral cortex's activity, which implies that sHFOs may be recorded by adjacent scalp electrodes.We visually examined the scalp EEG data of patients with sHFOs and found that, even in low-density EEG data (18 bipolar channels), sHFOs with a high amplitude (>5 µv) can be recorded by adjacent electrodes because of the volume conduction effect [47].The spatial propagation of HFOs has been mentioned in some studies [48], [49], [50].In the future, we plan to design a module to identify sHFOs recorded by adjacent electrodes and evaluate the impact on the results.Moreover, classification of physiological and pathological HFOs in the ripple band is challenging.Some studies have shown that there are differences between the two types of HFOs in terms of time-frequency features, spatial distribution, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE IX WEIGHTED SOFT VOTING RESULT OF DATASET 3
and low-frequency coupling [51], [52], [53], and the classification method have been designed based on machine learning algorithms [53], [54], lacking deep learning-based methods.One most common approach is to classify physiological HFOs if they occur within the normal zone [49], [55], which can label physiological and pathological HFOs.The patients included in this study were children and the primary treatment was antiepileptic drugs.There were no plans to treat these patients with surgery, so we did not obtain further clinical information, making it uneasy to label physiological and pathological sHFOs.The lack of labels limits the implementation of deep learning-based detectors for training.In the future, we plan to collect scalp EEG of patients undergoing epileptogenic resection surgery to explore the recognition of physiological sHFOs based on deep learning algorithms.In addition, because of the limitations of the hospital equipment, we collected low-density EEG data (18 channels), which is sufficient for evaluating the effectiveness of epileptic drug treatments and postoperative prognoses but is difficult to apply to accurate EZ evaluation.In future studies, we plan to include the data of children and adolescents who have undergone epilepsy surgery and to collect high-density scalp EEG data for evaluating the applicability of sHFOs to preoperative auxiliary localisation and postoperative prognosis evaluation.

V. CONCLUSION
Herein, we propose an automatic deep learning algorithm for sHFO detection.Despite the presence of EMG artifacts and noise in EEG recordings, our algorithm can accurately locate and identify sHFOs from scalp EEG, yielding excellent detection results.The false positive rate was considerably lower than the existing automatic detection algorithms.Meanwhile, the automatic detection results of our algorithm were highly consistent with visual marking.In addition, our detector exhibited stable performance when applied to datasets obtained from different centres, which makes it reliable for clinical application.

Fig. 2 .
Fig. 2. Framework of the proposed sHFOs detector.The initial detection extracts candidate HFOs (c-HFOs) from 80-200 Hz bandpass-filtered EEG data and a time-frequency map is plotted.The processed data are input into the 1D-CNN+LSTM and attention-based 2D-CNN models.Finally, the classification results are obtained using the weighted soft voting method.

Fig. 4 (
A) shows the 1D-CNN+LSTM model of the deep learning framework.The detailed parameters are presented in Table II.This model combines a 1D-CNN and LSTM to extract and analyse the time-domain characteristics of the EEG data.The c-HFO segments with a size of 250 × 1 are used as the input.The first two layers of the model are multiscale convolutional layers that extract waveform and amplitude features from the 1D raw data.Four different convolution kernels are used with sizes of 1 × 16, 1 × 32, 1 × 64 and 1 × 128 and zero-padding is applied to ensure the same output size at each scale.Four different scale

Fig. 5 .
Fig. 5. Initial detection results.(A) Relationship between artifacts and the sHFOs detection rate at different thresholds (1.5-2SD).(B) sHFOs detection rate for the test set at a threshold of 1.7SD.

Fig. 6 .
Fig. 6.Weighted soft voting result.The weight ranges from 0-1.The relationship between 1D and 2D model weights is w:(1)-w).When the weight of the 1D model is 0.65, and the weight of the 2D model is 0.35.

Fig. 7 .
Fig. 7.Comparison between number of sHFOs with automatic detection and visual marking.

TABLE VIII WEIGHTED
SOFT VOTING RESULT OFDATASET 2