Predicting Ventricular Fibrillation Through Deep Learning

Ventricular fibrillation (VF) is a type of cardiac arrhythmia. This chaotic cardiac electrical activity results in heart quivering instead of normal pumping. To date, early cardiopulmonary resuscitation (CPR) and defibrillation are the only effective VF treatment. Acute myocardial infarction is the most common cause of VF, and cardiomyopathy, myocarditis, electrolyte imbalance, cardiotoxic medication, and even ion channel abnormality can cause VF. Physicians have attempted to identify specific patterns in electrocardiography (ECG) that might predict VF in the short term. For example, ST segment changes might imply coronary artery occlusion with myocardial ischemia, increasing VF risk. However, in most cases, VF occurs abruptly without any early warning. Machine learning is used to extract information usually neglected by the human brain. In deep learning, a cascade of multiple layers of processing is used to extract features. Machine learning is used to classify different types and outcomes of cardiac arrhythmias that are difficult to recognize directly. In this study, we developed a new deep learning method to predict the onset of VF. ECG from MIT-BIH databases were used as the training and validation data sets; the prediction results showed that the proposed two-dimensional short-time Fourier transform (2D STFT)/continuous wavelet transform (CWT) convolutional neural network (CNN) model can reach a recall of 99% and an accuracy of 97%. We also compared the proposed 2D model with 1D and 2D time-domain CNN models. The results showed that the 1D CNN and 2D time-domain models can achieve an accuracy of 60.5% and 56%, respectively.


I. INTRODUCTION
Ventricular fibrillation (VF) is one of the most common cardiac arrhythmias in patients with sudden cardiac death (SCD) [1]. Rapid, chaotic electrical activity results in the failure of synchronized ventricle contraction, leading to the immediate loss of cardiac output. Electrical defibrillation remains the only method to terminate VF. Prolonged VF results in a decreasing waveform amplitude, with progression from initial coarse VF to fine VF, and VF ultimately degenerates into asystole due to the progressive depletion of myocardial energy stores, leading to myocardial cell death. Thus, the probability of successful defibrillation decreases after a prolonged duration of VF. The chance of survival to hospital discharge gradually decreases after VF without The associate editor coordinating the review of this manuscript and approving it for publication was Mohammad Zia Ur Rahman . prompt defibrillation, and it can be lower than 15% if defibrillation is given after 6 minutes from the onset of fatal ventricular arrhythmia. Extensive efforts have been made to shorten the response time for delivering defibrillation, such as public access defibrillation (PAD) programs in communities and rapid response teams (RRTs) in hospital settings. However, reasons such as the inability to monitor VF and the time taken to retrieve the defibrillator still make immediate defibrillation impossible for most patients. Currently, if patients survive the first VF event, they may be candidates for receiving implantable cardioverter defibrillators (ICDs), which is the only means to deliver defibrillation immediately outside hospitals to date.
Although the detailed mechanism of VF is not fully understood, structural heart diseases, such as myocardial ischemia, cardiomyopathies, and some congenital heart diseases, are believed to be the major causes of VF. Acute myocardial infarction is the most common cause of VF. In most cases, hypoxic myocardial cells become hyperirritable, normal membrane potential is lost, and VF finally occurs. Channelopathies, such as acquired or congenital long QT syndrome, may also cause VF. Occasionally, prolonged monomorphic ventricular tachycardia (VT) may precede VF and may result in VF by causing myocardial ischemia, free radical production, and intracellular calcium release. In myocardial ischemia-related VF, clinicians have noticed that VF may be preceded by unique electrocardiographic (ECG) findings, such as multiple premature ventricular contractions (PVCs), ST segment changes, R on T phenomenon, pauses, QT prolongation, VT, supraventricular arrhythmias, and even the very common sinus tachycardia. However, none of these findings is specific enough to predict the development of VF.
Conventional machine learning methods, such as support vector machine (SVM) [2] or multilayer perceptron (MLP) neural network [3], have been used to predict life-threatening cardiac arrhythmias. These methods utilize certain explicitly recognizable features (such as the RR interval) [2] derived from standard clinical ECG analysis. Some new features, such as frequency domain information, have also been used to train the model; however, these features are still assigned by researchers according to the current knowledge on signal analysis, and the richness of features would be limited. In this study, we developed a new deep learning method based on the convolutional neural network (CNN) to effectively predict the development of VF before its onset. Our method utilized short-time Fourier transform (STFT) and continuous wavelet transform (CWT) to introduce information on the frequency domain of the original signal. We applied the method for both the clinical settings of the emergency medical service/emergency room and the intensive care unit. According to the desired alert time before VF onset, we achieved 99% recall and 97% accuracy to predict the development of VF.

II. RELATED WORK
In classical cardiac electrophysiology, physicians have attempted to identify the patterns of arrhythmia that may lead to sudden ventricular arrhythmia. For example, frequent PVCs may precede ventricular arrhythmia development [4]. Researchers have also attempted to reveal the electrophysiological nature of ventricular arrhythmias [5]. Recently, new markers on electrocardiographs, such as periodic repolarization dynamics (PRD) [6] and the T peak -T end interval (predicting reperfusion VF in STEMI patients) [7], have been discovered to predict SCD [6]. However, the exact mechanism through which these arrhythmias are related to fatal ventricular arrhythmia is still unclear.
In the past few years, researchers have attempted to solve the same problem from another aspect. Computer analysis has been used to extract hidden information in electrocardiograph signals. Heart rate variability (HRV) of ECG signals, which is widely used to analyze many other arrhythmias, is very commonly used in related studies [8]- [11]; HRV is accepted as a predictor of severe ventricular arrhythmia and SCD [12]. Machine learning algorithms, such as SVM or MLP neural network, are usually applied to classify prediction results. For example, Ebrahimzadeh et al. [13] extracted 24 features from ECG recordings in the Sudden Cardiac Death Holter database of the MIT-BIH database [3], including linear timedomain features (such as the mean, standard deviation, and square root of the mean squared difference of all RR intervals), linear frequency-domain features (such as the ratio of power spectral densities of low-and high-frequency bands), time-frequency-domain features (such as energy extracted in different time-frequency regions), and nonlinear features (such as parameters extracted through detrended fluctuation analysis [DFA]). The MLP neural network was then used to predict SCD episodes. They reported a high accuracy of predictions at 1 (99.73%) and 2 (96.52%) minutes before the episodes, but the 4-minute accuracy decreased to 83.93%; thus, the prediction model is clinically useful in very limited hospital settings, such as intensive care units. Fujita et al. [2] achieved a high accuracy for 4-minute prediction of SCD by extracting more nonlinear features, such as Renyi entropy, Fuzzy entropy, Hjorth's parameters (activity, mobility, and complexity), Tsallis entropy, and energy features of discrete wavelet transform (DWT) coefficients; these features were then input into classifiers, such as the k-nearest neighbor (KNN), decision tree (DT), and SVM.
Lee et al. [14] developed an early-prediction model that can predict VT (another type of fatal ventricular tachyarrhythmia) 1 hour before its onset. They used 14 parameters obtained from HRV and respiratory rate variability (RRV) analyses of signals obtained from cardiovascular intensive care unit patients at a medical center, and the prediction model was created using the artificial neural network. They achieved a sensitivity of 0.88, a specificity of 0.82, and an AUC of 0.93. Au-Yeung et al. [15] also extracted features from HRV data obtained from the analysis of ECG signals from 788 patients enrolled in the ICD arm of the Sudden Cardiac Death-Heart Failure Trial (SCD-HeFT) [16]. These features included principal components identified by principal component analysis (PCA), mean N-N interval, Hjorth complexity and Hjorth mobility, short-term and long-term fractal scaling exponents yielded by DFA, and frequency-domain features extracted using the Lomb periodogram algorithm. Machine learning algorithms, including random forest (RF) and SVM, were trained on these features to predict the occurrence of ventricular tachyarrhythmia. Both RF and SVM methods achieved a mean AUC of 0.81 and 0.87-0.88 for 5-minute and 10-second prediction, respectively.
These recent approaches exhibit a high accuracy. However, most of them started with identifying as many features as possible (e.g., HRV and RR interval), which limited the amount of information that could be extracted from original signals. The accuracy can be further improved if information neglected by humans is used to build the prediction model. Moreover, due to the rarity of ventricular tachyarrhythmia in VOLUME 8, 2020 clinical settings, statistical power is limited when only dozens of segments are used in the analysis.
Amezquita-Sanchez et al. [17] also developed a new methodology to predict SCD based on ECG signals by employing the wavelet packet transform (WPT), homogeneity index (HI), and enhanced probabilistic neural network classification algorithm. Compared with other methodologies, HI achieved a higher accuracy as a single nonlinear feature; thus, lower computational resources would be required. Segments of 60-second ECG signals were extracted 1 to 20 minutes prior to the VF event and were compared with signals from healthy individuals. Using 20 cases from the MIT-BIH SCD Holter database [18], they were able to predict the risk of a VF event up to 20 minutes prior to the onset with a high accuracy of 95.8%. However, signal segments from the same patient during a relatively stable time and prior to the event were not compared.
Similar to the prediction of ventricular arrhythmias, various feature extraction methods combining different classifiers have been used to detect ventricular arrhythmia episodes in SCD patients. Variational mode decomposition (VMD) with the RF classifier was used by Tripathy et al. to detect shockable ventricular arrhythmias, and the accuracy, sensitivity, and specificity of this method were 97.23%, 96.54%, and 97.97%, respectively [19]. The least-squares support vector machine (LS-SVM) classifier was later used in a similar study [20]. ECG signals were first decomposed with digital Taylor-Fourier transform (DTFT) and then introduced into LS-SVM. The accuracy, sensitivity, and specificity of the proposed method for the classification of non-VF and VF episodes were 89.81%, 86.38%, and 93.97%, respectively. In recent years, convolutional neural network (CNN)-based approaches have been used to detect ventricular arrhythmias by using ECG signals [21]. Panda et al. [22] decomposed ECG signals by using fixed frequency range empirical wavelet transform (FFREWT) into various modes and input them into a novel deep CNN to detect shockable ventricular cardiac arrhythmias. The proposed approach achieved an accuracy of 99.036%, 99.800%, and 81.250% for the classification of shockable versus nonshockable, VF versus non-VF, and VT versus VF, respectively. Deep CNN showed higher classification performance.

A. WORKFLOW
The workflow of the methods proposed in this article is illustrated in Fig. 1. First, segments of ECG signals were extracted according to the clinical setting. Second, the ECG segments underwent different preprocessing methods. Then, the processed data were input to the CNN we designed. Finally, the performance of different methods was evaluated.

B. DATA ACQUISITION AND SIGNAL EXTRACTION
The records of VF signals from Creighton University Ventricular Tachyarrhythmia Database (CUDB) [3, 23] were used in this study. CUDB contains 35 records of 8-minute singlechannel ECG data. VF occurrences are annotated in each record [3].
VF-predicting and non-VF-predicting rhythms were extracted from the records in the selected databases (as described below) into segments of the length of an assigned analysis window. The starting point of each VF episode was annotated by a specialist in the database. A clinically rational predefined range of warning period (alert interval) was determined by an experienced emergency physician. In this study, we considered two types of settings with different warning period definitions.
Setting 1: In the first setting, VF-predicting and non-VF-predicting segments were extracted from different ECG signals. For VF ECG signals, we defined the minimum alert interval before VF occurrence and calculated the latest alert point as the VF occurrence time subtracted by the minimum alert interval. Then, a VF-predicting segment was extracted by choosing the ending point as a random point before the latest alert point. For each VF episode, a number of VF-predicting segments were extracted. Non-VF-predicting segments were extracted from another database with normal ECG, that is, the MIT-BIH Normal Sinus Rhythm Database [3]. The method is illustrated in Fig. 2. As shown in Fig. 2, before VF occurrence, the ECG signal is normal, and after VF, the ECG signal becomes noisy. The straight line denotes the latest alert point (before VF occurrence), and the horizontal bars (red) show two extracted segments for VF prediction. ECG signals from 35 patients in CUDB and 46 patients in the MIT-BIH Normal Sinus Rhythm Database were used. Of these 81 patients, the data of 65 patients were chosen randomly as the training data set, and the data of the remaining 16 patients were included in the validation data set.
Setting 2: In the second setting, both VF-predicting and non-VF-predicting segments were extracted from VF ECG signals. We defined the minimum and maximum alert intervals, which denote the latest and earliest alert points, respectively. For each VF-predicting segment, the ending point was chosen randomly between the minimum and maximum alert intervals. For each VF episode, a number of VF-predicting segments were extracted. Non-VF-predicting segments were extracted from the same signal, but the ending points were chosen as random points before the earliest alert point. Fig. 3 illustrates the method. As shown in Fig. 3, the two straight lines represent the earliest and latest alert points (before VF occurrence). The horizontal bars (red) between the two straight lines show two extracted segments for VF prediction, and the horizontal bars (green) before the earliest alert point show extracted non-VF-predicting segments. All the ECG signals were obtained from the 35 patients in CUDB. Moreover, the data of 28 of these patients were chosen randomly as the training data set, and the data of the remaining seven patients were included in the validation data set.
By randomly choosing the ending points, we could extract multiple data segments for each VF occurrence. Thus, sufficient data segments could be generated for the training and validation of the proposed methods with a limited number of VF cases, and this shows an important characteristic of ''prediction'': The event to be predicted may occur within an interval but not a specific point. The settings in this study were designed considering clinical needs. In Setting 1, the ECG signal segments of a predefined length at least 1, 2, 3, 4, or 5 minutes (minimum alert interval) prior to VF episodes from CUDB were compared with random signal segments of normal cardiac rhythm from the MIT-BIH Normal Sinus Rhythm Database. The maximum alert interval is 8 minutes, which is the length of each ECG signal in CUDB. The onset points of each VF episode were confirmed by a senior emergency physician. This prediction system is useful in overcrowded emergency departments (EDs), where patients may be placed in the hallway, and some time may be required for receiving emergency care from the physician. It can also be applied to out-of-hospital settings, such as for sports activities and home health monitoring, where emergency defibrillation cannot be performed immediately. This minimum alert interval increases the probability of the patient being rescued earlier, leading to a higher survival probability.
Setting 2 was designed for intensive care units. Different ECG signal segments of critically ill patients in the same database (CUDB) were compared. ECG segments of a predefined length were acquired randomly within 20, 30, 40, 50, and 60 seconds (but at least 10 seconds) prior to VF onset. Other segments of the same length were acquired from ECG signals in the same database, but the aforementioned intervals were excluded. Critically ill patients have a high risk of developing VF at any time, and frequent warnings may cause alarm fatigue in medical staff. Because treatment with manual defibrillators and care from medical staff can be rapidly provided to patients, more precise prediction of the development of VF just before its onset is desired.

C. ECG SIGNAL PREPROCESSING
The sampling frequency of the ECG data was 250 Hz. A high-pass filter with a cutoff frequency of 1 Hz was applied to remove baseline wandering. The ECG signal segments were transformed into spectral images through the following methods. In the first method, we explored the frequency-domain characteristics of the ECG signal. Fourier transform is an important signal analysis tool that provides frequency spectrum and phase measurements. Moreover, in the study of biomedical signals, frequency content variations are important. In this study, we applied STFT, which introduces information on frequency changes in spectral responses with respect to time.
The STFT operation is briefly described as follows: A window is considered from the starting point of the signal, and discrete Fourier transform (DFT) is applied to the window. Then, the window moves, with a given overlap with the previous window location, and DFT is performed again to obtain the frequency components of the second time interval. The operation is repeated until the end of the signal. Thus, a (sampled) spectrum with both frequency and time information is obtained. Regarding the window length, choosing a narrow window might result in a poor resolution at low frequencies, whereas using a wide window produces a poor time resolution at high frequencies. The fact that we cannot find a time-frequency representation with perfect accuracy in both time and frequency domains can be interpreted as a type of uncertainty principle. The STFT operation can be formulated as follows: where x[n] and w[n] denote the signal and window sequences, respectively, and ω is the frequency sampling points, as most typical applications of STFT are performed on a computer using fast Fourier transform (FFT). We adopt a Hamming window w[n] with length M = 256, which is defined as Compared with standard Fourier transform, STFT provides time-localized frequency information for situations in which the frequency components of a signal vary over time, such as ECG signals. The unique property of STFT is that the instantaneous frequency as well as the instantaneous amplitude of localized waves with time-varying characteristics can be explored; thus, STFT is useful in other types of ECG analysis, such as in arrhythmia classification [24]. The dimension of STFT output is determined by the 1D signal length, FFT size, and time-domain shift. In this article, we consider the ECG signal as 1500 samples, FFT size as 256 points, and timedomain shift as one-fourth of the FFT size (i.e., 64 points) in STFT calculation. The resulting 2D signal has the dimensions of (129, 20).

2) TWO-DIMENSIONAL FREQUENCY-DOMAIN SIGNAL WITH CWT (2DCWT)
In the second method, we transform ECG signals into different spectrograms by using continuous Fourier transform (CWT). In contrast to STFT, CWT allows arbitrarily high localization in the time of high-frequency signal features. Different localized waveforms can also be employed as the analyzing function in CWT. The wavelet transform of a continuous time signal, x(t), is defined as where ψ * (t) is the complex conjugate of the analyzing wavelet function ψ(t), a is the dilation (scaling) parameter of the wavelet, and b is the location parameter of the wavelet. CWT is based on different wavelets. In this article, we consider the Morlet wavelet, which is one of the most commonly used wavelets in ECG signal analysis [25], as its shape resembles that of ECG signals. The (real) Morlet wavelet used in this article is defined as follows: In the CWT preprocessing method, we consider the ECG signal with a length of 250 samples and scale (a) from 1 to 32. The output is subsampled by a factor of 2; thus, the CWT-preprocessed 2D signal has the dimensions of (125, 32).

D. DEEP LEARNING STRUCTURE
We designed a CNN for the deep learning structure. The CNN structure is shown in Fig. 4. The learning structure is composed of the following key enabling components.

1) TWO CONVOLUTION LAYERS WITH ACTIVATION AND POOLING
Together with the activation and pooling layers, the convolution layers perform linear combinations of the input, as explained below. Conv2D: The convolutional layer performs sliding window convolution to a 2-dimentional input (ECG spectrum images). Nearby points in time and frequency domains are correlated. The parameters are learned from learning progress. One convolution layer consists of multiple channels, and each channel extracts a feature. Max-pooling is applied after convolution to reduce the size of the input to the next step.
ReLU: Rectified Linear Unit performs linear rectification activation and outputs nonlinear results.
Maxpooling2D: The maximum pooling layer considers the largest value in each 2D region of the input signal.
Taking the STFT method as an example, we now explain dimension changes at each layer. The input of the STFT method has the dimensions of (129, 20). After the first Conv2D operation with a kernel size of (3, 3), 32 channels, and max-pooling of (2, 2), the output from the first hidden layer has the dimensions of (64, 9,32). The second hidden layer has a similar structure but with 64 channels, and the output after activation and max-pooling has the dimensions of (31, 4, 64).

2) FLATTENING AND FULLY CONNECTED LAYERS
A flattening layer is introduced after the second convolution and pooling stage, and it is used to flatten the multidimensional input to a 1D output. In our STFT example, it transforms the (31, 4, 64) output from the second hidden layer into an array with a size of 7936. Then, two fully connected layers with the output sizes of 120 and 84 are applied.

3) OUTPUT LAYER
The output from the fully connected layer is input into a softmax activation layer, and we determine whether VF is predicted.

IV. EXPERIMENT EVALUATION A. EVALUATION METRICS
The performance of the learning scheme was evaluated based on recall and accuracy. Recall (or more commonly called sensitivity in clinical studies) is defined as the ratio of true positive (TP) to the sum of TP and false negative (FN).
Accuracy is the ratio between the number of correctly classified samples and the number of whole test samples. Its mathematical expression is as follows: Recall would be lower if VF actually occurs, but the learning-based predictor failed to predict its occurrence. Compared with the commonly used criterion of accuracy (defined as the ratio of TP and true negative to all events), recall emphasizes the seriousness when the predictor does not provide an alert regarding the occurrence of a fatal event, and the accuracy can be high when the event is rare, and the predictor simply predicts that all the results are negative (true negative). In clinical applications, high recall is important because VF events cannot be missed.

B. RESULTS
The recall and accuracy of different methods are shown in Table 1. In Setting 1, we adjusted the minimum alert time before VF occurrence from 1 minute to 5 minutes. In Setting 2, the minimum alert time was fixed as 10 seconds, and the maximum alert time was adjusted from 20 seconds to 60 seconds.
From Setting 1, we observed that both 2D frequency domain methods had high recall. STFT preprocessing slightly outperformed CWT.
We observed that performance was slightly weaker in Setting 2 than in Setting 1, perhaps because both the VF and non-VF input segments were extracted from patients who developed VF eventually. Nonetheless, recall above 76% when the alert time was 10 seconds to 60 seconds can still warn medical staff about a patient's need for emergent management. Preprocessing with CWT did not show a higher accuracy compared with STFT. However, it still showed high recall.

C. ACCURACY VERSUS EPOCHS
The accuracy value curves are presented in Figs. 5-8. The batch size was fixed to 128. VOLUME 8, 2020

D. COMPARISON WITH 1D CNN AND 2D TIME-DOMAIN CNN
In addition to the two methods based on 2D transform (i.e., STFT and CWT), we also considered two time-domain methods:

1) ONE-DIMENSIONAL SIGNAL (1D TIME)
The 1D ECG signal of length L samples is considered as input to the proposed learning scheme. The 2D convolutional layers in the proposed learning scheme are replaced with 1D layers, and the dimensions of each layer are modified accordingly. In Setting 1, this 1D CNN model achieved an accuracy of 60.5% with minimal alert time 2 minutes before the onset of VF. Recall was 20.2%. In Setting 2, the model achieved an accuracy of 56% with the maximum alert time 50 seconds before VF onset. Recall was 63.5%.

2) TWO-DIMENSIONAL SIGNAL WITH TIME-DOMAIN COMPONENTS (2D TIME)
The 1D signal of length L samples is now piled into the 2D signal with the width of W samples and the height of H samples, where L = W × H. The input signal is still in the time domain; such a simple rearrangement allows interaction not only between nearby samples but also between farther samples. The architecture of the learning scheme is the same, but the dimensions are modified accordingly. In Setting 1, this 2D time-domain CNN model achieved an accuracy of 56% with minimal alert time 4 minutes before the onset of VF. Recall was 10.1%. In Setting 2, the model achieved an accuracy of 47.1% with maximum alert time 20 seconds before VF onset. Recall was 50.4%.

V. REAL CLINICAL VALIDATION
Although most VF episodes occur outside the hospital and result in SCD, they occasionally occur in the emergency room. Here we present two VF cases in the emergency room of a medical center in northern Taiwan. Our model was validated with the initial ECG signals of these patients.
A 65-year-old woman without cardiac history, DM, hypertension, or dyslipidemia visited ER in the evening with the complaint of vomiting and dyspnea. The initial ECG (Fig. 9) showed hyperacute T waves on precordial leads,  which could by easily missed by inexperienced clinicians. A defibrillator/monitor was attached to the patient immediately, and VF developed several minutes later (cardiac arrest). The patient received manual defibrillation immediately and returned to sinus rhythm. The patient was alive again even without cardiopulmonary resuscitation (CPR). Primary percutaneous coronary intervention (PCI) was conducted and showed left ascending coronary artery (LAD) stenosis. A stent was placed, and the patient was discharged from the hospital 6 days later under a stable condition. We input the lead II segment into the model in Setting 1, and the model predicted the VF episode successfully (minimal alert time: 5 minutes).
Another 50-year-old man without significant medical history visited ER at around midnight after experiencing left chest pain for 20 minutes. The initial ECG (Fig. 10) was taken immediately at triage as a part of the ER acute coronary syndrome (ACS) protocol and showed only slight ST depression in the precordial lead. The patient collapsed while the emergency physician was taking history (several minutes after the ECG was taken), and VF was identified after a defibrillator/monitor was attached. The patient immediately received manual defibrillation and returned to sinus rhythm. Primary PCI was conducted and showed 100% stenosis at the proximal segment of LAD and 70% stenosis at the obtuse marginal branch of the left circumflex coronary artery (LCx.) A stent was placed in the LAD, and the patient was discharged from the hospital 3 days later under a stable condition. We input the lead II segment into the model in Setting 1, and the model also predicted the VF episode successfully (minimal alert time: 5 minutes).

A. CLINICAL ASPECTS
In this study, our proposed method could identify patients who may develop fatal VF in a short time, with higher recall (sensitivity) and accuracy than previous methods with SVM.
In clinical practice, VF has a poor prognosis. No tool exists for predicting the development of VF to date. Patients with a high risk of fatal events, such as acute myocardial infarction, are admitted to the ICU, so that defibrillation can be conducted as early as possible if VF develops. Similar to the findings of all conventional ECG analyses, some arrhythmias are thought to precede VF. The ''R-on-T phenomenon,'' which denotes PVCs that begin at or near the apex of the T wave (termed the vulnerable period, during which the energy threshold for VF is reduced) [4], [26]- [28], is thought to be associated with VF. Coupled PVCs are also related to fatal ventricular arrhythmias. Congenital cardiac sodium channel disease, such as Brugada syndrome, also exhibits a specific ECG abnormality and is associated with a high incidence of fatal ventricular arrhythmia in patients with structurally normal hearts. However, it is still very difficult to predict the development of VF in the near future. Thus, implantable cardioverter-defibrillator (ICD) remains the definite treatment for these patients.
Our approach was motivated by the current clinical prediction methods. If some types of PVCs or some waveform abnormalities (such as the coved ST segment in V1-V3 that leads to Brugada syndrome) are associated with VF, there should be more information in the waveform that can be discovered. Ideally, meaningful features can be extracted to predict the development of VF.
Our study has some potential limitations. First, even though the database contains long ECG recordings of many patients, VF episodes are still quite rare. To solve this problem, we extracted multiple segments prior to each VF episode. Further data collection requires multicenter collaboration. Second, all patients were critically ill and treated at intensive care units. Concomitant diseases could not be clearly identified from the database, and their influence on ECG signals was unclear. Patients could not be divided into different groups. However, with the deployment of cloud-based monitoring and medical record systems in hospitals, more data will be available for the future improvement of the learning accuracy. Third, out-of-hospital SCD is more fatal, unpredictable, and difficult to rescue on time. VF episodes account for mortality in a large proportion of these patients. Although no ECG database is available, wearable devices that can sense and record cardiac signals may help collect a large amount of data in a short time. VOLUME 8, 2020

B. LEARNING MECHANISM
Our experiments showed significant improvement in terms of recall when the frequency-domain method was applied. Our interpretation is as follows: 1) In the time domain, the ECG signal several seconds before VF occurrence can be quite different even if the alert interval is fixed. 2) In practice, a predicting mechanism with a fixed alert time length cannot be adopted. When the alert interval varies, the time-domain signal becomes more untraceable. 3) In the frequency domain, because any signal can be decomposed into frequency components, specific frequency components may appear before VF development. Even with different alert intervals, similar frequency components may be captured. 4) The signal is time variant. By constructing a 2D preprocessed signal using STFT, the input to CNN carries information for the both time and frequency domains. Then, the 2D convolution operation correlates the frequency components at similar time intervals, the amplitude of the same frequency component at different times, and different frequency components at different times. Thus, CNN learns the variation of frequency components and uses it to predict the occurrence of VF.
The advantages and contributions of our work are as follows: 1) Two clinically meaningful settings were examined.
Setting 1 showed promising results for predicting the onset of VF, enabling prompt defibrillation. Setting 2 showed that it is possible to predict the VF precisely right before its onset. 2) Real clinical cases were included for external validation. The results are encouraging, and we believe that further training will improve the accuracy of prediction. 3) Low computation cost. The proposed learning scheme has approximately 136,000 parameters in total. One reason for the low complexity is our preprocessing methods. With proper settings (e.g., the time-domain shift in the STFT method or the number of scales in the CWT method), both methods generate 2D input with reasonable dimensions. In addition, we did not use a very deep learning scheme; thus, complexity was low. Even so, we still obtained a high accuracy. This relatively simple structure might be an advantage when developing wearable device applications for real-time VF prediction in the future.

VII. CONCLUSION
VF is one of the main causes of SCD. With timely resuscitation, the patient's life can be saved. In this study, a new methodology based on Fourier transform and CNN was proposed for VF prediction by using ECG signals. Two clinical settings of emergency and intensive care units were evaluated. In Setting 1, we utilized ECG signal segments from at least 1 to 5 minutes prior to the onset of VF. This allows physicians in ED and emergency responders in out-of-hospital settings to administer the defibrillator to the patient and deliver life-saving cardioversion on time. In Setting 2, ICU staff can be prepared for the VF episode that may occur 20 to 60 seconds later in critically ill patients. We also validated our model with two patients in the ER, and VF development was successfully predicted in both cases.

VIII. FUTURE WORK
Although VF episodes were successfully predicted in real clinical cases in this study, bigger data sets should be used to build a more accurate model. Future work should acquire additional data from conventional and portable hospital vital sign monitors and wearable devices. The applications of our prediction model include medical settings and ambulatory health monitoring. In medical settings, it can be incorporated into the warning system of current cardiac monitors, including wired or wireless monitors. In the ambulance and the emergency room, medical staff can be alarmed few minutes before the onset of VF and can be prepared for manual defibrillation and resuscitation. In the ICU or coronary care center (CCU), for patients admitted with suspected myocardial ischemia (VF can develop anytime), medical staff can have more accurate prediction of VF less than 1 minute before its onset.
For ambulatory and home health monitoring, the current applications of other ambulatory arrhythmia monitoring systems should be considered. Accurate atrial fibrillation detection by transmitting ECG signals from a wireless patch device to a server, with processing using recurrent neural networks (RNNs), has been proven to be effective in ambulatory patients [29]. 1D CNN with the active learning model was also used to classify ECG signals acquired from textile wearable device [30]. Kohonen Self-Organizing Map (KSOM) was used to recognize the pathologic QRS complex from features extracted in the frequency domain of ECG signals acquired from a wearable system [31]. Researchers have also proposed arrhythmia classification methods with the KNN classifier for ECG signals acquired from commercial 3-lead devices, with a high accuracy [32]. Smartwatches with sensors and software are also available currently to provide information to identify cardiac arrhythmias [33], [34]. These over-the-counter devices may also help predict fatal arrhythmias in the general population. The computational power of these devices should be evaluated in future work. Transmitting signals to more powerful mobile devices (such as smartphones) for computation is an alternative. Recently, researchers have also tried to develop energy-efficient processors to detect cardiovascular diseases on smartphones [35]. A method to reduce the power and cycle requirement for FFT of ECG signals through low-level arithmetic optimizations was also proposed [36]. These technologies may help develop devices to predict fatal arrhythmia and may improve the chance of survival among patients with sudden cardiac arrest.