Generalization of Convolutional Neural Networks for ECG Classification Using Generative Adversarial Networks

Electrocardiograms (ECGs) play a vital role in the clinical diagnosis of heart diseases. An ECG record of the heart signal over time can be used to discover numerous arrhythmias. Our work is based on 15 different classes from the MIT-BIH arrhythmia dataset. But the MIT-BIH dataset is strongly imbalanced, which impairs the accuracy of deep learning models. We propose a novel data-augmentation technique using generative adversarial networks (GANs) to restore the balance of the dataset. Two deep learning approaches—an end-to-end approach and a two-stage hierarchical approach—based on deep convolutional neural networks (CNNs) are used to eliminate hand-engineering features by combining feature extraction, feature reduction, and classification into a single learning method. Results show that augmenting the original imbalanced dataset with generated heartbeats by using the proposed techniques more effectively improves the performance of ECG classification than using the same techniques trained only with the original dataset. Furthermore, we demonstrate that augmenting the heartbeats using GANs outperforms other common data augmentation techniques. Our experiments with these techniques achieved overall accuracy above 98.0%, precision above 90.0%, specificity above 97.4%, and sensitivity above 97.7% after the dataset had been balanced using GANs, results that outperform several other ECG classification methods.


I. INTRODUCTION
An ECG is a standard tool for measuring the electrical activity of the heart and for diagnosing cardiac arrhythmias. Using an ECG involves placing electrodes on the surface of the body-such as the chest, neck, and arms-in order to detect electrical changes in the heart. An ECG record primarily consists of several distinctive wave forms, such as the P wave, the QRS complex, the T wave, and other wave forms. The P wave shows atrial contractions; the QRS complex shows ventricular contractions; the T wave shows the electrical activity produced as the ventricles are recharged for the next contraction [1]. Study of these complex waves and the cardiac activities they represent is vital for diagnosis of various arrhythmias [2]. It is difficult for a cardiologist to correctly analyze a large number of ECG records given their complexity and the amount of time required to analyze them [3].
The associate editor coordinating the review of this manuscript and approving it for publication was Yizhang Jiang .
Yet life-threatening arrhythmias need to be detected early and accurately [4].
Arrhythmias can be grouped into two main categories, life-threating and non-life threating. Life-threatening arrhythmias such as tachycardia and ventricular fibrillation cause heart attacks and sudden death [5], [6]. Non-life-threatening arrhythmias, which is our interest in this study, require attention in order to prevent deterioration of the heart muscle [3]. The category of the arrhythmia can be determined by recognizing the classes of consecutive heartbeats [7]. Manual beat-by-beat classification can be very time-consuming and too difficult in many scenarios. It is crucial to automate ECG analysis so that cardiac disorders can be discovered and treated as quickly as possible in clinical situations where speed in providing medical aid is essential.
Medical datasets like the MIT-BIH arrhythmia dataset are often very limited. They usually have data imbalance problem; they over-represent normal classes and common diseases and only sparsely represent rare diseases. Collecting medical data is a complex and expensive process that requires the collaboration of cardiologists and researchers [8]. Imbalances in the datasets can make training the models-especially deep learning models-technically challenging, and the models will tend to be biased in favor of classes that contain large number of samples. The classification methods used in most studies tend not to perform well on minor classes because their aim is to optimize overall accuracy without giving appropriate consideration to the relative accuracy of each class [9]. Yet the cost of misclassifying minor classes in medical datasets is often much greater than the cost of misclassifying major classes, since the arrhythmias of highrisk patients usually fall in the minor classes of arrhythmia disease. The need for a good data augmentation technique for medical datasets is thus an urgent one.
One way to overcome the imbalance is to artificially create synthetic data by modifying the original training set using such classical data augmentation methods as translation, flipping, and rotation, which have become an essential step in computer vision tasks [10]. Although these modifications may result in marginal gains in diversity, they may also disrupt relevant orientation-related features, especially in such one-dimensional signals as those of an ECG.
The goal of generative models, the most promising models for data synthesis, is to learn the distribution of the training set and then to generate new samples from the learned distribution. In this paper, a specific kind of generative model called a Generative Adversarial Network (GAN), which has proved its effectiveness in synthesizing high-quality images in several domains [11], is used to generate synthetic heartbeats and thereby restore the balance of each class of the MIT-BIH dataset. The model generates realistic heartbeats that are very similar to actual heartbeats. Recently, GANs have been applied to balance a variety of medical imaging datasets, including generating MRI slices [12], retinal images [13], data for chest pathology [14], and data for bone lesion pathology [15]. To the best of our knowledge, this is the first attempt to apply the GANs to one-dimensional medical data.
In this paper, we propose a novel data augmentation technique based on the combination of real and synthetic heartbeats using GAN to improve the classification of ECG heartbeats of 15 different classes from the MIT-BIH arrhythmia dataset. In addition, two approaches based on CNN are proposed. The first approach (an end-to-end approach) classifies the heartbeats as one of the 15 classes in a direct way. The second approach (a two-stage hierarchical approach) classifies the heartbeats under one of the five main categories in the first stage, and each heartbeat is classified into one of the classes that belongs to that category in the second stage. To show the superiority of the GAN, we compared the results of the end-to-end approach after the dataset had been balanced using GAN and using other common data augmentation techniques. The proposed approaches are applied to lead 1 only from the MIT-BIH dataset to reduce the computational time. The contributions of this study are as follows: 1) the synthesis of high-quality heartbeats using GAN and 2) design of two deep-CNN approaches with superior performance compared with other studies.
The remainder of this paper is organized as follows: The related work is reviewed in section II. The proposed technique and methodologies are discussed in section III. The experimental results are presented in section IV. Finally, the conclusion and the future work are provided in section V.

II. RELATED WORK
The traditional approach to ECG classification is to develop an algorithm to extract the important features from the input signal and then choose an appropriate classifier to be used in the classification stage. The conventional approaches in most studies involve preprocessing, feature extraction, feature reduction, and classification. Many researchers in the literature have conducted studies on using such an approach for ECG classification [4], [31]- [44]. In these studies, the researchers considered different classes and utilized several methods in the feature extraction stage, such as independent component analysis (ICA), discrete wavelet transform (DWT), discrete cosine transform (DCT), principal component analysis (PCA), Gaussian mixture models (GMMs), higher order spectra (HOS), and the one-dimensional hexadecimal local pattern (1D-HLP). In the classification stage, they utilized several algorithms, including FNN, PNN, SVM, and BNN algorithms.
Although the studies described above achieved acceptable ECG classification performance, they have some disadvantages. For instance, the conventional approaches require developing a feature extractor and then reducing the extracted features to a set of optimal features that can be fed into an appropriate classifier. Computer-aided design (CAD) models developed using the above workflow show low performance when validated on a separate dataset and often suffer from overfitting [45]. Deep learning approaches have the capability of learning the most important features automatically from the input signals. Hence, the essential steps that are required in the traditional approaches, namely, feature extraction, feature reduction, and classification, can be developed with no need to be explicitly defined in the deep learning approaches. Recently, studies have applied several deep learning approaches for ECG classification. VOLUME 8, 2020 Zhang et al. [27] proposed a CNN model consisting of six layers, comprising two convolutional layers, two pooling layers, and two fully connected layers. The model classifies five classes of the MIT-BIH dataset (Normal, Atrial Premature Contraction, Ventricular Premature Contraction, Right and Left Bundle Branch Blocks), and an overall accuracy of 97.50% is achieved. In [29], the authors explored the use of a DWT layer with bidirectional LSTM for ECG classification of five types of heartbeats obtained from the MIT-BIH arrhythmia dataset; they achieved an overall accuracy of 99.39%.
Acharya et al. [28] proposed a nine-layer CNN to classify five different categories of the MIT-BIH arrhythmia dataset. To overcome the imbalance in the number of heartbeats in the five (N, S, V, F, Q) categories, they generated synthetic heartbeats by varying the mean and standard deviation of the Z-score that was calculated from the original data. The number of heartbeats of the remaining types are increased to match the number of heartbeats of the N category. The researchers achieved an overall accuracy of 94.03% using the augmented data, and the overall accuracy was reduced to 89.07% when the model was trained only with the original data. The generated heartbeats are used in the training and testing phases. Tuncer et al. [31] proposed the use of DWT coupled with the 1D-HLP technique for automated arrhythmia detection. Ten-second segments of 17 ECG classes from the MIT-BIH dataset were considered and an overall accuracy of 95.0% was obtained using a 1 nearest neighborhood (1NN) classifier.
In [46], the researchers utilized 1,000 ECG fragments from the MIT-BIH arrhythmia dataset from lead 1, where each fragment was 10 seconds long. A deep one-dimensional (1D)-CNN consisting of 16 layers was proposed to classify 15 different classes, and an accuracy of 92.51% was achieved, whereas an accuracy of 91.33% was obtained when they considered 17 classes. Oh et al. [30] proposed a combination of CNN and LSTM for diagnosing five classes with variable length segments from the MIT-BIH arrhythmia dataset. The architecture consisted of six convolutional and pooling layers, followed by an LSTM layer and two fully connected layers. LSTM was used to extract the temporal information from the feature maps resulting from the convolutional layers. An accuracy of 98.10% was obtained.
Pławiak and Acharya [47] proposed a deep ensemble of classifiers for ECG classification based on deep learning approaches and genetic algorithms; the researchers used 10-second ECG segments from 29 people in the MIT-BIH arrhythmia dataset. Hence, they did not utilize the whole dataset, and only 744 segments from 29 out of 48 records are considered. The achieved overall accuracy for the utilized segments of the dataset was 95.00%. In [48], the authors proposed a convolutional autoencoder-LSTM system to automatically recognize five different types of arrhythmia heartbeats. They utilized the autoencoder to compress a large amount of ECG signals with a minimum loss and then classify the compressed signals using LSTM network. Finally, the classification of the five classes from the MIT-BIH dataset achieved an accuracy of 99.21%.
Most studies have only considered overall accuracy. Yet, the overall accuracy as a parameter is not enough for measuring the robustness of the model because it is biased in favor of classes that contain large numbers of samples, while neglecting precision in the minor classes. Furthermore, the heartbeats of the minor classes need to be classified precisely because high-risk patients usually belong to these classes. In contrast, although [3], [43] considered the precision for the classes, they utilized data from leads 1 and 2, in addition to developing a fusion step to improve the results, which increased the computational time. Most studies also do not handle the imbalance problem in the MIT-BIH dataset, which negatively affects the achieved accuracy for classes with few heartbeats. In this study, we propose a novel approach to balance the classes of the utilized dataset using GAN, and we present two deep learning approaches to overcome the handcrafted methods of feature extraction and reduction in the literature. The approaches classify 15 different classes from the MIT-BIH dataset using data only from lead 1, and multiple evaluation methods are considered.

III. METHODOLOGY
The proposed methods for preprocessing, data augmentation, and classification are discussed in this section. A detailed description of each method is introduced in the subsections below.

A. PREPROCESSING
The raw ECG signals are preprocessed to eliminate noise and improve the classification accuracy. The noise is eliminated by removing both high and low frequencies from the acquired ECG signal. A Butterworth bandpass filter with a range of 0.5-40 Hz is applied to extract the most valuable information from the ECG signal [49]. After that, ECG records are segmented into multiple heartbeats using the R-peak locations associated with the dataset; each heartbeat contains a P wave, QRS complex, and T wave.
Fixed segmentation methods are usually applied due to the difficulty of detecting the beginning and ending of each heartbeat [3], [4]. Yet, it is not always reliable because such an assumption cannot consider the variability of the heart rate. Hence, a dynamic segmentation strategy is used to overcome the heart rate variability, as proposed in [20]. To be invariant to the variability of the heart rate, the number of samples before and after each R peak are calculated according to the duration between the current and previous R peak (RR previous interval), as well as the duration between the current and next R peak (RR next interval). Thereafter, the number of samples in the larger interval is divided into three thirds; the first third is considered before the R peak, while the other two-thirds are considered after the R peak. Finally, the amplitudes of the heartbeats are normalized between 0 and 1; each heartbeat is resized to have 300 samples to unify the number  of samples for each heartbeat. The result of the preprocessing stage is shown in Figure 1.

B. DATA AUGMENTATION USING GAN
GAN consists of two neural networks-a generator and discriminator-and each network competes against the other. The generator network learns to map a noise vector to the distribution of the data it wants to generate; the goal of the generator is to produce data samples similar to the samples in the original dataset. In contrast, the discriminator network receives data samples from either the generated (fake) samples or the original (real) samples, and it is responsible for determining whether the received samples are real or fake. Figure 2 describes the training process of the GAN.
Two main problems arise when the GAN is trained with samples from all classes. The first problem is as follows: If the generator network in the GAN is trained to fool the discriminator network by generating realistic heartbeats, it will focus on the generation of the dominant classes to optimize the loss function of the network while collapsing away the other modes of the minor classes; this is known as the mode collapse problem [11]. However, such a problem may be partially tackled by using some advanced techniques, as proposed in [50]. The second problem is incurred if the GAN is trained by using all heartbeats and somehow generates a diversity of fake samples; in such a case, the labels of the generated heartbeats cannot be determined precisely because some classes are highly similar. Hence, the GAN is trained using the heartbeats of each class independently to generate synthetic heartbeats to balance the training set for each class. The training of the GAN is terminated after the loss of the networks begins to saturate.
Synthetic heartbeats are generated after the segmentation stage. The generator network is trained on the segmented beats for each class, except the N class, because this is the dominant class. The number of samples in the other classes is increased to match the number of samples in the training set of the N class. Batch normalization [51] is used in the generator network to improve the performance and stability of the network and add diversity to the generated samples.
The generator network consists of four fully connected layers; it receives a vector of 100 random numbers sampled from standard uniform distribution as an input and outputs a heartbeat of a size of 300 × 1. The discriminator network consists of five fully connected layers; it takes a heartbeat as an input of size 300×1 and outputs a decision on whether the heartbeat is real or fake. According to the decision, the parameters of the networks will be tuned to minimize the loss of the  networks according to (1) and (2) using Adam optimizer [52]: where m is the number of samples per minibatch, D is the discriminator network, G is the generator network, x is the real samples, and z is a noise vector. Figure 3 shows the proposed architecture of the generator and discriminator networks. Two post-processing steps are applied after generating the heartbeats to improve the results. In the first step, Savitzky-Golay filter [53] is used to smooth the amplitude of the heartbeats for enhancing their quality. After that, the amplitude is normalized between 0 and 1 as the amplitude of the real heartbeats. In the second step, random sample consensus (RANSAC) is used for identifying and removing the outliers of the generated samples and ensuring that they come from the same distribution. Figure 4 shows samples for real and synthetic heartbeats of different classes from the MIT-BIH dataset.
Although there are slight differences between the synthesized and original heartbeats in Figure 4, this is the intention and goal of the GAN. In this study, the aim is to generate heartbeats that have the main features of the original ones, not to generate identical versions of them. The same idea is adopted when using any data augmentation technique: The original samples are modified slightly to obtain diversity in the training set.

C. CLASSIFICATION STAGE
We propose two approaches based on deep CNNs to classify 15 arrhythmias from the MIT-BIH dataset that are distinct from other recent classification approaches; no significant feature extraction of ECG data is needed to achieve strong 35596 VOLUME 8, 2020  classification performance. The first approach is an end-toend architecture that classifies the heartbeats in a direct way. The second approach is a two-stage hierarchical process that determines the category of the heartbeats in the first stage and classifies the class belonging to that category in the second stage. The details of each approach are discussed in the next subsections.

1) END-TO-END APPROACH
In this approach, the model takes the heartbeat as an input and classifies it as one of the 15 classes in an end-to-end way. The motivation behind choosing the proposed architecture, based on our analysis, is inspired from the inception network [54] for the following reason; In this study, the positions of the waves (P, QRS, and T wave) are not fixed and the length of the waves is not same for all heartbeats. So, the proposed approach calculates the features using multiple kernel sizes to guarantee that the considered wave features are invariant to the length of each wave. The proposed architecture consists of three inception modules followed by three fully connected layers, each inception module consists of multiple convolutional layers that operate on the same level; each layer has a number of filters with a specific kernel size, and padding is applied to unify the output sizes to be able to concatenate them. The outputs are concatenated and the size of the concatenated filters is reduced by applying a max-pooling operation. Figure 5 shows the components of the proposed inception module, and Figure 6 describes the proposed architecture of the first approach.

2) TWO-STAGE HIERARCHICAL APPROACH
According to ANSI/AAMI EC57: 1998 standard, the 15 classes of the MIT-BIH dataset are mapped into five main categories as shown in Table 1. The classification is done based on two stages in this approach. In the first stage, the heartbeats are classified into one of the five main categories. Subsequently, each heartbeat in the second stage is classified into one of the classes that belongs to that category. The proposed approach is shown in Figure 7.
The architecture of each CNN in Figure 7 is similar to the architecture in Figure 6 except for the number of fully connected layers and the number of neurons in each layer. Category F has only one class, so no classification network is needed for it in stage 2. Only the correctly classified heartbeats in stage 1 will be passed to the second stage.
The data augmentation process is slightly different in this approach. It is simple in the end-to-end approach because each class will contain 9,660 samples as the number of heartbeats in the Normal Class, but in this approach, the data augmentation is done across two stages. In the first stage, GAN is applied to the classes of categories S, V, F, and Q to match the number of heartbeats in category N; after augmentation, each VOLUME 8, 2020 category will have a training set containing 15,904 heartbeats. In the second stage, each category is balanced using GAN separately based on the major class in each category. For instance, the major class in the first category is the Normal Class, with 9,660 heartbeats. Based on this, LBBB, RBBB, NE, and AE are balanced to have a training set containing 9,660 heartbeats for each class.

A. DATASET
The MIT-BIH dataset [55] is the most popular dataset for arrhythmias, and it is used for arrhythmia detection in most studies. It contains 48 records of individuals of different genders and ages; each record is a 30-minute-long recording of heartbeat signals, with a sampling frequency of 360 Hz. The heartbeats and R-peak locations have been annotated by experts and associated with the dataset; these annotations and locations have been utilized as the ground truth in the training and evaluation phases. Only ECG data from lead 1 has been considered. According to ANSI/AAMI EC57: 1998 standard [56], only 44 records can be utilized because there are four paced records. Hence, 15 arrhythmias are considered in this study.
In this study, the beats of utilized records from lead 1only were divided into training and testing sets. For comparison's sake, the data division in [3] and [43] has been followed. The percentages of training and testing sets were not the same for all classes because the numbers of beats for the classes were not equally distributed. The training set consisted of 13% of the total beats from the Normal Class, which contains tens of thousands of beats; 40% of the total beats from the classes with large number of beats; and 50% of the total beats from the classes with a small number of beats. The division of the beats is described in Table 2.

C. RESULTS OF THE END-TO-END APPROACH
In this approach, the training set is selected randomly for each class according to the data division in Table 2, and the other beats are used in the testing set. There is no duplication between the training and testing sets. The generated heartbeats by the GAN were used to increase the training set for the classes (except the Normal Class). The final training set after data augmentation had 144,900 beats, with each class having 9,660 training beats. Finally, Adam optimizer [52] was utilized to tune the parameters, and the network weights were initialized with random values from standard normal distribution. The proposed approach was applied only to lead 1 from the MIT-BIH arrhythmias dataset, and 15 arrhythmia classes were considered. The proposed model in Figure 6 was trained using the same techniques and hyperparameters as the original imbalanced dataset and the augmented dataset with GAN to observe the effect of balancing the dataset. The generated heartbeats were utilized only in the training phase, and the testing set contained unseen real heartbeats. The confusion matrix of this approach is shown in Table 3.
The precision, sensitivity, and specificity for each class before and after data augmentation are shown in Table 4. Although increasing the number of training samples decreased the precision slightly for some classes, such as APC, AP, and VF, the GAN had a great effect on the minor classes and significantly increased the precision for these classes, such as AE, UN, and NE. It is worth mentioning that the dangerous and rare diseases usually fall in the minor classes of arrhythmia disease. However, UN segments contain distortions in one or more of the three main waves (P, QRS, and T). So, UN segments can't be recognized as a specific heart disease.
Generating synthetic heartbeats using GAN and adding them to the training sets achieved better results. After data augmentation, the precision increased by 8.64%, achieving 90.0%, and the specificity increased by 0.76%, achieving 99.23%; the overall accuracy increased by 0.5%, achieving 98.3%. The effect of the GAN does not appear clearly on the overall accuracy because it is biased to the major classes, which already contain large numbers of samples and can be recognized easily; the effect has been shown on the precision of minor classes, which need to be balanced because they contain only tens of samples.

D. RESULTS OF THE BASELINE DATA AUGMENTATION TECHNIQUES IN THE END-TO-END APPROACH
To illustrate the efficiency of the GAN, we compared our results after data augmentation in the end-to-end approach against other common data augmentation techniques, such as random oversampling, the synthetic minority oversampling technique (SMOTE), and adaptive synthetic (ADASYN) sampling. Random oversampling randomly replicates the samples of the minor classes to match the number of samples in the dominant class. However, it increases the likelihood of overfitting. SMOTE [57] generates synthetic data based on the similarities of the feature space that exist in the samples of the minor classes. It randomly selects one of the neighbors of each sample in the minor classes and generates new samples by calculating linear interpolations for the samples. In contrast, He et al. [58] proposed another technique called ADASYN to generate synthetic samples based on the density distributions of the training data. Moreover, we trained and evaluated the proposed model with the original unbalanced dataset using the weighted loss strategy. Table 5 shows the results of the end-to-end approach using ten folds for each  technique compared with the results of the GAN. It demonstrates the efficiency of the GAN compared to the other common data augmentation techniques.

E. RESULTS OF THE TWO-STAGE HIERARCHICAL APPROACH
In this approach, the heartbeats are classified into one of the five main categories in the first stage, whereas in the second stage, the heartbeats that were correctly classified in the first stage are classified into the classes belonging to that category; the overall accuracy is measured based on the misclassifications in the two stages.
The proposed model in Figure 7 was trained with the same techniques and hyperparameters using the original imbalanced dataset and the augmented dataset using GAN to see the effect of balancing the dataset in this approach. The confusion matrix for the first stage is shown in Table 6, whereas the results of the first stage before and after data augmentation are shown in Table 7. The effect of the GAN is 35600 VOLUME 8, 2020   clear from the precision of minor categories, such as F and Q categories because these categories had few samples before data augmentation. We also observed that the GAN has no effect on N and V categories because they are the dominant categories.
The confusion matrices for the categories-except category F because it contains only one class-in the second stage are shown in Figure 8, whereas the results of the second stage before and after data augmentation using the same techniques and hyperparameters are shown in Table 8. The achieved overall accuracy after data augmentation across the two stages is 98.0%, while the average precision, sensitivity, and specificity for the classes in each category are 93.95%, 97.71%, and 97.41%, respectively. In this approach, the GAN increased the precision significantly by 8.65% and increased the overall accuracy by 1.45%. The precision is slightly higher in this approach than it is in the end-to-end approach. Moreover, the main category is also known, not only the disease class. In contrast, the sensitivity and specificity in this approach are reduced by nearly 2% compared with the end-to-end approach. It is also observed that the GAN has more effect in stage 2 than stage 1 because stage 2 contains the minor classes, whereas stage 1 contains the categories. For instance, the precision of AE class increased in stage 2 after data augmentation from 20% to 87.5%.
As mentioned in section 2, not all studies consider the precision or provide a confusion matrix for the considered classes. The comparison between our work and other studies that consider precision in addition to the overall accuracy is given in Table 9. Most studies consider only a few classes with utterly different beats, resulting in high overall accuracy and precision. However, the average precision decreases when more classes are considered. In contrast, in this study, the proposed approaches using only data from lead 1 achieved VOLUME 8, 2020  better results than other studies did, proving the robustness of the proposed approach. It is worth mentioning that other studies, such as [3] and [43], achieved their average precision by using data from leads 1 and 2, as well as developing a fusion step to make an accurate final decision, which dramatically increased the computational time.
GAN can generate heartbeats that are similar to real ones and significantly improve the results compared with the original data and the other data augmentation techniques. The advantages of the GAN can be summarized as follows: 1) It is an unsupervised method; GAN does not require the data to be labeled and can be trained using unlabeled data. However, as mentioned in this study, we trained the GAN on the classes independently to be able to precisely determine the label of each generated sample; 2) It can generate highly realistic heartbeats that are indistinguishable from real ones; 3) It has the ability to learn the distribution of the data, even if it is complicated; and 4) It can even be trained using a small number of samples. However, GAN also has some limitations, which can be summarized as follows: 1) The generated heartbeats are not as smooth as real ones; a post-processing step using a smoothing filter needs to be applied to enhance the quality of the heartbeats; and 2) It occasionally generates distortion samples. To increase the precision of the results, outlier removal should be utilized to remove these outliers before using the generated samples.

V. CONCLUSION AND FUTURE WORK
The dynamic heartbeat segmentation technique was utilized because it is invariant for the heart rate variability after filtering the input signal to reduce noise. Thereafter, a novel data augmentation technique was proposed for ECG data using GAN to solve the imbalance problem in the MIT-BIH arrhythmia dataset. Two deep learning approaches were used to classify different heartbeats into 15 classes of the MIT-BIH dataset. The end-to-end approach classifies the heartbeats in a direct way, whereas the two-stage hierarchical approach recognizes the category in the first stage and determines the exact class that falls in that category in the second stage.
Adding synthetic heartbeats has impacted the minor classes and increased their precision significantly. An overall accuracy of 98.30% and precision of 90.0% are achieved by the first approach. The second approach has achieved an overall accuracy of 98.00% and precision of 93.95%, which means that the deep CNNs succeeded in learning the most important features automatically, without any handcrafted features. The results are superior and have been achieved using only the data of lead 1, unlike other existing studies, which increase the computation significantly by utilizing data from two leads and adding a fusion step to increase the overall accuracy and average precision if considered. Moreover, we show that balancing the dataset by augmenting the heartbeats using GAN achieved better results than augmenting using other common techniques.
The resources used in the experiments comprised a 1x Tesla K80 GPU with 2,496 CUDA cores, and 12 GB of GDDR5 VRAM. The training times were 47 minutes in the end-to-end approach and 61 minutes in the two-stage approach, while the testing times for classifying one heartbeat were 0.104 milliseconds in the end-to-end approach and 0.262 milliseconds in the two-stage approach. This means that classifying one 30-minute record of a patient, such as record number 100, will take only 0.235 seconds in the endto-end approach and 0.590 seconds in the two-stage approach, which proves that both approaches are highly efficient and can be implemented in real-time monitoring systems.
The work in this study can be used in two clinical applications. The first usage is to deploy the models in real-time lightweight wearable devices, as proposed in [59], using an application program interface (API). The models can also be deployed in real-time monitoring using ECG devices in the hospitals. Our future work will develop different variants of the GANs, apply different classification architectures, utilize different sampling rates, and deploy the proposed models in real-time monitoring and classification systems.