Application of CNN for Detection and Localization of STEMI using 12-Lead ECG Images

STEMI is the most severe type of Myocardial Infarction that causes death or disability. Previous studies among physicians and paramedics have shown that the accuracy of STEMI diagnosis by the 12-lead ECG is not sufficient. Thus, we propose a 2D-CNN model that can detect and locate the STEMI signals from 12-lead ECG images. The 2D-CNN model is trained as a binary classification with 540 ECG images (270 STEMI cases and 270 other ECG images), and it achieved 96.3% accuracy, 96.2% sensitivity, 89.4% precision, 0.926 F1-score, and 0.962 ROC-AUC scores for 537 testing images. The proposed model is compared with 10 different transfer learning models. The proposed model has the best accuracy, sensitivity, F1-score, and ROC-AUC score. Grad-CAM technique is used for localization of STEMI signals in ECG images. According to the comparisons, the proposed model is the most reliable one for localizing the STEMI signals. This localization builds trust for the model because the CNN model is not a black box anymore, we can see where the CNN model looks and decides. The result of localization can also be used for teaching inexperienced physicians and paramedics. Also, the proposed model can be helpful for the accurate diagnosis of STEMI with a quick response time for clinical practices.


I. INTRODUCTION
Ischemic heart disease (IHD) is the major cause of death in the World. In 2019, IHD affected 182 million people and resulted in 9.14 million deaths. It makes up 16.7% of all deaths, making it the most common cause of death globally. According to Global Burden of Disease 2019 Data, IHD is ranked 1 from 1990 to 2019 for the total cause of deaths in the world [1]. Acute Myocardial Infarction (MI), commonly known as heart attack, is one of the most significant type of IHD causing deaths. MI is caused by blocked coronary arteries, which limits or completely stops the blood flow to a part of the heart. Due to the reduction or the complete stop of the blood flow, Oxygen cannot reach the blocked part of the heart which causes the muscle suffers and begins to die. MI can fall into two different categories, ST Elevation Myocardial Infarction (STEMI) or Non-ST Elevation My-ocardial Infarction (NSTEMI). A STEMI type happens when the coronary artery is completely blocked, while an NSTEMI type involves partial blockage of the artery. A STEMI heart attack is the greatest risk of death or disability, so it is vital that the blocked artery is reopened as quickly as possible to save the patient. Thus, the treatment of STEMI requires more time-critical decisions than NSTEMI. According to the current STEMI guideline, it is recommended to diagnose STEMI with an acquired 12-lead electrocardiogram (ECG) within 10 minutes starting from the point of the first medical contact [2].
The ECG is a signal generated by the heart muscle and has an important role to monitor the activity of the heart. Electrodes are used to detect the small electrical changes (in millivolts) of the heart muscle. Conventionally, 12-lead ECG is used to detect MI. 12-lead ECG consists of I, II, III, VOLUME 10, 2022 aVL, aVR, aVF, V1, V2, V3, V4, V5, and V6 leads. The ECG consists of a series of waves (P, Q, R, S, and T), the P wave followed by the QRS complex, and the T wave as shown in Fig. 1. The area between S and T waves is called ST segment. STEMI is diagnosed by the examination of the ST segment of the ECG. In a STEMI case, the ST segment of the ECG is more elevated than the normal ECG case. The difference between the ST segments for STEMI and normal ECG is shown in Fig. 2.

FIGURE 1.
A series of waves in an ECG image Source: [3] In a case of emergency, physicians and paramedics must diagnose the patient as quickly as possible according to the guideline. In this case, they might have false-negative prediction which can be mortal for the patient's life. In a previous study, 99 physicians interpret ECG records and their overall sensitivity and specificity for detecting STEMI is 76.9% and 65%, respectively [4]. Another study conducts a survey with 124 physicians to interpret a total of 4392 ECG records and the overall sensitivity and specificity for detecting STEMI is 65% and 79%, respectively [5]. It clearly shows that around one-fourth of the patients are misdiagnosed, and the patients might not be treated in time properly. Thus, we develop an Artificial Intelligence (AI) model to automatically detect if the patient has STEMI type of heart attack by only receiving the ECG image. We adopt Convolutional Neural Network (CNN) to construct our AI model. We focus on using 12-lead ECG images as the dataset. This model is designed to assist physicians and paramedics to make fast and accurate decisions in a short period of time. In addition to that, this study does not only aim at developing an AI model for prediction but also creating a visual explanation of the 12-lead ECG image to show where the AI model actually "looks" and predicts the case. This will assist the physicians about where to look to classify if the patient has STEMI or not. As a result, the developed AI model predicts the result and locating the STEMI signals. Combining these two results can have a great contribution for the physicians and as well as for the continuation of human life.
The rest of the article is organized as follows: In Section 2, we describe other studies related to IHD and STEMI. In Section 3, we introduce our dataset and some public datasets. Also, we introduce our data preprocessing method and the proposed architecture of our AI model. We explain the results in Section 4 and conclude the study in Section 5.

II. RELATED WORK
In recent years, with the development of AI frameworks, using AI has become more available for many different scientific fields. There have been many studies related to using AI in medical field as well. These studies are broadly categorized into machine learning (ML) or deep learning (DL) approaches. The ML-based studies mostly consist of hand-crafted feature extraction methods. Some of the common ML algorithms are K-nearest neighbor (KNN), support vector machines (SVM), decision tree, and boosting such as XGBoost. Sharma et al. applied KNN for detection of normal and MI ECG signals [6] Acharya et al. applied KNN for the same task and they provide both detection and localization of the normal and MI ECG signals [7]. Sharma and Sunkaria applied KNN and SVM for detection of the normal and inferior MI ECG signals [8]. Weng et al. applied SVM for detecting MI and non-MI ECG signals [9]. Dahore et al. applied principal component analysis reduction technique on 12-lead ECG signals to reduce the features in the dataset from 220 to 14. Then, they used these 14 features for detection of MI and normal ECG signals by SVM classifier [10]. Tao et al. developed IHD detection and localization method by using ensemble learning. They used 2 SVMs and 1 XGBoost classifier for the detection task, and XGBoost model for the localization task [11].
Unlike ML methods, DL-based studies generally do not require manual feature extraction methods. The most common DL algorithms for ECG records are 1-dimensional convolutional neural network (1D-CNN), 2D-CNN, and recurrent neural network (RNN). While 1D-CNN and RNN algorithms are mostly used for ECG waveforms, 2D-CNN is used for ECG images. Baloglu et al. used 1D-CNN to classify MI with 12-lead ECG signals. They classified MI based on presence of MI ECG perturbations in eleven different groups [12]. Reasat and Shahnaz used only 3 leads (Lead II, Lead III and Lead AVF) from the ECG signals to classify inferior MI and healthy signals. They feed each lead into an inception block which they use 1D-CNN inside [13]. Liu et al. proposed a different approach for targeting lightweight mobile healthcare applications. Their approach is called sub 2D-CNN which uses 1-D kernels shared among the different leads to generate local optimal features. They use ECG signals to classify Generalized Anterior MI (GAMI) and Healthy controls. The class GAMI includes anterior MI, anteroseptal MI and anterolateral MI [14]. Feng et al. proposed a combination of 1D-CNN and long short-term memory (LSTM) network, to detect MI and normal by using single-lead ECG signals [15]. Cao et al. proposed the Multi-Channel Lightweight CNN which combines 4 leads (V2, V3, V5 and aVL) from ECG signals to detect Anterior MI. Their proposed model is designed to be suitable for mobile devices for remote MI monitoring [16]. Fu et al. proposed a mechanism called MLA-CNN-BiGRU to detect and locate MI for 12-lead ECG signals [17]. Han and Shi proposed a multi-lead residual neural network architecture to detect and locate MI for 12-lead ECG signals. They use the same model for both detection and localization tasks by outputting 2 and 6 classes in the last layer, respectively [18]. Acharya et al. applied 1D-CNN to detect MI and normal ECG signals without any denoising, feature extraction and selection techniques [19]. Makimoto et al. prepared their dataset by extracting ECG images from raw ECG signals. They used 2D-CNN model for classification of MI and non-MI ECG images. They also extracted the heatmap of the last convolutional layer activations to visualize their model. They found out that their model achieved a higher accuracy of MI ECG images as compared to physicians [20].
All the previous studies discussed above are related to detection and localization of MI vs non-MI ECG signals. In addition to that, all the studies use raw ECG signals as the main dataset which is not easy to receive. Generally, ECG devices are designed to print an ECG record after the device receives the signals from the patient's body. Asking for the raw signals from the machine requires business-tobusiness agreements between the ECG device company and the hospital. Hong et al. provide a broad review of studies that are published between January 1st of 2010 and February 29th of 2020 related to the usage of ECG data in deep learning [21]. They analyzed 191 different papers and categorized them based on the aspects of tasks, models, and data. Most of the papers explained in the review use open-source raw signal ECG data. Among these, the only study related to STEMI is done by Park et al [22]. They implemented 1D-CNN model for detecting STEMI on 12-lead ECG signals. They received the raw ECG signals and preprocessed them with notch filter and high-pass filter to reduce the noise. They also segmented pulses from ECG to focus on the ST segment of the signal. Their overall results are 0.932, 0.896, and 0.943 for sensitivity, specificity, and ROC AUC, respectively.
To the best of our knowledge, our study is the first work that uses 12-lead ECG images for prediction and localization of STEMI cases.

A. DATASET
The ECG data are obtained by Hualien Tzu Chi Hospital. The dataset is composed of 1137 ECG images and each of them is for each individual patient. All the ECG images are labeled by the physicians in Hualien Tzu Chi Hospital after the patients are diagnosed correctly. Each ECG image is a digital copy of a conventional pink ECG grid paper and contains 12 leads (I, II, III, aVR, aVL, aVF, V1, V2, V3, V4, V5, V6). According to the statistics, 431 out of 1137 ECG images contain STEMI signals. The dataset is randomly split into training, validation, and testing sets. The training dataset is used to feed the CNN model. While the validation dataset is used to calculate the training accuracy during the training process, the testing dataset is used to calculate the general classification accuracy and other evaluation metrics of the model. Table 1 describes the dataset. All the ECG images are passed through the pre-processing step before using the CNN model as disscussed in Section 3-D.

B. DATA AVAILABILITY
The data that support the findings of this study are not publicly available due to the containing information could compromise research participants' privacy.

C. PUBLIC DATASETS
Various public datasets can be found from PhysioNet [23].
PhysioNet is an archive of biomedical signals for research communities. They collect, characterize, and document databases of multiparameter signals from healthy subjects and patients with various health effects (for example, epilepsy, sleep apnea, myocardial infarction, movement disorder, and so on). The datasets from PhysioNet come with raw biomedical signals from devices. There are 2 datasets from PhysioNet that are similar to the dataset of this study, namely Long-Term ST Database (LTST) [24] and PTB Diagnostic ECG Database (PTB-ECG) [25]. LTST dataset contains 86 ECG recordings of 80 human subjects and each individual recording of the dataset is between 21 and 24 hours in duration. The dataset deals with VOLUME 10, 2022 a variety of ST segment changes including ischemic ST episodes, axis-related non-ischemic ST episodes, episodes of slow ST level drift, and episodes containing mixtures of these phenomena. The dataset contains only two or three leads. Since the dataset used in this study consists of 12-lead ECG, LTST dataset is not a suitable dataset to conduct an experiment.
PTB-ECG dataset is a collection of 549 records from 290 subjects. Each ECG record contains 12 leads and 3 Frank leads. The ECG records are classified into 9 diagnostic classes. Although one of the diagnostic classes is Myocardial Infarction, the class does not contain any subclass information about the type of myocardial infarction. Since we cannot exactly confirm which records are STEMI, this dataset is not suitable for our experiment as well. An accurate pre-processing technique has a great impact on the further feature extraction and classification layers. We applied fully automatic pre-processing for the received images and summarize the steps in Fig. 3. The fundamental idea of the pre-processing for the AI model is to create an image that contains only the 12 leads with no background noise. This process will give us a black and white image that only contains the signals, which is the region of interest of the AI model. Python 3.8 with OpenCV library is used for the pre-processing steps. The steps are as follows:

1) All the ECG images received from Hualien Tzu Chi
Hospital are converted to grayscale images. 2) The grayscale images are padded by 10 pixels from the edges. This is necessary to detect the edges of the ROI. 2 output images are extracted from this image: • Image thresholding operation is performed to the grayscale image. This image is used as a reference to crop the image later. Let us call the output of the image is thresholded image. • The grayscale image is inverted for further operations. Let us call the output of the image is inverted image.

3) A thresholding operation is applied to the inverted
image. This step prepares the image for morphological operation. 4) A morphological operation, closing with a 3x3 kernel is applied to the image after step 3. This operation makes the gridded area to become completely white. A bigger kernel such as 5x5 or 7x7 can also be used for this operation as well. However, a kernel bigger than 3x3 leads the noisy areas to be connected to the gridded area which should be avoided of. 5) The contours of the image are calculated. This step finds all the contours of the image. For this case, we only need to extract the rectangular gridded area. 6) The OpenCV function for contours keeps all the coordinates of the detected contours. The coordinates of the rectangular area are extracted among these automatically. 7) These coordinates are used to crop the thresholded image (Step 2, output 1). This cropped image is the ROI that is used by the AI model.
Pre-processing steps are summarized in Fig. 3 and the effects can be seen in Fig. 4.

E. PROPOSED ARCHITECTURE
The input size of the network is selected as rectangular shape 250x500. Table 3 summarizes the neural network architecture. The network has only 10 layers with less than 13000 parameters inside, which means that the computational complexity of the model is small, and it can be used in low-end computers. For future accessibility, this model can even be used in wearable devices or small microcontrollers in the medical field.

A. PROPOSED MODEL
The proposed model is trained for 50 epochs with a batch size of 8. Adam optimizer with learning rate 0.001 is selected as the optimizer and loss is calculated as categorical crossentropy. The training accuracy and loss for each epoch are shown in Fig. 6. The result shows that the training accuracy and the validation accuracy converge to a similar number around 0.96-0.98. This means the model is fitted very well to the provided data, there should not be an overfitting problem.

FIGURE 5. Confusion matrix of the testing dataset
The evaluation of the testing dataset is extracted to a confusion matrix in Fig. 5. According to the confusion matrix, the total number of mispredicted images is only 20 out of 537 images. This result concludes the model with 96.3% of accuracy. The results of performance evaluation metrics are summarized in Table 2.

B. TRANSFER LEARNING MODELS
Transfer learning (TL) is a method that an AI model developed for one task is reused for another AI model on a different task. It is a popular approach in deep learning, especially for image classification tasks.   In our study, we compare the proposed model with transfer learning models. We use the pre-trained convolutional layers of VGG [26], ResNetV2 [27], Xception [28], Incep-tionV3 [29], InceptionResNetV2 [30], MobileNetV2 [31], and DenseNet [32] models which they are trained for Ima-geNet dataset [33]. The inherited pre-trained convolutional layers from the models are used as feature extractors for the transfer learning models. For each feature extractor, we VOLUME   create the same type of classifier that is an artificial neural network (ANN) which takes the 2D Global Average of the input layer, connects it to 32 neurons that is connected to 2 neurons as output layer. The comparison of the models is summarized in Table  4. According to the results, the proposed model achieves the best score for all the types of performance evaluation metrics except the precision rate. Transfer learning with VGG16 (VGG16-TL) achieves the best result among the transfer learning models. VGG16-TL also achieves a better result than the proposed model for precision metric but the proposed model prevails for all the other metrics.

C. LOCALIZATION OF STEMI SIGNALS
We applied Gradient-weighted Class Activation Mapping (Grad-CAM) method to localize the STEMI signals in the ECG images. Grad-CAM produces a heatmap that highlights the important regions in the image by using the gradient information flowing into the last convolutional layer of the CNN [34]. This method is applied to all the models discussed in this study. Fig. 7 concludes the differences between the models for a randomly selected STEMI ECG image. Even though the Grad-CAM results of the models are different, all the models predicted the sample image correctly, STEMI. Our proposed model [ Fig. 7(a)] predicted the STEMI signals correctly by locating lead V1, V2, V3, and V4. VGG16-TL [ Fig. 7(b)] located the STEMI signals only on lead V4. On the other hand, VGG19-TL [ Fig. 7(c)] predicted the STEMI case by locating the wrong leads of the image which are not part of lead V1, V2, V3, and V4. Xception-TL [ Fig. 7(d)], ResNet50V2-TL [ Fig. 7(e)], and DenseNet121-TL [ Fig. 7(k)] predicted the STEMI case by locating lead V2 and V3 but they were not be able to locate the STEMI signals on lead V1 and V4. ResNet101V2-TL [ Fig. 7  by locating the end of lead aVL and the beginning of lead V2. As a result, the proposed model is the only model that successfully located all the STEMI signals for this case. Another result we can come up with is how coarse the heatmap results are. As the heatmap size depends on the size of the last convolutional layer, the resulting image will directly be affected from it. The size of the sample image we used here is 400x800. When a VGG-TL or DenseNet-TL model is used, the heatmap will be upscaled from 7x15 to 400x800 which means the heatmap is upscaled more than 50 times to demonstrate it on the input image. Similarly, the upscaling for ResNet-TL and MobileNet-TL models is from 8x16, and Inception-TL models (including InceptionResNet-TL) is from 6x14 while it is from 31x62 for the proposed model. Thus, the proposed model gives fine-grained heatmap results while the transfer learning models give more likely coarse-grained heatmap results. As a result, the proposed model does not only achieve the best scores in terms of the evaluation metrics, but it also gives a fine-grained heatmap result with correct localization.

V. CONCLUSION
The main aim of this study is to predict and locate STEMI and not-STEMI signals from 12-lead ECG images by using 2D-CNN. In this scope, we used the dataset provided by Hualien Tzu Chi Hospital that contains STEMI and not-STEMI cases. Preprocessing steps for the ECG images are introduced to prepare the ECG image format for the CNN model. The proposed CNN model is compared with 10 different transfer learning models. According to the results, the proposed model has the best performance in terms of total parameters, trainable parameters, training accuracy, testing accuracy, sensitivity, F1-score, and ROC-AUC score. From the transfer learning models, only VGG16-TL achieves a better result than the proposed model in terms of precision with 93.80% while the proposed model is in rank 2 with 89.40%. Also, localization of STEMI signals by using Grad-CAM is introduced. The resulting images show that the proposed model outputs a fine-grained heatmap while the transfer learning models output more likely coarse-grained heatmaps. As a result, the proposed model is more suitable than any other experimented models. With the fine-grained heatmap VOLUME 10, 2022