QRS Complex Detection Using Novel Deep Learning Neural Networks

Objective: Accurate QRS complex detection is essential for electrocardiography (ECG) diagnosis. Many proposed algorithms don’t perform satisfactorily on noisy and arrhythmia ECGs. The purpose of this study is to develop a noise resistant and generalizable method to detect QRS complexes accurately. Methods: Two deep learning models based on multi-dilated convolutional blocks are proposed. One model (CNN) is mainly composed of convolutional blocks and Squeeze-and-Excitation networks (SENet). The other model (CRNN) contains a hybrid convolutional and recurrent neural network. With 5-fold cross-validation approach the models are trained and tested on four open-access ECG databases: the China Physiological Signal Challenge (2019) database (CPSCDB), the MIT-BIH Noise Stress Test Database (NSTDB), the MIT-BIH Arrhythmia Database (MITDB) and the QT Database (QTDB). Results: The F1 score of CNN model on CPSCDB, NSTDB, MITDB and QTDB are 0.9929, 0.9892, 0.9994 and 0.9998 respectively. The F1 score of CRNN model on these four databases are 0.9947, 0.9953, 0.9995 and 0.9998 respectively. The ensemble of both models scored the first place in the China Physiological Signal Challenge (2019). Conclusion: The proposed models achieve state-of-the-art performance in QRS complex detection and show good generalization on different databases. This work might help make better ECG diagnosis.


I. INTRODUCTION
Cardiovascular diseases (CVD) are the leading cause of death globally, taking around 17.8 million lives each year [1]. Electrocardiogram (ECG) is the most widely used diagnostic tool for CVD. It is easily performed, noninvasive and can give immediate information. About 3 million ECGs are produced each day throughout the world [2]. With the development of wearable devices, more and more ECGs are generated for analysis. Automated diagnostic methods are required to process ECGs generated by wearable devices and to reduce doctors' workload. Many diagnostic methods are based on accurate QRS complex detection. QRS complexes serve as the beat positions and provide information about rhythm and intraventricular conduction. Normally they are the most prominent parts of the ECG and can be easily identified by human eyes. A lot of algorithms have been developed to automatically detect QRS complex since several decades ago. Common QRS complex detectors share a two-stage structure The associate editor coordinating the review of this manuscript and approving it for publication was Jingchang Huang .
including the preprocessing stage and the decision stage [3]. The preprocessing stage takes advantages of linear filtering and non-linear transformation to enhance QRS complex and attenuate other waves, noises and artifacts. The decision stage establishes the peak detection logic and additional decision rules to optimize the detection results. The popular methods include digital filtering [4], [5], wavelet transform [6]- [9], empirical mode decomposition [10], Hilbert transform [5], [8], and machine learning [11], [12]. A recent study tested ten widely used QRS detection algorithms on six ECG databases with varying degrees of noise and found that these algorithms showed very high detection accuracy on high quality ECG databases but poor accuracy on low quality ECG signals [13]. For long-term ECG monitoring, intermittent strong noise is unavoidable due to patient movement, muscle activity or even loose lead contact. It still remains a challenge to locate them accurately on noisy arrhythmic ECG.
Deep learning has been very successful in computer vision, natural language processing and speech recently. It's reported that a deep neural network achieved better than average cardiologists in classifying 12 rhythm classes ECG [14]. And there have already been a few deep learning approaches to detect QRS complexes. Wang et al. proposed two parallel residual neural network (ResNet) like deep neural networks and achieved positive predictive value of 99.98% and sensitivity of 99.92% on ECG data from the MIT-BIH Arrhythmia Database (MITDB) [15]. However, they discarded 2 records and the last 4.44 seconds of each remaining record, which might attenuate the model's generalization. Xiang et al. constructed two-level convolutional neural network and got positive predicted value of 99.91% and sensitivity of 99.77% for the MITDB data [16]. Yang et al. turned one dimensional ECG into two dimensional picture and used a faster Regional CNN model to detect QRS complexes. They tested the model on 24-h wearable ECG recordings and got a sensitivity of 98.76% and a positive predictively of 98.52% [17]. These results are comparable to state-of-the-art approaches and show promising application of deep learning in QRS complex detection. However, according to the research done by Habib et al., the CNN model didn't generalize well when the testing database was different from the training database [18]. A more generalized and robust QRS detector is required for real application.
The aim of this study is to propose a noise-resistant deep learning method that reaches cutting edge performance for QRS complex detection and generalizes well in different ECG databases. Our algorithms won the first place in the CPSC2019 and achieved state-of-the-art accuracies on three other common ECG databases.

II. DATABASES
The China Physiological Signal Challenge (2019) database (CPSCDB) consists of 5232 single-lead ECG recordings which were collected from patients with CVD [19]. It contains many noisy ECG excerpts together with various arrhythmia patterns. All recordings are sampled at 500 Hz and each is 10 s long. The training set has 2000 recordings and was used for training. The test set has 3232 recordings and was used for algorithm performance evaluation by the challenge committee. This database can be accessed at http://2019.icbeb.org/Challenge.html.
MITDB consists of 48 half-hour two-lead ECGs which are sampled at 360 Hz [20]. It's the most popular standard ECG database tested for QRS detection algorithms. Because there is only one record that contains ventricular flutter segments, model training and testing can't be reasonably arranged. In the same way as others [7], [10], [21] we also excluded the ventricular flutter segments in record 207 and used all 109494 beats in this study. NSTDB has 12 half-hour ECG recordings. These recordings were created by adding calibrated amounts of noise to two clean records (118 and 119) fm MITDB. The noise signal was added intermittently after the first 5 min of each record. The signal-to-noise ratios (SNR) of the noisy segments are: 24, 18, 12, 6, 0, −6 dB [22].
QTDB contains 105 fifteen-minute two-lead recordings which have various QRS and ST-T morphologies. 23 records have no annotations and the remaining 82 records were selected for our research. These recordings have a sampling rate of 250 samples per second. To be consistent with most reports, the first lead data in these three databases were used in this study.

III. METHODS
The flowchart of our proposed method is shown in Figure 1. The raw ECG is preprocessed and then fed into a deep learning model. The model's output information is further judged by the decision rules to obtain the final result.

A. PREPROCESSING
Single towering spike whose voltage is more than 20 mV is examined and replaced by the normal sample immediately before it ( Figure 2). These spikes are only existed in CPSCDB and the spike removal algorithm makes no change to the recordings in other three databases. All ECG recordings are resampled to 500 Hz using fast Fourier transformation method. To achieve better model generalization, the mean of signal values is subtracted for each recording. No further preprocessing such as differentiating, normalization or noise filtering is required.

B. PROPOSED DEEP LEARNING MODELS
The proposed CNN model is shown in Figure 3a. There are three parallel dilated CNN blocks following the input layer and each convolutional block contains six 1D convolution layers ( Figure 3c). The first convolution layer has the kernel size of 11. The second and third convolution layers are stacked and have the kernel size of 7. The rest three convolution layers are also stacked with the kernel size of 5. Different sets of dilation rate for convolution layers were designed for these dilated blocks. A dilation rate of 2 means a convolution takes every other point as the input. The combination of different dilation rates is to get different receptive fields for the output neurons. The details are shown in Table 1. Block 1 has the smallest receptive field size that is equivalent to 0.18 s of original samples. The receptive fields of block 2 and block 3 are 0.97 s and 3.97 s of ECG samples respectively. The batch normalization layer connected to the convolution layer is used to speed up the training process and improve generalization. The Max pooling layer following the batch normalization layer is to down-sample the features while keeping important information. Then the concatenation features extracted by convolutional blocks are fed into the squeeze-and-excitation networks (SENet) followed by three fully connected layers. The last layer uses sigmoid activation to predict QRS complexes. The size of the output layer is one-eighth the size of the input layer. So every output point denotes 0.016 s of the original signal. And each QRS complex is expected to correspond to 7 points in the output which equal 0.112 s. The proposed CRNN model is basically the same as the CNN model except that two stacked LSTM layers are added before the SENet (Figure 3b). LSTM layers are good at dealing with time series data. They can extract temporal features while convolution layers can only extract local features.
The input layer of both models accept variable input sizes. In current study, we input 10 s long segments of the training sets into the models during training to take full advantage of the parallel performance of the GPU. And we used the original recordings in the test sets for fast model inference.

C. MODEL TRAINING
The models are built using Keras which is a user-friendly python library for deep learning. Adam is selected as the training optimizer and its learning rate is set between 1e-3 and 1e-4. The models are trained 60 to 100 epochs with the batch size of 200. Data augmentation techniques such as adding random amount of Gaussian noise, combining a sinusoidal signal with random initial phase and amplitude [23], randomly shifting the baseline and making the signal upside down are applied on the fly to the input data. And So the models hardly see two identical inputs during training. It's very helpful for deep learning models to improve their performance and robustness.

D. PEAK LOCALIZATION
The decision stage is to localize the QRS complexes by finding the peaks from the output of the deep learning model. A fixed threshold of 0.5 is set for the output to determine whether the samples belong to a QRS complex. Another threshold of 64 ms for the duration of clustering positive samples is set to eliminate some wrong predictions. If the duration of clustering positive samples is longer than 64 ms, the midpoint of these samples is considered as a QRS complex candidate. After all the candidates are determined, the distances of adjacent candidates are calculated. If there are two candidates whose distance is less than 100 ms, the candidate with low confidence score will be removed. The search will be repeated until all the distances between adjacent candidates are more than 100 ms. This algorithm was used in CPSC 2019. However, it may miss paced beats. To fix that, further search should be performed to locate where the distances are greater than 1200 ms between adjacent QRS complexes. For these periods, if there exists at least one point that is great than 0.5, the threshold of the duration of clustering positive samples is reduced by 16 ms and this process will continue until a new QRS candidate is found or the threshold decreases to zero.

E. PERFORMANCE EVALUATION
For CPSCDB, the accuracy of QRS location and heart rate (HR) estimations are used for performance evaluation. The evaluation algorithm, which is provided by the challenge committee, compares the reference QRS annotations and the predicted ones for every single ECG recording from 0.5 s to 9.5 s in the test set. The first and last half second is omitted. Each predicted location is deemed accurate if it lies within 75 ms duration of the reference location. When all the predicted locations and the annotated locations are totally matched, that recording scores one point. If there is only one false positive (FP) or false negative (FN) in the prediction, the recording scores 0.7 and 0.3 points respectively. For other situations, the recording scores 0 point. The detailed QRS scoring rules for a single recording are as follows: The final QRS score is calculated as follows: where N is the number of test recordings. HR is calculated between 5.5 s and 9.5 s from each recording. Its scoring rules are described in the following two equations.
where HR ref is the reference QRS location and HR test is the predicted QRS location.
The final QRS score is calculated as follows: where N is the number of test recordings.
Sensitivity (Se), positive predictive value (PPV), error rate (ER) and F1 are calculated in all databases. These metrics are defined as follows: where TP is true positive and TN is true negative. The standard grace period of 150 ms is used for beat-by-beat comparison [24]. Three kinds of model evaluation strategies are used on NSTDB, MITDB and QTDB. At first we performed cross-database testing, which means the model is trained first on CPSCDB and then tested on other databases. Secondly we performed 5-fold cross-validation for MITDB and QTDB. Since the data in NSTDB are basically mixture of two records from MITDB with electrode motion (EM) artifact and some other noise, 5-fold cross-validation will lead to data leakage. We referred to Jia et al's method [6] to make a training set by adding the unused part of EM noise in NSTDB to the last 20 records in MITDB and then tested the models on the whole NSTDB. Lastly we performed fine-tuning, which means the model is trained first on CPSCDB and then evaluated on MITDB and QTDB with 5-fold cross-validation strategy or the specific method mentioned above for NSTDB. When 5-fold cross-validation is used, the recordings of a database are randomly split into 5 folds. The data in each unique fold used for testing is kept in an unsegmented state, whereas the recordings in the remaining folds are cut into 10 s segments, which are fed into the model during the training process.

IV. RESULTS
Two examples of using the proposed method for QRS detection are showed in Figure 4. The upper figure shows our CNN model identifies QRS complexes on an excerpt of arrhythmic ECG with drifted baseline. The lower figure shows that dynamic threshold for duration of clustering positive samples in our decision stage avoids a missed paced heartbeat detection.

A. CPSCDB
To optimize the deep learning architecture for QRS detection, different dilated CNN blocks or SENet of our proposed models were removed. Their performance was evaluated using 5-fold cross-validation ( Table 2). The CNN model showed good predictive ability. Its QRS acc , HR acc , Se, PPV, ER and F1 were 91.23%, 94.67%, 99.26%, 99.31%, 1.42% and 0.9929 respectively. When one of the dilated blocks was removed from the proposed CNN model, the performance on all these metrics decreased. Among these three blocks, block 3 was more important, and the performance fell most VOLUME 8, 2020  significantly after it was removed. The lack of block 2 had the least impact on the overall performance. When two of the blocks were removed, the model performance further decreased. Retaining block 3 performed slightly better than retaining any other blocks. SENet could help the model get better overall performance. The CRNN model has two more stacked LSTM layers than the CNN model and it significantly improved QRS detection performance, especially QRS score and HR score . Ensemble model of CNN and CRNN, which averages CNN output and CRNN output, showed best in all metrics except HR acc . And it was used to participate in the CPSC2019 and won the first place. As shown in Table 3, our final QRS acc and HR acc reached 92.14% and 94.89% which were 0.59% and 0.60% higher respectively than that of the second place.

B. NSTDB
The Se and PPV of different training strategies on NST database are shown in Table 4. In all four different cases, signals with high SNR were all well identified, whereas the performance differed significantly in signals with low SNR. The overall performance comparison of our methods with others' is shown in Table 5. Our cross-database testing performance of both models is comparable to most published results. The CNN model trained from scratch showed better results and the fine-tuned CNN model further improved its performance. The state-of-the-art result was got by the fine-tuned CRNN model which was pre-trained on CPSCDB and retrained on NSTDB.

C. MITDB
The performance of our models and some recent published results on MITDB is reported in Table 6. Cross-database testing showed the CNN model had better recognition of 97086 VOLUME 8, 2020  QRS complexes than the CRNN model. Its Se, PPV, ER and F1 are 99.91%, 99.90%, 0.19% and 0.9991 respectively. Cross-validation results of the CNN model showed similar results. The fine-tuned CNN model outperformed other published methods in terms of all four types of metrics. The fine-tuned CRNN model got the best overall performance with Se of 99.94%, PPV of 99.97%, ER of 0.09% and F1 of 0.9995.

D. QTDB
The performance comparison of our methods with others' is shown in Table 7. Cross-database testing results of the CRNN model were only better than Pan-Tompkins algorithm while the results of the CNN model were comparable to other reported results. In other situations our CNN model and CRNN model reached the new high evaluation score with F1 of 0.9998. They were superior to known published results from various methods.

V. DISCUSSION
In this report, we introduced two novel deep learning models that could perform accurate, robust and noise-resistant QRS complex detection. Unlike many algorithms whose performance decreased significantly when tested on different  databases [13], our methods were validated on a challenge database and three commonly used database and showed good generalization. Deep learning models normally get better results when they were trained with more data. However, the model's generalization capacity does not increase by simply adding more similar ECG samples [18]. That's because VOLUME 8, 2020 ECG is composed of many repeated patterns and when the model has already learned them, more similar data won't help improve the model's performance. But a diverse range of subjects are useful [18] because they have many new patterns and help improve the model's robustness. Since CPSCDB has noisier data and the recordings are abundant with various arrhythmic patterns, it's quite different from other databases. So the model's performance can be further improved when the model is trained on a new database together with CPSCDB. And we believe with more dynamic arrhythmic ECG data, our models can achieve even better performance in QRS complex detection.
Besides the data, deep learning algorithm is critical for output results. The key part of our algorithm is the three parallel dilated CNN blocks. They share the same parameters except the dilation rates which affect the receptive field of an output neuron. A small receptive field makes the correspondence between output and input data more accurate, while a large receptive field makes the output result represent more original data. Table 3 shows that combination of three different receptive fields has optimal results. Removal of a medium-sized receptive field has less effect on the model performance than removal of a small or large receptive field. This is because the model can still take into account both local features and enough nearby information simultaneously when block 2 is removed. If only one block is left in the model, larger receptive field size is better because the identification of QRS waves in noisy ECG cannot be directly recognized from the local morphology, but requires a long period of data for comprehensive consideration. So models with three parallel convolutional blocks showed robust and outstanding performance in various ECG databases. SENet, which was introduced in ILSVRC 2017 classification challenge and won the first place [27], can further improve the end results. It introduces channel-wise attention mechanism to the output features of convolutional blocks and improves the models' performance at minimal additional computational cost.
The fine-tuned CRNN model showed powerful ability in identifying QRS complexes in different ECG databases. It increased F1 value by 0.62% in NSTDB and by 0.18% in CPSCDB compared with the fine-tuned CNN model. However, it improved performance just a little bit in MITDB and QTDB. The reason is probably related to the noise level existed in the ECG database. NSTDB has the noisiest recordings and CPSCDB contains many low signal quality recordings, while MITDB and QTDB have relatively clean ECG data. For hard ECG excerpts, it's difficult to identify QRS complexes just from their morphology. Instead, the location of the QRS complexes can be inferred from adjacent heartbeats or farther signals with high quality. The stacked LSTM layers in CRNN model are good at dealing with long sequential data because an LSTM unit has memories of previous data by controlling the information flow through three gates which are an input gate, a forget gate and an output gate [28]. Whereas, convolutional layers can only extract local morphology features and lack of information from farther signals. So the CRNN model is superior to CNN model in identifying QRS complexes of noisy ECG data. However, the cross-database testing showed that the generalization of the CRNN model was not as good as the CNN model. It implies that the CRNN model is more prone to overfitting when trained in a single database. Another disadvantage of the CRNN model is that it has 12 times more trainable parameters than the CNN model. That leads to slow model training and slow inference. Table 8 shows the models' average inference time for one second ECG sample. The wall time indicated the real time consumption and was influenced greatly by the numbers of CPUs and threads that participated in the program running. The CPU time indicated the total amount of time that used by all CPU cores for running the program. The CNN model took 15.7 ms of CPU time to process a one-second sample in CPSCDB, and it took less than half of the CPU time for samples in MITDB and QTDB. Whereas, the time consumed by the CRNN model is 17-32 times that of the CNN model. The wall time of processing a 30-min MITDB ECG recording by the CNN model is around 1.26 s and it's comparable with some conventional algorithms [29]. The CNN model can be implemented for real-time heartbeats monitoring and ECG analysis, while the CRNN model is suitable to be deployed on a workstation for static ECG analysis.
There are several limitations of our method. First, the predicted QRS locations are only approximate to the R-peak. That is because the size of the model output is only one eighth of the model input size and the final QRS locations are obtained by multiplying the peak positions in the output by eight. Second, our models can't recognize ventricular flutter. We just excluded the flutter segments of MITDB when testing our models. Third, the threshold of models' output confidence is arbitrarily set at 0.5 in the decision stage. Other values or dynamic threshold may further improve model's performace.

VI. CONCLUSION
In this paper, we proposed two models with multi-dilated convolutional blocks and tested them on various ECG databases. The CNN model runs fast, achieves high performance and generalizes well. The CRNN model makes new state-of-the-art performance on several databases but it's computationally expensive. Both models show the powerful potential of artificial intelligence in ECG analysis. He is currently an Associate Professor with the University of Shanghai for Science and Technology, China. His research interests include bio-signal processing, biomedical image processing, and pattern recognition.
DANQIN HU received the B.Eng. degree in biomedical engineering from Southwest Medical University, Sichuan, China. She is currently pursuing the degree with the School of Medical Instrument and Food Engineering, University of Shanghai for Science and Technology, China.
Her research interests include bio-signal processing and pattern recognition. VOLUME 8, 2020