An Evolutionary-Neural Mechanism for Arrhythmia Classification With Optimum Features Using Single-Lead Electrocardiogram

Potentially lethal heart abnormalities can be detected/spotted with recent evolution in continuous, long-term cardiac health monitoring using wearable sensors. However, the huge data accumulated presents a challenge in terms of storage, knowledge extraction and computing time. Moreover, manual examination of long-term ECG recordings presents various problems like huge time and work demand, inter-observer variations and difficulty classifying complex non-linear single-lead ECG signal. To address these problems, we propose an automatic heartbeat classification system that uses the optimized minimum number of features using ECG time-series amplitude directly as input, without feature extraction and provides a primary classification and diagnosis for 1 normal and 14 types of arrhythmic heartbeats. Multi-objective particle swarm optimization (MOPSO) is used to achieve the best feature fitness. A novel fitness function is designed to be the sum of macro F1 loss and normalized dimension, with the optimization objective calculated as the minimum of the fitness function. Multi-layer perceptron (MLP), k-nearest neighbor, support vector machine, random forest and extra decision tree classifiers are trained using the selected features. For the targeted 15-class classification problem, MOPSO-optimized features with MLP consistently performed best with significantly reduced number of features. The proposed method proves to be an efficient and effective arrhythmia identification system for continuous, long-term cardiac health monitoring using single-lead ECG signal.

, [14] usually follow pre-processing, QRS detection, car-  64,96 128 and 256 features from a 3072 feature vector size. 97 Wang et al. [23] proposed effective ECG arrhythmia classifi-98 cation scheme consisting of a feature reduction method com-99 bining principal component analysis with linear discriminant 100 analysis. Alonso-Atienza et al. [24] used a filter-type feature 101 selection procedure was proposed to analyze the relevance of 102 the computed parameters. Chen and Yu [25] applied nonlinear 103 correlation-based filters, calculated feature-feature correla-104 tion to remove redundant features prior to the feature selec-105 tion process based on feature-class correlation. Asl et al. [26] 106 proposed feature reduction scheme based on generalized dis-107 criminant analysis. Haseena et al. [5], [27] used a fuzzy C-108 mean (FCM) clustered probabilistic neural network (PNN) 109 for the discrimination of eight types of ECG beats. The 110 performance has been compared with FCM clustered multi 111 layered feed forward network trained with back propagation 112 algorithm. Important parameters parameters are extracted 113 from each ECG beat and feature reduction has been car-114 ried out using FCM clustering. Polato et al. [33] used a DNN model to classify 120 7 rhythm categories reduced due insufficient recording for 121 4 cases, from an original 11 classes. The architecture com-122 prised two parts of representation learning with 1D convolu-123 tional and sub-sampling layers, and a sequence learning part 124 using long short-term memory (LSTM). In [34] a combina-125 tion of a radial basis function process neural network (RBF-126 PNN) and learning vector quantization network (LVQN) was 127 proposed. The first is used to embed prior feature knowledge 128 whereas the later is a competitive learning and structural 129 self-organizing mechanism that expanded the model depth. 130 LVQN measures feature similarities between input signals 131 and pattern category is determined by a set of wining neurons 132 connected to the output. RBFPNN performs spatial-temporal 133 feature aggregation and learning was done by dynamic time 134 warping and C-means clustering. Wang et al. [35] proposed 135 an end-to-end deep multi-scale fusion convolutional neural 136 network (DMSFNEt) classification architecture using mul-137 tiple convolution kernels for feature extraction. The archi-138 tecture starts with a multi-scale (low to higher scale) feature 139 learning and fusion, then the model is trained by jointly opti-140 mizing the losses of multiple branches for effective learning 141 and discriminative classification features. To restore balance 142 to imbalanced dataset [36] used a generative adversarial net-143 work (GAN), and a 2-stage deep-CNN performed feature 144 extraction and reduction as well as classification. However, 145 the GAN has a problem of focusing on dominant classes 146 and generation of problematic samples which require extra 147 processing. In [37]  time-consuming. An artificial intelligence based diagnosis 208 system was proposed in [49] using texture feature of 2D 209 images of ECG. The images were constructed by projecting 210 the signal vector as a row of the image. A 12-bit signal is 211 transformed into a 8-bit resolution grayscale sub-image on 212 the claim that texture features in images contain determina-213 tive indicators of various diseases. Ge et al. [50] proposed 214 a feature fusion method guided by multi-label correlation 215 and classification with CNN. The labels were calculated 216 based on frequency and Bayesian conditional probability 217 and a multi-label feature vector generated. Shi et al. [51] 218 proposed a classification system based on deep CNN and 219 LSTM network with multiple input layers. Automatic and 220 hand-craft features were both extracted. To manage better the 221 retraining of models, [52] proposed a deep learning-without-222 forgetting CNN architecture comprising feature extraction 223 module, classification layers, memory module to store proto-224 types, and a distance matching network task selector module. 225 Taking a ECG converted to image, a pretrained denseNet169 226 extracted discriminative features.

227
Most of the cardiac beat classification algorithms proposed 228 in literature (see Section-IV) use computationally intense 229 feature extraction step after the beat segmentation (the beat 230 segmentation criteria may be different than the one used by 231 us i.e., some authors use 5, 6 or 10-second signal classifying 232 rhythm rather than exact beat labels as provided by MIT [36], [38], [39], [40], [43], [44], [51], and others. 236 Feature extraction has to be implemented on every section 237 of the incoming time-series ECG signal being continuously 238 acquired by wearable device (Holter in this case). Hence 239 in the case of ECG signal being acquired in the long-term 240 and continuous monitoring 24-hour acquisition scenarios, the 241 least computationally intensive procedure providing a quick 242 scanning method is to directly identify incoming beats for 243 normal and pathological conditions. None of the abovemen-244 tioned works use direct beat samples, remove the redun-245 dant and noisy features to maximize the performance of 246 discrimination of 15 heartbeat classes additionally consid-247 ering the imbalanced nature of normal to pathological heart 248 condition occurrence. So according to our best understanding 249 the proposed algorithm takes the route of least computation 250 performing best heartbeat pathology detection for a quick 251 and early reference in case of long-term and continuously 252 acquired ECG for cardiac health monitoring of patients. 253 In the foregoing propositions, a common denominator is 254 the challenge of complexity, scale, computational demand, 255 time cost, interpretability, etc. while maintaining a high over-256 all accuracy of the classification system. Hence, motivated 257 by designing an automated arrhythmia recognition system 258 competitive with the parallel research, in this work, an effi-259 cient decision support system was developed to perform a 260 quick scan on the single-lead minimally pre-processed ECG 261 time-series signal acquired by Holter device to detect and 262 recognize a broad range (i.e. 15 classes) of heart abnormality 263 conditions. The key objective was to improve the accuracy   The raw ECG signal is acquired through Holter device and 361 the effective ECG frequency lies between 0.5 and 40 Hz 362 frequency band [62]. There is a baseline drift from patient 363 breathing. Hence, in the preprocessing stage, power and 364 low-frequency components are removed from the raw 365 ECG signal by using a 6 th -order bidirectional Butterworth 366 band-pass filter with lower and upper cut-off frequencies 367 of 0.5 and 40 Hz, respectively. Next, the baseline is com-368 puted as a cubic spline interpolation of fiducial points placed 369 90 milliseconds before R-peak positions as an approxi-370 mation for baseline PR-segment and subtracted from the 371 bandpass-filtered signal as shown in Fig. 3.  An initial particles matrix P is generated as in (1) and (2) where, p i,j represents bit value at j th feature position in i th 405 swarm particle. Here j = 1 to d and i = 1 to n. (2) is 406 a version of (1) for the case where j = 1 to d number of 407 features and i = 1 to n. 1's and 0's in each swarm particle 408 represent the selected and non-selected features respectively. 409 The number of individuals n is chosen as 50 so that it is 410 large enough to avoid stagnancy and small enough to avoid 411 excessive computing time [61].      where j defines the dimension of search space, and i repre-461 sents the index of the particle. Updates for velocity, position, 462 weight, best performing particle and fitness value are done 463 using (5), (6), (7), (8) and (9) given as follows: where, t is the iteration in progress, r 1,j and r 2,j are randomly the overall swarm. The inertia is updated after each itera-479 tion using (7). wMax and wMin represent upper and lower 480 boundary limit respectively. The inertia weight influences 481 the impact of prior velocity on finding the optimal features. 482 Hence, exploration is favored for large inertia weights, and 483 exploitation is favored for smaller values. Algorithm 1 repre-484 sents a MOPSO based feature reduction. Initialize the particles randomly with swarm size of n c = 50; while t ≤ T or gBestScore does not change for 20 iteration do for i to n c do Evaluate the swarm particle using the fitness function to obtain fit as in (3) gBestScore i ← gBestScore i ; update the velocity in each particle using (5) and update the mask by applying the new velocity to (6) update inertia weight w using (7) return gBestScore, gBest 4

486
Fitness function fit for each particle in the swarm is calculated 487 using (3). Applying the current-to-best strategy, if p i shows a 488 higher fit value than the corresponding p i , then p i in the P is 489 replaced with v i . Otherwise, the p i retains its position. This 490 comparison and replacement process is repeated for every 491 (p i , v i ) pair an evolved version of P is obtained at the end 492 of the iterations. This process evolves and accumulates better 493 particles until the maximum number of iteration i.e. 100 is 494 reached. After looping through all iterations every particle in 495 the P is replaced with the best possible candidate i.e having 496 highest fit value. gBest with best fit in the end p is selected as 497 the optimum feature subset with 1's representing the selected 498 features d out of d, where d ≤ d. 499

500
The process terminates if the maximum number of given iter-501 ation 100 is reached or fit becomes stagnant for a consecutive 502 20 iteration. For every new iteration, the values of gBestScore 503 and pBestScore are updated.

505
The classification is crucial for the proposed system archi-

521
SVM is a conventional machine learning method in 522 classification. First, the input data are transformed into a 523 high-dimensional feature space. In this space, the data points 524 are linearly separated by a hyper-plane. Because the data 525 points are not linearly separable in most cases, the data 526 points are mapped into a high-dimensional space using an 527 appropriate kernel, and then the optimization step is fulfilled. 528 Various kernel transformations are used to map the data into 529 high-dimensional space, including linear, sigmoid, polyno-530 mial, and radial basis functions. We experimented with linear, 531 polynomial, and Radial basis kernels, and the C was set as 532 100, the Gamma was set as 4, and the polynomial was selected 533 as the kernel-type parameter. This study used parameter opti-534 mization to find the optimum SVM parameters.

535
DET is a predictive model that can characterize both clas-536 sifiers and regression models. DET refers to a hierarchical 537 model of decisions and their results and is used to classify 538 a sample into a predefined set of classes based on their 539 feature values. DET consists of nodes that form a rooted tree 540 meaning. It is a directed tree with a node called a root with 541 no entering edges. All other nodes have only one entering 542 edge. A node with outgoing edges is referred to as a test node. 543 All other nodes are known as leaves or decision nodes. Each 544 leaf is allocated to one class, demonstrating the most accurate 545 target value. In addition, the leaf holds a probability vector 546 specifying the probability of the target feature with a definite 547 value. 548 VOLUME 10, 2022  Table.1. as mentioned in Section-II-D2. Fig. 6 shows the data split 574 strategies used for the disease-specific classification case.

576
To test the generalization of finding the optimum features 577 and their applicability we performed a test using all of the  SVDB. All beats are resampled at 360 Hz and each record 584 in all 2 datasets has been divided by their respective gain to 585 process the signal further in millivolts. The division of records 586 and beats into training and testing sets for an interpatient 587 classification analysis is detailed in Table.3.

588
Detailed comparisons were performed for both checking 589 the robustness of the reduced features and their efficiency and 590 speed of proposed algorithm to find an optimum solution. 591 The classification was performed for All features set (as 592 exact solution) and Optimized features subset obtained after 593 MOPSO optimization. Hence, all measures are reported for 594 both All features and Optimized features cases to present a 595 comparison between classification improvement and feature 596 reduction achieved using the proposed method. To perform 597 a comparison for classification accuracy using optimized 598 features on test data, 5 classifiers are used: MLP, KNN, DET, 599 SVM and RF. An introduction to the working principles of all 600 these classifiers has been presented before in Section-II-E. The optimum hyperparameter values of implemented classi-603 fier architectures for MLP, KNN, RF, SVM and DET imple-604 mented on the test data for both all and optimized number 605 of features were selected that performed best for all features 606 (exact solution) and the same model was tested with the test 607 data for reduced and all features. The optimized parame-608 ters for all classifiers are mentioned in Table.1. We ran the 609 MOPSO optimization for 10 simulation runs for each exper-610 iment in Python on a machine with 6 cores (AMD Ryzen 5 611    The indices of selected 40 feature subset are given in Table.4.   Fig. 7 shows timing analysis 637 done for classification of a single test sample. Mean and 638 standard deviation are reported over 10 trials. MLP shows 639 the highest amount of time required to classify a single test 640 sample but has the lowest error rate keeping in view the nat-641 ural imbalance of data samples for arrhythmia classification 642 VOLUME 10, 2022    subset. All classifiers show a significant decrease in comput-646 ing time when comparing optimized feature and all feature 647 case respectively. Fig. 8 (a and c) shows overall ROC curves 648   for a primary scan check as depicted in Fig. 9 in continuous 667 and long-term cardiac health monitoring applications using 668 single-lead ECG signal successfully proves to be a quick 669 and early referral system to send the patient to a general 670 physician/cardiac specialist or to emergency in case of stroke.

671
As summarized in Table.9, most of the previous studies S, V, F, and Q or a subset of these. The works focused on 675 achieving maximum accuracy. The problem in this particular 676 case using the accuracy as prediction metric is that normal 677 class has much greater number of samples than arrhythmic 678 samples. Then different types of arrhythmias ventricular, 679 supraventricular, atrial pathologies and their subtypes have 680 different frequency of occurrence some of them rare than 681 others. Accuracy in this case does not put higher importance 682 to the prediction quality of minority classes, which in our case 683 or in the case of disease analysis in general opposes the design 684 objective. Hence, in this work, we worked to achieve macro 685 F1 score which put equal weight to prediction of majority (i.e. 686 normal) and all minority (i.e. arrhythmia) classes. 687 VOLUME 10, 2022       Also, to make the proposed system to reproduce the ECG 746 signal to be used in a clinic/hospital setting, we intend to This work focused on reducing the dimension of features 757 to perform a quick scan on heartbeats segmented from 758 single-lead ECG signal for the purpose of abnormal cardiac 759 pathology recognition to be used as an early referral system. 760 The results obtained in all experiments confirmed that the 761 proposed MOPSO-MLP method efficiently delivers compet-762 itive recognition performance and precision with 84.189% 763 less time-series amplitude points. Furthermore, the developed 764 method provides early diagnosis for a wide range of heart 765 abnormalities making it an applicable arrhythmia decision 766 support system for wearable ECG devices.

768
The authors would like to thank the guidance and support of 769 Prof. Laura Burattini (Department of Information Engineer-770 ing, Polytechnic University of Marche, Italy) in conducting 771 this research.