Classification Techniques for Arrhythmia Patterns Using Convolutional Neural Networks and Internet of Things (IoT) Devices

The rise of Telemedicine has revolutionized how patients are being treated, leading to several advantages such as enhanced health analysis tools, accessible remote healthcare, basic diagnostic of health parameters, etc. The advent of the Internet of Things (IoT), Artificial Intelligence (AI) and their incorporation into Telemedicine extends the potential of health benefits of Telemedicine even further. Therefore, the synergy between AI, IoT, and Telemedicine creates diverse innovative scenarios for integrating cyber-physical systems into medical health to provide remote monitoring and interactive assistance to patients. Data from World Health Organization reports that 7.4 million people died because of Atrial Fibrillation (AF), recognizing the most common arrhythmia associated with human heart rate. Causes like unhealthy diet, smoking, poor resources to go to the doctor and based on research studies, about 12 and 17.9 million of people will be suffering the AF in the USA and Europe, in 2050 and 2060, respectively. The AF as a cardiovascular disease is becoming an important public health issue to tackle. By using a systematic approach, this paper reviews recent contributions related to the acquisition of heart beats, arrhythmia detection, IoT, and visualization. In particular, by analysing the most closely related papers on Convolutional Neural Network (CNN) and IoT devices in heart disease diagnostics, we present a summary of the main research gaps with suggested directions for future research.


I. INTRODUCTION
Recent advances in technology have enabled a synergy between wireless technology and healthcare providing accessibility benefits beyond the limitation of the traditional healthcare systems, such as enhanced quality support in medical services and agile response for medical intervention [1]- [3]. Specifically, the Internet of Things (IoT) is simplifying the way the parameters and variables are captured in realtime. One of those elements is a wearable health-monitoring system that allows remote diagnosis of patients during particular clinical events [4], [5]. By employing the IoT for healthcare, doctors and nurses could be more productive if The associate editor coordinating the review of this manuscript and approving it for publication was Shaohua Wan. the patients have access to data by phone or using online platforms [6], [7].
There are various developments in Telemedicine such as detection of glucose level and oxygen concentration; ECG interfaces to monitor the heart rates, etc. Leveraging the IoT in healthcare can have a significant impact on early diagnosis and intervention in some terminal illnesses, such as heart diseases, and decrease the mortality rate [8]. Eysenbach et al. [9] indicated that the telemedicine helped reduce the number of in-hospital admissions of the patients with diseases related to lungs, heart and stroke; hence a considerable decrease in mortality [3]. According to Ebrahimi et al. [10], cardiovascular disease (CVD) is the principal reason for human death, answerable for 31 % of the deaths in 2016, at least, about 17.9 million people died from heart disease in 2016, a 31 % of all global deaths. Reinforcing VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ the idea, the authors in [11] showed that 7.4 million people died due to heart attack in 2015, being the leading cause of heart disorder and that is related to a heart attack [12]. Arrhythmia (which is one type of CVD) can affect the internal process between different organs and it is also depends on human genetics, so it is important to have early diagnostics. Therefore, in this work, we focus on technological advances towards the classification of arrhythmia patterns. Arrhythmia disease can be divided into three categories: premature heartbeat, tachycardia, and bradycardia. Cai et al. [13] and Zhu et al. [14] have highlighted that Atrial Fibrillation (AF) is the most popular arrhythmia condition, affecting around 4.5 million in the European Union and 2.3 million patients in the US, and approximately a third of all strokes is associated with the AF. Therefore, in order to reduce the number of deaths, it is vital for new developments to focus on an early and accurate ways to detect AF and prevent stroke events. Consequently, Deep Learning (DL), a subset of Artificial Intelligence (AI) has emerged to extend the capabilities of IoT in the medical field with regards to resolving the urgent issues around heart diseases (specifically, arrhythmia) and healthcare.
This paper presents a review of relevant research in literature related to the advances of DL techniques and their application to different types of cardiac arrhythmia recognition, based on the methodology proposed in the Cochrane Handbook [15]. Existing reviews of arrythmia pattern recognition have focused on the Machine Learning (ML) models and their architecture using the MIT database [16]. However, this paper presents an expanded overview of how the heart diseases are being detected, with considerations of both hardware, software, and different databases that are available in literature.
The rest of this paper is organized as follows: Section II presents an overview of the essential concepts and background information. Section III presents details of the review methodology and Section IV discusses existing work on AI methods used in ECG and healthcare. Section V discusses the challenges and existing solutions with regards to the application of AI and arrhythmia pattern classification. In Section VI, we discuss the key literature and finally conclude the paper in Section VII.

II. CONCEPTS AND BACKGROUND
In order to understand the application of DL techniques to different types of cardiac arrhythmia, this section provides an overview of the essential concepts.

A. ELECTROCARDIOGRAM (ECG)
The ECG is a graph that represents a diagram of the heart palpitations ( Figure 1). According to [17], the ECG is the basic and reachable method to diagnose heart rhythm disorders, which is a non-invasive method for providing a useful method to understand heart health and pathology. The features of the ECG forms are represented in the following features [18]: P wave, PQ interval, QRS complex, ST stretch, T wave.

B. ARRHYTHMIAS
As Raja et al. [19] emphasise, an arrhythmia is an abnormal heart rhythm, expressed by a slow, rapid, or irregular heartbeat classified as life-threatening versus non-life-threatening. Murat et al. [20] mention the five principal non-lifethreatening classes: non-ectopic (N), supraventricular ectopic (S), ventricular ectopic (V), fusion (F), and unknown (Q). However, Yildirim et al. [17] explained the previous classes are a compendium of the following 17 classes: Ventricular flutter, Premature ventricular contraction, Idioventricular rhythm, Atrial premature beat, Normal sinus rhythm, Atrial flutter, AF, Supraventricular tachyarrhythmia, Right bundle branch block beat, Pre-excitation (WPW), Ventricular bigeminy, Ventricular trigeminy, Ventricular tachycardia, Fusion of ventricular and normal beat, Left bundle branch block beat, Pacemaker rhythm, Second-degree heart block. Nevertheless, the most common among the aforementioned are as follows [10]: • Atrial Fibrillation (AF): This disease occurs when there is a rapid movement in the atrium, about 400-600 beats/minute.
• Right Bundle Branch Block (RBBB) and Left Bundle Branch Block (LBBB) are classified as a disruption in the normal system that produces an abnormal QRS shape. The right bundle adheres to the Right Ventricle (RV), and the right bundle does not produce any activation. This behaviour in the electrical aspect creates an abnormal QRS morphology. In the LBBB, the left bundle does not activate.
• Tachycardia: this anomaly occurs when the heart rate overpasses the normal resting rate. The type of arrhythmia is supraventricular or atrial, sinus tachycardia, and ventricular tachycardia.
• Atrial Flutter (AFL) is a continuous aberrant heartbeat that begins in the area of the atrial chambers of the heart. This is usually correlated with a quick heart variation. 87388 VOLUME 10, 2022 • Ventricular Flutter (VF), it is an unpredictable arrhythmia that affects the ventricles with a pace of 150-300 pulses per minute. VF could cause sudden cardiac death.
• Ventricular fibrillation (Vfib) is a behaviour in which the heart quivers instead of pumping due to a non-organized electrical activity in the ventricles. It could be expressed in an irregular wave in the QRS Complex.

C. DEEP LEARNING
Deep Learning is a branch of AI and represents a learning method of processing raw data, images, graphs, etc, with the motivation to learn automatically from diverse patterns acquired from images or data. Xing et al. [21] mention the DL could be applied to image segmentation, object detection, target classification, and DL techniques that can successfully be applied in natural language processing, computer vision, medical imaging, etc [22]. For instance, Srivastava et al. [23] establish that in the medical field, DL has been applied to recognize diseases like diabetes, malaria, obesity, tuberculosis, brain image recognition, mitosis detection, as Merone et al. [18] added. One of the most common architectures of DL is Convolutional Neural Networks. The articles [24], [25] report, Convolution Neural Network is a powerful architecture of image recognition and analysis that has recent developments, attracting extensive attention, especially for pattern classification. As shown in Figure 2 CNN is divided into convolutional, Maxpooling, and fully connected layers. The CNN has the input acquired from the original image, and if the image includes a large amount of data or noises are involved, it is necessary to include a pre-processing stage. Then, using convolutional layers, the architecture obtains the most important features. Using Maxpooling layers, the architecture reduces the information, being more precisely accurate to filter the information and prepare data before getting the results. In the fully connected layer, the architecture uses Softmax layers to produce a respective classification based on the target or classes.

D. BODY AREA NETWORK (BAN)
BAN represents a network determined by devices that allow capturing a series of biomedical parameters like blood pressure, temperature, oxygen, glucose levels, body temperature, etc. as Wan et al. [26] defines.

E. THE IoT
IoT [27], [28] is considered as a networking group by intelligent sensor devices with limited storage and a cheap processing power capability [7], [29]. IoT, co-working with cloud computing, has considerable storage and enough processing capability for smart healthcare applications. Indeed, IoT has made remote monitoring of patients to provide better healthcare services [30]. Majumdar et al. [16] added, with the inclusion of ML, the number of developments could help IoT to create strategies to tackle daily concerns in the medical field. Therefore, the incentive to create enormous quality, cheap manufacture, and focused on the patient is under the responsibility of engineers, doctors, scientists, and healthcare staff.

III. REVIEW METHODOLOGY
The literature review is a compendium of guidelines or strategies to provide an answer based on the questions formulated by the researcher. For this systematic review, the methodology used is PRISMA [31], and article selection criteria composed of title, structured summary, objectives, acceptability criteria, databases sources, search, data acquisition, synthesis of results, summary of evidence, limitations, and conclusions. This methodology (Figure 3) aimed to provide clear steps in the field of DL, ECG, and IoT to processing, recognition, and classification of Heart Rhythms.
By using PRISMA, the main question of this research is: is it possible to use CNN and IoT to develop new devices that allow patients, nurses, and doctors to have a diagnosis about the cardiac condition using ECG and QRS diagram?. From the preliminary search, the motivations are: first of all, Rizwan et al. [32] commented that the number of patients in the hospital is increasing and as a result, the hospital is having a workload, and one of the solutions is to provide caring solutions while patients are at home; Secondly, ElSaadany et al. [33] have identified that one of the obstacles of healthcare is the poor survival rate out of the hospital-associated to sudden cardiac arrest. Implementing technology, real-time detection is one of the solutions, however, not all hospitals could analyse those emergency circumstances; Thirdly, Lin et al. [34] reports a problem where few specific treatments like obstructive sleep apnoea, require people to spend one or two nights in the hospital. This treatment requires a sleep technician who will always check the patient during the treatment; Fourthly, the article [35] mentions a lack of information from users has become one of the crucial aspects of the healthcare industry. The target is to implement a system that provides medical and healthcare information services to users using interactive models and technology; finally, a recommendation mentioned by Alam et al. [36] is a smart health care assistance that tracks one's activities, moods and suggest precautions or actions when required, as a health assistant, using Personal assistance chatbot to handle the inquiries from users and collectively provide the appropriate answers to it.
A set of inclusion and criteria were established to find the articles from IEEE Explore, EBSCO: Applied Computers Database, MEDLINE, Library Information Science and Technology, PubMed, ScienceDirect databases, and keywords like Smart Health Monitoring, Wearable Devices, Healthcare Systems, Wireless Systems, Cardiac Diseases, Telemetry, Healthcare issues, arrhythmia, IoT, Electronic Devices, Heart Rate, ECG, Systems, and Convolutional Neural Network (CNN) were used. A final amount of 2060 articles were identified, and applying the criteria, the number reduced to 1140. The papers were initially screened according to title and abstract, and although, there is a wide range of methodology to classify the arrhythmias, including frequency and statistical analysis, Markov models, and a mixture of expert algorithms [37], a list of 50 articles were selected working with CNN models. The others were not highly correlated to the topic and were excluded, as Kiranyaz et al. [37] concluded the superiority in the classification performance, comparing to others complex models; Andreotti et al. [38] mentioned the tuning stage of CNN facilitates training and accuracy in the networks; and Zhang et al. [39] concluded, extracting features automatically using CNN, improves efficiently the classifying process comparing to arrhythmia conventional detections models. The process used is displayed in Figure 4.
The following part inspects methodologies used by different authors, including the selected classes, parameters, comments and discussion session, keeping in mind that the process is divided by preprocessing, feature extraction, and classification, mentioned by Zhang et al. [39].

IV. USING THE AI METHODS IN ECG AND HEALTHCARE PROCESS A. MULTI-CLASSES ANALYSIS
Yildirim et al. [17] worked on a special approach presenting an advanced DL methodology for cardiac arrhythmia detection focused on long-duration electrocardiography analysis ( Figure 5). A convolutional network was implemented for the distribution signals according to classes. In the first layer, a one-dimension convolution was performed using a vector of 50 × 128. The activation outputs were normalized with a normalization function. In the 1D of the max-pooling layer, a new feature map was created extracting the maximum values given in the previous layer. The responsibility of the max-pooling was to reduce the size of feature maps, avoiding unnecessary information. Then, there was another convolutional layer working on the input feature maps with 32 × 7-size weights. Because of the implementation of the batch normalization, the feature maps were reduced using pooling methods on the next layer. Convolution and Pooling operations were performed in the next layers. The features from the flattened layer are passed to a dense layer of 512 units. Finally, the last layer of the network was the SoftMax layer. The SoftMax layer was responsible to classify according to the output classes.
Yildirim et al. [40], remarked a CNN model to recognize multiple classes on 12-lead signals ECG. The trials were performed on an ECG dataset, collected by Chapman University and Shaoxing People's Hospital. The approach of this paper was focused on a scheme where the training and testing stages used different patients. The authors decided to work with CNN because the models of DL had an exceptional ability to learn features from data inputs using convolution. Along the process, it was important to modify the correct parameters, such as the number of filters, kernel size, and strides. The proposed model was composed of six convolution layers and four max-pooling layers. Between intermediate steps, 87390 VOLUME 10, 2022 FIGURE 5. A DL model for cardiac arrhythmia detection [17].
there were two batch normalization layers to normalize the data; two dropout layers to avoid the over-fitting issue, and a Leaky-ReLU layer with a 0.1 alpha value to avoid the dying ReLU problem ( Figure 6).
Singh et al. [41] proposed a model to classify six types of arrhythmias with high accuracy and in real-time. The approach of the mode is to use a less computational process to predict the output. The system includes Arduino and AD8232 sensor to process and develop the structure of the CNN for data pre-processing, and ECG arrhythmia classification. The data were taken from the MIT-BIH arrhythmia and the chose model is a CNN model to train and test the input file. The 2D-CNN requires an input image. Due to this reason, ECG or EKG signals are adapted to EKG images and feature extraction and noise are not required in this phase. This is essential because feature extraction and noise filtration might delete important data of ECG Beats. The proposed model included multiple layers to classify the ECG arrhythmias. First, the input is given to the Convolution layer which filters only important features. Secondly, the output of the Convolution layer is given to the ELU, and from this stage to the B-Norm layer. This combination is repeated twice, and the final output is given to the Max-pooling layer. From the Convolution layer to the Max-pooling layer, there are six repetitions to reduced features and have accurate data. After the Max-pooling layer, the output goes through the Dense, ELU, B-Norm, and Dropout layer. Finally, the Softmax layer is used to get the probabilistic values depending on the input with degrees of belongingness to other classes. At last, the classification is ready with numeric values (Figure 7).  The innovative paper presented by Ihsanto et al. [42] mentions the stages for ECG classification are commonly divided into four: QRS detection, preprocessing, feature extraction, and classification ( Figure 8). Nonetheless, for this paper, those were reduced to two steps only, i.e.. QRS detection and classification, called beat segmentation, and classification, VOLUME 10, 2022 respectively. Additionally, for the configuration of the hyperparameters such as filter size, padding type, activation type, pooling backpropagation, they were configured following techniques that can reduce the amount of trial and error attempts to achieve the best results. One of these is the Depthwise Separable Convolution (DSC), which reduces arithmetic operations, and redundancy in the number of parameters. In the first phase called detection, a gradient analysis is used, where the R-peak of the graphs are extracted, obtained from the 48 records of the MIT BIH databases, and the segmentation interval is based on the amounts of R-peak found. Subsequently, a CNN model is proposed that contains 21 layers, including DSC, which the training time was faster for depthwise separable CNN. The input layer size is 256 and represents the raw ECG beat waveform, while the output layer size is 16 representing the number of classes as described in the MIT-BIH database. A group of layers is repeated several times, where the convolution process is included. Also, layers 5 to 7 is configured according to the DSC algorithm, and layer 2, 7, 12, and 17 are used to replace the pooling layer.
In the following model presented by Savalia et al. [43], two datasets were used to distinguish normal sinus rhythm . CNN architecture to classify heart diseases using MIT-BIH and NSR databases [43].
database (NSR-DB), and MIT / BIH arrhythmia database) were used to distinguish between normal and abnormal ECG signals. It was downloaded from the site, kaggle.com, and the models are implemented in the TensorFlow library. The proposed model based on CNN includes pooling layers, fully connected layers, normalization layers, and softmax layers. ReLU activation was used in each of the convolution layers. For the max-pooling size, it was determined as 2 × 2, since it works better than 3 × 3, and basically, the process begins reading the datasets and then defines the characteristics and labels. Next, the data is divided into training and test sets to build the model. The loss function is calculated with the goal of reducing the cost function with gradient descent, a type of correction of weights, and finally, the predictions are used to determine what type of disease it is. The previous method is shown in the figure (Figure 9).

B. FIVE-CLASSES ANALYSIS
Murat et al. [20] presented a method based on an analysis of five different heart disease classes. Using the database of MIT-BIH arrhythmia, they used 100,022 beats and N, S, V, F, and Q main classes. The architecture of the CNN was by convolution layers, pooling layers, flatten layers, a dense layer which included the five types of heart diseases. Evaluating the performance of the CNN, there were variations of the size layers. For instance, the numbers were set from 32 to 256, and kernel sizes from 5 to 3. Besides, to decrease the computational processing, and improve the model, the authors added an LSTM stage. These techniques have been developed, and the training of the NN-1 network could be completed using 2000 epochs. The main objective of CNN-LSTM networks was to design capable models of input data, combining representative and sequence learning. The paper [44] showed a model of CNN and LSTM networks. The database used came from the MIT-BIH arrhythmia dataset that heartbeats are grouped into five classes: Non-ectopic (N), supraventricular ectopic (S), ventricular ectopic (V), fusion (F), and unclassified beats (Q). The architecture of the model was composed of convolution layer, pooling layer, concatenated layer, LSTM layer, and fully-connected layer. In layer 0, there were three inputs, and in layer 1 those inputs were convolved with 32 kernels of size 13 × 1, and then a leaky rectified linear unit (Relu) is taken as an activation function. Subsequently, in layer 2, maxpooling of size 2 was employed to form output shapes of 19 × 32, 48 × 32, and 28 × 32. Then, a concatenation layer was applied and as a result, the output for layer 3 was 95 × 32. On the other hand, the LSTM network had 32 hidden units. In layer 5, the output corresponds to the result of the LSTM network when was flattened to 3040 × 1. Then, there were fully-connected layers where the last fully-connected layer (layer 9) had 15 units. During the process, dropout was used in layers 4, 6, and 7, and this parameter was equal to 0.5.
Wang et al. [45] proposed a methodology based on the MITDB database where is focused on analyse are normal beats (N), supraventricular ectopic beats (S), ventricular ectopic beats (V), fusion beats (F), and unknown beats (Q) (Figure 10). In the first step, a preprocessing stage implemented by wavelet transform, notch filter, derivative filters, and median corrected filters is used to reduce the baseline wandering, muscle noise, 60 Hz powerline interference. Then, the structure includes different kind of classification for each type of signals saved in training sets as the below image shows: Zhang et al. [24] present a 1-D CNN as a method to classify ECG signals. The CNN model is composed of five layers, including input and output layers. Those layers are distributed by convolution layers, downsampling layers, and one full connection layer. The method gets the initial data and classifies the features automatically in normal, left bundle branch block, right bundle branch block, atrial premature contraction, and ventricular premature contraction. First, the data should be filtered using the wavelet and wavelet FIGURE 11. An automatic method to detect cardiac diseases using DL [46]. decomposition algorithm to reduce noises. CNN is mainly composed of two parts: feature extraction and classification. The convolutional structure is composed of convolution layers, sampling layer for pooling feature vectors from the previous convolutional layer. As a result, its output is 324 feature vectors with 56 sampling points.
Takalo-Martila et al. [46] present an automatic method implementing deep convolutional neural networks (CNN). The model focused on inter-patient arrhythmia classification, where the data used is different between the training and test phase. The dataset used was the MIT-BIH arrhythmia Database and the classes to identify are normal (N), Supraventricular Ectopic beat (SVEB), Ventricular Ectopic beat (VEB), Fusion beat (F), and Unknown beat (Q). This work ( Figure 11) has been implemented using a one-dimensional convolutional neural network to learn the meaningful features of the ECG signal. The model consists of three convolutional and two fully connected layers, including a preprocessing stage, in the first phase. The pre-processing is performed by adding filters and then normalization of the signal with the purpose to eliminate powerline, electromyogram noise. Then, these heartbeats are divided into two different datasets: Patient dataset DS1 for validation and patient dataset DS2 for testing. The model of CNN uses 16 neurons in the first layer, 32 neurons in the second, and 64 in the third layer, respectively. To prevent overfitting, the max-pooling layer and dropout-layers should be used. The activation function in this feature extraction process is a rectified linear unit (ReLU), and the last layer has five neurons.
The authors of [47] have taken the information from the MIT-BIH arrhythmia database to develop a method to detect the five types of heartbeats: Non-ectopic, Supra Ventricular Ectopic, Ventricular Ectopic, Fusion, and Unknown for the others. The ECG heartbeat classifier consisted of two main steps: pre-processing and classification ( Figure 12). For the pre-processing stage, the signal from MIT-BIH includes noise from myoelectric interference, power line interference, and baseline drift. For this reason, the wavelet filter should be implemented to denoise the signal. The next step is the classification stage and is composed of nine layers: four convolutional layers, two subsampling layers, two fully connected layers, and one Softmax layer. In the convolutional layer, the principal features were captured. Then, subsampling layers were essential to reduce the size of layers, and compress the dimension of the ECG data, reduce the time, and extract vital features. In the output layer, the softmax activation function was used to obtain five categories of heartbeats. The overfitting was avoided using Linear Unit as an activation function and dropout between layers.
Kiranyaz et al. [48] show an innovative method called degradation. The degradation models are simulated get by the median normal beat of the patient and were used to train the CNN. As a result, the degradation signals were called abnormal beat synthesis (ABS). The design of those filters consisted of modelling the most principal causes of arrhythmias such as high blood pressure, clotting, smoking, diabetes, drugs, etc. For this paper, the beats selected were: N (normal), S (supraventricular ectopic), V (ventricular ectopic), F (fusion), and Q (unclassifiable). The structure of the 1D CNN had only four convolutional layers and two fully connected layers. The output layer size was 5 which corresponds to the number of beat classes.

FIGURE 13.
A new scheme of CNN using AIoT, Cloud system, and hardware/software [12].

C. ANALYSING LESS THAN FIVE CLASSES
As shown by Lin et al. [12], a new scheme for ECG analysis and cardiac disease detection is outlined, using AI of things (IoT), hardware, user interface, a cloud system, and an AI platform. The architecture, presented in (Figure 13), was implemented with analogue circuits and commercial Bluetooth modules. The DL architecture was used to detect the user's ECG signal into different cardiac arrhythmias. The pre-processing structure included three steps: noise removal, baseline removal, and image generation, and after this process, these images trained the CNN model. The CNN model was built with four convolutional layers and three fully connected layers, and each convolutional layer was followed by a leaky rectified linear unit (leaky ReLU) as an active function. The model used a max-pooling to extract more features, and then, the three fully connected layers made the number of output neurons from 100 to 10 and then shrink these 10 neurons to 4 categories as output ( Figure 13).
Similarly, Isin et al. [49] developed a diagnostic system for cardiologists. The proposed system distinguishes and classifies the following cardiac arrhythmias: Right Bundle Branch Blocks (RBBB) from Paced Beats and Normal (Healthy) Beats. The data is taken from the MIT-BIH Arrhythmia Database. The method started with signal Pre-processing, removing noises from ECG recordings, and remove the DC noise from the ECG signals implemented high and low pass filters. Then, the following step is QRS detection. The foundation of the detection of R-peaks is the Pan-Tompkins algorithm. The algorithm used derivation, squaring, integration procedures for the detection of R-peaks of the ECG signal. After that, the extraction architecture was based on AlexNet CNN that contains a total of eight layers, five convolutional and three fully connected layers, which are trained on the generic images of the ImageNet.
Mahajan et al. [50] proposed a 1-D, a 12-layer CNN for recognizing the raw ECG rhythms. The target of the method consists of detecting four features in the output layer and 128 features in the fully connected layer of CNN. As a requirement, there is a wide range of parameters, such as feature maps, number of hidden layers, kernel size, stride, and regularization coefficient to play around with. The proposed CNN architecture consists of using 12 convolution layers with a filter size of 1 × 5 and each layer followed by the batch normalization layer, activation layer, and dropout layers. The activation function used is the ReLU activation function and to increase the generalizability of the model, the dropout layers and L2 regularization is used. In the convolution layer, the max-pooling layers were applied to control overfitting. Finally, there are three fully connected layers to detect four classes of cardiac rhythms: Normal, AF, Other, Noise ( Figure 14).
Borde et al. [51] developed a QRS detection system based on CNN methods ( Figure 15). For this research, although three assumptions were mentioned, only the CNN hypotheses were analysed. This method consisted of classifying the input signals into four classes: P-wave, QRS-wave, T-wave, or neutral. The signal segmentation process started with a preprocessing stage consisted in get the differential of the signals. After this process, the features extraction was extracted by two convolutional branches, and then, in the middle of the process, the signals from the two branches were concatenated and passed to fully connected classification layers. The last layer has only two neurons because the goal of this method is to annotate signal in 1 for QRS and 0 for non QRS segment.
A 6-layer deep CNN is presented by Fujita et al. [52] for automatic ECG pattern classification in classes such as Normal, AF, Atrial Flutter, and Ventricular Fibrillation. The database used was the MIT-BIH AF database (afdb). Before the classification of the CNN, there was a pre-processing stage based on continuous wavelet transformation, and the signal was divided into wavelets. The next step was convolutions layers, two max-pooling layers, and two fully connected layers. The target of the convolutional layer is to get meaningful features from the input. Then, the dimensionality was reduced by implementing max-pooling layers, whereas the significant features kept the same for the next operations. Finally, the fully connected layer is used to join neurons from the last max-pooling layer and converting the previous layers into a four-class (Nr, Af ib, Af l or Vf ib) probability distribution. Leaky rectifier linear unit (LeakyRelu) was implemented as an activation function for the convolution layers, and the dropout layer optimized the output from the fully connected layer.
Zhou et al. [53] used the MIT-BIH database labelled N, SVEB, VEB, F, Q following the ANSI / AAMI categorization. Due to the inclusion of noises from the ECG, it is necessary to include a preprocessing stage consisted of three parts: denoising the ECG signal, QRS detection, and segmentation of the signal. Broadly speaking, normalization is performed to eliminate the DC level and variations in amplitude, filtering the QRS complex using the Pan-Tompkins algorithm, taking 100 points to the right and 150 to the left. In this model, a concept called Extreme Learning Machine is implemented, generating random numbers of the weight between the input layer and the hidden layer, and setting there is a reduction in the number of hidden layer neurons, decreasing the number of iterations. Following the conventional CNN structure, convolutional layers are used to build the feature map, including the activation function ReLu. After the convolutional layer, a pooling layer is used to reduce dimensions, reducing data, and overfitting, which has four feature maps in the first pooling layer, 8 feature maps in the second. Finally, two ELM layers are used, to randomly generate input weights and hidden layer biases, and the last layer stores the probabilities of four classes ( Figure 16).

V. EXISTING APPROACHES: CHALLENGES & POSSIBLE FUTURE SCOPE A. CHALLENGES FOR THE WEARABLE PHYSIOLOGICAL SENSOR-DRIVEN IoT AND INTELLIGENT HEALTHCARE SYSTEMS
The development of technology tools faces challenges in aspects such as design, implementation, social and economic. This section consists of an overview of the most common problems in wearable physiological sensor-driven IOT and intelligent healthcare systems. Wan et al. [26] presented a framework that includes several wearable sensors to analyse the health conditions of patients. The healthcare signals are blood pressure, heartbeat, and body temperature. Wan, emphasizes, although the device is functional, it is not comfortable for people to attach sensors to some parts of the body. In addition, battery life is one of the problems with smart phones, Bluetooth, and WiFi [54]- [57]. As Raja et al. [19] mentions, most wearable devices could be affected by having battery and noise limitations.
Baig et al. [58], explained the number of sensors used in wearable devices is representative and requires a specific place on the body or body postures to provide accurate measurements. One of the technical barriers is the interference of feature extraction due to the motion of the sensors. The authors recommended using linear filtering, detecting the R-wave peak timings from the ECG. However, most of these noises are complex to filter over hardware because of processing limitations, and as a solution, these noises could be filtered using software resources. On the other hand, although the study achieved a good outcome in a small setting, yet inconsistency is apparent related to the impedance value. The measures of the sensor can change according to factors such as skin conductivity variation, creating a challenge in the analysis of the signals. Other common issues with wearable systems are the delay associated with data loss, buffering, network communication, monitoring, or processing, and the low battery life because of the continuous connection with the Bluetooth, WiFi, or 3G/4G networks.
Possible future solutions issues with battery life and connectivity will be to implement wearable devices with high-performance and low-power embedded processors and low power-long range connectivitiy such as Long Range Wide Area Networks (LoRaWAN) [59]- [63].

B. CHOOSING SIGNALS
The extensive variation of the heart disease classes is diverse, and sometimes the studies include five classes, and others include seventeen. According to the MIT database of arrhythmias, there are fifteen classes of heart disease, but, this classification also can be grouped into five groups as the Association for the Advancement of Medical Instrumentation (AAMI) dictated and described in ANSI / AAMI EC57: 1998 / (R) 2008 (ANSI / AAMI, 2008). For instance, in [24], [46]- [48], [64], the authors used models following the five classes recommended by the AAMI. These five classes are Normal (N), Supraventricular ectopic beat (S, SVEB), Ventricular ectopic beat (V, VEB), Fusion beat (F), and Unknown beat (Q). This model is also used by Acharya et al. [65] and Li et al. [64]. The study [20], also implemented the detection system of arrhythmias calling the non-life-threatening parameters the five classes previously mentioned and can be grouped as life-threatening versus non-life-threatening.
Nonetheless, the studies [14], [17], [42], [44] also expose that the fifteen classes of detection were used to implement the models. In addition, there is a slight preference to choose a different value of classes and is not a standard process to follow a specific class to analyse. This is clearly presented in [12], [24], [40], [49], [50] where it is explained that the use of complex recognition structures imply a high computational performance whereas reduced models in DL contribute to the efficiency of the model and hardware. This last study, used different quantities of classes, starting at 11 classes and ending with 4.

C. LOCALIZATION
The acquisition and classification of different signals from the heart are essential for doctors and nurses in the disease recognition process. For instance, studies [26], [33] the authors proposed architecture to sense parameters such as heartbeat, temperature, and blood pressure. They got the heartbeat signal using a cardiac sensor based on the technique plethysmography. In contrast, the technique used by Yang et al. [66], consisted to implement three dry electrodes and obtain the signals coming from the area nearby the heart whereas in [18], the technique involved additional locations such as chest, heart, fingers, and extremities like legs and arms. Singh et al. [41] presented a technique using three electrodes along with the heart areas, forming a triangle between the extremities and the stomach, following Einthoven's Triangle; and Chuang et al. [12], implemented a two-electrodes technique taken the signals from the heart. However, there is also an important number of research that the signals came from databases. The recent studies [24], [46], [47], [51], [67]- [69] authors used only databases as an input to capture ECG signals. Murat et al. [20], used about 100.200 ECG beats to evaluate the different target classes, whereas the articles [17], [40], adopted 10000 and 100, respectively, from ECG fragments to perform the analysis and classification. Most of the previous articles used the database from MIT which has contributed to the analysis, training, and visualization of heart signals. For instance, Shi et al. [44], [49] proposed a CNN model supported on the MIT database of 48 ECG recordings of 30 minutes. The previous beats were obtained by expert cardiologists, by Holter machines a few years ago. In the article [45], the database is also used, mentioning the type of machines taken corresponds to 12 channels of standard ECG. In the article, [65], a private database of 8,258 ECG recordings was adopted, through a single-channel ECG Device called Alive Cor. Baloglu et al. [67], used an open-access ECG database that includes fifty-two records of normal patients and 148 records of patients with heart problems.

D. FILTERING, TRAINING TIME AND PERCENTAGE
The studies evidenced, the filtering process is another essential step once there are unstable signals in the process associated with electromyographic noise, baseline drift, and power line interference. As the highlighted methodology from the papers justified, this filtering process could be divided into two groups. The first consists of using hardware such as integrated circuits, operational amplifiers, etc. The second one, the filtering process can be achieved using software through image processing and mathematical expressions. Both can be applied depending on the environment.
In the software group, the tendency dictates to use specific methodologies to filter the signals using a mathematical expression. For instance, see [18], [58], the authors implemented a pre-processing stage using low-pass filters, Butterworth bandpass, adaptive filter, bandpass finite impulse response filter to eliminate noises around ECG signals. Isin et al. [49], eliminated the DC noise by subtracting the mean from the ECG recording, putting the signal at zero, in the x-axis, and he argued that frequencies related to breathing and movements are classified as high frequency, and it can be eliminated by implementing high-pass filters should be applied.
Ghiasi et al. [68] explained in the research that baseline noise and low frequencies were eliminated using high and low pass filters. Then, to recognize the P, QRS, and T of the ECG waves, the authors followed the Tompkins algorithm to detect the R-peaks. In the same line, in the study [33], the authors developed an IoT platform and the processing stage is based on a digital filtering stage. They performed an elimination of the Baseline Wander and High-Frequency noises by applying a high-pass filter and then apply a low-pass filter represented by mathematical equations. Similarly, in [12], the development of an IoT application was implemented using the filtering part on a platform. This stage was composed of an 8-point moving average filter for noise and polynomial fitting for baseline removal. In the articles [17], [24], [45], the pre-processing stage for the reduction of noise is mainly composed of the discrete wavelet transform, wavelet transforms, notch filter, derivate filters, and median corrected filter are being used, and additionally, in [70], the author mentioned wavelet transform is an affective time-frequency analysis tool that achieved good results in baseline wandering elimination, QRS complex analysis, and the authors in [50], used a signal decomposition in wavelet, to reduce noise and improve the signal to noise ratio. The signal is decomposed in wavelet and later it is built based on the variance, mean, average of the signal.
On the other hand, the studies [41], [66] developed a hardware platform using internal filters from microcontroller units that allow the signal processing and improve the signals that come from the electrodes.
As Jan et al. [26] explained, one of the prerequisites for wearable devices is to achieve the demand for real-time health and accuracy. This demand is related to the algorithm's performance. To facilitate the state-of-the-art research, Tables 1 and 2 summarises the results of different CNN algorithms and accuracy from the research papers considered in this review. It can be observed from Tables 1 that the MIT-BIH standard arrhythmia database [71] is the most used dataset from the research papers considered in this review.

VI. DISCUSSION: COMPARATIVE ANALYSIS AND GAPS
Previous works and articles present important and essential studies for the development of AI since they validated models and present a range of acceptable percentages for it. However, it is important to review some aspects of management and improvement at a general level. According to the literature reviewed, there are some discussions related to the results of the chosen articles.
First of all, biomedical devices that offer 24/7 operation are still not comfortable for patients, especially if patients need to go to the bathroom or have any movement that they require. Besides, some wearable devices offer limited computation and storage capacity, and the duration of the battery remains one another of the concerns in this material. BAN development requires regular maintenance and robustness to avoid cyber attacks. Wan et al. [26] mentioned that, in realtime services, there is an important consideration that the application should be personalized for patients and families, doctors, and nurses due each participant require only to read specific information from the patient.
Regarding electrodes, Merone et al. [18], shown the design of dry electrodes is comparable with other electrodes that are implemented with gel. These allowed quick access to the signals and no prior preparation is required and could follow possible scenarios that a patient may have: In-person, On Person, or Off the person. The last one offers a better condition to work because there is no need to place a sensor on the body, and there should not be prepared. On the other  hand, regarding image processing, the recommendation from the authors is to normalize the axis values concerning the highest value. For example, the fewer the contact points of the electrodes, the acquisition is much easier and more practical.
Hong et al. [76], focused on the acquisition and processing of information, and after reviewing the paper, there is not a standard to classify the signals. There are different leads, durations, frequencies and classes in today's databases. The classes found in all the papers are presented in a great variety and are distributed diversely. This could generate ambiguities in some classes because although few of them are not common, there is a probability patients could suffer. In some reviewed papers, the classification consists of four types of diseases as unknown. It is important to include VOLUME 10, 2022 them in a formal study, and as Kiranyaz et al. [37] mentioned, following the classes of the Advancement of Medical Instrumentation (AAMI) to improve the quality of the results and contributes to the best practices in this field. Another concern is getting this information is almost impossible, since being 2020, there are still the same databases from 40 years ago. The models presented in the selected papers differ in many ways, and there could be more complicated models with, and others that are associated with improvement parameters such as dropout, learning rate, etc. This also makes it difficult for us to understand and implement. Hong et al. [76], suggested researchers must work complex models from simple schemes and dividing the tasks corresponding to each of the layers. In this way, there can be a better interconnection and interpretation between the reader and the writer.
Jumping in the computational cost, although the CNN structure is preferred, these models have their tuning parameters that demand computational cost. Hybrid models can be a solution to have more accurate results, but the computational cost is enormous since the time in which the algorithm is executed takes time to process. Ebrahimi et al. [10] emphasize the processing complexity of DL depends on the number required of the floating-point. This reference depends on parameters like hardware, compiler optimization, and APIs used like TensorFlow, anaconda3, PyTorch. In addition, there is a fact that the presented models are complex. This means, if researchers want to implement those models in a mobile device for healthcare, is a great obstacle since it requires great resources for the understanding and the classification of the signals. As a recommendation, simpler models must be handled, and few articles express that model should be modelled and the CNN model in a real setting. Murat [20] suggests that researchers should find more scalable and integrable methods with mobile and cloud systems applied in clinical standards. This would imply the importance of working with wearable devices with power consumption. For instance, Kiranyaz et al. [37], proposed a simpler and cheaper model that only implemented 1-D convolutions, having an individual training per each patient.
Regarding few concerns about the structure of papers, ElSaadany et al. [33] presented a basic device scheme for detecting heart rhythm and predicting cardiac arrest is presented. Although there are graphs and data associated with the experiment done, there is no indication that evidence readers the system works correctly because extreme situations are not included in the test of this algorithm. The data presented in the document does not allow for validation if the situations were taken using a computer or from the mobile system, or any other device. Another example found in the work presented by Wan et al. [26] where the structure of the paper is well done, but the captured results only present a healthy patient. Also, faced with this scenario, we cannot have any percentage of algorithm accuracy. Expanding the information further, it would be important to take into account some percentage to handle and involve other scenarios to test how the algorithm behaves.
The authors also presented few recommendations at the end of their research. For instance, Lin et al. [12] recommend future developers improve the presented model in a simple model. This model should be tested on an electronic chip and allows the detection of other types of arrhythmia, and Yildirim et al. [17] emphasize that a short-time processing model should be used for potential applications in telemedicine and mobile devices or cloud computing for realtime. Here, the authors give guidelines of what a clinical case scenario could look like, where the patient could be monitored in a specific time frame, so, then the patient's ECGs will be processed in the cloud. This can lead to a potential reduction in real-time processing. Another suggestion came from Zhang et al. [24] that a CNN and LSTM structure is proposed to extract implicit features and sequential that can be used in the medical decision support system to assist doctors to diagnose arrhythmias using wearable devices for automatic diagnosis. Finally, Fujita et al. [52] proposed a proposed model where an R-peak detection is not required and could be implemented in health-care industries as an added tool to assist physicians in providing a decision support system on the diagnosis. If necessary, it can also be used at home for patients of a wide age range or relatives with heart problems.
Further research and innovations for arrhythmia detection using IoT, AI and Visualization are required to acquire, detect, and classify different ECG heart rhythms taken from patients. Furthermore, future research directions could use forecasting techniques to predict potential heart diseases.

VII. CONCLUSION
In this paper, we have discussed the main methodologies that relate the study and classification of heart diseases to CNN models. The state-of-the-art research and industrial applications have been provided along with the findings from the literature, through which the key challenges and recommendations have been outlined. It has been shown that there is a very low percentage of electronic devices that allow the detection and classification of the different classes of heart diseases. Understanding that most of the studies involve the learning and testing stages without including any devices other than the computers, and bearing in mind the recommendations about the place where the electrodes should be located, the embedded system to be used and its challenges; and the CNN model implemented, gives us the motivation for our future work which is to implement an IoT-CNN enabled wearable device for early detection of arrhythmia patterns.
ANDRES FELIPE GUERRERO was born in Bogota, Colombia. He received the Bachelor of Science (B.Sc.) degree in electronic and electrical engineering from Los Andes University, in 2015. In 2015, he joined a Research Group with Los Andes University, where he worked on developing an electronic system to generate electricity using bacteria as an aid to a vulnerable community in Colombia. In 2016, he joined Electronic Corporation, Bogota, where he principally worked on design PCB layouts using CAD tools, programming microcontrollers, and analyzing hardware to design circuits. His research interests include analog and digital signals, embedded systems, and sensors. He is progressing his master's studies focused on developing techniques to classify heart diseases using artificial intelligence and embedded systems. He has collaborated actively with researchers in several other disciplines of embedded sensor applied for environmental and military purposes. After his studies, he started working in a software company as a Support Engineer, where he was involved with the IoT devices and software support. Simultaneously, he started working in a start-up as a Software Engineer, working with Garmin devices, sensors, and embedded systems. He is currently working as an IT Engineer, collaborating with the Technical and Development Team, providing data tools to analyze behaviors and status of the IoT devices, applying machine learning, and data analysis. This work was presented in the 1st International Conference on Bioresource Technology for Bioenergy and Environmental Sustainability, Sitges, Spain.
QUOC-TUAN VIEN (Senior Member, IEEE) received the Ph.D. degree in telecommunications from Glasgow Caledonian University, U.K., in 2012. He is currently a Senior Lecturer with the Faculty of Science and Technology, Middlesex University, U.K. He has authored a textbook, coauthored four books, five book chapters, and more than 90 research papers in ISI journals and major conference proceedings. His current research interests include physical-layer security, network coding, non-orthogonal multiple access, RF energy harvesting, deviceto-device communications, heterogeneous networks, network-on-chip, and the Internet of Things. He was a recipient of the Best Paper Award from IEEE/IFIP 14th International Conference on Embedded and Ubiquitous Computing in 2016. He also serves as a Program Co-Chair for the INISCOM (2018-2022) and a Technical Symposium Co-Chair for the SigTelCom (2017-2021). He also serves as an Editor for the Wireless Communications and Mobile Computing and the International Journal of Digital Multimedia Broadcasting, and a Guest Editor of the EAI Endorsed Transactions on Industrial Networks and Intelligent Systems and the Mobile Networks and Applications. He is also a frequent reviewer of the IEEE journals and a TPC member of the IEEE conferences. He was honored as an Exemplary Reviewer of the IEEE COMMUNICATIONS LETTERS, in 2017. VOLUME 10, 2022