ANNet: A Lightweight Neural Network for ECG Anomaly Detection in IoT Edge Sensors

—In this paper, we propose a lightweight neural network for real-time electrocardiogram (ECG) anomaly detection and system level power reduction of wearable Internet of Things (IoT) Edge sensors. The proposed network utilizes a novel hybrid architecture consisting of Long Short Term Memory (LSTM) cells and Multi-Layer Perceptrons (MLP). The LSTM block takes a sequence of coefficients representing the morphology of ECG beats while the MLP input layer is fed with features derived from instantaneous heart rate. Simultaneous training of the blocks pushes the overall network to learn distinct features complementing each other for making decisions. The network was evaluated in terms of accuracy, computational complexity, and power consumption using data from the MIT-BIH arrhythmia database. To address the class imbalance in the dataset, we augmented the dataset using SMOTE algorithm for network training. The network achieved an average classification accuracy of 97% across several records in the database. Further, the network was mapped to a fixed point model, retrained in a bit accurate fixed-point environment to compensate for the quantization error, and ported to an ARM Cortex M4 based embedded platform. In laboratory testing, the overall system was successfully demonstrated, and a significant saving of ≃ 50% power was achieved by gating the wireless transmission using the classifier. Wireless transmission was enabled only to transmit the beats deemed anomalous by the classifier. The proposed technique compares favourably with current methods in terms of computational complexity and has the advantage of stand-alone operation in the edge node, without the need for always-on wireless connectivity making it ideal for IoT wearable devices


I. INTRODUCTION
C ARDIOVASCULAR diseases (CVD) such as coronary heart disease (CHD), stroke, and other circulatory diseases account for roughly 30% of all global deaths in any given year.In addition, CVD is a leading cause of premature deaths and the primary driver of morbidity among all noncommunicable diseases (NCD) [1].It is estimated that CVDs cost the European Union, approximately C169 billion annually, of which 62% is direct costs in the healthcare system and the remainder is productivity loss and informal care [2].
Continuous monitoring of physiological signals such as ECG using IoT enabled wearable devices is widely considered a solution to mitigate the costs and healthcare risks associated with CVDs [3].Although the concept itself is not new, continuous monitoring of medical-grade electrocardiogram (ECG) is not yet a practical reality due to the large power consumption associated with constant wireless transmission.Analysing the data for detecting potential anomalies at the IoT sensor itself, can reduce the need for constant wireless transmission and thus reduce sensor power consumption.
There are several methods [4] reported in literature, for automatic multi-class ECG classification and simple anomaly detection using signal processing [5] and machine learning techniques [6], [7], [8], [9], [10].However, many existing works often exhibit several flaws that make them non-ideal for real-world implementations.Several works attempt to classify ECG beats into multiple classes i.e. (Normal:N, Ventricular:V, Supra-Ventricular:S, Fusion:F, and Unclassified:Q).Attempting multi-class classification on edge sensors may not be ideal and mostly redundant, due to the computational complexity involved.In addition, from the perspective of the user of such a device, multi-class classification brings limited added value compared to simple anomaly detection.Some ( [6], [11]) have implemented binary beat classification (Normal vs Abnormal), however, the computational complexity involved is still high and may result in large power consumption.Association for the Advancement of Medical Instrumentation (AAMI) [12]) has established protocols to test medical instruments and yet several existing works ( [13], [14]) have not taken these into account.Typical sampling rate of ECG in real-world sensors is around 250Hz, while all the existing works use MIT-BIH records in its original, but rather unusual sampling rate of 360Hz for performance evaluation and this results in unrealistic, but higher performance.MIT-BIH records are highly imbalanced in terms of class distribution, and these are not carefully handled by much existing work ( [15], [16]).In real world scenarios, this will introduce sampling bias and overfitting issues etc to these reported works.Many works have failed to produce a fixed-point model, which is the common and cost-effective environment in most IoT devices.The conversion of the floating point algorithms is subject to quantization errors and performance degradation, which isn't addressed well in existing works.All the above identified research gaps along with the issue of deteriorating performance of the existing approaches under unseen realworld conditions are addressed in our research.
This work aims to address the aforementioned problems by developing a low complexity machine learning algorithm for binary classification of the ECG signal that can be implemented locally on an IoT sensor.Only when an ECG beat is deemed anomalous by the classifier, wireless transmission will be enabled and thus sensor power consumption can be reduced.In addition, the issue of class imbalance in the MIT-BIH Arrhythmia database is addressed by augmenting the training data using Synthetic Minority Oversampling TEchnique (SMOTE) technique.This reduces disparities in real world performance of the proposed technique compared to the test data.
The proposed novel architecture embodies a Long Short Term Memory (LSTM) based recurrent block to identify the regularity of a typical time-series like data and simple Multi-Layer Perceptron (MLP) based block which learn the underlying relationship between the extracted features such as activation maps of Principal Component Analysis (PCA) coefficients of sequence of beats and ventricular rates.Our novel approach towards simultaneous training of all blocks will push the overall architecture to learn different properties of the sequence and complement each other while making a decision comparable to ensemble learning approach [11].
For this work, we also implemented floating and fixedpoint versions of various machine learning building blocks and introduced fast approximate functions and their derivatives to facilitate model development in a floating-point environment and the subsequent mapping to a fixed-point implementation.Our approach is distinct from the widely used TensorFlow approach where there is usually the implementation loss (losses due to pruning, quantization, and look-up table based approximation), which remains in the network.The network proposed in this paper is significantly smaller in footprint (number of parameters and complexity) while achieving stateof-the-art performance.The fixed point model is deployed and tested in an ARM Cortex M4 based Nordic Semiconductor nRF52DK Bluetooth embedded development kit.A significant power saving of ≃ 50% is achieved compared to sending every sample through a Bluetooth link.The low complexity of the proposed classifier, system level power reduction of the sensor and a reliable, real-world performance estimation using data augmentation makes the proposed approach a good choice for implementation in IoT edge applications.
The rest of the paper is organized as follows.Section II, explains the currently available solutions and algorithms for ECG classification and anomaly detection.Section III details the ECG dataset used in this work and any pre-processing performed on the data.Section IV provides details of the proposed neural network architecture and its fixed-point implementation.The performance analysis and the comparison of the proposed method with previous approaches are evaluated and presented in Section V.The conclusions are presented in Section VI.

II. RELATED WORKS
There are many approaches proposed in the literature for automatic detection and classification of cardiac arrhythmias from ECG signals.Arrhythmia classification is a pattern recognition task that can be done using syntactic or machine learning methods [17].In traditional syntactic methods, ECG signal features are carefully extracted using signal processing and feature extraction methods such as frequency domain analysis, wavelet transform (WT), and morphological features after which hand-engineered algorithms and rules are applied to the extracted features to detect arrhythmia.Machine learning-based methods such as Decision Tree, Random forest, K-Nearest Neighbour, Support Vector Machine (SVM), Artificial Neural Network (ANN), Reservoir computing with logistic regression (RC), Linear discriminants (LD), Hidden Markov Models (HMM), hyper box classifiers, optimum-path forest, conditional random fields and rules-based models, and Bayesian models use a combination of signal features and morphologies as feature vectors to classify ECG signals [18].However, the accuracy of these methods strongly depends on the selected learning technique and nature of training data; and the data is often limited with large variation in morphologies between patients.
In Veeravalli et al [17], Fast Dynamic Time Warping (FDTW) with a constraint window is used to formulate the cost feature matrix between the first 30 beats in a patient's record and K-means clustering is used to find the max cluster to nominate a beat as the global normal beat for that particular patient.Thereafter, DTW distance between all the incoming beats with respect to the selected global normal beat were computed.Further, anomalies in the data are detected using a Hampel filter.However, the approach fails to address the case when there are no multiple classes (i.e.Normal and Abnormal) present at the initial clustering phase and most occurring beats may also not always be the clinical normal beat.In addition, K-means clustering is an NP-hard problem and performance evaluation was done using only 15 records selected from the MIT-BIH arrhythmia database.
Zadeh et al [13] uses a bandpass filter to do the preprocessing and an SVM classifier based on features from a Continuous Wavelet Transform is used.The approach has achieved 97% of Normal (N) vs Abnormal (S, V, F, and Q) test accuracy over 17,784 beats from a limited set of 8 selected patient records (118, 124, 207, 208, 209, 214, 222, and 223).Similarly in Jiang et al [14], a block based neural network has been used with Hermite transform features over 49,600 selected beats to achieve 95.6% accuracy in identifying abnormal beats.However, the beat selection criteria used in this work were not specified.
Dan Li et al [15] has introduced a 1D Convolution Neural Network (CNN) to classify ECG signals and achieved more than 98% test accuracy on selected 13,200 beats with a balanced down sampling of data to have an equal probability set of AAMI classes.Wavelet decomposition is used for preprocessing ECG signals and a SoftMax classifier is used in the neural network.This approach is purely dependent on the local morphological information and ignores the simple and rich temporal features making it unfit for generalized tests.Kiranyaz et al [16] have proposed an adaptive 1D CNN which is trained with both global and patient-specific data.The global part of the training set contains 245 representative beats, which includes 75 from each type: N, S, and V, and 13 from F and 7 from Q randomly sampled from the first 20 records (100 -124).The first 5 min of data from each record is used for patient-specific training for that subject.For testing all 44 records except the paced records were considered and a 97% (N Vs A (S, V, F, and Q)) test accuracy is achieved.However, the approach heavily depends on the patient-specific training data and its characteristics during the first 5 min interval.
In [11], a Recurrent Neural Network (RNN) cell-based novel architecture is proposed to capture the temporal and spatial patterns of ECG signals.It uses ECG WT coefficients and RR interval properties in one RNN pipeline (Model Alpha) and PCA components from a concatenated vector of (WT, downsampled ECG beat and RR interval-based feature) in another RNN based pipeline (Model Beta).Each model has been trained individually and blended with the result of a new MLP network for better performance.A patient-specific training procedure has been followed along with global data collected from records 100-124 of MIT-BIH arrhythmia database.A total of 49,632 beats were tested to achieve 98% test accuracy and an F1 score of ∼ 92%.However, this method requires two leads of ECG signal which is difficult to acquire with a low profile wearable device [19].Also, it is uncommon in that different portions of the concatenated feature vectors are presented to the RNN on each invocation whereas normally the RNN input vector contains the same set of features each time.
Das and Ari [20] have proposed a combined feature vector of 4 temporal features (pre-RR, post-RR, local-avg-RR, and global-avg-RR), 8 S-transform based features, and 20 wavelet decomposition based statistical features acquired from a single ECG beat.The architecture proposed uses an MLP based neural network to classify ECG beats.A six fold crossvalidation test over 24 records from MIT-BIH arrhythmia database achieved 94.5% accuracy after patient specific training with 5 mins of data from respective ECG records and 200 global random training beats from the first 20 records.
Finally, in [6] a Two-Stage Neural Network (TSNN) that achieves 97.8% and 98.6% test accuracy over 48,310 individual beats, without and with biased training respectively is proposed.The first stage of the network takes an input of raw ECG beats and an MLP network classifies the beat into a Normal or Abnormal beat.The Abnormal beats from first stage will be fed to a second stage, where a CNN classifies these beats into AAMI classes N, S, V, F, and Q.
For performance evaluation of an ECG classifier, data from the same subject shouldn't be used for both training and testing.This is to ensure that the performance of the classifier on previously unseen records are evaluated.In addition, the maximum duration of a dataset used for training should follow the limits imposed by the AAMI protocol.Although many works in the literature strongly obey these tenets, only a few authors have taken explicit precautions to follow the AAMI protocol precisely when reporting results.This can make fair comparisons between published works difficult.

III. DATA SET AND PRE-PROCESSING
In this study, the MIT-BIH arrhythmia database [21] containing 48 ECG records excluding the paced1 beat records are used for performance evaluation.Twenty-three of the records are intended to serve as a representative sample of routine clinical recordings and the remaining 25 records contain complex ventricular, junctional, and supraventricular arrhythmias.The records are bandpass filtered at 0.1-100 Hz and sampled at 360 Hz.There are over 100,000 labeled beats of 15 different heartbeat types.Each record has two ECG leads.The first lead is modified limb lead II (ML II) and the second lead is modified lead V1 or in some cases V2, V4, or V5.Two or more cardiologists independently annotated each record of 30minute duration selected from 24-hour recordings [14], [6].The database defines 15 types of beats.For the purpose of this work, we group the beats labeled as Supra Ventricular Ectopic beats (SVEB -S), Ventricular Ectopic Beats (VEB -V), Fusion Beat (F), and Unclassified Beat (Q) as 'Abnormal' and the remaining beats as 'Normal'.This categorization is consistent with AAMI standards.
ECG is usually affected by various noises like baseline wander (low-frequency noise in the range of 0-0.3Hz), electrode contact noise, motion artifacts, power line interference (PLI) etc. which affects the efficacy of signal analysis [22].Many real-world ECG devices perform baseline wander and PLI removal during acquisition and present a clean ECG signal at a typical sample rate of 250Hz.To emulate this we perform the following data processing steps: • Discrete Wavelet Transform (DWT) based denoising [18] • PLI removal using a standard IIR notch filter at 60Hz • Re-sampling from 360Hz to 250Hz.An illustration of ECG noise removal using the above steps is shown in Figure 1.Additionally, we chose to use R-peak location annotations from MIT-BIH database [21] directly instead of implementing our own R-peak detector.Since there are several existing works that achieve good accuracy for R-peak detection [23], [24], [25], we narrowed the scope of this work and focus exclusively on developing the classifier.
We have extracted a segment with samples extending from 250 ms before to 450 ms after the R-peak location.According to [18], this segment can sufficiently capture the entire beat (including P, T waves).At 250Hz sampling rate, this segment corresponds to a vector of 175 samples which is the basic unit upon which our algorithm operates every time a R-peak is detected.

IV. PROPOSED ARCHITECTURE
This section provides the details of the proposed lightweight neural network, ANNet, starting with the feature vector formulation in section IV-A, followed by the details of network architecture in section IV-B.A fixed-point implementation of the proposed network is then discussed in section IV-C.

A. Feature Vectors
Two feature vectors are derived from the original ECG data samples, namely: Fig. 3: LSTM Cell Pipeline as part of LSTM X (Fig. 2) 1) X: For each beat, we then compute X i , a vector of Principal Component Analysis (PCA) coefficients of length 6 (with respect to the principal Normal=N, RBBB=R, LBBB=L, Ventricular=V, Supra-Ventricular=S, and Fusion=F beats respectively), where i is the current beat index.An illustration of PCA feature vector construction for 2 random beats is shown in Fig 4 .The same process should be followed with the 5 beat window.
2) RR: The second feature vector is based on ECG RR interval information, and is defined as vector, [RR i , RR i+1 , RR i , RR wSDN Ni , RR Indexi ], of length 5 .The first two elements of this vector are the RR intervals just prior and after current ECG beat respectively.The third element is the average of 11 RR-intervals from RR i−9 to RR i+1 .RR wSDN N and RR Index are Heart Rate Variability (HRV) metrics2 based on [26], which are defined as: where:

B. Neural Network Architecture
The proposed network is composed of three main blocks: LSTM X, MLP R, and a blending block with MLP layers as seen in Figure 2. The LSTM based recurrent block is selected to identify the regularity property of the typical timeseries signal while all the other simple extracted features use MLP layers to learn the underlying relationship to predict abnormal beat.Simultaneous training of the blocks will result in complementary learning compared to ensemble learning of models [11], i.e. blocks will tend to learn different properties of the sequence and complement each other while making a decision.
The LSTM X block in Figure 3 generates an attention map at its output (Y ) which keep track of past beats' properties, as in the work by Zhang et al [27].Figure 3 shows the LSTM X block in more detail.The LSTM cell is executed 5 times for each ECG beat, with inputs being X i−4 through X i in sequence.The output of the LSTM cell, h (of length 10) and internal state vectors are updated for each execution.Additionally, each output vector h is presented to the MLP L layer which generates a vector (Y k ) of length two, as shown in Figure 3.These two outputs are concatenated across the 5 execution cycles, forming the vector Y of length 10.
In parallel to the LSTM X block, there is an MLP R layer (Figure 2) that takes the feature vector, RR, as input, and produces output, RR 1, of length 2. This output is concatenated with the vector Y from the LSTM X block and is passed to an MLP network (MLP 1, MLP 2) with one hidden layer of 5 neurons and 2 outputs (C 3).These two outputs are then passed to a SoftMax layer, which classifies the beat as Normal(N) or Abnormal(A).
The learning rate ε is set to 0.001 and the β value in the stochastic gradient descent algorithm is set to 0.9 for the network.A mini-batch size 128 is used and the error between the prediction and ground truth is used as the cost for backpropagation learning.More details about the parameters and complexity of the network are analyzed and the results are presented in sections V.

C. Fixed Point Implementation
ANNet was initially implemented and trained in Matlab using floating point arithmetic (denoted as "Float" in section V).Fixed point representation of the network reduces the complexity so that the model can be deployed on a cost effective low-profile embedded system.To port the algorithm to a fixed point embedded platform, we first replaced the activation functions in the model with fast approximate versions and then mapped these to a fixed point implementation.The model was implemented in C-language for an embedded platform.Also, a bit-accurate and fixed point version of the model was created in Matlab.This allowed us to take the previously trained coefficients from ANNet's floating point version as a starting point and to retrain (in Matlab) the model using fixed point arithmetic.Note that this re-training step also requires derivatives of these fixed point activation functions for the back-propagation process; these are not implemented in the embedded environment, as they are only used during the retraining process.The details of the approximation functions and their mapping to fixed point arithmetic are given in sections IV-C1 -IV-C3.We used the notation "Qm.n" to denote our fixed point number formatting.Here it is assumed that all quantities are 2's complement signed numbers with "m" integer bits (including the sign) and "n" fractional bits.Accordingly the resolution is 2 −n and the range is from −2 m−1 to +2 m−1 −1.
As we are targeting an embedded implementation, all variables will be either 16-or 32-bit wide.
The number of fractional bits to use in the fixed point model of ANNet was experimentally determined.We performed retraining of ANNet for various sizes of fractional bits and observed the performance as shown in Figure 5. Based on this analysis, the number of fractional bits was chosen to be n = 6.

1) Sigmoid Activation Function (σ):
We used Sigmoid cell activation function for tuning the weights of neurons in ANNet.The below equations show the original sigmoid (Eq.3) and its derivative (Eq.4) used.
Sigmoid function cannot be easily implemented in a fixedpoint environment due to the presence of exponential functions.Therefore, it is translated to an approximate fast version of the function: σ(x): Elliott Activation Function [28] (Eq.5,6), to facilitate implementation on an embedded device: These can be implemented in fixed-point arithmetic with n fractional input 3 , x = ⌊2 n x⌋, and output bits as follows: 4 Fig. 6: Illustration of the Sigmoid activation function, σ(x), and its approximation σ(x) and their respective derivatives as computed for n = 12.
2) Tanh Activation Function: It is mainly used inside the LSTM cell as the candidate gate C t (Figure 3) activation function.The original equation (Eq.9) and its derivative (Eq.10) are provided below along with its fast approximate fixed point implementation inspired from the work of Anguita et al. 3 All rounding in this work truncation 4 Note multiplication by '2 k ' and '2 −k ' represents k bit shifts to left and right respectively.Also note that right bit (negative k) implicitly round towards zero and so this rounding will not be shown explicitly.The word sizes are chosen so as to mean that no overflows will occur when left shifting.[29] is used in this work with modification to its coefficients considering bit level manipulation for efficiency.
An approximate version of the tanh function is t(x): These can be implemented in fixed-point arithmetic with n fractional inputs and outputs as: 3) SoftMax function: This is primarily used in the classification layers.It is not related to a single-neuron output rather it computes a normalised vector, s (Eq.15), based on the vector, x, of outputs from the last fully connected layer: where |x| is the cardinality of the vector x, which for our binary classifier we have |x| = 2.We used normalisation with offset '1' as approximate version of the SoftMax function, s(x): These can be implemented in fixed point arithmetic with n fractional inputs and outputs as before: The nRF52DK development kit from Nordic Semiconductor with Segger Embedded Studio is used to deploy and test ANNet in real-time.The experimental setup is shown in Figure 8. Nordic nRF PPK (Power Profiler Kit) shield along with the development kit is connected to a PC and is used to measure the power consumption using the Nordic Power Profiler Software.An FTDI FT4222H SPI bridge is configured to act as an ECG sensor by transferring signals from MIT-BIH arrhythmia test records stored in the PC for fair emulation.The ECG signal is pre-processed offline as per section III creating a 250 Hz sampled signal and quantized to 16-bit signed fixed point numbers, composed of 4 integer bits5 followed by 12 fractional bits (Q4.12 format6 ) along with a flag indicating R-peak locations.This is the input to our embedded, fixedpoint ANNet classifier.The overall flow diagram is provided in Figure 9.
1) Preprocessor and Feature Extraction: The input, output, and internal calculations within these blocks are implemented using 16 bit, Q4.12 binary fixed-point arithmetic.A ring buffer (length 1000) is maintained to store incoming samples and another ring buffer (length 11) is used to record the R peak locations with the sample buffer.For each beat detected, a window of 175 samples (62:R:112) is taken and normalized, and then the PCA coefficients are calculated using the stored principal components.Additionally, the RR features (section IV-A2) are calculated from the R peak locations.The combined feature vector is then fed to the ANNet module for classification.
2) ANNet: The ANNet module uses 32bit arithmetic represented in Q26.6 fixed-point format corresponding to a resolution or 2 −6 .The large word size is used to avoid overflows in intermediate results throughout the ANN.The classification results are then sent back to the PC for verification of the results and to obtain performance metrics.The detailed of this block described in section IV-B.

V. EXPERIMENTAL SETUP & RESULTS
For evaluating the proposed network, we have used MIT-BIH arrhythmia database [30].In total, 100,661 ECG beats from the database were used for this study.In accordance with De et [31]

A. Augmentation of Imbalanced Dataset
The MIT-BIH dataset is severely imbalanced with more than 90% beats of the type 'Normal' class.This can affect the realworld performance and therefore, it is beneficial to balance the training data so that the underlying proportion of beat type in the training set will not have a major impact while tuning the network parameters.Some of the previous approaches use the Conditional Data Grouping, Biased Training, or downsampling method to address the imbalance and identity issue of the training database [7].However, it is visible that there is a significant variation among beats, and selecting very few to balance the classes will limit the learning of the network to a particular subspace of available data representation.Hence, we used Synthetic Minority Oversampling TEchnique (SMOTE [32], [33]) for augmenting the dataset by creating synthetic data of the minority class [34].

B. Evaluation Method
This section describes the steps which we have followed to train and evaluate the performance of ANNet.Table I The SMOTE algorithm is used to address the class imbalance in DS1 and the total number of beats have been increased to 91,729 after augmentation.After global training with this augmented DS1 dataset, the network achieved ≃81% of accuracy with ≃52% of F1 score over the DS2 test sets.This allows the network to learn without getting biased due to the large presence of the normal(N) beats.Once the network is trained without any influence from the imbalanced data, the SMOTE synthetic data is removed from the training set and the network is trained globally with the original training set.This step improves the performance of the network to ≃92% of accuracy with ≃69% of F1 score.
The model is converted to a fixed point version which introduces approximation errors due to the use of our fastapproximation functions (section IV-C) and quantization errors.This results in a performance drop from 92% to 84% in accuracy, as seen in Table I.Using the model as a starting  Acc: Accuracy, Sen: Sensitivity, Spe: Specificity, Ppr: Positive Prediction, F1: F1-Score, G: G-Score point, the fixed-point model was re-trained in a fixed-point bit-accurate environment to optimise performance.This step improves the fixed-point model global performance to 94%.
2) Patient Specific Training: The patient-specific training and testing have been performed using the DS2 dataset and the results of individual records are presented in Table II.The training set is created with the first 5 min data from individual records in DS2 according to AAMI recommendations.The network performance is evaluated among the other approaches and presented below in section V-C.During this training phase, the learning rate ε is set to a low value so that the layer weights will not deviate much from prior learning with augmented DS1 dataset.In comparison to the previous floating-point global training, this step improves the accuracy to ≃ 97% with an F1 score of ≃85%.Similarly, the fixed-point patient specific training from the global fixed point model as starting point, improved the overall accuracy to 96%.
A total of 41,480 beats from DS2 is used for unseen-testing and this excludes the 5-minute patient specific training set.The total number of True Positive (TP), False Negative (FN), False Positive (FP), True Negative (TN) are given in Table II.Further, training and testing with a 70:30 split also conducted on the proposed model and the results are provided in Table I only for study purposes.

C. Performance
The proposed implementation achieved higher performance compared to many other works while maintaining lower com- plexity despite all the others using higher sampling rate (360 Hz) and floating point implementations.[17], [13] and [15] has reported performance, which is on par or marginally (∼ 1%)  [6] 929,650 [6]   [6] 1,289,312 [6]   [6], Division, and other operation such as sampling and SoftMax calculations) and No of Instructions per classification for the algorithm in the literature which has minimum available information to do the same as described in section V-D.* This value is an estimate based on assumptions over the operations involved and architectures unveiled.
higher than ours, however, the training and testing do not follow AAMI recommendations [12] and consist of less than half the numbers of test beats for evaluation.Similarly [11], which reports marginally higher accuracy, involves irregular use of LSTM cells and uses two lead ECG data which is difficult to acquire in a wearable device.[6] and [16] exclusively depends on the local morphology of a single beat ECG segment and uses a CNN based architecture.These approaches wouldn't be able to extract and make use of the RR interval information and isn't a recommended practice as the morphologies vary a lot across patients and may result in poor performance under a new unseen environment.In addition, [16], [35], [11] and [6] consume more than a million instruction cycle to classify single beat to achieve that marginal performance while [35]'s 1D-CNN architecture is used irregularly over temporal and morphology feature extraction.MIT-BIH dataset, which is used for performance evaluation by most of the above approaches, is an unbalanced dataset.Training with an unbalanced dataset is not properly addressed in these works, which makes the prediction by these methodologies subject to sampling bias.On the flip side, the test accuracy also depends on the selection of test records and their properties, therefore, achieving marginal performance does not signify the effort and guarantees the best model.For example, [20] architecture which requires very low (∼ 2.5K) instruction cycles for beat classification performs a 6 fold test and reports the average performance which makes the results less reliable.
In essence, most of the existing works achieve comparatively lower performance than ours and show several implementation drawbacks as discussed above.The proposed network achieves a higher level of accuracy in routine clinical records, while the overall performance (≃97%) in the DS2 recordings has not surpassed the level of stateof-the-art mainly due to the presence of regular abnormal signal.It is observed that the signals which belong to the normal category sometimes do have different morphology, which introduces significant changes in the PCA coefficients and thus affecting the performance.
According to the MIT-BIH arrhythmia database [30]

D. Complexity Analysis
Referring to ANNet description in section IV-B, we have enumerated the input vector sizes and the number of trained coefficients in each of the layers of our proposed ANN architecture in Table IV below.In total, we have 791 parameters which is comparatively very low with respect to many previous approaches in Table III.
Table III applies various assumptions in calculating the number of parameters and instructions per classification for those algorithms which lacks detailed architectural information.The values are calculated based on the confusion matrix provided in the related publications and converting it to N(N, L, R, j, e) versus A classes.The other parameters and instructions were also calculated based on the revealed architecture for classification of N versus A only.The input feature vector extraction and other operations such as filtering are not considered for complexity calculations since it is considered a separate block in most cases and most models are not fully dependent on specific features.To compute the complexity in terms of instruction per classification, we have reasonably assumed that for addition and multiplication, it will cost one instruction cycle and for division, it costs 2 instruction cycles based on the Cortex M4 Technical Reference Manual [36].The instruction for all activation functions is considered with one division and one addition operation only, even though different activation functions behave differently within different regions of the input value.These assumptions are uniformly applied across all the other methodologies despite the fact that others are using floating point architecture and therefore consume more clock cycles or instruction cycles, particularly in activation and SoftMax classification functions with exponential value calculations.
For example, in [6], a TSNN was proposed where the first stage does binary Normal / Abnormal classification similar to our design, but the parameter count is significantly higher and the MLP based network used has a complexity of approximately 180k instructions cycles per classification whereas the network proposed in this work consumes less than 20k instruction cycles, a 9-fold decrease.More comprehensive comparisons are provided in Table III.

E. Power Consumption Analysis
The current consumption of the proposed system on nRF52 DK is measured in real-time using an nRF PPK shield and the results are shown in Figure 10.For a baseline reference, the average current consumption for sending all the samples through BLE(Bluetooth Low Energy) without executing the ANNet ('NO ANN' in Figure 10) is 112.68µA for a 30 minute record sampled at 250 Hz; this can be seen as a flat reference line in Figure 10.Operation of the ANN increases the current consumption expended on microprocessor computations which is expected to be dependent on the number of beats per second.However, this is offset by the mechanism of only triggering Bluetooth transfers when anomalous beats are detected (both true and false positives).These two effects can be clearly seen in Figure 10 where it is evident that average current is correlated positively with both the average number of beats per second as expected and with the percentage of abnormal beats.
For example, record 117 has the lowest number of beats (1,534) and a very low abnormal beat rate consumes only 26.4µA on average.However, record 232 has 77.6% of its being classified as abnormal and consumes the highest high average current of Another example is record 213, with 3,549 total beats with 17.2% abnormality consumes approximately 66.5µA on average.
The test results prove the significant power efficiency, or the order of ≃ 50%, are typically achieved across many of the records compared to the alternative of continuously wireless transmission of the ECG signal.In real-life personal health monitoring, the occurrence of arrhythmias is extremely sporadic and power savings are expected to be higher than 50% ensuring the longevity of the battery.

VI. CONCLUSION
In this study, we have proposed a lightweight neural network to classify ECG into Normal and Abnormal beats.The network takes an input feature vector created from the coefficients of the PCA using 5 consecutive beats and a temporal feature vector created from the ventricular R-R interval rate.The proposed method is able to achieve low complexity with higher anomalous signal detection accuracy in routine clinical recordings and reasonable accuracy in complex records.The algorithm was ported to an embedded platform by replacing various activation functions with approximations and the mapping to fixed point after retraining resulting in very little implementation loss and a design having the lowest computationally complexity with respect to the state of the art.
Compared to continuous data transmission, we demonstrated that gating the wireless transmission using a binary classifier so that only anomalous beats are transmitted, can significantly reduce the overall system power consumption

Fig. 1 :
Fig. 1: ECG Signal before and after denoising and notch filtering

Fig. 4 :
Fig. 4: Principal Component Analysis for the ECG records a: Original raw ECG signal randomly selected representing Normal and Ventricular beat.b: Normalised ECG beat.c: Compute the basis vectors (Principal Components) for each class.The most significant first basis vector of each class is shown in the plot.d: Represent each beat as a coefficient with respect to each basis vector from every class.

Fig. 5 :
Fig. 5: Global training performance metrics of the proposed network with different level of quantization

Fig. 7 :
Fig. 7: Floating and Fixed point approximation of Tanh function3) SoftMax function: This is primarily used in the classification layers.It is not related to a single-neuron output rather it computes a normalised vector, s (Eq.15), based on the vector, x, of outputs from the last fully connected layer:

Fig. 8 :
Fig. 8: Test setup used for experimental verification D. Embedded system

Fig. 9 :
Fig. 9: Flow diagram of the implementation of the algorithm on the embedded environment

Fig. 10 :
Fig. 10: Average current consumption in µA over 30 minutes of DS2 records.We can see that record 213 has the highest average heart beat rate (scaled to 100%).The records with the largest number of abnormal beats (e.g., 232) show the highest current.
, illustrates the performance at each step from floating point global level training to fixed point patient specific training.1) Global Training: We used DS1 from the MIT-BIH database, which consists of 50,982 beats, for training our network.Many prior works have used DS1 as the global training set and hence no performance test is conducted over this subset of data.

TABLE I :
Performance of the proposed floating point and fixed point networks at each training phases.True Negative, FN: False Negative, TP: True Positive, FP: False Positive

TABLE II :
Patient Specific Training summary of floating point network for the DS2: MIT-BIH arrhythmia dataset

TABLE III :
Performance comparison of the proposed method with other approaches The Table applies various assumptions in calculating the value for Accuracy, No of Parameters, No of Operations (Total No of Additions and Multiplication similar to , it is expressed that records belonging to 200, 203, 214, and 222 are corrupted with the occasional burst of noise and artifacts.In 214, 215, and 228, there exist few episodes of tape slippage, and 219 and 232 are with long pauses.There are occurrences of axis shifts in records 203 and 223.The records 201, 212, 213, and 223 show either abnormally high or slow cycles for the relevant beat type, making it more complex to analyze.Records 203 and 207 are included in the global training set, despite their classification being onerous even for experienced cardiologists to manually annotate.This results in accuracy of the model appearing lower than in typical conditions.The records 121, 202, 213, and 222 are the worst performing among the DS2 dataset.In 222, the incorrectly classified beats belong to PAC -Premature Atrial Contractions (Physionet-A) which is not a dangerous arrhythmia class.In records 213, the beats which are classified as normal belong to the Fusion category and these Fusion PVC beats are almost identical to normal in morphology.In addition, records 103, 105, 121, 123, 212, 222, and 234 do not have any abnormal beat sample to represent in the first 5 min training episodes and therefore, it became almost impossible to do the patient-specific training with the first 5 min episode as suggested by AAMI.