An adaptive cognitive sensor node for ECG monitoring in the Internet of Medical Things

The Internet of Medical Things (IoMT) paradigm is becoming mainstream in multiple clinical trials and healthcare procedures. Cardiovascular diseases monitoring, usually involving electrocardiogram (ECG) traces analysis, is one of the most promising and high-impact applications. Nevertheless, to fully exploit the potential of IoMT in this domain, some steps forward are needed. First, the edge-computing paradigm must be added to the picture. A certain level of near-sensor processing has to be enabled, to improve the scalability, portability, reliability, responsiveness of the IoMT nodes. Second, novel, increasingly accurate, data analysis algorithms, such as those based on artificial intelligence and Deep Learning, must be exploited. To reach these objectives, designers and programmers of IoMT nodes, have to face challenging optimization tasks, in order to execute fairly complex computing tasks on low-power wearable and portable processing systems, with tight power and battery lifetime budgets. In this work, we explore the implementation of a cognitive data analysis algorithm, based on a convolutional neural network trained to classify ECG waveforms, on a resource-constrained microcontroller-based computing platform. To minimize power consumption, we add an adaptivity layer that dynamically manages the hardware and software configuration of the device to adapt it at runtime to the required operating mode. Our experimental results show that adapting the node setup to the workload at runtime can save up to 50% power consumption. Our optimized and quantized neural network reaches an accuracy value higher than 97% for arrhythmia disorders detection on MIT-BIH Arrhythmia dataset.


Introduction
The Internet-of-Things (IoT) paradigm, declined in the so-called Internet of Medical Things (IoMT), enables seamless collection of a wide range of data streams, that can be analyzed to extract relevant information about the patient's condition. However, in order to make IoMT really ubiquitous and effective, a step forward is needed to improve scalability, responsiveness, security, privacy. Most of the efforts aiming in this direction focus on the adoption of an edge-computing approach. Data streams, acquired by sensors, can be processed, at least partially, at the edge, before being sent to the cloud, on adequate portable/wearable processing platform. This provides several advantages. First, it reduces bandwidth requirements. Near-sensor processing can extract from raw data more compact information. In this way, less communication bandwidth is required to the centralized server, and, at the same time, the energy consumption related to wireless data transmission is drastically reduced. Second, near-sensor processing can improve reliability. Monitoring must not rely necessarily on connection availability and, if immediate feedback to the user and/or local actuation is needed, the delays through the network can be avoided. Moreover, pre-processed information can be delivered to the cloud, preserving user privacy avoiding the propagation of sensitive raw data.
An extremely important field of application of IoMT is related to the treatment of cardiovascular diseases (CVD), a major public health problem that generates millions of deaths yearly and impacts significantly on health-related public costs. As an example, in 2016, ≈ 17.6 million (95% CI, 17.3-18.1 million) deaths were attributed to CVD globally, representing an increase of 14.5% (95% CI, 12.1%-17.1%) since 2006 [1]. In Europe, the CVD impact on the economy is estimated to be around C210 billion [2]. CVD treatment with remote monitoring involves in most cases analysis of electrocardiogram (ECG) signals. Creating embedded platforms implementing such kind of analysis is promising, but, at the same time, very challenging, for several reasons: • Requires edge computing at low energy/cost budget: Sensor nodes must be wearable and affordable to implement ubiquitous patient monitoring. Given the high data rate produced by ECG sensors, raw data wireless data transmission requires an energy budget that cannot be negligible when the task is implemented in a portable and inexpensive computing device.
• Requires cognitive computing: state-of-the-art anomalies detection tasks are based on the analysis of manually designed features with are hard to craft and extract online from the ECG waveforms. Thus the community is shifting focus to techniques based on neural networks and deep learning, that rely on automatically learned features. However, existing approaches that use deep learning for the recognition of anomalies on the ECG trace, rarely pay attention to energy consumption to be deployed on low-power processing systems. Thus, pretty often do not take into account workload reduction and post-deployment accuracy evaluation.
• Requires adaptivity: Intensity of the processing workload is very dependent on the needed level of detail and also intrinsically data-dependent. Information to be analyzed is usually contained in waveform shapes of ECG peaks, thus the rate of sample frames to be analyzed is directly dependent on the patient's heartbeat rate. This paves the way to energy consumption reduction by means of an adaptive management of the system, that reconfigures itself on the basis of the detected data and on the chosen operating mode.
In this work, we explore the implementation of a system for at-the-edge cognitive processing of ECG data. We have conceived a hardware/software setup for the processing system inside the IoMT node. We have used SensorTile, a compact processing device developed by STMicroelectronics, as a reference microcontroller platform. The system makes use of a quantized convolutional neural network, specifically sized and trained to run on a low-power microcontroller, that has been validated in post-deployment and recovers accuracy drops that arise in real on-line utilization. Moreover, we take a step further in hardware/software optimization using adaptivity, allowing the system to reconfigure itself, to suit different operating modes and data processing rates. To this aim, besides executing the tasks that implement sensor monitoring and on-board processing, the system includes a component called ADAM (ADAptive runtime Manager), able to dynamically manage the hardware/software configuration of the device optimizing power consumption and performance. ADAM creates and manages a network of processes that communicate with each other via FIFOs. The morphology of the process network varies to match the needs of the operating mode in execution. ADAM can be triggered by re-configuration messages sent by the external environment or by specific workload-related variables in the sampled streams (e.g. patient's heartbeat pace). When triggered, ADAM changes the morphology of the process network, switching on or off processes, and reconfigures the inter-process FIFOs. Moreover, depending on the new configuration it changes the hardware setup of the processing platform, adapting power-relevant settings such as clock frequency, supply voltage, peripheral gating.
The remainder of this paper is as follows: Section 2 describes the landscape of related work in literature, Section 3 describes an overview of the overall SoS picture, Section 4 present the proposed template for the node and the reference target platform and the reference application model. Moreover it presents the details of the ADAM component. Section 5 describes how the chosen template has been declined to implement ECG monitoring, the proposed operating modes and the processing tasks coexisting in the application. Section 6 discusses our experimental results. Finally, Section 7 outlines our conclusions.

Related work
Multiple solutions involving the use of sensor networks in hospitals or at home and the IoMT are proposed in literature [3][4][5]. Most of these studies exploit a cloud-based analysis: data is usually encapsulated in standard formats and sent to remote servers for data mining. Most research work takes into account wearability and portability as main objectives when developing IoMT-based data sensing architectures, thus devices available on the market can guarantee autonomy for days or weeks [6,7].
To really use cognitive computing at the edge, more complex and accurate algorithms, such as those exploiting artificial intelligence or deep learning, must be targeted. Their efficiency has been widely demonstrated on high-performance computing platforms. Some examples are [8], where an NVIDIA GeForce GTX 1080 Ti (11 GB) is used, [9], that uses a 3.5 GHz Intel Core i7-7800X CPU, RAM 32 GB, and a GPU NVIDIA Titan X (Pascal, 12 GB), or [10], based on an i7-4790 CPU at 3.60 GHz. However, how to map state of the art cognitive computing on resource-constrained platforms is still an open question. There is an ever-increasing number of approaches focusing on machine learning and artificial intelligence to identify specific events in sensed data. In [11] and [12] authors exploit ANN (artificial neural networks) to detect specific conditions from the proposed data. In [12], an ANN is used to identify the emotional states (happiness or sadness) of the patient. However, network topologies are still very basic and highly tuned and customized to fit on the target device. In [13], energy/power efficiency is improved, using near-sensor processing to save data transfers, and dynamically adapting application setup and system frequency to the operating mode requested by an external user and to data-dependent workload. As a use-case, a CNN (Convolutional Neural Network) is used to identify anomalies on ECG traces.
There are several works that implement ECG monitoring on customized chips, it is shown that with low energy consumption it is possible to classify cardiac anomalies in real-time even using AI methods [14][15][16][17][18][19][20]. Other work focuses on implementing efficient off-the-shelf commercial devices to facilitate easier community adoption of these techniques. Several target technologies have been used in the literature, such as FPGAs or microcontroller-based boards.
A substantial number of research works are dedicated to studying IoT devices in the medical field, in particular ECG monitoring and detection of anomalies [21][22][23][24][25][26][27][28]. The edge-computing paradigm is often only marginally exploited, and local processing is used only for implementing easy checks on raw data and/or marshaling tasks for wrapping the sensed data inside standard communication protocols [29][30][31][32]. To really use cognitive computing at the edge, more complex and accurate algorithms, such as those exploiting artificial intelligence or deep learning, must be targeted.
The cognitive approach that involves the use of convolutional neural networks (CNNs) shows promise in terms of accuracy in detecting ECG signal arrhythmias compared to other traditional strategies based or not on artificial intelligence algorithms [33][34][35], Moreover, in most cases, the use of CNNs allows to classify an ECG signal even if not pre-processed. The most common strategies present in many state-of-the-art works that allow to improve the efficiency of these IoT nodes are: moving the inference operations at the edge, choosing a low-power device, quantization techniques to speed up the network execution of the inference stage.
Another interesting work was presented in [34], in addition to the comparison with other techniques used to analyze the ECG trace, Latent Semantic Analysis techniques were used to improve the accuracy of the network. Both training and inference take place on the cloud side, our aim is to move the inference to the edge of a low-power device in order to reduce latency times and reduce energy consumption due to wireless communication.
In [36] excellent results were obtained for ventricular arrhythmias and supraventricular arrhythmias classification: 99.6% and 99.3% for accuracy value, 98.4% and 90.1% for sensitivity value, 99.2% and 94.7% for positive predictive value, respectively. In [36] a double CNN is used, one of them takes as input the frequency domain information of the ECG signal (a fast Fourier transform is performed). This methodology, despite the excellent results in terms of accuracy, was not taken into consideration in our case because it's particularly expensive to perform on a microcontroller.
In [35], again, there is proof of how neural networks obtain good results if compared with methods such as K-nearest neighbors (KNN) and random forest (RF) (95.98% on MIT-BIH Supraventricular Arrhythmia Database) and the inference occurs directly from the IoT node but the power consumption remain relatively high once again, they are used in fact non-low power devices such as Raspberry Pi 4 or Jetson Nano. Always in [35], a good job of research has been done on the morphology of the CNN network that was more suitable for inference on ECG signals, the network we used provides a structure very similar to the one chosen in [35].
In [37], good results are obtained in terms of accuracy (96% using MIT Arrhythmia dataset), but here too the inference occurs on the cloud side, albeit with excellent latency times the node will still have to transmit a large amount of raw data which in our case causes excessive consumption of power with respect having edge-side inference.
An approach similar to [38] was chosen, an embedded device was chosen that is able to perform the inference directly on the node. In [38], a study was made on the variation of accuracy as a function of different quantization levels, they choose a precision of 12-bit with an accuracy of 97%, but it's visible that already from 6-bit upwards the accuracy levels exceed the 90%. Power consumption is around 200 mW during computation and the node is based on FPGA technology.
Other works with which we are confronted are [39][40][41][42]. The quantization technique will also be exploited in our work by choosing an 8-bit precision, this will allow us to significantly speed up the inference operations. In particular, the CMSIS libraries are used to exploit the SIMD capacity of the microcontroller chosen.
In this work, we extend [13] taking into account that in the current state-of-the-art landscape, network topologies, processing platforms and software tools can be much more complex. On one hand, the community has designed novel ultra-low power processing platforms, providing previously unmatched computation capabilities on typical AI and data analysis workloads. In summary, as main novel contribution, we propose: • The definition of a hardware/software/firmware architectural template for the implementation of a remotelycontrolled sensory node, allowing for near-sensor cognitive data processing, inserted in an IoT context.
• Its validation on a state-of-the-art data analysis based on a Convolutional Neural Network as an example computational load.
• The evaluation of the effectiveness of in-place computing and operating mode dynamic optimization on ARM microcontroller platform, as a method to reduce the power consumption of the node, on a case study involving classification of ECG data.
3 Adaptive sensor node architecture Figure 1 shows an overview of the system architecture as envisioned in this work. We see the network as composed of three levels. The lower level is composed of the sensor nodes, which acquire information from the environment. They are connected to the upper level using Bluetooth technology. The nodes are capable of reacting, reconfiguring their operating mode, to commands sent from higher levels, or to workload changes that can be detected near-sensor, thanks to the internal component called ADAptive runtime Manager which will be described later. The intermediate level consists of several gateways, in charge of collecting the data from the sensor nodes and send them to the upper level. To test the approach presented in this work, the gateway was implemented with a Raspberry Pi 3 running a Linux operating system.
For the same purpose, the cloud-based infrastructure, on top of the stack, has been implemented using Google App Engine. Data is stored securely on the cloud, and can be used for analysis or simply visualized by a healthcare professional. Such kind of user, accessing a web-based interface, can also send downstream commands to the nodes, to communicate a required change of the operating mode, e.g. changing the needed detail of acquisition of the patient's parameters.
In this paper, attention will be focused only on the sensor node.

IoMT Node architecture
The sensor node architecture itself can be seen as a layered structure, schematized in Figure 2. In the following sections, a detailed description of each level is provided. The bottom layer is the hardware platform, which may be any kind of programmable microcontroller, that integrates sensors to take care of data acquisition, one or more processing elements, to manage housekeeping and pre-processing, and an adequate set of communication peripherals, implementing transmission to the gateway.
The hardware platform is managed at runtime by a firmware/middleware level, potentially including some operating system (OS) support, to enable the management and scheduling of software threads. Moreover, this level must expose a set of low-level primitives to control hardware architecture details (e.g. access to peripherals, frequency, power operating mode, performance counting, etc.), and a set of monitoring Application Programming Interfaces (APIs) to continuously control the status of the hardware platform (e.g. energy and power status, remaining battery lifetime) and to characterize the performance of the different application tasks on it.   At the top of the node structure, there is the software application level, which executes tasks designed according to an adequate application model based on process networks, to be easily characterized and dynamically changed at runtime.
To implement adaptivity, we add to the application an additional software agent, that we call ADAM (ADAptive runtime Manager), which is in charge of monitoring all the events that may trigger operating mode changes (workload changes, battery status, commands from the cloud) and reconfigures the process network accordingly, to minimize power/energy consumption. Reconfiguration actions may involve changes in the process network topology (activation/deactivation of tasks and restructuring of the inter-task connectivity) and playing with the power-relevant knobs exposed by the architecture (e.g. clock frequency, power supply, supply voltage). As mentioned, to assess the feasibility of our approach based on dynamic reconfiguration, we have used a single-core microcontroller, namely an off-the-shelf platform designed by STMicroelectronics named SensorTile. In the following, we will describe the main features of such platforms, exploited in this work.
We have chosen SensorTile to represent a class of platforms available on the market equipped with a single-core low-power IoT nodes, usually integrating a wide scope of sensors and peripherals to increase usability. These solutions often integrate mid-to low-end processing elements, capable of executing simpler near-sensor processing tasks on a low energy budget, using optimized libraries to recover performance and lightweight operating systems to enable the coexistence of multiple software processes.

Hardware platform layer
The SensorTile measures 13.5 × 13.5mm. It's equipped with an ARM Cortex-M4 32-bit low-power microcontroller. The small size and low power consumption allow the device to be powered also by the battery and obtaining good results in terms of autonomy without having to give up portability. Several architectural knobs can be used to adapt the platform to different conditions. SensorTile can work in two main modalities: run mode and sleep mode, in which different subsets of the hardware components are active. Moreover, in each mode, the chip can be set to a different system frequency (from 0.1 MHz to 80 MHz). Depending on the chosen system frequency and operating state, the device uses different voltage regulators to power the chip.
In Table 3, we list some configurations selectable using the mode-management APIs offered by the platform vendor.  For our experiments, we have chosen to use two approaches to dynamically reduce power consumption: • to change system frequency (and consequently voltage regulator settings) over time according to the workload.
• to use the sleep mode of the microcontroller whenever possible. The operating system automatically sets a sleep state when there are no computational tasks queued to be performed and a timer-based awakening can be used to restart the run mode when needed.

Middleware/OS layer
In addition to the API offered by the manufacturer, we used other middleware components to manage multiple computation tasks at runtime and to execute CNN-based near-sensor processing with an adequate performance level.

FreeRTOS
SensorTile runs FreeRTOS as ROTS (Real-time Operating System). This firmware component is aimed at developers who intend to have a real-time operating system without too much impact on the memory footprint of the application. The size of the operating system is between 4 kB and 9 kB. Some features offered by the operating system are real-time scheduling functionality, communication between processes, synchronization, time measurements. One of the most important aspects that led us to choose FreeRTOS is that of having the possibility to enable thread-level abstraction to represent processing tasks to be executed on the platform and to timely manage their scheduling at runtime. FreeRTOS creates a system task called idle task, which is set with the lowest possible execution priority. When this task is executed, the system tick counter is deactivated and the microcontroller is put in a sleep state. Due to the priority setting, the idle task is only executed if there are no other tasks waiting to be called by the scheduler.
FreeRTOS does not natively support the frequency variation of the system. Once the frequency has changed, timing functions would be completely de-synchronized. We had to modify part of the OS support, to enable system frequency changes without impact on the rest of the OS functionality.

CMSIS
In order to be capable of executing in-place processing of the sensed data, we have exploited the Cortex Microcontroller Software Interface Standard (CMSIS), an optimized library specifically targeting Cortex-M processor cores [43]. It includes several modules having many libraries capable of optimizing mathematical functions based on the type of architecture used. Of particular interest is the CMSIS-NN module, inside there are various optimized functions that allow cognitive computational implementations. While CMSIS provides quite extensive support for neural network execution, we had to add some changes to support the use-case that are described in the following, namely to enable mono-dimensional convolutions on one-dimension sensor data streams.

Application model
In this section, we describe the application model that we have used to create and analyze the application, the source code is available at our public repository 1 . We selected an application structure based on process networks. Tasks are represented as independent processes, communicating with each other via FIFO structures, using blocking read and write communication primitives to avoid data loss in case of busy pipeline stages. Processes may be potentially executed in parallel, in case of available processing resources, potentially improving performance using a software pipeline.
In particular, for each sensed variable to be monitored, we build a chain of tasks that operate on the sensed data ( Figure  3).
Get data Process Threshold Send Figure 3: Simple task chain.
A chain of processes is generated for each sensor node, so that, if required by changes in the operating mode, it's possible to dynamically turn on and off the useful and non-useful components.
For each sensor, we envision four types of general tasks: • Get data task: takes care of taking data from the sensing hardware integrated into the node.
1 https://github.com/matteoscrugli/adam-iot-node-on-stm32l4 • Process task: it's possible to have multiple tasks of this type, representing multiple stages of in-place data analysis algorithm. Having more than one task of this type allows a prospective user to select, for example, a certain depth of analysis, which determines an impact on the required communication bandwidth, detail of the extracted information, and power/energy consumption.
• Threshold task: this task allows to filter data depending on the results of the in-place analysis. For example, a threshold task may be used to send data to the cloud only when specific events or alert conditions are detected. Its purpose is to limit data transfers from the node.
• Send task: is the task in charge of outwards communication to the gateway.
Considering the selected process network model, activation/deactivation of tasks or entire chains corresponding to sensors can be implemented by: • enabling/stopping the periodic execution of the involved task; • reconfiguring the FIFOs to reshape the process chain accordingly.
In this way, it's possible to select multiple application configurations, corresponding to operating modes characterized by different levels of in-place computing effort, bandwidth requirements, monitoring precision.

Adaptivity support: the ADAptive runtime Manager
Within the process network, a task was exclusively dedicated to the management of dynamic hardware and software reconfiguration of the platform. We have implemented such reconfiguration in a software agent called ADAptive runtime Manager (ADAM). ADAM can be activated periodically by means of an internal timer. It evaluates the status of the system, monitoring: • reconfiguration commands from the gateway; • changes in the workload, e.g. rate of events to be processed. For example, a task may have to be executed periodically, with a rate that depends on the frequency of certain events in the sensor data. This poses real-time constraints that may be varying over time in a data-dependent manner.
Depending on such input, ADAM can react to change the platform settings, performing different operations: • Enable or disable the individual tasks of the sensor task chain or the entire chain; • Choose whether to set the microcontroller in a sleep mode or not; • Set the operating frequency of the microcontroller to increase/reduce performance level; • Reroute the data-flow managed by the FIFOs according to the active tasks. Figure 4 shows an example of the reconfiguration of the system that may be applied by ADAM, deactivating a process task, to switch from an operating mode that sends pre-processed information to the cloud to another sending raw data.

Get data Process Threshold Send
Get data Process Threshold Send

Designing the application: operating modes and processing tasks
To implement ECG monitoring, we have declined the previously described application model to deploy an adequate waveform analysis application on SensorTile. We built a prototype using an AD8232 sensor module from Analog Devices, connected to the ADC converter integrated into the reference platform. In this section, we describe the supported operating modes, that can be selected at runtime, and the processing tasks coexisting in the different operating modes.

Operating modes
We have enabled three different operating modes to be selectable by the user, by sending adequate commands from the cloud. Operating modes are shown in Figure 5.

Get data Peak CNN Threshold Send
Operating mode 1: Raw data.

Get data Peak CNN Threshold Send
Operating mode 2: Peak detection.

Get data Peak CNN Threshold Send
Operating mode 3: CNN processing. Figure 5: EEG application model.

Operating mode 1: Raw data
The first operating mode envisions sending the entire data stream acquired by the sensor node to the gateway. There is therefore no near-sensor data analysis enabled, and it poses fairly high requirements in terms of bandwidth. In this operating mode are: • Multiple samples are been grouped and inserted into a packet of 20 Bytes (8 ECG data 16 bit, 1 timestamp 32 bit).
• The sample rate of the ADC is set to 330 Hz, considering sending multiple samples at a time, one Bluetooth packet is sent every 24 ms.

Operating mode 2: Peak detection
This operating mode does not provide visual access to the whole ECG waveform. A healthcare practitioner, when selecting this mode when accessing the data, can select to monitor only heartbeat rate, requiring a lower level of detail in the information sent to the cloud. He could also set thresholds and receive notification only when thresholds are exceeded. In this operating mode, four tasks are active: • Get data task  Figure 6: Filtering block diagram.
• Process data (peak detection) • Threshold task (alert heartbeat rate evaluation) • Send task This operating mode processes samples to search for signal peaks and consequently computes the heartbeat rate. The first task ( Figure 5) collects data from the sensor (as in raw data operating mode), the second analyzes the signal analysis and calculates the heart rate, and the fourth allows data transmission. The threshold task is used to determine if data must be sent to the cloud. For example, no data is sent if the heartbeat rate is controlled between two high and low alert values. The peak detection algorithom it's not very critical in terms of time and power consumption, it will be better discussed in Section 5.2. The size of the package sent is 5 Bytes packet (1 heartbeat rate value, represented on 8 bit, 1 time-stamp 32 bit). The transmission rate is given dependent, in the worst case a package is sent for each peak detected.

Operating mode 3: CNN processing
In the latter operating mode, a further level of analysis is introduced. An additional task implements a convolutional neural network, classifying the ECG waveform to recognize physically relevant conditions. Using such classification technique, the practitioner can monitor the morphology of the signal without the need of sending the entire data stream to the cloud, saving transmission-related power/energy consumption. The neural network implemented recognizes anomalous occurrences in the ECG tracing, in this case, communications with the gateway occur only in case of anomaly detection. The enabled tasks are: • Get data task • Process data 1 task (peak detection) • Process data 2 task (CNN) • Threshold task (anomalous shapes in the ECG waveform) • Send task The required communication bandwidth is more similar to peak detection o.m. than raw data o.m., however, with respect to peak detection o.m., computing effort is higher. The node executes the 1D convolution neural network similar to the one described in [44]. We have designed the system be capable of classifying ECG peaks according to alternative sets of categories, each composed by 5 classes, named NLRAV and NSVFQ (see Figure 9). The design process used to select, train and deploy the specific neural network topology is explained in Section 5.3.
The size of the data transferred to the cloud is 6 Bytes (1 heartbeat data 8 bit, 1 label data 8 bit, 1 timestamp 32 bit).

The peak detection algorithm
The processing of the ECG signal is activated in peak detection and CNN o.m., in both operating modes it's necessary to identify the R peaks in the signal, therefore a simplified version of the Pan Tompkins algorithm was used in order to obtain the position of the R peaks during data acquisition from the sensor. The reference study to implement the R peak recognition algorithm is [45], the Figure 6 shows the block diagram representing the signal processing. Figure 7 shows the raw signal in blue color and the filtered one in red from two different recordings. A peak is detected when a filtered signal exceeds a predefined threshold, then returns to a local minimum point and the delay introduced by the filter is taken into account, the threshold value may be set differently for each recording.
A detected peak is considered a true positive when it is associated with a dataset peak in a neighborhood of 50 samples within the track under analysis. Equation 1 and 2 shows the sensitivity (true positive rate) and precision (positive  predictive value) data of the peak detection algorithm on the MIT-BIH arrhythmia database: Figure 8 shows true positive, false positive, false negative of the peak detection algorithm with a tolerance of 50 samples.

Designing the CNN: training and optimization
We have exploited a training procedure using and comprising a static quantization 2 step, the source code is available at our public repository 3 . This process enables the conversion of weights and activations from floating-point to integers and allows to implementation of the CNN using the CMSIS-NN optimized function library, which expects inputs represented with 8-bit precision. In static quantization 2 , which takes place right after quantization, float values are converted to qint8 format. We set the procedure to force bias values to be null, while, to quantize weights, MinMax observers 4 are inserted inside the network to detect the output values dynamics in each layer. On the basis of the reported distribution, scale and zero-point values are selected and used to convert effectively and prevent data saturation.
The functions implementing convolution and fully connected layers in the CMSIS-NN library provide for output shifting operations to apply the scale factor on the outputs, allowing for scaling values ranging from -128 to 127. The quantization procedure in PyTorch, on the other hand, requires a scale value that is not necessarily a power of 2. For this reason, we slightly modified the CMSIS functions to support arbitrary scale values. Such modification has led to a limited increase of the inference execution time. As an example of such performance degradation, we report here the execution time increase for two examples CNN topologies, named 20_20_100 and 4_4_100 networks (network name indicates the main topology parameters as conv1OutputFeatures_ conv2OutputFeatures_ fc1Outputs), corresponding to respectively 2,87% and 10.52%.

Model exploration
In order to select an optimized CNN topology implementing the classification task required for the system, we have carried out a design space exploration process, comparing tens of neural network topologies in terms of accuracy reached after training and in terms of computing workload associated with executing the inference task on SensorTile.
We have explored multiple topologies composed by two convolution layers, two down-sampling layers, and two fully connected layers, as represented in Figure 9, the size of the input sample frame is equal to 198.   Explored topologies feature different numbers of output channels from each layer. The results are reported in Figure  10, showing the most interesting results for both the NLRAV and NSVFQ classes. Models NLRAV_20_20_100 and NSVFQ_20_20_100 achieve the highest accuracy value as shown in Equation 3 and 4. The training set is compoused by 70% of the elements of the entire dataset and they are chosen randomly. Figure 11 shows the trend of the accuracy value during the training stage. Figure 12: For the most accurate models, the energy consumption for a single CNN task call is shown. The dotted lines represents the maximum allowable drop in accuracy (0.5% with respect to the most accurate model) for NLRAV (red line) and NSVFQ (blue line) classes. The models marked with an "×" do not respect the constraints imposed on the minimum necessary accuracy value. Figure 12 shows a Pareto plot representing accuracy and energy consumption for the most accurate topologies identified by the exploration.
For both classes NLRAV and NSVFQ, only one neural network model must be selected which allows to reduce power consumption as much as possible but, at the same time, does not lead to an excessive drop in accuracy. A maximum accuracy drop equal to 0.5% with respect to the most accurate model (represented in Figure 12 by the dotted lines) was chosen. The reported energy consumption is associated with a single CNN inference task execution on SensorTile. Models that are above the 0.5% threshold are considered to be valid, and, for each set of labels, the valid model that consumes less energy is chosen to be refined in the next steps and deployed on the board. Eventually, we have selected NLRAV_4_4_100 and NSVFQ_4_4_100. The accuracy values are reported in Equation 5 and 6.

Post-deployment degradation and Refinement with Augmentation
The ECG peaks in the reference dataset are perfectly centered in the frame of samples that is received in input by the CNN during the trainig stage. As a consequence, the network is trained to recognize the chosen classes as long as the peak is centered in the signal frame. The peak detection algorithm on the SensorTile, on the other hand, operates online on the incoming signals and would not always detect the peak in the same position specified in the dataset.
To assess the accuracy degradation after the deployment, we report post-deployment accuracy values by: • considering false positive and false negative peaks produced by the peak detection algorithm, which need to be accounted for in Equation 5 and 6.
• using a post-deployment validation dataset, composed by the same samples in the original one, but modified to be centered as dictated by the peak detection algorithm during online analysis.
In these conditions, accuracy degrades to 94.52% and 94.09% for NLRAV and NSVFQ respectively.
To overcome the deriving inaccuracy, the chosen networks have been retrained for refining their precision in case of imperfectly centered input frames. We have used a data augmentation technique, depicted in Figure 13, that adds to the training set 33 decentralized copies of each original peak in the dataset, each shifted by 3 samples with respect to the previous one.
As expectable, augmentation reduces the specialization of the CNN on the perfectly centered validation set, sightly reducing the pre-deployment accuracy to a value of 98.37% and 97.76% for NLRAV and NSVFQ respectively. On the other hand, ECG recordings with anomalous peaks, that are difficult to be perfectly centered by the peak detection algorithm, are expected to be classified much more accurately, as will be shown in Section 6.

Experimental results
In this section, we show our main experimental results. We first show a detailed accuracy evaluation to show the effectiveness of the data augmentation procedure and the class-level classification capabilities of the designed CNNs. Moreover, we present measures of the energy consumption of the entire system and we highlight the energy contributions of each task. To estimate power consumption in each operating mode, we have performed a thorough set of experiments measuring energy consumption in different setup conditions. The results were used to create a model highlighting the contribution of each task to the energy consumption of the node.

Post-deployment CNN accuracy
As mentioned above, when considering the ideally centered samples in the dataset, the selected CNNs are very accurate. The precision of the classification, however, decreases significantly when peaks are detected online and imperfectly centered. As a solution to such accuracy degradation, we have enriched the training set with samples derived from the original ones by applying some artificial shifting. To prove the obtained improvements, we report a detailed classification analysis. In Table 4, we report the number of false positives and false negatives cases resulting from the peak detection algorithm, and we classify the remaining cases, true positives, with the neural network selected for NLRAV and NSVFQ classes. Such classification is executed on the post-deployment validation set mentioned in Section 5.3. The improvement in post-deployment accuracy after data augmentation is shown in Equations 7 and 8.
Data augmentation techniques allows recovering most (around 2.9%) of the drop due to imperfect centering of the input ECG peaks. Data augmentation has obviously no effect on the drop due to misdetections, which still determine 1.7% degradation with respect to the pre-deployment phase. Figure 14 shows a more detailed view of the effects of the quantization procedure and of the augmentation on the accuracy, focusing on the classification of the peaks detected online. The two leftmost plots represent the accuracy levels when no augmentation is exploited. The accuracy, as can be noticed in the leftmost bar of each plot is very high, with small variability over the different tracks, and is only slightly decreased when quantization is applied to obtain a fixed-point implementation. However, when considering the positioning of the peak as identified by the online detection, as shown in the two rightmost bars of each plot, precision degrades on some of the tracks, as can be noticed by the presence of multiple outlier tracks with very bad classification accuracy. This happens independently on the data representation format since the behavior is similar for both the fixed-and floating-point implementations. The two graphs on the right show the impact of data augmentation. As may be noticed by the rightmost bars in these two plots, general accuracy is significantly improved: classification works correctly for all the tracks and even the outliers show an accuracy higher than 90%.
Finally, Table 5 summarizes the results in terms of neural network accuracy on MIT-BIH dataset. We compare to the works discussed in Section 2 that deals with inference at-the-edge. As may be noticed, our system provides accuracy that is higher or very close than the alternatives, despite being, to the best of our knowledge, the only work actually  Table 4: False positives and false negatives cases resulting from the peak detection algorithm and classification with remaining true positive cases for NLRAV and NSVFQ classes using CNNs trained with augmentation techniques. Figure 14: Taking into consideration the true positive peaks obtained with a tolerance equal to 50 samples, the statistical distribution of the accuracy values for each ECG recording, obtained from the classification on the validation set, is represented. The floating-point and fixed point models are tested, inference with centered and non-centered peaks is also tested. In orange, the median value.   Figure 9).  Figure 15: The graph summarizes the energy consumption for different heartbeat rates, when data sending is enabled (Tx) or not (No Tx). Raw o.m. does not depend on heartbeat nor on the threshold settings and the threshold task is disabled, so only one value is shown. evaluating post-deployment accuracy, and considering all the contributions to errors deriving by all the steps in the online processing system.

Power consumption measures
In order to measure the device's power consumption, we monitored the current absorption through an oscilloscope and a Shunt resistor. The Figure 15 shows some data on power consumption derived from experimental results, the individual cases will then be taken and discussed.

Case: 50 bpm
With low heart rates values, considerable energy savings are obtained even without adapting the system frequency to the workload. In fact, peak detection o.m. and CNN processing o.m. are workload-dependent, which in this case is low. For the latter reason, they consume less than the raw data o.m., which constantly sends data to the cloud. There is a further energy saving given by the reduction of the system frequency according to the workload, in this case the peak detection o.m. is set to 2 MHz and the CNN processing o.m. is set to 4 MHz. The raw data o.m., the worst case, works   always at 8 MHz. The Figure 15 also shows the power consumption values if data transmission to the cloud is present or not (Tx, No tx), as already said, the decision is up to the threshold task.

Case: 100 bpm
Peak detection o.m. and CNN processing o.m. keep the same operating frequency of the previous case. Thus there is a slight increase of power consumption for such modes, only due to the more intense data-dependent workload. Obviously, no change in raw data o.m. in terms of power consumption.

Case: 200 bpm
Compared to the previous cases, again, there are no changes in the raw data o.m.. A increase of the working frequency to 8 MHz is required to sustain the CNN processing o.m.. The role of the threshold task, that implies the difference between the Tx and the No Tx bar for the CNN processing o.m., is more important. Even with this very high rate, the CNN-based monitoring is still convenient with respect to the raw data o.m., confirming the usefulness of near-sensor processing.

Power model and operating mode power consumption estimation
We have performed a thorough set of experiments measuring energy consumption in different setup conditions. The results were used to create a model highlighting the contribution of each task to the energy consumption of the node. By interpolating the experimental results on power consumption in the different use cases and knowing the duration of each task, we were able to build a model capable of estimating the energy consumption of the device under each possible use case. Table 6 shows the energy values for each task in the process network. Table 7 instead shows the power consumption of the platform in idle state and ECG sensor.
At this point it's possible to easily estimate the power consumption relative to each operating mode, the resulting equations that calculate the power consumption for each operating mode are: • P raw data o.m.
(E g + αE s ) · f s + P idle + P sensor Figure 16: Estimation of energy consumption for each task of each operating modes at 60 bpm.
• P peak detection o.m. E gp · f s + (E t + E s ) · f p + P idle + P sensor • P cnn processing o.m. E gp · f s + (E c + E t + E s ) · f hr + P idle + P sensor where: • f s is the sampling frequency, • f hr is the heart rate, • f p is the peak data sanding frequency, • α −1 it's the number of samples inserted in a BLE package, • P idle power consumption of the platform in idle state, depends on the system frequency, • P sensor energy consumption of the ECG sensor. Figure 16 shows the estimate of the power consumption of the device and the contributions of each task in case the heart rate is around 60 bpm.
The following list shows the estimated battery life (600 mAh, 3.7 V Li-Ion) for each operating mode: •

Conclusion
We have defined a hardware/software template for the development of a dynamically manageable IoMT node, studied to execute in-place analysis of the sensed physiological data. Its implementation has been tested on a low-power platform, able to exploit a CNN-based data analysis to recognize anomalies on ECG traces. The device is able to reconfigure itself according to the required operating modes and workload. The ADAM component, able to manage the reconfiguration of the device, plays a substantial role in energy saving. A quantized neural network reaches an accuracy value higher than 97% on MIT-BIH Arrhythmia dataset for NLRAV and NSVFQ diseases classification. We measured an energy-saving up to 50% by activating in-place analysis and managing the hardware and software components of the device. This work demonstrates how the feasibility of increasing battery lifetime with near-sensor processing and highlighting the importance of data-dependent runtime architecture management.