Hybrid Anomaly Detection Model on Trusted IoT Devices

Most machine learning proposals in the Internet of Things (IoT) are designed and evaluated on preprocessed data sets, where data acquisition and cleaning steps are often considered a black box. In addition, IoT environments have numerous challenges related to acquiring data from sensors, where sensitive data can be threatened by malicious users who seek to interfere with the communication channel or storage. Additionally, sensor data can also be affected by noise. Therefore, differentiating the type of threat/anomaly requires additional energy and computational resources. We propose to carry out data cleaning/anomaly techniques on the IoT device itself, not in the cloud servers but closer to the data source. Therefore, the IoT device sends trusted data to the Cloud. Among the benefits of this is the considerable reduction in the cost of implementation due to less movement of data between IoT devices and the Cloud. Consequently, we define three anomaly detection steps using smoothing filters, unsupervised learning, and deep learning techniques (i.e., hybrid model) to detect different variations of anomalies and threats while focusing on a small computational/memory footprint. The deployment of the hybrid model on AVR, Tensilica, and ARM microcontrollers showed that the last ones are an adequate target to implement the model because they best satisfy the necessary hardware requirements. The proposed model consumes 50 kB of Flash and 12 kB of RAM and processes data locally, achieving a bandwidth reduction of 60%. Finally, the hybrid model was tested in external data sets.


I. INTRODUCTION
N OWADAYS, many research fields use machine learning (ML) techniques due to their capability to identify and describe patterns [1]. Often, the data sets from the Internet of Things (IoT) applications are time-series data since the source   of the data set is sensors and similar devices "interacting" with the real world, sending their collected data to highperformance servers located in the cloud [2], [3]. These servers are the ones used to run ML algorithms because such algorithms are computationally complex-naturally, much effort in ML proposals goes into optimizing the computations in the cloud and making such servers run more efficiently. Nevertheless, many proposals that apply ML ideas on top of IoT/sensor data assume that the data set has been "cleaned" and is typically treated as a black box [4].
In the IoT space, the ability of electronic devices to work close to the end users or interact directly with the environment could generate incorrect sensor readings because they are used in indoor and outdoor environments, often with low/no maintenance [5], [6]. Additionally, extreme weather conditions can impair performance and cause component wear [7]. Therefore, IoT-based solutions can also help to detect hardware damage due to environmental conditions. For example, IoT systems can use the borescope inspection technique in monitoring and maintaining the health of aero-engine blades and, with the help of deep learning techniques, can produce an intelligent diagnosis. This being a topic of upmost importance, because it involves the safety of human beings [8], [9], [10]. As a result, IoT devices periodically send collected data to the cloud for further processing. By some estimates, 80% of the battery consumption of IoT devices is related to wireless communication with the edge/cloud [11]. Moreover, in some scenarios, IoT devices send data to the cloud without bandwidth restriction/limitation carrying on bottlenecks and unnecessary storage on the server when data is not processed for a specific target [12].
Since IoT devices gather data from different scenarios, data protection is commonly needed to acquire, manage, send, and process them [13]. Therefore, sensitive data could be threatened or compromise their integrity, and final users or companies can lose control when they are stored in cloud servers, causing unpredictable outputs from the ML model [14]. Consequently, the main security/privacy concern is managing the high quantity of generated data to protect storage and communication. Therefore, detecting anomalies/threats must be done before sending data to the cloud [15]. Thus, cloud computing could consider that the IoT device is trusted since the sensor data is secure to manage/process [16]. Consequently, anomaly detection techniques allow recognizing if data is statistically different from the rest, and such differences can be divided into intrinsic error (genuinely fault sensor), sensor event (an actual event that increases or This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Fig. 1. Solid black arrows, traditional data movement; dotted arrows, lightweight ML models exported to the IoT device from different locations; and square dotted arrows, hybrid model allocated in memory. In addition, the sensor side comprises the devices along with microcontrollers. Furthermore, the edge side consists of gateways with capabilities to deploy data preprocessing models, payload errors, empty cells, and manning the wireless communication architecture. Finally, the cloud side is a powerful server with the ability to train heavy ML models. decreases its value dramatically), and intermittent sensor error (data tampering) [17].
The most representative anomaly detection techniques fall into four categories: 1) statistical; 2) unsupervised; 3) supervised; and 4) deep learning. However, single models have yet to prove exceptional performance [4], [18], [19]. Moreover, several works focused on deploying a single anomaly detection model show that such setups do not prevent the possibility of sending erroneous data to the cloud [5]. Therefore, in recent years, hybrid models are getting more attention and demonstrate prominent advantages when a set of anomaly detection techniques are merged into one single solution [11]. Nevertheless, a hybrid system with several anomaly detection algorithms is typically deployed outside the IoT device due to computational constraints, limited storage capacity, and battery power limitations [20].
This article aims to reduce the cost of processing/detecting anomalies in the cloud and ensure data integrity by processing data locally. Following novelty decentralized computational architectures [19], this work achieves this goal by designing and deploying a hybrid ML model that detects anomalous data and threats by a threat model on IoT devices. However, there are open challenges, such as the IoT device might recognize either data is compromised by sensor fault or is a security threat that someone is tampered with the sensor to inject errors. Besides determining the optimum steps to deploy a hybrid anomaly detection model on the device. Related to the literature, the most representative anomaly detection techniques are as follows: noise reduction [3], [21], outlier detection [22], [23], [24], and tamper/threat detection [17], [25]. Based on [18] and [26], we conducted an extensive literature review focused on determining the underlying algorithms and how these could be ported to emerging microcontrollers with better computational capacity. Therefore, we explore a range of possible algorithms to determine how well they work on small IoT devices.
To prove our proposal on how to develop the hybrid model locally, Fig. 1 shows in solid black arrows how traditionally data is sent to the Edge where it is stored. Then, from the cloud side, computer scientists query data to develop ML models [27]. Next, the inference goes back to the Edge to update the firmware to send it back to the IoT device. In contrast, with the computational power of new microcontrollers, the dotted arrows show how some ML algorithms that typically work outside the device can be quantized to be processed locally in the IoT memory (shown in square-dotted arrows in Fig. 1). As a result, our proposal aims to reduce the data movement, latency, and risk of compromising data sending them through communication networks. The contributions of this work can be summarized as follows.
1) We propose a three-step hybrid anomaly detection model targeting to run on an IoT device considering the most relevant anomaly detection approaches (noise reduction, outlier detection, and tamper/threat detection) and evaluate them in terms of accuracy and metrics like memory consumption and bandwidth on the IoT device. 2) Our results demonstrate that smoothing filters, specifically the Gaussian algorithm, are the most effective for improving samples for the first noise reduction step. Then, unsupervised learning methods are suitable solutions for the second outlier detection step. Especially, the one-class support vector machine (OSVM) algorithm reduces scatter significantly when obtaining cleaned samples. Finally, for the third tamper detection step, we design a threat model to define assumptions where the IoT device must classify either a tamper/threat situation or regular sensor reading. As a result, a model based on deep neural networks can detect data tampered adequately whether the algorithm is trained with the tamper possibilities. 3) After determining the most suitable technique for each step, the hybrid anomaly detection model is tested in real environments with three microcontroller families, demonstrating that ARM microcontrollers are an adequate target to deploy this hybrid model satisfying the necessary hardware requirements and reducing the bandwidth need between IoT devices and the Edge/Cloud. The remainder of this article is structured as follows. Section II shows related works. Then, Section III presents the hybrid anomaly detection model proposed. Next, results are discussed in Section IV. Finally, Section V presents the conclusions of this article and future work.

II. RELATED WORKS
There are several works with different approaches to detecting anomalies. Consequently, works, such as Ghosh et al. [28], Gaddam et al. [29], Choi et al. [1], and Cook et al. [30], presented earlier state-of-the-art studies showing the most relevant techniques used for detecting anomalies. These works offer statistical-based approaches, which assume some probabilistic data distribution categorized into either parametric or nonparametric methods.

A. Traditional Models
Anomaly detection in IoT environments got attention since Zhang et al. [31] presented the first intuition about it and how data with errors can impair the actuation of control systems. This work opens opportunities to other researchers like Shen and Wang [32], which presents the importance of the data validation stage before processing them. Later, Chacko and Bharati [33] presented the risks of bringing incorrect information to the cloud and how lightweight ML algorithms can be compiled in IoT environments. Besides, Yu et al. [5] presents a principal component analysis-based data outlier detection and sensor data. This work defines an architecture for sensor aggregation in the cloud. Then, Nesa et al. [22] considers different techniques to reach outlier detection. As a result, this work presented metrics to compare those algorithms with classification and statistical approaches to determine the performance of each technique. Wang et al. [34] used several algorithms, such as isolation forest (ISO) and local outlier factor (LOCAL), and their mathematical approach to present their isolation nearest neighbor ensemble (iNNE) technique. Conversely, Sood et al. [17] presented the first initiative to define a high-level tamper detection combining some algorithms with different approaches: statistical, supervised, and unsupervised techniques. Therefore, the model complexity increases at the same time as its robustness.

B. Hybrid Models
Recently, anomaly detection has been studied based on new trends when the flow of information processed by the IoT data has grown exponentially. For these trends, Zhu et al. [11] proposed a novel framework named gridbased approximate average outlier detection (GAAOD) to support KNN-based outlier detection over IoT streaming data. Furthermore, Zhang et al. [4] proposed a three-level hybrid model based on: median filter (MF), empirical mode decomposition (EMD), classification and regression tree (CART), autoregression (AR), and exponential weighted moving average (EWMA) methods (called MF-EMD-CARTAR-EWMA) to detect outliers in sensor data. The hybrid model was evaluated using a real hydrometeorological observation network data set. Then, Wu et al. [3] defined a long short-term memory (LSTM)-Gauss-Bayes method, which is a synergy of the LSTM neural network (LSTM-NN) and the Gaussian Bayes model for outlier detection in the IoT devices. Next, regarding the new tendency to use decentralized computational systems, Garmaroodi et al. [19] presented an intelligent IoT system applied in pharmaceutical systems to detect anomalies in water purifying pumps. In addition, Liu et al. [37], and Wang et al. [38] demonstrated an application of federated learning in anomaly detection in industrial environments using hybrid models. Furthermore, Liu et al. [26] and Wang et al. [38] deployed a decentralized solution where edge devices collaborate to train anomaly detection models. Besides, Garmaroodi et al. [19] presented two anomaly detection approaches running on the device. Finally, Zhang et al. [36] proposed a data edge verification mechanism based on blockchain to ensure that data is not tampered with. As illustrated by these works, a new trend exists to deploy a decentralized computational architecture where IoT devices are part of the ML pipeline running the inference of the model.
Summarizing the most relevant works and their approach, Table I shows the applied algorithm, the used algorithm, and the location where the ML algorithm was developed. Also, Table I shows relevant information about the algorithm selection for the following sections presented in this research. However, there are open challenges even when previous works presented several solutions to face anomaly detection in sensor data and IoT devices. For example, bringing these solutions into the IoT device due to the new computational capabilities of the microcontrollers. Moreover, several works present anomaly detection hybrid models using different ML methods. Nevertheless, tamper detection techniques are not deployed as part of hybrid models, even with the increasing concerns about data confidentiality.

III. HYBRID ANOMALY DETECTION MODEL
ML applications in IoT environments start with the data collection by sensors, the next step is data anomaly detection. However, even though usually just the model inference step is deployed on the IoT device [39], only some techniques can be exported to IoT devices due to their expensive hardware requirements. In addition, these anomaly detection techniques are applied in different ways to reach individual purposes, such as noise reduction (improved samples), outlier detection (cleaned samples), and tamper detection (processed samples). Therefore, it is necessary to present a hybrid model merging the mentioned methods to detect erroneous, unpredictable, and unexpected data [40]. However, this hybrid model must be compiled with the minimum computational footprint [41]. As a result, the IoT device stores in memory and runs in sequence different ML models related to detecting all the anomaly possibilities. Fig. 2 shows the following steps of the proposed hybrid model and the corresponding data flow, starting with the original samples, which have noise (represented by variations), outliers (represented by red circles), and tampered data (represented by red dots).

A. Threat Model
The hybrid anomaly detection model defines tamper detection as the final stage after sending data to the Edge/Cloud. Therefore, the threat model proposed identifying the potential vulnerability as follows.  1) Sensors could be damaged or worn by harsh conditions and this can affect their calibration. Therefore, the noise reduction by the smoothing filter removes the noise and drift of the samples. 2) Since the IoT device is placed in this specific environment, the sensor could be relocated to confuse the data acquisition stage. Therefore, the outlier detection stage faces this security flaws in detecting data with different distributions. 3) Whether a malicious user tries to insert erroneous data into the IoT device, the tamper detection stage has previous information on all tamper possibilities to detect them of regular readings and send an alert.

B. Data Collection
We propose a controlled data acquisition environment where data manipulation is doable and the hybrid model can be tested. Therefore, we employ a well-known CO 2 , temperature, and humidity sensor SCD-30, which is applied in several indoor and outdoor system applications [42]. First, the electronic system has one microcontroller, the SCD30 sensor, and battery, which receives data in normal conditions in an office where people work for around 7 h in groups of two of three people (Label 1). Second, people try to tamper with the sensor by blowing on it (Label 2). Third, people smoke near the sensor to activate the fire alarm (Label 3). Finally, we simulate a fire by putting fireballs in a basket near the sensor (Label 4).
However, because the data needs a couple of hours to get tiny variations and the security of the building can be affected if malicious users try to tamper with the sensor, we create a test box (scaled size of a standard office) to accelerate the data acquisition process. Fig. 3 shows the design with a computer fan to cool the box, an incandescent bulb to warm up the box, and a small door that the user can tamper the sensor directly. To avoid noisy samples for past tests, the test box can be disassembled to restore the settings for new tests. The sample rate for label 1 in offices was 10 min. Then, the trend was calculated to define the same seasonality behavior in the text box, which is to take samples every 5 s. As a result, the data was stored into a matrix X ∈ R m×n , where m is the number Fig. 3. Test-box designed for data acquisition process: 1) computer fan, 2) light bulb, and 3) electronic circuit. of samples and n amounts the number of variables (that is to say, the number of attributes representing each sample). Meanwhile, L ∈ R m×1 is a vector holding the sample labeling. In this case, m = 1500 and n = 3.

C. External Data Sets
External sensors data sets were considered to prove this hybrid model proposed. Therefore, the first data set (EDS1) comprises eight MOX gas sensors and a temperature and humidity sensor to collect data from home activity with two stimuli: 1) wine and 2) banana. Therefore, when the system detects background activity, this is considered a threat situation since users attempt to tamper with sensors by using them without stimuli and get erroneous readings. Then, the second data set (EDS2) detects fire detection in Algeria. It uses ten different sensors to classify fire and nonfire. However, there are erroneous samples when people burn paper or there are strong winds that affect the sensor calibration. As a result, this issue is considered an anomaly behavior. Next, the third data set (EDS3) uses five different QCM gas sensors to recognize five different gas measurements (i.e., 1-octanol, 1-propanol, 2-butanol, 2-propanol, and 1-isobutanol). Thus, isobutanol is a flammable liquid combined gas, reflecting machinery malfunctions. Finally, to compare with a previous hybrid anomaly detection model, the fourth data set (EDS4) is a real data set gained from physical sensors (indoor and outdoor). Two scenarios for each type of sensor (indoor and outdoor) are applied: single-hop and multihop cases. The single-hop case consists of a sensor and head station, plus a router in the multihop case. Also, this data set is used in [18]. Finally, Table II describes the data sets.

D. Noise Reduction
A phenomenon can be described by static data, time series, and images. Therefore, sensors measure the magnitude of a quantity and convert it into an electric signal at a specific sample rate (original samples) [4]. Then, the signal is discretized/digitalized to be understandable to the microcontroller. However, errors like voltage fluctuations, nonlinearity response, and vibrations insert noise into the electric signal, confusing the ML algorithm in the feature extraction stage [43]. For this reason, signal smoothing is a filter that reduces these noise components when the phenomenon does not have high sampling frequencies getting a cleaner signal [44]. Consequently, Moving Average, Mean, Gaussian filter, and Savi-Golay are the most relevant smoothing algorithms [45]. In this hybrid model, the output of this step is called improved samples.

E. Outlier Detection
This stage is responsible for detecting data with a different distribution than the rest. This process is carried out through an unsupervised analysis [4]. As a result, a refined data subset called cleaned samples is obtained. Outlier detection methods used for this stage related to the literature review are as follows.
1) The standard deviation method (STD) quantifies the variation or spread of a numerical data set in which two standard deviations represent 95% of the data. Therefore, any data point outside of this boundary is removed [21].
2) The LOCAL measures the local deviation of the density of a given sample concerning its neighbors. It is local in that the anomaly score depends on how isolated the object is from the surrounding neighborhood [23]. 3) ISO is based on modeling the expected data in such a way as to isolate anomalies that are both few and different in the feature space [22].

4) Elliptic envelope (ELLI) detects outliers in a Gaussian
distributed data set. It provides the Contamination argument that defines the expected ratio of outliers observed in practice [23]. 5) OSVM captures the density of the majority class and classifies examples on the extremes of the density function as an outlier [24].

F. Tamper Detection
In some scenarios, the information acquired by the IoT device can be compromised by malicious users trying to steal or modify data [27]. Therefore, the threat model works to identify and detects threats regarding defined assumptions and scenarios to avoid security flaws [46]. Consequently, since the IoT device aims to send less data to the Edge/Cloud, and its memory stores one application (i.e., the IoT device does not allow multitenancy), we consider sensors could be tampered with. With this assumption, on the one hand, malicious users can modify the incoming data when the ML model is learning, and its rules could be inaccurate. On the other hand, the IoT device may have the proper model, whereas users try to insert erroneous inputs creating manipulated environments (e.g., activate the fire alarm by putting a lighter near the smoke sensor) [47].
Due to the above-mentioned statements, the tamperdetection stage requires knowing all the manipulation possibilities to mitigate them, which needs to deploy supervised learning [48]. Therefore, the IoT device was also trained when a malicious user could modify the standard conditions of sensors by blowing on them when collecting data (label 2). Besides, a malicious user could attempt to trigger the fire alarm by smoking at the sensor (label 3). As a result, when the IoT device detects a potential threat, it will decide to eliminate the sensor reading (label 2) or send an alert to the Edge/Cloud (label 3). This procedure is the last step of the data flow called processed samples. Finally, the algorithms used to reach this goal are presented as follows.
1) k-nearest neighborhood associates a new instance with the training base and assigns it to the closest group according to its distance [4]. 2) Naive Bayes describes the probability of an event happening based on previous knowledge of the event [25]. 3) Support vector machine (SVM) has kernel functions that put the data in hyperplanes to improve the decision boundaries [21]. 4) Decision tree is a heuristic model based on rules [18]. 5) Neural network is a recursive training model [2].

IV. RESULTS
This section evaluates the choice of algorithms listed for each step in Section III and determines a target microcontroller family to deploy the final version of the proposed hybrid model.

A. Experimental Setup
ML models are trained on an external computer [core i7 with 32 GB of random access memory (RAM)] with the previously described data sets. Therefore, regression analysis is used to observe how smoothing algorithms soft the signal in the noise reduction step, which improves samples (according to the data flow presented in Fig. 2). Then, the ANOVA statistical metric determines the improved variability of the samples and how unsupervised models reduce the scatter, obtaining cleaned data. Next, the classification performance determines which classification method is adequate for detecting tampering in the cleaned data to send to the cloud just processed data. Finally, selected algorithms are exported as a .h file (i.e., header file referenced by a document written in C) to determine the memory consumption. It is important to mention that the original samples were divided ten times randomly to avoid bias in selected algorithms.

B. Improved Samples
The data-gathering process is affected by noise injection from the abovementioned sources. Therefore, time series analysis provides many techniques to understand a data set better. Perhaps, the most useful of them are Level (i.e., the base value if it were in a straight line), Trend (i.e., the linear increasing or decreasing behavior), Seasonality (i.e., the repeating patterns), and Noise (i.e., the variability of the observations). In this way, Fig. 4 shows these components from the CO 2 variable as an example. Therefore, smoothing filters are required, where noise samples cancel each other out.  The above-mentioned smoothing filters are deployed to reduce the data set noise and get improved samples. Hence, regression analysis such as the root-mean-square error (RMSE) determines how close the observed data points are to the filter-smoothed values. In addition, the R2 metric presents the fitness of the filter, and the signal-to-noise ratio (SNR) shows improvement in the quality of the relevant signal. Therefore, the metrics mentioned above were used to select a suitable smoothing filter. As a result, even though the MF has a better RMSE metric for each signal (i.e., minor value), its output is very similar to the signal input and does not reduce noise adequately. Conversely, for the CO 2 , temperature, and humidity, the Moving average filter has both better SNR and R 2 metric. Therefore, this filter is selected for scoring higher in the evaluated metrics. Fig. 5 shows graphically how the moving average filter refines the CO 2 , and Table III summarizes the values obtained from each metric where k stands for windows size.
Once the average filter proves to be the suitable solution to smooth signals, it is applied to the external data sets. Fig. 6 shows the result of one smoothed variable of each data set.

C. Clean Samples
The data set maintains outliers with different distributions. Therefore, statistical methods can describe the variability of Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. the entire data set, and ANOVA analysis can be used for this purpose. This analysis splits the data set into systematic factors (i.e., data with normal distribution) and random factors (i.e., outliers). Then, unsupervised ML methods can prune these random factors and increase the decision edges distance between labels. Indeed, these outliers detection methods can reduce the data complexity, and the ML model can reach high throughput.
The mentioned-above outlier detection approaches are applied to the data set to get different improved/pruned subsets (all methods prune data around 15%) and test them with ANOVA analysis. Consequently, the standard deviation and LOCAL algorithms keep the data set scattered, which can cause confusing decision edges between labels. On the other hand, the ISO algorithm concentrates the data due to the central point (mean). Furthermore, ELLI and OSVM have similar results in maintaining data characteristics while reducing scatter. For this reason, we selected the standard deviation, Isolation Forest, and OSVM criteria, representing improvements in reducing data scatter to determine in the next step which of them better eliminates outliers improving the classification algorithm performance. Finally, Fig. 7 shows the variability of each method obtained by ANOVA analysis. Given that OSVM is the suitable solution to remove outliers, this method was applied to the external data sets.

D. Processed Samples
This stage aims to detect whether the sensor has been tampered with. Therefore, it is necessary to detect labels 2 (people blowing on the sensor) and 3 (people smoke near the sensor to active fire alarm). Hence, classification algorithms were trained with clean subsets (obtained in the previous section) and the original samples to determine improvements in the classification decision boundaries. Table IV shows the classification performance of each applied algorithm. As a result, the original samples demonstrate outliers because ML models have issues classifying incoming data correctly. In addition, despite the subset In contrast, ML models have high classification performance when they are trained with data sets obtained by ISO and OSVM algorithms. Hence, OSVM is adequate to be implemented in the IoT device due to its capacity to reduce outliers significantly and lightweight memory footprint. The next step was to test the classification algorithms with the improved samples (i.e., Average filter) and cleaned samples (i.e., One-SVM) of the external data sets. Therefore, Table V shows the classification results of each data set. It demonstrates that the SVM algorithm and Naive Bayes struggled to classify between labels. In addition, Decision Trees and Neural networks maintained similar results. Furthermore, using this hybrid approach by smoothing samples and detecting outliers, the chance to find a tamper situation is very close to 100% on each data set. Moreover, for other classification purposes, this model increases by at least ten points of accuracy whether the classifiers were trained with the original samples.
Once the ML models are trained, the decision tree algorithm and neural network show higher classification results. However, in real test scenarios, the decision tree algorithm has strict rules which reduce the ability to recognize light changes in incoming data. Therefore, the neural network model is an adequate solution to detect whether the sensor has been tampered with. Nevertheless, neural networks can achieve close to 100% accuracy in all the evaluated scenarios. However, its complexity (i.e., the number of layers and neurons) is considerably reduced. For example, the optimal neural network should have three hidden layers of 32, 16, and 8 neurons with the original database, respectively. However, with the database obtained by OSVM, only two hidden layers of 24 and 8 neurons are necessary, significantly improving memory consumption with similar accuracy (Table VI). Conversely, when the neural network is trained with pruned samples from the external data set EDS1, which is considered by its complexity and size, it improves the accuracy (from 95% to 98%). Besides, the model complexity reduces by several neurons in hidden layers (Table VII). The model architecture uses a sequential model and dense layers, which means each layer is deeply connected with its preceding layer. In addition, rectifier linear unit (ReLU) is used for sparse activation of neurons, efficient computation, and better gradient propagation. Finally, a dense layer with softmax as an activation function is added. Applying softmax makes the output vector in the interval I = [0, 1] such that the components will add up to 1. Therefore, they can be interpreted as probabilities.

E. Memory Consumption
One essential part of a microcontroller is its memory, which stores information temporarily or permanently. Hence, memory blocks are described as arrays and divided into cells that can store data and be accessed using a unique identifier, representing its address or position relative to the memory array. Consequently, memory blocks can be either volatile or nonvolatile, and the control processing unit "CPU" accesses them concerning the computer architecture; this can be either Von Neumann or Harvard architecture [49]. In this scenario, Tables VIII and IX summarize the memory consumption of each microcontroller. As relevant results, AVR microcontrollers do not have enough memory to compile this application because the Flash needed is 35 kB, and SRAM is fully utilized (2 kB). Besides, the outlier detection model uses half of the RAM. ARM microcontrollers, specifically with microcontrollers M0, M4, and recently M7, are designed to allow ML models with higher computational resources than the AVR family. Arduino Nano IoT uses its Flash around 11 kB in the bootloader, 12 kB in the main program, 2 kB to allocate the smoothing filter, 20 kB for the outlier detection model, and 5 kB for the tamper detection. Furthermore, Flash and SRAM utilization were 56 and 12 kB, respectively. Therefore, Nano IoT has at least 200-kB free memory, and the 20-kB RAM is available. On the other hand, since Tensilica microcontrollers need external libraries to be programmed in the Arduino environment, the bootloader needs 260 kB, which is a high memory requirement. Hence, the Flash memory was used at 310 kB for the hybrid model; it is around 13% of the total Flash memory. Nevertheless, these boards struggle with some WiFi libraries that are inefficiently compiled.

F. Bandwidth Consumption
To determine the impact of performing the anomaly detection on the IoT device instead of the cloud on bandwidth consumption, we calculate the expected bandwidth consumption for each case. First, sending streaming data to the cloud, the payload uses 20 bytes, and the data packet (i.e., control header) is 30 bytes sending data to the cloud each 30 ms by the MQTT protocol, which means the bandwidth is around 1 kb/s. Second, we define sending data to the cloud when the system detects a tamper/threat situation since the outlier detection can be processed locally by its outputs (fan) takes 15 ms. Hence, the same data packet and protocol use just 300 bps (worst scenario) since the frequency of sending messages decreases significantly. Besides, we determine that outliers appear when the IoT device wakes up from sleep modes, even when the main program waits until the sensor responds that is calibrated. Therefore, this anomalous behavior is corrected locally. In addition, data is not compromised because packages sent to the cloud can be just alert messages reducing, even more, the bandwidth needed.

V. CONCLUSION AND FUTURE WORKS
In this article, the proposed three-step anomaly detection hybrid model, which integrates data preprocessing (reducing errors), and outlier detection tasks, achieves excellent performance in the data tamper detection scenarios for data collection from IoT devices. Besides, the hybrid model runs on IoT devices to process data locally with a minimal memory footprint. In the first step, to achieve noise reduction, smoothing algorithms demonstrate a suitable solution for canceling voltage fluctuations produced by sensors. In the second step, we demonstrate that it is necessary to implement an outlier detection step to increase the robustness of detecting abnormal distribution in data. Therefore, OSVM prunes outliers and reduces the scatter better. Finally, in the third step, the tamper detection step needs previous knowledge to train models and detect anomalous sensor behavior. Thus, neural networks can recognize those changes in sensor functionalities and detect whether if the work environment is compromised. The anomaly detection hybrid model also proves the new functionalities that emerging microcontrollers may have processing ML models close to the user. This reduces latency with fewer data movements between devices with adequate time response. In addition, processing data locally reduce the opportunity for malicious users to steal data. Furthermore, neural networks can recognize 98% of tamper situations.
This research provides a new perspective on anomaly detection. The proposed hybrid model was evaluated in time-series data from different sensor data sets. In future work, the proposed method will be further expanded and optimized for different sensor data with extensive power consumption analysis.