RDCM: An Efficient Real-Time Data Collection Model for IoT/WSN Edge With Multivariate Sensors

In the application of the Internet of Things (IoT), a sensor board depends on a battery that has a limited lifetime to function. Furthermore, the IoT sensor board with multivariate sensors influences the battery life-time, since there are additional data transmissions that must be supported by the board causing it to drain the battery much faster than the sensor board with one sensor. The main aim of this paper is to increase the battery life of the IoT sensor node. To do so, this paper proposes an efficient real-time data collection model for multivariate sensors in IoT/WSN applications named RDCM. The general structure of RDCM is composed of two main levels: the IoT sensor board level and the fusion center level. The IoT sensor board level is implemented in real time by all the IoT sensor boards simultaneously in each cycle and fusion center level is executed by the fusion center. The IoT sensor board level includes various stages as follows: check the physical conditions of the IoT edge device (board) stage and update data strategy stage, data validation stage, and sensed data reduction stage. The average of the total percentage of energy saved by the application of RDCM to real-time data sets injected with various percentages of errors for all nodes is 98%. In summary, the RDCM has a very high performance in terms of energy consumption compared with other algorithms. This paper concludes with the limitation of the current study and some further research opportunities.


I. INTRODUCTION A. OVERVIEW
Wireless sensor network (WSN) consists of spatially distributed autonomous devices that used sensors to monitor physical or environmental conditions. It integrates a gateway that provides wireless connectivity to the internet. The Internet of Things (IoT) is a communication paradigm that envisions total connectivity with objects of everyday life and is an integral part of the Internet [1] infrastructures. Hence, the IoT concept promotes the Internet even more immersive and pervasively, enabling an easy access and interactions with a variety of devices [2]. Various practical communications models are used in IoT implementations, and each model has its own characteristics. There are three models described The associate editor coordinating the review of this manuscript and approving it for publication was Eyuphan Bulut. by the IoT architecture board which includes machineto-machine, machine-to-cloud, and machine-to-gateway-tocloud as shown in Figure 1(a), Figure1(b) and Figure1. (c), respectively. These models highlight the flexibility in ways that IoT devices can be connected and provide value added to the user [4]. It must be noted that in all previous models the source machine (IoT edge devices) is the backbone of the system, which is used to collect the data.
The world evolves, and so does our lifestyles, where we are more dependent upon numerous modern electronic devices. In recent years, WSN has played a vital role in IoT applications. Numerous applications are based on WSNs and IoT technologies, which have been applied in various fields. They may be in healthcare, smart homes and buildings, air pollution, military, industrial, precision farming, modern horticulture industry and many more. In the wearable medical monitoring applications [3], [4] sensors can be very useful to provide accurate and reliable information about people's activities and behaviors, provide assistance to the human living environment [5], [6]. Furthermore, healthcare is not only for humans, but also includes animal care observing biological parameters such as rumination, body temperature, heart rate, ambient temperature and humidity [7]. Smart home and building applications, such as home environment monitoring [8], real-time wireless monitoring of indoor air quality [9] and energy management, all contribute to the widespread usage of WSN/IoT integration. WSN possesses several constraints such as limited energy availability, low memory size, and low processing speed, which are the principal obstacles to designing efficient management algorithms for WSNs [10], even more so if it concerns WSN/IoT integration.

B. MOTIVATION
Various data collection algorithms played a very important role in improving the efficiency of the real-time IoT/WSN applications. Nevertheless, there are still obvious challenges and development issues faced by data collection algorithms for real-time applications. Numerous researchers have addressed the issue of reducing the number of transmissions packets by the IoT/WSN sensor board with a univariate sensor. However, reducing the number of transmissions packets by IoT/WSN sensor board with multivariate sensors is one of the most important research issues. Similarly, data reduction methods based on the coding scheme in WSN/IoT still have many constraints, such as delay, solve floating point variables, and historical data. Despite all this, the problem of multivariate sensors is still a critical issue open for further research. In WSN/ IoT, sensor data consist of one attribute (univariate) or multiple attributes (multivariate). Since the sensor board is only used to collect one type of data (light/temperature or humidity), this type of data is called univariate data. Similarly, in some IoT / WSN applications, each sensor board is equipped with a multivariate sensor to support different application needs. Furthermore, the current multivariate data models used in IoT/WSN applications for the purpose of reducing or validating the sensed data during data collection process may have some challenges. For example, these models are dependent on training, which means that the accuracy of those models is declining over time due to the increment in the approximation error. This increment in the approximation error of the multivariate data model during the real-time data collection is one of the significant challenges. The standard solution to this issue is accomplished by applying an adaptive model that is able to update its reference parameters during data collection. However, the act of increasing the updating frequency of the model reference parameters will affect the energy efficiency of the sensor board due to the size of transmitted data after the updates. Normally the model reference parameters typically are larger than or equal to the payload data size without reduction. The process of determining the threshold value is a difficult task in the multivariate data models, which is why an objective solution is needed.
This paper studies the problem of how to design and improve an efcient real-time data collection model for IoT/WSN sensor board with multivariate sensors. It will be as a means to address several issues which have been described previously. The research problem as such be stated as the follows: (i). How will the model reduce the number of transmissions packets by IoT/WSN sensor board with multivariate sensors? (ii). What is the most beneficial method to determine the threshold value for multivariate data reduction models? (iii). How will the proposed model avoid transmitting incorrect sensed data during the data collection for IoT/WSN sensor board with multivariate sensors? (iv). How will the model reduce the number of bits' payload for IoT/WSN sensor board with multivariate sensors?

II. RELATED WORKS
It should be noted that some of the recent works aim to improve the network performance by means of focusing on the sensed data processing, dissemination and scheduling.
For example, in [10], adaptive data processing and dissemination for drone swarms in urban sensing named ADDSEN has been proposed. The authors focused on improving the in-network quality by means of observing the low-quality or faulty sensed data and separating it from the set of sensed data and redundant data. Similarly, many researchers focus on mobility management and flow scheduling in IoT, where the work in [11] achieved scalable mobility management and robust flow scheduling in IoT multi-networks. For example, compared with other software-defined networking (SDN) systems as presented by the authors, the throughput has been increased by 67.21%, the delay has been reduced by 72.99%, and the jitter has been improved by 69.59%. It is clear that the previous solutions benefit mobile nodes. In our work, like many previous works [13]- [48], have addressed the fixed nodes scenario. For more clarity, this work focused on payload data only. Also, there is no connection between the nodes with the assumption that each IoT sensor node is able to immediately update its sensed data to the fusion center.
We are probably the first to address various issues in one model data collection for multiple sensors in IoT application. Therefore, in this section of related works, it is divided the prior works into two main parts: first, (A) evaluation of measures related to the reduction in the number of transmitted packets, which includes two subsections (i) reduce the number of transmitted normal data and (ii) reduce the number of transmitted incorrect data. In this study, normal data is correct sensed data. However, the aim of the update data strategy during sensing phase is to save the energy consumption of the IoT sensor board with multiple sensors by reducing the number of transmission packets if no significant change is reported by the payload sensing block. Similarly, the incorrect data, their values measured by the sensor board that are wrong, therefore avoiding the waste of energy consumption for incorrect data. In the second part, (B) evaluation measures related to payload data size reduction approaches are presented.

A. EVALUATION MEASURE RELATED TO REDUCTION IN THE NUMBER OF TRANSMITTED PACKETS
In order to decrease energy consumption, various methods have been proposed to reduce the number of transmitted data. On the other hand, avoiding transmitting incorrect sensed data during the data collection will contribute significantly to saving the energy of IoT Edge device.

1) REDUCE THE NUMBER OF TRANSMITTED NORMAL DATA
Packet transmission can be drastically reduced if data prediction algorithm such as time series prediction (TSP) can be utilized. TSP is a significant applied technique for commercial, inventory, weather prediction, manufacturing control and signal processing. TSP is defined as a sequence of data that is ordered by time and characterized by chronological importance. Thus, the indices of variables and the correlation between them can be used to develop mathematical models. Therefore, the main purpose of time series modeling is to collect and study historical values in order to find the appropriate models that represent the general structure of a given data [13]. The prediction of the time series based sensing data model is a conventional technique for reducing the transmission by sensor nodes, and there are several ways to use this technique, which includes: moving average (MA), exponential smoothing (ES), autoregressive (AR), autoregressive with exogenous inputs (ARX), and autoregressive integrated moving average (ARIMA). Nonetheless, these methods only support a single type of data sensor. For instance, in [14], the authors presented a prediction-based data reduction method by joining it with an adaptive sampling rate. In addition, the recent work by Tan and Wu [15] introduces a method to reduce the number of sensor node transmitted packets by applying the hierarchical Least-Mean-Square (HLMS) in the presence of adaptive filter. In the previous work by [16], the authors presented a fast and efficient dual-forecasting method to reduce the number of messages sent from the sensor board. Careful evaluation of the findings presented by [15] and [16] shows that only univariate data with a fixed threshold error were investigated. Recently, the work by [13] proposed a new method based on forecasting to reduce the number of transmitted packets. The advantage of the proposed model is its ability to evaluate the proposed model using vibration sensors datasets. However, the method only addresses the univariate data. In [17], the study shows a prediction of a light-weight model with 0.2 tolerance error and in [19], Artificial Neural Networks (ANN) was employed to predict the sensed data. It uses a Multi-Layer Perceptron (MLP) to decide on the required data samples. Collective forecast exploiting temporal-spatial correlation named CoPeST based on Least Mean Square (LMS) algorithm as reported in [19] reduces the amount of energy that is crucial for expensive transmission while maintaining the data integrity to be within the error threshold of the user. In [20], the study is a preliminary work on optimizing sensor node energy employing an efficient data collection and dissemination (EDCD) updating strategy. EDCD is a strategy to update sensed data to the fusion center, which is employed to reduce the number of transmitted packets. On the other hand, ref [21] proposed an adaptive method for data reduction (AM-DR). AM-DR method is based on a convex combination of two decoupled Least-Mean-Square (LMS) windowed filters with different sizes. AM-DR is used to reduce the number of transmission packets by predicting the current sensed data at the base station. The trends in the majority of the forecasting methods are to broadcast the original sensed data to the sink, when the predicted data error is more than the threshold value. The authors in [22] proposed an adaptive data acquisition mechanism that allows each sensor node to adjust its sampling rate according to its environmental changes while optimizing its energy consumption. In another attempt, the author [23] used a simple linear regression to save power consumed by sensor nodes. It is done by reducing data transmission. The study considered that only one attribute is related to the prediction, and only one attribute  is used to predict the dependent variable. Time characteristics are not the most relevant variables compared to other features such as lighting, temperature and humidity, which makes the predictions used by the solution inaccurate [24]. As a key assessment, most of the methods currently used to reduce the number of transmitting packets in WSN/IoT cover only univariate data, except for the work involving multivariate data in [20]. Table 1 illustrated the characteristics of the prior works that address the problem of reducing the number of transmitted packets. The authors applied separately these algorithms to each type of sensors listed in Table 1.

2) REDUCE THE NUMBER OF TRANSMITTED INCORRECT DATA
In WSN/IoT, the sensing data error detection approaches can be divided into two types, namely: centralized error discovery method and distributed error discovery method. Most existing error detection approaches use periodic batch testing at a central location, possibly a cluster head or a fusion center [25]. A useful background overview of the current outlier detection methods for WSN can be found in [25]- [28]. In addition, recent work [25] introduced a novel mathematical model for assessing the impact of different data verification systems on energy dissipation in the edge device. The One Class Quarter Sphere Support Vector Machine (OCSVM) was used in [29] to create an anomaly discovery algorithm. The authors in [30] proposed a method for observing outliers by using a kernel principal component analysis (KPCA) based on the Mahalanobis kernel. The idea behind is to isolate the anomaly from the normal data distribution pattern. However, this work was executed at the CH level and only supports a single sensor (single variable). The previous work [31] reported a qualified study of strategic detection of abnormal sensed data in the smart city applications based on WSN. In [32], the study presented an adaptive One Class Principal Component Classifier model to detect the outliers in realtime. The problem in the proposed work, which was how to detect outliers on training samples, was not solved. Therefore, in [33], the authors proposed a statistical training sensed data removing approach for PCA-based chiller sensor fault discovery, diagnosis, and data reconstruction technique. The study discussed the discovery and the elimination of outliers from the original training sensed samples. In [34], the authors proposed a data validation algorithm for detecting different types of faults. Its evaluation used data samples of WSN's prototype for environment monitoring injected with different types of faults. The Modified-Z score method [35] was used to detect outliers. Similarly, in [36], they proposed a new realtime algorithm for observational verification of sensor data at the node level, which is named Validity of the measuring sensor reading at node level (VSNL). VSNL is a sensor data verification algorithm based on an adaptive threshold. VSNL considers detecting various types of errors in the sensed data and proposes a simple mechanism to classify errors and events. Sensor anomaly detection system for distinguishing between real and false alarms has been provided in [37] for healthcare applications. Table 2 illustrates the characteristics of the previous works that address the problem of abnormal data.

B. EVALUATION MEASURES RELATED TO PAYLOAD DATA SIZE REDUCTION APPROACHES
In the previous section, some of the work related to reducing the number of packets transmitted in WSN/IoT were explored. However, this section provides a thorough discussion on the latest work on the method of reducing payload data size through the transmission of sensed data from the IoT edge device to the FC. In a report presented by [38], the authors propose a coding scheme to reduce the size of the payload data sent by the cluster head node. Similarly, the work of [39] aims to improve the accuracy of the data received by the fusion center. The proposed coding scheme is based on relative differences and precision factors rather than the absolute variation method used in [38]. These tasks are beneficial for cluster head nodes with univariate data. Principal Component Analysis (PCA) is one of the most widely used methods for multivariate data reduction. Various types of PCA-based data reduction models are reported in [40]- [44]. Due to limited resources of the sensor board, the original version from PCA is not suitable for WSN/IoT edge level. Therefore, a lightweight version of PCA called Candid Covariance-free Incremental PCA (CCIPCA) was proposed in [45]. The previous work in [46] used CCIPCA as multivariate data reduction in WSN with a fixed threshold and two Principal Component (PC). In addition, the recent work [47] proposed two methods for multivariate data reduction for adaptive threshold known as PCA-B and MLR-B. PCA-B is a multivariate data reduction that used CCIPCA with adaptive threshold and set the number of PC to one in order to achieve a high reduction level. MLR-B is a multivariate data reduction utilizing Multiple Linear Regression model (MLR) with an adaptive threshold. According to the work of [47], the size of transmitted data after updating the model reference parameters which are larger or equal to the payload data size without reduction. It means that the sensor board TABLE 3. Characteristics of the prior works that address the problem of reduction the size of payload data. requires more energy in the updating stage than the reduction stage. The study recommended the frequency of updating the model reference parameters during data collection be used as a new metric to evaluate the performance of the multivariate data reduction models. More detail regarding the data reduction methods has been described in recent work [48]. Additionally, that work proposed a new simple mechanism called the Adaptive Real-time Payload Data Reduction Scheme (APRS) for energy-efficiency purpose in IoT/WSN sensor board with multivariate sensors. APRS aims to reduce the transmitted packet size for each sensed payload. Table 3 illustrates the characteristics of the previous works that address the problem of reducing the size of payload data. In addition, Table 4 and Table 5 show the summary of comparison of the related works for each issue and the summary of their limitations, respectively.

III. MATHEMATICAL MODEL OF ENERGY CONSUMPATION FOR REAL-TIME DATA COLLECTION SCHEMES IN IoT/WSN EDGE DEVICE LEVEL
In this section a mathematical model of energy consumption to evaluate the real-time data collection schemes for IoT/WSN edge device level is introduced. The model solves several problems related to the energy consumption of IoT sensor nodes. It addresses the issues related to reduction of transmission packets when using multiple sensor IoT board. In this model, incorrect data transmission is avoided and also it reduces the amount of payload bits before transmitting it to FC. The proposed model can be used for numerical analysis of energy consumption in different highlighted issues.

A. CONSTRAINTS
Let us consider that an IoT sensor board battery life-time L is defined as Eq. (1).
Thus, in this work the problem of data collection formulae is defined as where b max is the maximum number of bits that could be transmitted and received during a period time, E bit the energy cost of transmitting or receiving one-bit, R the measured data , N is the number of samples and C is the number of constraints, is the function referring to reducing the number updating times during data collection issue, f (d) is the function referring to reducing the number of transmitted bits issue when it is necessary to update the IoT sensor board sensed data to the FC/cloud and f (v) is function referring to reducing the cost of data validation as well as avoiding sending incorrect data issues. It should be noted that another reason for the loss of sensor node energy is data processing. In this paper, the transmitted data during the data collection phase constitute a fundamental component of energy consumption. This is because the energy consumed in sending one bit via sensor board is higher than running many microcontroller instructions [49]. Hence, in wireless devices, the energy consumed by transceiver accounts for 80% of the overall energy consumption of the node [50]. This study highlights that incorrect data is one of the reasons for wasting battery energy. This is because the transmission of erroneous data requires the same amount of energy as transmission of normal data. In addition, at FC this data will be removed from the dataset after applying a data validation algorithm, which means we avoid wasting some energy by not transferring the incorrect data. Therefore, applying a simple solution at the IoT sensor node level to avoid transmitting incorrect data will help in saving the energy of the IoT sensor node.

B. NUMERICAL EXAMPLE
Consider that a battery is used to equip an IoT sensor node for a specific application. The maximum number of samples that can be sent to the FC is N = 1000 samples when the sensor node is in active mode/ RF(on) with energy consumption E byet = 52.92µJ/byte and for simplicity, we assumed that the consumption in sleep mode/RF(Off) is E byet = 0uJ. The number of sensors in the same node is n = 3 and each sensor needs 4 bytes. The number of incorrect data is Er = 100 samples. The energy consumption for each scenario is as follows: From the above numerical analysis, we can prove that avoiding the transmission of incorrect data leads to minimizing the total energy consumption E total.
According to [25], the IoT sensor node energy consumption will be affected by a mechanism that is used to detect and remove the incorrect data during data collection for IoT real-time application. The cost of the error detection and transmissions E DV −Phase during the validation phase based on the approaches applied is defined as in Eq. (3).
where ε Tr is the energy consumption for transmission of normal data, E NN is the energy dissipation to receive data from various nearest neighbor nodes, H Tbits is the size of transmitted data, P time is the current time and E SD is the VOLUME 7, 2019 energy dissipation to read/write data from SD-card memory during data collection. For example, if the used algorithm to observe an incorrect data during data collection does not need to build a historical data or receive data from nearest neighbor nodes, the E SD and E NN is equal to zero. Therefore, our proposed model is able to observe the incorrect data without the need to build historical data or receive data from the nearest neighbor nodes (See in Algorithm 4). According to Eq. (3), we can infer the following: (i) Increase in the value of E SD negatively affects the energy dissipation to detect the incorrect data. This is because the error observation approach is unable to check whether the data is being sensed directly in real-time, but it needs to collect enough number of samples N, and save in a memory, thus creating a historical data. (ii) Similarly, increase in the value of E NN negatively affects the energy dissipation to detect the incorrect data. This is because the error observation approach is unable to check the condition of the sensing data in real-time directly, but it needs to receive the neighbor's sensed data to verify its validity. This mechanism is thus totally dependent on the spatial-temporal correlation among neighbor's edge devices. Regardless of the percentage of accuracy, its disadvantage is in the energy consumption of error observation. (iii) the use of online/real-time approach is the best way to observe incorrect data with the lowest energy consumption. The key point of this situation is that the error observation method can check the sensor data in real time without delay, or need to construct historical data or bring neighbor data for data validity verification.
The energy dissipation to receive data from various nearest neighbor nodes E NN defined in Eq. (4). From the equation, it is clear that increasing the number of nearest neighbor nodes will increase the cost of detecting the data error at the edge device. This is because the cost of detecting the status of the sensed data (normal/abnormal) is higher than the cost of transmitting the sensed data itself.
where H Rbits is the size of received data, E r is the energy dissipation to receive one-bit and N nib is the number of nearest neighbor nodes. Eq.(5) defined the energy dissipation for reading/writing samples from memory E SD during data collection which used for checking the validity measured data. The number of samples and its sizes affect the cost of the observed error.
where R bits , W bits the number of read and write bits to and from the memory, respectively. R Ecost , W Ecost represent the cost of energy dissipation to read and write bits from the memory, respectively. H is the number of samples that have been collected before checking validity of the current data.
The energy consumption during the reduction phase E RD−Phase depends on the methods used as defined as follows. Equation Eq.(6) is used to calculate the energy consumption during the reduction phase E RD−Phase as in [48]. E RD−Phase is divided into three parts as follows (i) Reduction Mode (RM), (ii) Non-Reduction Mode (N-RM) and (v) Retraining Mode (RTM). N-RM is a common mechanism for sending payload data from the sensor node to the FC without reducing its size. The energy consumption per sample in N-RM is defined as in Eq. (7).
RM is a mechanism for sending payload data from the sensor node to the FC with reducing its size by applying a benefit algorithm. The energy consumption per sample in RM is defined as in Eq. (8).
The efficiency of the data reduction models that are dependent on training declines over time due to the increase in the approximation error. The retraining process aims to update the reference parameters to represent the new dynamic changes in the sensed data [46]. Therefore, the sensor node needs to transmit a copy from the reference parameters to the FC. The energy consumption per sample in RTM is defined as in Eq. (9).
where OR Length , RD Length and RF Length is the original length of payload, reduced data and the model reference parameters per sample, respectively.
It should be noted that Eq. (10) highlighted numerous issues impact on energy consumption. In summary, avoiding transmitting of incorrect data helps in reducing energy consumption, thus selecting an appropriate approach for that purpose is very important. This is because the value of energy consumed by applying an approach to check the validity of the sensed data is higher than the energy consumption if it is forwarded to FC (See in Eq. (3)). Similarly, reducing the size of payload data in the sensor board with multivariate sensors will help in saving the energy consumption. Nevertheless, as is clear from Eq. (6), design of an efficient model for that aim is a vital issue, as previously discussed. Accordingly, the proposed RDCM model addressed different issues that help to save energy such as reducing the number of transmission packets by IoT sensor board with multiple sensors, avoiding transmission error measured and reducing the number of payload bits. More details about the RDCM model is presented in the following section.

IV. PROPOSED RDCM
In this section, a detailed description about the proposed RDCM is provided. Figure 2 illustrates the block diagram of the RDCM in a general structure, composed of two main levels; IoT sensor board level and fusion center level. The IoT sensor board level is implemented in real-time by all IoT sensor boards simultaneously in each cycle and fusion center level is executed by the fusion center. IoT sensor board level includes various stages; (i) analyze the physical conditions of VOLUME 7, 2019 the IoT edge device; (ii) update data strategy stage (iii), data validation and (iv) sensed data reduction. IoT edge device phase dealt with the physical state of the IoT edge device. The quality of the sensed data is a correlation with the state of the edge device, such as the temperature and the battery level of the board. Therefore, it is important to check the physical state of the IoT edge device before start sensing. If the edge device is not in good condition, an alarm is send to the fusion center informing that the device needs maintenance or change. In this paper, a simple algorithm to check the physical state for the IoT edge device has been proposed (See in Algorithm3 ). If the edge device is in good condition, the board will start sensing. After that, the sensed data is passed to the update data phase. Updating data phase makes a decision to transmit the sensed data or not depends on the percentage of the different between the current sensed data and the last transmitted sensed data. If the sensed data must be transmission to the fusion center, the model will pass the sensed data to the data validation phase. Data validation phase able the algorithm to decide the state of the sensed data are correct or error event. If the sensed data either it is incorrect/error the model will define it as unreliable data it must be discard and increment the error counter by one, after which it reads a new sample again. In another case, when the sensed data is correct the model will forward the sensed data to payload data reduction phase. Payload Data Reduction Phase dealt with reducing the payload size for the edge device with multiple sensors. The key point for this phase is that it only uses one variable D[t] to represent the multiple sensors measured data S 1×n [t] (n-variables) based on the relative difference between the current measured data S 1×n [t] and last transmitted measured data S 1×n [t − 1] to the fusion center for all sensors. At the fusion center, the RDCM is able to execute reconstruction of the original realtime sensed dataŜ 1×n [t] (n-variables) from D[t].

A. RDCM-IoT SENSOR BOARD LEVEL
IoT sensor board level is the main phase in the RDCM model, which is implemented in real-time at the edge board with multiple sensors.

1) RDCM -INITIAL PHASE
This phase includes the following steps (i) calculate the prediction model threshold that will be used to check the measured data validity only during data collection (ii) sensor board transmits only one sample without any reduction in the payload packet size. This paper proposed the calculation of the model threshold during the initial phase. Accordingly, the minimum residual errors between the training data and approximated data occurred during the initial phase [47]. This is because the purpose of training any model is to get the benefit of reference parameters / weights which will be used later to enable prediction of one attribute from various attributes based on simple linear regression. The proposed threshold is adaptive such that the value of the threshold changes during data collection as in Algorithm 1.

Definition 1 (Reference Parameters ( RFP) Function):
we compute the prediction model references by applying Eq. (11) as follows; where i = p, i = 1, ..n and j = 1, 2 . . . , w is the number of collected samples. In order to determine the value of the threshold, we first calculate the approximation error between the training data S p [T] and Predicted dataS p [T] which is defined in Eq. (13). After that, estimate the threshold value by selecting maximum approximation error value for S p .
According to [47], [48], increase the number of updating frequency metric (UFM) for data reduction model effect of the energy consumption. This is because the size of transmitted data after updating the model reference parameters which is larger or equal to the payload data size without reduction. It means that the sensor board requires more energy in updating stage than the reduction stage. The UFM values in the case of the non-adaptive threshold is larger than the adaptive one. The reason for that, the model based on non-adaptive threshold is entirely dependent on the value of threshold that has been calculated during the training phase and is used in reduction phase without any change in the value of that threshold. Furthermore, the probability that the value of the threshold to be small for the first time. In this case, the model will still be retrained as the dynamic data will change in most of the cases leading to the production of error that is larger than the threshold. Conversely, the adaptive threshold changes its value every time the reference parameters need updating. Therefore, this study updated the model reference parameters at the node level without send a copy from the reference parameters to the FC. This is because this study used the prediction model only to check the validity of the sensed data (See algorithm4) at the sensor node level. More detail about determining the threshold and step phase steps is presented in the following pseudocodes for algorithm1 and algorithm 2, respectively. It should be noted that RDCM-Initial phase is run only once during data collection. The detailed description of this phase is stated in the following pseudo-code.

2) RDCM -SENSING PHASE
In this phase, conditions of the IoT edge sensor board such as battery level, board temperature and confidence level measured by the sensors is a very important issue, since bad conditions will reduce the accuracy of measured data. For example, in order to read the temperature sensor, the study in [51] recommended to use the sensor board with battery level that should be greater than or equal to a specified threshold.

Definition 3 (Confidence Level Measure for the Sensor (CMS) Function):
CMS is a strategy to measure that amount of acceptance of the readings obtained from the sensor board.
× 100 (14) where E is the number of measured errors and C is the number of correct measured data. Eq. (14) is used to evaluate how reliable the IoT sensor board is by dividing the number of sensor error readings to the total number of sensor readings. Furthermore, standing IoT/ WSN sensor boards are more reliable when the error rate (CMS) is close to zero and vice versa. Table 6 shows the classification of physical state of IoT sensor boards. If the IoT edge device is in poor condition, an alert should be sent to the fusion center to inform the device that it needs to be maintained or replaced. In this study, a simple algorithm has been proposed and described to check the physical state of IoT edge devices. The following pseudo code details the implementation steps of the RDCM-Sensing phase.

Algorithm 3 Physical Conditions of IoT Edge
The aim of this phase is to save the energy consumption of the IoT sensor board with multiple sensors by reducing the number of transmission packets if no significant change is reported by the payload sensing block.
Definition 4 (Relative Difference (RTV) Function): we calculate the relative difference vector RD between the current sensed data S 1×n [t] and last transmitted data S 1×n [t − 1] by applying the Eq. (15).
where i = 1, 2, .. n and n is the number of sensors on the same board. Decision: If there is no significant change in the sensed data (for more detail, see algorithm5), then set RF (Off), otherwise check the validity of the current sensed data.

4) RDCM -VALIDATION PHASE
The aim of this phase is to avoid transmitting any incorrect data, which will contribute to saving in energy consumption as well as increase the system accuracy. The following pseudo code details the implementation steps of the RDCM-Validation phase. In this study, the types of error are range error (RE), constant error(CE) outlier error (OE) and event value (EV). • This work considers the current measured value S p (t) is an outlier fault if the maximum difference value is higher than the threshold value, otherwise, the sensed value is normal. In addition if, the sum of matrix {D} is zero, in this case S p (t) is an constant fault. Table 4 shows the transmission decisions based on the status of the current sensed data.

5) RDCM -REDUCTION PHASE
The main aim of this phase is to reduce the transmitted packet size for each sensed payload, which will help in saving the energy of the IoT sensor board as in APRS [48].
The following pseudo code details the implementation steps of the RDCM-Reduction phase.

Algorithm 5 Multivariate Data Reduction (MDR) 1) Inpu t: RD, n 2) Outpu t: D[t] 3) Begin: 4) m ← log2(Max){ABS(RD)}
First, calculate the required number of bits to represent |RD i | as the following m = log 2 (Max (|RD 1×n |)) + 1 (17) where m is the maximum number of bits. Calculate the total number of bits (L) required to represent relative difference ±RD i and defined as Definition 5: In order to manage negative and non-negative RD tests, Eq.(19) is applied Definition 6: calculate the representation of the sensed data D[t] in real time [t] as defined in Eq. (20).
Definition 7: Approximated data (Approx) Function), we calculate the approximated of the sensed dataŜ 1×n [t] at current time t as the following The following pseudocode details the implementation steps of the RDCM-IoT edge device Level. IoT sensor board level includes various stages including analyse of the physical conditions of the IoT edge device, updating data strategy stage, sensed data validation and sensed data reduction.

B. RDCM-FUSION CENTER LEVEL
It should be noted that the FC receives data from the IoT sensor nodes and is able to identify each IoT sensor node by its ID, where the sensor node ID is the name of the node. If Err = 0 or EV = 1 10) Call MDR// 11)

17) D[t] ← BinToDec(F) 18) Send D[t] to FC 19) End if
After the FC receives the reduced data D [t] from the IoT sensor node, we determine the total number of bits of the received data D[t] by applying Eq. (22).
If the size of the received data is 3 bits, which means that the IoT sensor board is not in a good condition, do the action based on the frame information as shown in TABLE 6. Otherwise, the received data will pass through approximation phase as follows (See in algorithm7). First, we estimate the number of bits for each sensor by applying m = L1/n , denotes the nearest integer to m. Next, we convert D[t] from decimal to binary based on BCD code Db = Dec2bin(D [t] , m × n), where (m × n) is the number of bits. Then, we predict the relative difference for each sensor RD i by taking m-bits from right to left, Db is stated as the following After that, we convert D i from binary to decimal as follows;

V. IMPLEMENTATION AND PERFORMANCE EVALUATION
Performance evaluations of the proposed RDCM model are done using different real-time data sets as follows: (i) ''Intel Berkeley Research Lab dataset (IBRL). IBRL wireless network recorded various types of sensed data as follows; air temperature, air humidity, light and voltage'' [52]; (ii) ''Grand St. Bernard dataset (GSB). GSB network used sensor nodes to measure the metrological characteristics of the environment which are ambient temperature, surface temperature and relative humidity '' [53]; (iii) ''Lausanne Urban Canopy Experiment dataset (LUCE). LUCE measure critical environment quantities which are ambient temperature, surface temperature and relative humidity'' [54]; (iv) UTHM_LAB measure air quality which are temperature and humidity [36]. MATLAB is used to simulate the algorithms effect in the performances of IoT edge node. The proposed RDCM model is evaluated using different benchmark real-time datasets as shown in Table 7 and Table 9. These datasets and network structure (See in Figure 3) are commonly used to evaluate the performance of some existing approaches in WSN (See in Tables 2-5). The assumptions of the simulation system model are summarized as follows [20], [36], [47], [48]: i Each IoT sensor board has different sensors as shown in Figure   For example, the humidity, temperature, atmospheric pressure, carbon dioxide and some other sensors are supported on IoT Lobelia Waspmote Gases board. ii The IoT sensor board must update its data to the fusion center continuously at a specific interval time. iii Each sensor board is able to directly update its measured data to the fusion center. In other words, the sensor node does not need to use two or more wireless hops to convey information from its location to the fusion center. iv The energy consumption for transmission of one byte is 52.92µJ in calculated for MICA2Dot mote. v The energy consumption in the case of no transmission (Off) is 0µJ .

Real-Time Definition:
In general, real-time data (RTD) is data that is provided directly after aggregation. The sensor node transmits the measured data to the fusion center without any delay. The simple meaning of real-time sensed data is that it is information that is not saved or stored, instead, it is provided to the end-user/gateway as it is collected. The RTD does not actually mean that the data will reach the end-user immediately as there may be presence of bottlenecks correlated to the data collection structure, bandwidth between numerous events, or slowness of the computer of the end-user. Unfortunately, the RTD does not promise sensed information within a certain number of microseconds. It only means that the sensed data is not planned to be kept back from its eventual use after it is collected [48]. The authors declared that, in this paper, the word ''real-time'' refers to the real dataset and, also to show that the proposed model has been applied for the sensed data after sensing immediately at the sensor node level. Figure 4 shows some samples of real-time dataset versus some samples of real-time datasets have been injected by random errors. In this study, the realtime data set is original datasets that have been collected by sensor nodes without any change in its values. The injected dataset is a real-time dataset after injected with some artificial errors.  In the related works to reduce the number of transmitted packets is supported by a single variable data. These methods require different thresholds for sensor boards with multiple sensors. Therefore, this work proposes a simple and effective updating data strategy. The proposed method aims to reduce the number of messages transmitted by a sensor board with multiple sensors based on the relative difference between the current and last sensor measurements transmitted. The advantage of this solution is that it prevents any transmission if the payload sensing block does not report a significant change.
The proposed method uses only one threshold for multiple sensors on the same board (See in the Algorithm5 step 7). This section examines the effect of β on the performance of the proposed model. The value of β is set to 0%, 1%, 2%, 3%, 4% and 5%. RDCM applies for different real-time data sets and various nodes. This study used real-time data set with  no change in its content (no injection error). The results of the study in this section is shown in Figure 5 and Figure 6, respectively. The results show that energy consumption is reduced by applying RDCM with β = 5%, which is better as compared to other β values. From the results, a significant increase in β will reduce the energy consumption of the IoT sensor. The reason is that the fusion center can only be updated if the difference between the current sensor data value and the previously sent data is lower than β. Although the increase in β reduces the number of transmitted packets and thus saves energy, as the results show, some nodes only sent 17 of 1000 samples. This will inadvertently affect the system accuracy. The advantages of the proposed model are that its β value can be easily adjusted depending on the application used. In order to obtain the highest possible accuracy in this study, the β value was set to 0%. Hence, RDCM and APRS can reduce the amount of bit payload sensing data before sending it to the fusion center.

B. THE ANALYSES OF RDCM-VALIDATION PHASE
In our work [47] we suggested a novel method for estimating the error threshold value during the training phase. The outcome showed that the adaptive threshold is better than the non-adaptive threshold with respect to decreasing the number of times the model required updating of its reference parameters, which positively affected prolonging the IoT sensor node lifetime. Moreover, adapting the threshold produced more accurate results. Therefore, the proposed threshold for the models that are being used to observe outliers or recover the sensed data in IoT/WSN real-time application would help in increasing the accuracy of these systems. The RDCM verifies the validity of the current sensed data before sending it to the fusion center.

C. PERFORMANCE OF THE RDCM-VSNL METHOD
In RDCM -validation phase as shown in the section that described RDCM model, sensed data validation checks for only one attribute from multiple attributes. The attribute (sensor) is denoted as S p (t) and it has a high correlation with other sensors on the same IoT sensor board. The RDCM-VSNL method is used for this purpose in order to examine performance of the RDCM-VSNL method during RDCM-validation phase. In this subsection, RACAD_ UTHM and IRBL-Intel datasets have been injected randomly with 10% errors of 40627 and 1000 samples, respectively.
From the simulation results as shown in Table 10, it is clear that the RDCM-VSNL is able to observe the sensed data errors in the real-time during data collection with high performance and the average of accuracy for all examined sensors is around 97 %.     data collecting. This is because the small size of W effects the accuracy of determining the model parameters. In contrast, using a large size training model W effects the efficiency of the IoT sensor node due to the resource constraints. Therefore, in this study, a small value of w was chosen, ranging from 5 to 55. This is to obtain acceptable performance considering the IoT sensor node component constraints. From the results it could be estimated that the model shows a better performance when the value of W is more than 15 samples. In addition, as long as the model RDCM -the Predicted model needs to update its reference parameters, the value of the adaptive threshold changes dynamically. The importance of changing the threshold according to the training of the model  is very important because the model here is used to detect irregular data. The lack of accuracy in the model reduces the performance of the pattern in detecting errors or events during real-time data collection.

E. PERFORMANCE COMPARISON RDCM WITH DIFFERENT ALGORITHMS
In this section the performance of various algorithms in terms of energy consumption is investigated. It should be noted that this study used the original datasets with no change in its content values. In order to analyze the performance of RDCM, EDCD2 and APRS algorithms with real-time datasets that have some errors, the original real-time datasets are randomly injected with different percentages of errors. The percentage of errors is set to 1%, 2% 3%, 4%, 5%, 6%, 7%, 8%, 9% and 10%. The number of samples is 1000. Figure 12, Figure 13, and Figure 14 show the energy consumption (µJ) results for APRS, EDCD2, and RDCM  algorithms applied to IBRL sensor nodes with different error percentages, respectively. Obviously, increasing the number of incorrect data transmissions will affect the energy of the IoT sensor board because the sensor board wastes its energy sending incorrect data that will be omitted at the fusion center. However, if the fusion center cannot detect the erroneous data received, then those errors will affect the accuracy of the whole system.
RDCM shows better performance than other algorithms, as shown in Figure 15. This is because RDCM can detect errors and ignore them during real-time data collection of IoT/WSN applications. In addition, RDCM can reduce the number of transmission packets and reduce the number of transmission bits of payload data. The average of the energy saving ratio for the algorithms RDCM, APRS, and  EDCD2 applied to all IBRL-Sensor nodes with various errors (1%-10%) is 98%, 90% and 58%, respectively. Table 11 shows the qualitative comparison of the proposed algorithms in energy saving. Compared with other solutions, RDCM has the advantage of saving energy because it solves most of the problems of wasting IoT board energy during data collection. Figure 16 shows the total energy consumption for applied EDCD2, VSNL, APRS, PCA-B, MLR-B, RDCM and Direct to real-time data LUCE-sensor node (N10) with its measured value injected with 2% errors of 1000 samples. The results show that sending the sensing data directly (without any algorithm) has the worst performance. The applied VSNL, PCA-B, MLR-B and EDCD2 show different performance because each save the energy of the IoT sensor board TABLE 11. A qualitative comparison of the presented algorithms in this paper in terms of energy saving. by addressing only one issue. For example, the PCA-B and MLR-B algorithms can only reduce the size of the transmission. However, they cannot reduce the number of transmissions or observe errors. VSNL can only reduce incorrect data transmission. The EDCD2 can reduce the number of transmitted packets only if the current sensed data does not change significantly compared to the last transmitted data. APRS and RDCM show high performance because it solves several problems, as shown in Table 11.

VI. CONCLUSION
This paper introduces a new model designed to save the energy consumption of IoT sensor board, which is denoted as RDCM. RDCM in a form of general structure is composed of two main levels; IoT sensor board level and fusion center level. IoT sensor board level is implemented in real-time by all IoT sensor boards simultaneously in each cycle and fusion center level is executed by the fusion center. IoT sensor board level includes various stages as follows; (i) check the physical conditions of the IoT edge device (board) stage, (ii) update data strategy stage (iii), data validation stage and (iv) sensed data reduction stage. The average of the total percentage of energy saved by applied RDCM to real-time data sets injected with various percentage of errors for all nodes is 98%. In summary, RDCM has a very high performance in terms of energy consumption compared to other algorithms.
The research stated in this paper reveals some possible further research opportunities as follows: i This work proposes solution to reduce size of payload data only from whole of packet during transmission phase. It is recommended to propose a new scheme to reduce whole packet size. ii This work assumes that the sensor node is able to send the data directly (One hop) to the FC/BS. It is recommended to design a new data collection model for multivariate sensors in IoT applications with consideration of the multi-hop network. iii The algorithms EDCD, VSNL, ARPS, MLR-B, PCA-B and RDCM discussed in this paper are analyzed through environment dataset for smart and green blinding application. It is recommended to analyze those algorithms with vibration dataset for industrial application. In addition, apply those algorithms for wearable health-care application and logistic application. iv The MRL-B and PCA-B models cannot reduce the number of transmitted packets. It is recommended to design a hybrid model involving those models with the EDCD algorithm. v Similarly, the PCA-B is based on a lightweight version from PCA. It is recommended to use the adaptive threshold which was proposed in this paper with PCA-B for anomaly detection at the cloud/FC level.