CDPM: A Combinational Data Prediction Model for Data Transmission Reduction in WSN

Background: Data prediction methods in wireless sensor networks (WSN) has been emerged as a significant way to reduce the redundant data transfers and in extending the overall network’s lifetime. Nowadays, two types of data prediction algorithms are in use. The first focus on reassembling historical data and providing backward models, resulting in unmanageable delays. The second is concerned with future data forecasting and gives forward models, that involves increased data transmissions. Method: Here, we develop a Combinational Data Prediction Model (CDPM) that can build prior data to control delays as well as anticipate future data to reduce excessive data transmission. To implement this paradigm in WSN applications two algorithms are implemented. The first algorithm creates step by step optimal models for sensor nodes (SNs). The other predicts ang regenerates readings of the sensed data by the base stations (BS). Comparison: To evaluate the performance of our proposed CDPM data-prediction method, a WSN based real application is simulated using a real data-set. The performance of CDPM is also compared with HLMS, ELR and P-PDA algorithms. Results: The CDPM model displayed significant transmission suppression (16.49%, 19.51% and 20.57%%), reduced energy consumption (29.56%, 50.14%, 61.12%) and improved accuracy (15.38%, 21.42%, 31.25%) when compared with HLMS, ELR and P-PDA algorithms respectively. The delay caused by CDPM training is also controllable in data collection. Conclusion: Results advised that the efficacy of the proposed CDPM over a single forward or backward model in terms of decreased data transmission, improved energy efficiency, and regulated latency.


I. INTRODUCTION
In WSN applications, SNs usually senses the environmental data at high frequencies. Continuous data transmissions cause SNs to consume a lot of energy. Because WSNs are batteryequipped, energy conservation becomes a key concern [1] [2]. Because radio communications require more energy at SNs than any other activity [3] [4], data reduction is becoming more popular as a means of conserving WSNs' limited energy resources [5] [6]. By minimizing duplicate data transfers, data prediction methods [7][8] [9] will conserve the constrained resources such as unnecessary communication overheads, energy consumption etc [10] [11][12] [13]. Every SN in a prediction-based method trains prediction models based on sensed data values and forwards them to the BS. Then, the SN predicts and reconstructs sensed reading using the same model as the BS. If the prediction threshold is lower than application in that case the data prediction model is not acceptable, as the total communication overhead of such models will be larger than the original application i.e., without data prediction [13] [15] [16]. Apart from these two kinds of techniques, there are a few more methods that are comparatively intricate in training data prediction model and whose feasibility has yet to be determined and is discussed in detail in the work [17]. To summarize, data-prediction based techniques faces three challenges: unpredictable latency, increased transmission overheads, and difficulty in model training. To the best of our knowledge, solving all these issues and challenges is still is work in progress.
This research provides a combinational model that may be used to reconstruct historical data as well as predict future data. The number of data points used to train the model is adjusted to meet predetermined upper constraints on error and latency, ensuring that data quality and delay are tightly regulated. To eliminate unnecessary transmissions and VOLUME XX, 2017 3 enhance energy efficiency, the model is used in both data regeneration of previous data and data prediction of future data. Two techniques are proposed in this work to implement the combinational model for real-world WSNs application. To generate optimal combinational models, a step-wise technique is developed for SNs. This approach can reduce the combinational model's computational load and increase its viability. For the BS, another data prediction and data regeneration technique are proposed. Extensive experiments are simulated on real-world WSN application to evaluate the combinational model's performance. The model's energy efficiency is compared to three already existing techniques. Simulation findings demonstrate that the proposed method can effectively suppress data transmissions, reduce overall WSN energy consumption, and tightly limit the delay induced by training. The objectives of this proposed work are as follow: • To eliminate unnecessary data transmissions by the data-prediction models which can the number of redundant data transfers and extending the overall network's lifetime. • To enhance energy efficiency, a combinational model is developed for both data regeneration of previous data and data prediction of future data. • To provide excellent proficiency in terms of reduced data transmission, reduce energy consumption and regulated latency by implementing and simulating the proposed combinational model. The remainder of this proposed work is organised as follows. In Section II, we review the related research work. Section III offers the overall framework of the proposed combinational data prediction model (CDPM). Section IV proposes the of CDPM in WSN applications and Section V discuss the implementation of CDPM. Section VI presents the energy model for the proposed work. Section VII includes the experimental setup, dataset and performance metrics for the proposed work. In Section VIII, the experimental results and discussion is presented to demonstrate the effectiveness of the framework. Finally, Section IX presents the conclusions and further research.

II. RELATED WORK
In many cases, transmitting all the sensed data is not a smart idea. Data transmission reduction is crucial to resolve some WSN issues, such as reducing energy consumption and eliminating redundant measurements. In this respect, this section presents the related work based on data prediction to reduce data transmission.
Zhao et al. [18] proposed a P-DPA algorithm which uses the valuable information of the potential law contained in periodicity as a guidance to change the prediction values. P-DPA effectively improve the accuracy, reduces communication frequency and prolongs WSN lifetime but it does not describe how to find attribute correlation and control delay was not reduced. Tan et al. [19] proposed the predicting approach is able to predict the measured values both at the SN and at the BS. HLMS provide low energy consumption, reduced data transmission and high data accuracy but only the temporal data prediction not spacial is predicted. The synchronization of the filters at SN and BS is unexplored in this work.
Makhoul et al. [20] proposed a data reduction model (KW) that allows SNs to adapt their sensing-rates based on the data variance to eradicate similar reading from the vector by similar function. A local aggregation algorithm was further introduced to reduce the size of the dataset before transmitting it to the BS. This model minimises the data size for transmission over the WSN for energy conservation but does not apply correlation between the adjacent SNs. Al-Qurabat et al. [21] proposed an Adaptive data gathering Dimensionality reduction using adaptive-piecewise constant-approximation (APCA) method, Sampling rate adaptation based dynamic time warping (DTW) similarity and Frequency reduction using symbolic aggregate approximation (SAX) method. APCA removes the redundant data and adapts the sampling rate in accordance with the environment conditions, conserves energy and also prolongs network lifetime but has high complexity and requires more processing time.
Tayeh et. al. [22] proposed the Adaptive Sampling + Transmission Reduction (AS+TR) based data prediction technique which aims to reduce radio communication and data sensing by combining of adaptive sampling and dual prediction mechanism techniques. AS+ TR reduce energy consumption and extend the overall network lifetime. The AS method does not compute the risk of data loss and replicated data. This work does not control delay also. Cheng et. al. [23] proposed a prediction model based on the twodirectional Long short-term memory (LSTM) is an artificial recurrent neural network (RNN) which is named as multi-node multifeatured (MNMF) prediction model. MNMF overcomes the issues Spatial or Temporal corelations in data collection method as the redundant data impose unnecessary burdens on both the SN and WSN. This method Reduces the energy consumption of SNs, extends the WSN lifetime with high prediction accuracy and reasonable prediction bias. Although it considers only homogeneous WSN application and does not even control delay.
Jain et al. [24] in 2020 proposed DA-AFM for reducing Correlated Spatial-Temporal Data, one at the SNs for determining Temporal Redundancies in data readings using both AFM and RD and the other at the CHs for determining spatial redundancies using AFM and RD. This method Exploits both spatial and temporal correlations, reduce data transmissions and energy consumption and improves accuracy but it considers only homogeneous WSN application and does not control delay. Jain and Kumar [25] in 2020 proposed a ECR model which is based on a two-vector model to synchronize predicted data in the intracluster transmissions to evade cumulative error in continuous data predictions. In the initialization phase of data collection cycle, it generates future data approximation and computes its prediction threshold VOLUME XX, 2017 3 error. ECR is simple, structure free and lightweight and scalable data prediction model. It reduces data transmissions cycles and energy consumption which maintaining the accuracy but has high complexity. Al-Qurabat and Idrees [26] proposed a DGAST gathers sensor data periodically and divides the networks into rounds. Each round in DGAST is divided into four stages: data collection, data aggregation, selective transmissions, and modifying the frequency of samples obtained for SNs. DGAST preserves energy and extends periodic sensor networks lifetime but has complex computation and high memory usage. Jain et al. [27] in 2021 proposed a ELR model which exempt the SN from transmission huge volumes of data for a specific duration during which the BS will predict the future data values and thus minimize the energy consumption of WSN. ELR is energy efficient model which reduces data transmission and extends network lifetime but it does not consider cluster-based topology, scalability and control's delay.
Agarwal et al. [28] proposed a DP-LRM model reduces the data transmission of redundant data by developing a regression model on linear descriptors on continuous sensed data values and is built on top of any data aggregation model. It uses a buffer based linear filter algorithm which compares all incoming values and establishes a correlation between them. DP-LRM is an energy efficient model which successfully reduces the data transmission cost, maintains accuracy and integrity in reduced data but it is Complex computation and does not consider scalability. Wang et al.
[29] proposed a data reduction approach based on Kalman filter This method performs data reduction through two phases: data reduction phase and data prediction phase. This is an efficient and effective data reduction which is reliable, energy efficient and extents network lifetime but has large computation overhead and it does not consider cluster-based topology and network's scalability.
Nels et al. [30] proposed the HFQKLMS filter was developed by integrating HFBLMS and QKLMS. The HFBLMS model is devised by integrating FC theory and the HLMS scheme. The prediction process is carried out using the HFQKLMS filter approach for data aggregation. This work is energy Efficient, maintains accuracy in reduced data and extends network lifetime but has Complex computation and does not consider scalability. Famila at al [31] proposed the RCHST-IETSMP integrates two critical parameters that define energy and trust parameters via a Hyper-Erlang process for successful CH selection assisted by the benefits of Semi-Markoc prediction integrated with the Hyper Erlang distribution process. This work is reliable and extend the WSN lifetime but has complex computation and does not consider scalability and controls delay. Jain and Kumar [32] in 2021 proposed a DTRM is implemented on the CHs and can be used in combination with most data aggregation algorithms. This study eliminates temporal redundancies and correlations from data readings and allows the SN to transmit only few data values, which increases data transmission effectiveness and reduces the energy consumption. DTRM provides data accuracy, reduced data transmissions, low complexity costs, lightweight processing, limited memory footprint, robust and effective but it is based on single value comparison. Table 1 compares all the above discussed data transmission methods in WSNs with the well-known parameters. Many methods have been proposed for data transmission reduction in WSNs, but the control delay is not yet introduced. In comparison to the methods and technique discussed above, the strength of the proposed CDPM algorithm lies in its ability to control delay, reduce energy consumption by achieving high data transmission suppression and reduced RMSE (improved data quality).

A. NETWORK MODEL
The proposed network model of CDPM is demonstrated in Figure 1. The WSN consists of number of sensor nodes, = { 1 , 2 , 3 , … , … , } and as a base station (BS) positioned away from the sensing region. These SN are deployed randomly such that each SN senses and transmits the measurement to the BS in each time slot = { 1 , 2 , 3 , … , … , }. The Combinational Data-Prediction Model suggest that the corelated and duplicate sensed values of the SNs will be flushed and not transmitted to the BS. Thus, the sensed data which is deviated from the prediction error will only be sent to the BS. Then, the BS will predict the nontransmitted data. Thus, the task of the SNs is to sense the environment parameters and transmit them in to the BS if it is outside the prediction budget and the task of the BS is to receive the communicated data and predict the nontransmitted values.
The data transmission protocol was not taken into account in this study. Rather than, we presumed that the data transmission between the SN and the BS was device-todevice. As a result, data transfers between SN and BS are accomplished in a timely manner. So, at any time if no data is obtained, it will be believed that it was discarded by the SN. Therefore, the CDPM will predict the non-transmitted data. It will be used for both data regeneration of previous data and data prediction of future data.

B. NETWORK ASSUMPTION
We have considered followed assumption for CDPM model: • The SN are considered to be stationary and are randomly deployed in the sensing region • The BS is positioned away from the sensing region. • All SNs have fixed data sampling rate.
• The data transmission between the SN and the BS is considered as device-to-device, which means, that the data transfers between SN and BS reach without any delay. • Dissimilar to SNs, the BS have no power, memory, or processing constraints.

IV. COMBINATIONAL DATA-PREDICTION MODEL
In the case of slight variations in the data, recent research has indicated that linear Data-Prediction models outperform the others. In line with that, this work provides a linear Data-Prediction Technique based Combinational Data-Prediction Model. A generic version of linear prediction model, as well as the proposed Combinational Data-Prediction Model (CDPM), are explained in this section.

A. GENERIC VERSION OF LINEAR DATA-PREDICTION MODEL
The frequently changing environmental data are represented as a function of time: ( ) = in a specific area of the physical world. Then the sensed readings of a SN can be denoted by the time-series reading as follow in Equation (1) below: where {1, 2, 3, … … , N} is epoch period in which the SN senses the environmental parameters. The SN will send ( , ) for epochs without prediction. Environmental data is assumed to follow a short-VOLUME XX, 2017 3 term linear pattern in linear models. Then, as a linear function, sensor readings can be approximated as follow in Equation (2) below: We train to build the prediction model in each step function. Some methods generate backward data prediction models for past data re-construction at the end of each step, which means they generate −1 and −1 when = . After training, instead of using the original sensed reading in the previous step for data regeneration, the model's parameters should be uploaded to the BS, which causes delays. While some methods generate forward data prediction models at the start of each step for future data prediction, which means they generate and when = . After training, the model's parameters, as well as the original sensed reading, are uploaded to the BS, resulting in new transmissions.

B. PROPOSED COMBINATIONAL DATA-PREDICTION MODEL (CDPM):
We have considered followed assumption for CDPM model: The CDPM algorithm updates data in every step which has two stages: first stage is the training phase and the second stage is the data prediction phase. During first phase, the proposed CDPM model is trained and developed on data values and the CDPM model is communicated to the BS. In the second phase, the BS will predict the non-transmitted data. BS will reconstruct sensor data of the first phase. If the prediction threshold is more than the predefined error, CDPM model will be retrained, i.e., the next step begins. The represents the training data values in ℎ step. At least two data-points are needed to develop a linear data-prediction model which implies ≥ 2. Thus, we have expressed the CDPM model as ( , , ). The CDPM model can be used to rebuild at least two data values, one of which has two points. In other words, three parameters of one model can represent at least four values of the sensed reading without requiring any further transmission.
In this model, the values obtained by regeneration are usually deferred due to the time required in CDPM's training phase. In real time WSN based applications, the SNs sensed the surroundings with a predefined frequency. The extreme delay that can be produced during the training phase is expressed in Equation (3). The delay in ℎ step is directly related to the number of data ( ) in training data Here represents the maximum-delay in ℎ step and represents the sensing period. The maximum delay can then be controlled by restricting the in the training phase. Let us assume that the highest tolerable delay is and the maximum values for training phase is delimit as . It is predefined by the following Equation (4)

V. IMPLEMENTATION OF COMBINATIONAL DATA-PREDICTION MODEL
The combinational data-prediction model is proposed for use in real-world WSNs in this section. For SN to train and update CDPM model, we present a stepwise approach. Another technique for reconstructing and predicting the sensed readings is also proposed for the BS.

A. TRAINING OF COMBINATIONAL DATA-PREDICTION MODEL
We use the least square method (LMS) to reduce error in order to create the best precise linear prediction model in training phase. We have calculated ( ), which is the difference between the sensed reading and the predicted data as follow in Equation (5) below: Then we calculate the error in training phase, we have evaluated the ( ) as follow in Equation (6) below: The will attain its minimum value, when the = 0 and = 0 as per the LSM. Thus, the values of and are computed as follow in Equation (6) and (7) below: We can express least square method as a following function of basic operations expressed in Equation (9) below: ( , ) = {( 1 , 1 ), ( 2 , 2 ), … … } Then, to decrease the error in the measurement by a SN, we have expressed ( ) as follow in Equation (10) below: VOLUME XX, 2017 3 In the data-prediction phase, if the of the predicted data is greater than the predefined threshold then the combinational model will be reinstructed and updated. Since we have considered to be upper-bound of , the prediction error will always within the threshold. The threshold value of is calculated conferring the values of , where represents the of ℎ training phase.
; ∀ and ≤ therefore ( ≤ 2 ) then ≤ Thus, according lemma-1, the threshold in the first (training) phase is directly related to the . To create optimal combinational models, we present a forward stepwise method for SNs. Each training phase is divided into multiple steps using this algorithm. The Least square method is used only whenever a new data is sensed, in order to avoid a huge amount of concurrent computations. The Algorithm-1 states that: If the value of is greater than value of , then it will return an earlier outcome. Whenever the value of matches , the Algorithm-1 will return an up-to-date outcome. = ∑ ( − × ′ − ′ ) 2 return ( , , ); 10. else 11. = ′ ; = ′ ; += 2 ; 12. endif 13.

ALGORITHM END
The algorithmic complexity of CDPM training phase is low. The worst-case complexity when only one reading is sensed will be ( ), which is easy enough for SNs. After model is trained, it is then updated and forwarded to the BS; and later the trained model will used for data-prediction. Later, every predicted value will be compared with newly sensed value to determine . In case exceeds , Algorithm-1 will be iteratively called to retrain the model and the latest data prediction model will be updated.

B. DATA PREDICTION AND REGENRATION PHASE
When the BS obtains the trained values of the CDPM model from an arbitrary SN, the estimates of data in the first phase are regenerated. The BS then predicts the sensed data based on this. The Algorithm-2 for Data Prediction & Regeneration phase is presented below. for ( = + 1; ≤ − ℎ; + +) 5.

ALGORITHM END
Here represents the epoch period of data sensing. The outcome of data-regeneration is expressed as { 1 , 2 , 3 , … … }.

VI. ENERGY MODEL
The combinational data-prediction model is proposed for use in real-world WSNs in this section. For SN to train and update CDPM model, we present a stepwise approach. Another technique for reconstructing and predicting the sensed readings is also proposed for the BS. In this section we propose an energy model for CDPM: To calculate an SN's energy consumption, the energy consumed in each operation must be considered. As shown in Equation (11), the total energy consumption in general is related to four essential tasks: i. data sensing ( ) which is the energy needed to sense one data value, ii. data transmission ( ) is the energy required by each SN per each communication round, iii. data aggregation ( ) is the energy needed to aggregate data, and iv. data prediction ( ) is the energy to perform data prediction by CDPM. To estimate the total energy consumption of a SN ( − ), we have used employed the model as discussed in the work [33]. 2017 3 Equation (12) evaluates the which is the energy required to transform the physical data into digital one, where is the number of bits in the sensed data, V is the supply voltage, is the total current required in data sensing, and is the total duration of data sensing.
= (12) To evaluate the amount of energy dissipated by each SN per round of communication, the classical first order radio energy model [33] has been employed for transmission and reception energy. The energy ingesting of SNs depends on the distance between the transmitter (SN) and the receptor (BS) in both free space ( ) and multipath ( ℎ). A threshold selects the channel, which is 2 energy loss for small distance and 4 energy loss for large distance. If a bit data has to be transmitted over a distance , data transmission ( , ) will be expressed by Equation (13).
is the energy used to send electronics for a transceiver which sense a single bit . The coefficient of the free-space amplifier and multipath are and respectively. The threshold 0 , determines the energy consumption which is calculated as 0 = √ . The energy dissipation to aggregate bits, is represented in Equation (15) as follow: Equation (16)

VII. PROFICIENCY ASSESSMENT
In this section we present the simulation setup and proficiency metrics for the evaluation Combinational Data-Prediction Model (CDPM) in terms of transmission suppression, energy consumption, latency and data quality. The performance of CDPM is also compared with P-PDA [15], HLMS [16] and ELR [22] algorithms.

A. SIMULATION SETUP
We have implemented CDPM in network simulator NS-2.34 [34] along with ELR, P-PDA and HLMS algorithms. NS-2.34 is an event-driven simulator that has aided in comprehending the dynamic nature of communication protocols. NS2 supports TCP, UDP, routing algorithms, network topologies, and multicast protocols on both wired and wireless networks [35]. NS2 is written in C++ and OTcl, which is an Object-oriented version of Tcl. The simulation parameters are shown in Table  3.

B. DATASET
The Intel Berkeley Research Laboratory (IBRL) [36] has approximately 2.3 million sensor measurements. Each SNs senses data after every 31 seconds. Several quantities are included in this dataset like temperature in degrees Celsius and humidity which ranges from 0-100%. The brightness is measured in Lux, and the voltage varies between 2 to -3 volts. The total readings in the dataset for temperature is 1048574, and for humidity is 104845. This simulation runs at each SN for five days to evaluate the performance of data prediction algorithms in humidity and temperature only. Linear interpolation is used to fill the missing values at different epochs.

C. PROFICIENCY METRIC
The proficiency of CDPM is evaluated by performing exhaustive experiments on IBRL dataset and following metrics are defined for them. Moreover, according to [27][32], data transmission is the major issue for energy depletion of such network. Therefore, the in the proposed CDPM model the energy consumption metric is estimated based on the number of data transmitted from SNs to the BS. VOLUME XX, 2017 3

1) TRANSMISSION SUPPRESSION
It is the estimate of the ratio of the transmitted data by using any data prediction model with the actual sensed data without implementing any data prediction method.

TS% = (
Transmitted data by using prediction algorithm Actual sensed data ) × 100

2) ENERGY CONSUMPTION
The amount of energy consumed in a WSN is directly proportional to the number of radio communications carried out by the SNs. Reduced data delivered to the BS would considerably boost WSN lifespan. The greater the transmission suppression, the less data is transferred and the less energy spent. The energy model of this work is explained in detail in Section VI.

3) DATA QUALITY
Data quality is a critical element in defining excellence in the WSN. We have already expressed Root Mean Squared Error (RMSE) as a way to lessen the error of data sensed by any SN (RMSE).
is the senses reading of ℎ SN and ̂ is the predicted values of that SN.

4) LATENCY
The latency is the measure of the delay. In WSN, it is defined as the time taken by the data to transmit data from the SN and reach the BS. It has a has a key impact on the performance of any network.

5) ALGORITHMIC COMPLEXITY
An algorithm's complexity is define as how the algorithm performs in different conditions. It is expressed numerically as a function of ( ) time versus input size [37]. Here we have estimated the algorithm's efficiency asymptotically. ( ) time will be measured as the number of required "steps," given that each such step takes constant time.

VIII. RESULTS AND ANALYSIS
In this section we present the simulation setup and performance metrics for the evaluation Combinational Data-Prediction Model (CDPM) in terms of transmission suppression, energy consumption, latency, data quality and algorithmic complexity. The performance of CDPM is also compared with P-PDA [18], HLMS [19] and ELR [25] algorithms.

A. COMPARISON OF TRANSMISSION SUPRESSION %
For experiments, CDPM, P-PDA, HLMS and ELR algorithms are deployed to gather data for ten rounds of communication. Each round has a varying threshold from 0.0 to 0.5 with a step function of 0.5. We determine the transmission suppression ( %) of four algorithms for average temperatures and average humidity of SNs as presented in Section VII in Equation (17). The larger the TS% will be, the less data will be transmitted and less energy will be consumed. The % of four algorithms for IBRL data for average temperatures and average humidity of SNs are visualised in Figure 1 and 2 respectively and are illustrated in Table 4. The Parameter settings of CDPM will be is set to be 600 seconds. Since Δ of the IBRL dataset is 31 seconds and is set to be 20. The % of CDPM is always higher than the % of P-PDA, HLMS and ELR algorithms at any value of threshold in any round of communication. Furthermore, CDPM can guarantee that the % is always less than 100% which means that the additional transmissions are avoided. The network-scales in IBRL applications are small enough that each SN can directly transfer data to the BS. Although, the default TCP packet size in NS2 is 12 packets which is a bottleneck in data transmission and scaling such WSN applications. Thus, conferring to the message format of NS2, the message size of P-PDA, HLMS and ELR algorithms are set to be 12 bytes each and for CDPM it is set to 10 bytes.

B. COMPARISON OF ENERGY CONSUMPTION
Most of data prediction methods delivers reduced data transmission, so we also compare the energy consumption of CDPM with P-PDA, HLMS and ELR algorithms. CDPM along with these three algorithms is deployed to gather data for ten rounds of communication where each round has a varying threshold from 0.0 to 0.5 with a step function of 0.5. We determine the energy consumption based on the energy model for both average temperatures and average humidity of SNs as described in section VI. The energy consumption of four algorithms for IBRL data for average temperatures and average humidity of SNs are illustrated in Table 5. The Parameter settings of CDPM will be is set to be 10 minutes. Since Δ of the IBRL dataset is 31 seconds and is set to be 20. It is observed that the energy consumption of CDPM is always higher than the energy consumption of P-PDA, HLMS and ELR algorithms at any value of threshold in any round of communication. In IBRL applications, the network-scales are small enough for every SN to transmit the data to the BS directly. The energy consumption of each SN for sending one byte data is set to be 0.0144 and for receiving one byte data is 0.0057 [38]. The cumulative energy consumption of algorithms after every round of communication for temperature and humidity are presented in Figure 3 and 4 respectively which illustrates that CDPM's energy consumption is much lower than other algorithms. These experiments have demonstrated that CDPM has greater data suppression rates and is more energy efficient.

C. COMPARISON OF DATA QUALITY
The preceding experiments demonstrate that CDPM has greater energy efficiency and data suppression rates. Therefore, we also conduct experiments on the data quality by estimating the value as described in Section VII in Equation (18). For experiments, CDPM, P-PDA, HLMS and ELR algorithms are deployed to gather data for ten rounds of communication where each round has a varying threshold from 0.0 to 0.5 with a step function of 0.5. The lower the RMSE score, the more accurate the predicted data will be. The of four algorithms for IBRL data for average temperatures and average humidity of SNs are illustrated in Table 6. The Parameter settings of CDPM will be is set to be 600 seconds and is set to be 20. It has observed from Figure 5 and 6 that the RMSE of all four algorithms for temperature and humidity are low while the threshold varies (0.05 to 0.50) and thus provides high data accuracy. Although the Data Accuracy of CDPM is always the higher as it has the lowest RMSE value for all thresholds. Thus, CDPM provides higher data suppression rates and energy efficiency while guaranteeing high data accuracy.

D. COMPARISON OF LATENCY
Two groups of experimentations are performed on IBRL data to determine the efficiency of CDPM in terms of latency. In the first set of experiments, varies from 0.05 to 0.5 and is set to be 600 . While in other set of experiments, varies from 60− 600 and the is set to be 0.5. Figure 7 and 8 illustrates that the maximum delay created by CDPM's training which is always inside the upper bound, while the mean value is much lower. These results indicate that if a WSN application collects data via CDPM, the delay caused by training is reasonable.

E. ALGORITHMIC COMPLEXITY OF CDPM
It is generally supposed that the greater the algorithm's complexity, the improved will be its performance. However, the algorithmic complexity of CDPM's training phase is low. The worst-case complexity when only one reading is sensed will be ( ), which is easy to handle for SNs. After model is trained, it is then updated and forwarded to the BS; and later the trained model will used for data-prediction. Then, every predicted value will be compared with newly sensed value to determine . In case exceeds , Algorithm-1 will be iteratively called to retrain the model and the latest data prediction model will be updated. The Algorithm-2 for Data Prediction & Regeneration phase has a linear time complexity of ( ). When no model adjustment is required, only one addition operation is required to predict the data value. When an adjustment is required, number of additions are required. Hence, the proposed CDPM model has ( ) for model training phase and has a constant complexity of the order ( ) for data prediction and regeneration phase.

IX. CONCLUSION
This work presents a Combinational model for data prediction (CDPM) that can build prior data to control delays as well as predict future data to reduce excessive data transmission. To eliminate unnecessary data transmission and to control delays, the proposed model is trained using an optimum current data value and then used to reconstruct previous values as well as anticipate future data. Two techniques are used to implement this paradigm in real-world WSN applications. The first technique generates step-by-step ideal models for SNs in order to prevent large concurrent computations and increase the model's feasibility. The other predicts and regenerates data readings of the sensed data is proposed for the BS. To evaluate the performance of our proposed CDPM data-prediction method, a WSN based real application is simulated using a real data-set. The performance of CDPM is also compared with ELR, P-PDA, and HLMS algorithms. The results demonstrated that proposed model provides excellent proficiency in terms of reduced data suppression and data transmission, improved energy efficiency as compared to the state-ofart algorithms. The delay caused by CDPM training is also controllable in data collection.
As a future work, several improvements could be made to this work. To begin, we propose implementing the effect of transmission reduction in the real world by conducting experiments in a variety of application-based scenarios. Second, the reduction in data transmission affects bandwidth, energy consumption, latency, and data quality in WSNs. The impact of such methods determines the key performance indicators of one's interest for IoT applications. Third, the CDPM model can be used to influence other network protocols at different network layers and thus it is critical to investigate the impact of these schemes on the various network layers.