Predictability of Internet of Things Traffic at the Medium Access Control Layer Against Information-Theoretic Bounds

Most of the existing Medium Access Control (MAC) layer protocols for the Internet of Things (IoT) model the traffic generated by each IoT device via random arrivals such as those in a Poisson process. Under this model, since it is implied that IoT device traffic cannot be predicted, only reactive MAC-layer protocols in which the network responds to the current traffic are viable. In contrast, recent work has demonstrated that the traffic generated by an individual IoT device can be predictable, thus enabling predictive network protocols at the MAC layer. In this paper, we investigate information-theoretic bounds on the predictability of IoT traffic of individual devices. To this end, first, we compare the performance achieved by the following state-of-the-art forecasters on individual IoT device traffic: Logistic Regression, Multi-Layer Perceptron (MLP), 1-Dimensional Convolutional Neural Network (1D CNN), and Long Short Term Memory (LSTM) as well as MLP under feature selection based on Analysis of Variance (ANOVA) and Auto-Correlation Function (ACF). Second, we quantify the gap between the performance of these forecasters against information-theoretic bounds as follows: For IoT devices that generate a fixed number of bits at each generation instance, we measure the gap between the forecasting accuracy and the information-theoretic bound established by Fano’s inequality on the probability of correct prediction. Our empirical results show that existing forecasting schemes perform close to the information-theoretic bound in this case. For IoT devices that generate a variable number of bits, we measure the gap between the Mean Square Error (MSE) and the estimation-theoretic counterpart to Fano’s inequality. Our empirical results show that the performance of existing forecasting schemes is far from the information-theoretic bound in this case. This work motivates the machine learning community to develop forecasting schemes that approach information-theoretic bounds. Furthermore, this work is expected to impact the development of predictive MAC-layer protocols that exploit these bounds.

performance can potentially be achieved if accurate forecasts of the future inputs to the network can be formed.
The main motivation of this article is to demonstrate the predictability of the traffic from individual IoT devices and thus establish a firm foundation for the development of predictive network protocols in the near future. The demonstration of such predictability has the following two key benefits: (1) Predictability of individual IoT device traffic will enable the development of predictive MAC-layer protocols for IoT in next-generation networks. (2) The quantification of the gap between the performance of existing forecasting schemes and the information-theoretic bounds has the potential to motivate the machine learning community to develop novel forecasting schemes that approach the information-theoretic bounds.
In order to clarify the concept of forecasting future IoT traffic that is generated by an individual IoT device, we visualize the traffic forecasting as in Fig. 1. In this figure, the horizontal axis is the time axis where the left is the past, the right is the future, and the middle line is the current time (present). In addition, each stalagmite represents a burst in the traffic pattern of the IoT device under consideration; these bursts are either generated in the past (blue stalagmites) or predicted to be generated in the future (orange stalagmites). A forecaster is used to predict the future traffic generation based on the traffic that falls in a certain time window in the past.
This work stands in stark contrast with the current model of ''random arrivals'' (e.g. as in a Poisson process) that is typically used to model the traffic from IoT devices at the Medium Access Control (MAC) layer in most of the existing network protocols [3]- [8]. While the random arrivals model may be suitable for human-generated traffic, it fails to capture the predictability inherent in machine-generated traffic in Machine-to-Machine (M2M) communication. In this article, we show that the traffic from a variety of IoT devices in distinct traffic classes possesses inherent predictability.
In order to demonstrate this predictability, first, we present the selection of the optimal length of the time window that is considered as the input of a forecaster for the univariate time-series traffic data. To this end, we analyze the entropy of IoT traffic and compare it with that of a memoryless source as a function of increasing window length.
Second, we establish an information-theoretic framework by which we compare the empirical predictability that is achieved in forecasting the traffic of an IoT device against the information-theoretic bounds. For IoT devices that generate a fixed number of bits at each generation instance, we compute the information-theoretic upper bound established by Fano's inequality on the probability of forecasting error. For IoT devices that generate a variable number of bits, we compute a lower bound of the Mean Squared Error (MSE) via the estimation-theoretic counterpart to Fano's inequality. Furthermore, we define two performance metrics: 1) Normalized Accuracy, which measures the gap between the accuracy that is realized by using a forecaster and the information-theoretic upper bound on accuracy; and 2) Normalized Reciprocal MSE, which measures the gap between the MSE realized using a forecaster and the information-theoretic lower bound on MSE.
Third, we compare the performance achieved by the following state-of-the-art forecasters on the traffic of individual IoT devices with respect to Normalized Accuracy and Normalized Reciprocal MSE metrics: Logistic Regression, Multi-Layer Perceptron (MLP), 1-Dimensional Convolutional Neural Network (1D CNN), Long-Short Term Memory (LSTM) as well as MLP combined with Analysis of Variance-based feature selection (ANOVA-MLP) and Auto-Correlation Function-based feature selection (ACF-MLP).
In this work, we demonstrate not only that the traffic of individual IoT devices is predictable but also give indications as to which forecasters achieve the best performance with regard to distinct IoT traffic classes. Our results also indicate the differences that may exist in the empirical predictability that is achieved across distinct IoT devices. Taking account of such differences will be important in designing next-generation predictive protocols based on the degree of predictability that can be achieved for each IoT device on the network.
In summary, the main contributions of this work are as follows: • We investigate the predictability of the traffic of individual IoT devices by selecting the optimal length of the time window for the forecaster input.
• We perform a comparative analysis of the performance achieved by state-of-the-art forecasters on individual IoT device traffic.
• We quantify the gap between the performance of these forecasters against information-theoretic bounds.
The rest of this paper is organized as follows: In Section II, we contrast our work with the articles in the existing literature. In Section III, we present high-level statistics of the considered IoT traffic data and the forecasting of IoT traffic including the hyper-parameter tuning of forecasting techniques. In Section IV, we analyze the predictability of IoT traffic that is generated by individual devices, and we present the comparison of the performance of forecasting techniques against the information-theoretic bounds. In Section V, we present our conclusions.

II. RELATIONSHIP TO THE STATE OF THE ART
We now present the relationship between our work and the state-of-the-art works in three categories: 1) the works that develop techniques for predictive networks based on the predictability of IoT traffic, 2) the works that present algorithms to forecast IoT traffic, and 3) the works that use Fano's inequality and its counterpart for predictability analysis targeting wireless communication systems. Tables 1, 2, and 3 present the summary of the recent related works for these categories respectively. 1

A. PREDICTIVE NETWORK TECHNIQUES
We present the works that develop proactive solutions to the massive access problem. Early works have developed proactive solutions targeting Human-to-Human (H2H) traffic in [21]- [23] and Machine-to-Human (M2H) traffic in [24]- [26]. For cognitive radio networks, a proactive MAC protocol was developed by Reference [27] in order to prevent the network against primary user emulation and data falsification attacks.
For M2M communication, Reference [28] has developed a resource allocation algorithm that gives access grant proac- 1 We took Reference [9] as an example for the styles of these tables.
tively to the neighbors of activated devices. Reference [29] aims to minimize total service time for the transmission of IoT traffic packets by allocating resources proactively. Moreover, based on the prediction of the channel condition, the video quality and resource sharing have been determined in [30] while an energy-efficient stochastic predictive resource allocation technique has been developed in [31]. Reference [32] has developed a technique to drop a subset of packets based on their predicted impacts on the latency and the performance. Reference [10] has scheduled uplink access of Industrial IoT devices for delay-critical applications based on the prediction of activity of these devices. In additon, Reference [33] has recently developed a predictive network architecture based on the prediction of total generated traffic in the network.
A recent proactive solution technique, called Fast Uplink Grant (FUG) has been proposed by 3GPP in Release 14 to provide an uplink grant to traffic packets by scheduling those based on the predictions [34]- [36]. In addition, Reference [11] developed a method for FUG which uses binary Markovian process to model the packet transmissions of IoT devices, and Reference [12] predicts IoT traffic via each of Support Vector Machine (SVM) and LSTM techniques aiming to enable FUG.
Another recent proactive solution technique, called Joint Forecasting-Scheduling (JFS), is proposed to schedule the transmission of the traffic of individual devices based on forecasting in [13]. Reference [14] developed the Multi-Scale Algorithm (MSA) in which the forecasting of IoT traffic is performed on multiple time scales in order to enhance the scalability of JFS for a massive number of devices. Reference [15] extended JFS to the Multi-Channel (MC) case. In addition, Reference [16] developed the Randomization of Generation Times (RGT) preprocessing algorithm, which improves the performance of IoT traffic scheduling by using queueing theory techniques. Furthermore, Reference [17] developed a technique, called Emulation of Application Specific Error Function (E-ASEF), to perform an analysis of the relationship between the forecasting error and the network performance under predictive network protocols. Reference [18] designed a meta-MAC framework, called Dynamic Automatic Forecaster Selection (DAFS), to select the best performing forecasting scheme with respect to the estimated network performance. For scalability improvement in IoT networks, diffusion analysis on the proactive scheduling of packet transmission is presented in Reference [19] and Quasi-Deterministic Transmission Policy (QDTP) is developed in References [19], [20]. The results of the works in this category show that the predictive network protocols (i.e. predictive scheduling of packet transmissions of IoT devices) are highly promising for the networks with a massive number of IoT devices. However, they also show that the performance of predictive protocols is very sensitive to the predictability of IoT traffic. In this paper, we analyze the predictability of the individual IoT device traffic, and we present a comparison of the performance of ML techniques against information-theoretic bounds.

B. IoT TRAFFIC FORECASTING
We compare our work to the previous studies on forecasting the traffic generation patterns of IoT devices. Reference [37] presented a comparative study of MLP, LSTM, 1D CNN, and ARIMA models along with the state-of-the-art feature selection algorithms, i.e. Auto-Correlation Function (ACF), embedding dimensions, and Analysis of Variance (ANOVA) to yield the best performing forecasters for each IoT device. Reference [38] developed an original neural network architecture, called Feature Selection Forecasting (FSF), specifically to predict traffic generation patterns of individual IoT devices. Reference [39] presented a deep learning approach as Gradient Boosting (GB) model with residual networks and stacked models to predict network traffic obtained by a mobile network operator. Reference [40] analyzed the sent and received stream packets to identify the 4 connected IoT devices by using 6 different machine learning models, including Decision Tree (DT) methods such as Random Forest (RF) and K-Nearest Neighbors (KNN). In addition, for security applications Reference [41] predicted IoT traffic on the edge network considering randomness and uncertainty via sample entropy. Reference [42] has developed an algorithm to predict the activation (i.e. the probability of traffic transmission) of IoT devices for each time slot.
Furthermore, Reference [43] predicted industrial IoT traffic, which is modeled as a Markovian process, via Reinforcement Learning (RL) for anomaly detection and network planning. Reference [44] predicted traffic of event-triggered IoT devices. Reference [45] used Recurrent Neural Networks (RNN) to predict the network traffic for the parameter optimization of a random access control scheme. In order to forecast network traffic, linear predictors have been developed in [46], [47] to control sampling interval, while an LSTM based predictor has been used with deep learning in [48] and Nonlinear autoregressive exogenous (NARX) neural network has been used in [49].
In summary, the recent research uses various ML techniques such as neural networks, deep learning, and DT methods to forecast individual or aggregated IoT traffic. However, none of these works directly investigated the predictability of the traffic with respect to information-theoretic bounds. In contrast, in this work, we both analyze the predictability of IoT traffic and compute information-theoretic predictability bounds.

C. FANO'S INEQUALITY AND ITS COUNTERPART FOR PREDICTABILITY ANALYSIS
We use Fano's inequality [56] to set the informationtheoretical limits for predicting IoT traffic generation. A set of previous studies used Fano's inequality primarily for mobility or location prediction. Reference [50] used three different measures of entropy, namely random entropy, uncorrelated entropy and actual entropy, with Fano's inequality to predict the mobility of individuals. Fano's inequality is also used to find the upper bound on predictability on the future locations of a person in [51] and to compute an upper bound for the prediction of an individual's location in [52]. Furthermore, Reference [53] analyzed the predictability of the radio spectrum state by using Fano's inequality and measuring statistical entropy. The works in this category used Fano's inequality to analyze the predictability of either position of individuals or radio spectrum state. Beside the works that use Fano's inequality and its counterpart, other information theory-based techniques are also developed and used in IoT networks. For example, Reference [54] developed information-aware traffic reduction to address network congestion. In addition, Reference [55] developed an anomaly detection method which captures the effects of abnormal/malicious traffic on information entropy. In contrast, in this paper, we use Fano's inequality and its counterpart to analyze the predictability of traffic generation patterns of IoT devices; to the best of authors' knowledge, there is no work that compares the predictability of IoT traffic against information-theoretical bounds based on Fano's inequality and its counterpart.

III. IoT TRAFFIC DATA AND ITS FORECASTING A. CLASSIFICATION OF IoT TRAFFIC AND THE DATASET
Reference [37] has classified the IoT traffic in the MAC-layer into four categories with respect to its periodicity and bit representation. For a given IoT device, if the generation intervals of the bursts of traffic generated by that device are constant, such traffic is called ''Periodic'' traffic. Otherwise (if the generation intervals vary with time), the traffic generated by that device is called ''Aperiodic''. For example, if an IoT VOLUME 10, 2022  device generates traffic when triggered by an event, then that traffic is categorized as Aperiodic traffic. Furthermore, if each burst has a fixed number of bits, then such traffic is called ''Fixed Bit''. In contrast, if the number of bits that in a single burst varies with time, then that traffic is called ''Variable Bit''. As a result, the four categories are as follows: ''Fixed Bit Periodic (FBP)'', 2 ''Fixed Bit Aperiodic (FBA)'', ''Variable Bit Periodic (VBP)'', and ''Variable Bit Aperiodic (VBA)''. The traffic category in which an IoT device falls depends entirely on the characteristics of the traffic generated by that IoT device; hence, it is pre-determined and fixed for each device. 3 2 Note that the IoT device traffic that falls in the FBP class requires no forecasting: if the generation time and the amount are known at any one instant, they are known for the rest of the time. Hence, we focus on forecasting for only for the FBA, VBP, and the VBA classes. 3 If there are multiple types of traffic generated, e.g. by distinct types of sensors on a device, we treat each distinct traffic stream separately. Hence, every traffic stream falls in only one of these four categories.
During the analysis in this paper, we use the publicly available dataset [57] whose collection and processing methodology has been presented in Reference [38]. This dataset contains the traffic generation patterns of 8 individual IoT devices. These devices and the classes in which they fall are as follows: FBA Class -Smart Home Energy Generation (SHEG), Non-Methane Hydrocarbon (NMHC) Gas Sensor, and Wind Speed; VBP Class -Light Dependent Resistor (LDR) and Relative Humidity (RH) Sensor; VBA Class -Elevator Button, NO 2 Gas Sensor, and Temperature Sensor. In addition, for each sensor i, we present the number of samples K i and the number of unique burst sizes M i in Table 4.

B. FORECASTING OF IoT TRAFFIC
We now describe the processing of time series IoT traffic data and the forecasting models that are used during the analysis in this paper.   First, we assume that the IoT traffic is generated in discrete time. Then, let x i k denote the burst that is generated by device i at discrete time k. 4 We perform 1-step ahead forecasting based on the traffic generated in the past time window with duration W ; that is, at each discrete time k, any forecaster 4 Table 5 presents the list of mathematical symbols in order of appearance.
predicts the value of the size of burst at k +1, denoted byx i k+1 , based on {x i k−m } W −1 m=0 . As we shall present in Section IV-A, we analyze the predictability of IoT traffic for different values of W .
For the forecasting of IoT traffic at the MAC layer, in this paper, we use six different ML models, namely Logistic Regression, MLP, ANOVA-MLP, ACF-MLP, 1D CNN and LSTM, where MLP-based models, 1D CNN and LSTM are the deep learning models. There are two main reasons that these models are selected as the forecasting schemes in this paper: 1) We aim to compare the performance of different types of (probabilistic, deterministic, static, recurrent) state of the art ML models. To this end, we represent the probabilistic and linear ML models with Logistic Regression, feedforward multi-layer models with the widely used universal approximator MLP [58], the state of the art deep learning with 1D CNN [59], and recurrent models with LSTM (which is able to capture temporal dependence in time-series data). 2) References [37], [38] have shown that these models are able to achieve successful results for the forecasting of IoT traffic at the MAC-layer. For these models, the parameter settings that are used during this paper are determined as follows: . Then, in order to make a 1-step ahead forecast, we set the n 4 = 1; that is, there is only one neuron in the output layer. Furthermore, we select the activation function of each neuron in each of the other three layers as ReLU (Rectified Linear Unit), and the activation function of each neuron at e = 4 (the output layer) as linear. 5 Finally, we find the local optimal value of n e for each layer e ∈ {1, 2, 3} within the range [2, 256] by using only integral powers of 2 for each hyperparameter by searching over 100 randomly generated points in the E-1 dimensional space, where E-1 is the number of hidden layers in the neural network. We shall refer to this particular search method as ''random search'' in the rest of this paper. We also use MLP with two different feature selection methods, namely ANOVA-based and ACF-based feature selection. In the ANOVA-based feature selection, we first compute the F-ratio [61] between each feature and the desired output. Then, we sort all features with respect to their F-ratios in descending order and select the first twelve features from this sorted sequence of fea- The internal architecture of the 1D CNN model in our studies is comprised of a convolution layer, a max pooling layer and four fully connected layers. We set the kernel size of each of the convolution layers and the max pooling layer to 3. We also set the stride of the convolution layer to 2. We set the activation function of the convolution layer and that of each neuron at each of the first three hidden layers to ReLU . We do not use any activation function for any neuron at the output layer. Furthermore, we set the number of neurons at the output layer to 1. Then, we perform random search in order to find the local optimal number c CNN of convolution filters at the convolution layer and the local optimal number c e of neurons at each hidden layer e ∈ {1, 2, 3}. We select the search intervals and the points in each search interval as follows: The interval for c CNN is set to [2,256] sampled at integral powers of 2. The interval for c e for each e ∈ {1, 2, 3} is set to [2,256] and is sampled at multiples of 2.
• Long-Short Term Memory (LSTM): In the internal architecture of the LSTM, the following hyperparameters need to be determined: the number of LSTM layers, the number of LSTM units in each LSTM layer, the number E LSTM of fully connected layers, the number of neurons in each hidden layer, and the activation function of each LSTM unit or neuron at each layer. We use one LSTM layer and three hidden layers, where the last hidden layer is the output layer. The activation function of each unit in the LSTM layer and of each neuron in the first two hidden layers is selected as ReLU . We use the linear activation function for all of the neurons at the output layer. Furthermore, there is only one neuron in the output layer to perform 1-step ahead forecasting. Then, we perform random search (as defined for MLP above) in order to find the local optimal number of LSTM units, which is denoted by h LSTM , as well as the local optimal number of neurons, denoted by h e in each of the layers e ∈ {1, 2}. The range of the search interval is [2,256]. We use integral powers of 2 in this interval for the value of h LSTM , and multiples of 2 for each e ∈ {1, 2}.

IV. ANALYSIS OF PREDICTABILITY AGAINST INFORMATION-THEORETIC BOUNDS
In this section, we aim to analyze the predictability of IoT traffic against information-theoretic bounds. To this end, we first present the calculation of the entropy of IoT traffic that is chopped up into sequential time windows, and we show the selection of the best value of the window size W . Then, using this entropy calculation, we analyze the predictability of traffic in the FBA class based on Fano's inequality [56] and the predictability of traffic in the VBP and VBA classes based on the estimation-theoretic counterpart to Fano's inequality. Specifically, we use Fano's inequality to relate the randomness inherent in time-series binary data (as quantified by the entropy of the probabilistic model of those data) to the probability of predicting those data correctly. Intuitively, the more random the binary data, the lower the probability of correct prediction. Fano's inequality gives the precise relationship between the entropy and the probability of correct prediction. This relationship is independent of the particular forecaster used and must be satisfied by all forecasters. We apply Fano's inequality to FBA data, which is a binary-valued time series, in which the device either generates no bits or a constant number of bits at each discrete time.
The estimation-theoretic counterpart to Fano's inequality generalizes this relationship from binary data to data that take values on a continuum. In this case, the probability of incorrect prediction is replaced appropriately by the MSE. Then, the counterpart to Fano's inequality specifies the precise relationship between the differential entropy of the probabilistic model of the data and the lowest possible MSE that can be achieved by any forecaster on such data. We apply Fano's inequality to VBP and VBA data. The multiple values taken on by data in these classes are approximated by a continuum, and the deviations are measured via the MSE.

A. ENTROPY OF IoT TRAFFIC WITH SEQUENTIAL TIME WINDOWS
First, let the time series IoT traffic be the collection of bursts that are generated by device i as {x i k } K i −1 k=0 , where K i denotes the total number of samples for device i in the considered dataset. Then, we split this time series traffic of device i into sequences over time windows of duration W , and we compute the set of sequences, denoted by T i (W ) as Accordingly, the entropy of T i (W ), denoted by S i (W ), is computed as where P(α) denotes the probability of sequence α. Now, we aim to determine the best value of W by comparing the actual entropy S i (W ) (calculated in (2)) with the entropy for the case where the generator of the traffic is a memoryless source that is distributed uniformly over the alphabet of the source at device i. 6 We let y i k denote the traffic data that is generated by such a memoryless source that has the same distribution as the instantaneous traffic generated by device i. Then, the set of sequences for y i k , denoted byT i (W ), 6 In our implementation, we consider only those values of data that are actually observed to be generated by the source to belong to the source alphabet. is computed as Accordingly, the entropy of the sequences for the memoryless source, denoted byS i , is calculated as Subsequently, we compare S i (W ) andS i (W ) for device i in each of FBA, VBA and VBP classes in Figures 2-4, respectively, and we select the best value of the window size W as where W * is selected to minimize the actual entropy S i (W ) while keeping the difference between S i (W ) andS i (W ) greater than or equal to a threshold , and we set = 3 100 max S i (W ). In this way, we aim to minimize the actual entropy while there remains a gap between the actual entropy and the entropy of the memoryless source.  (c) Wind Speed. The results in Fig. 2 (a) show that although S i (W ) decreases as W increases, the gap between S i (W ) and S i (W ) also decreases with increasing W and becomes almost zero for W > 20. The reason is that the number of samples is too low for W > 20 that the actual data is very similar to the data of the memoryless source. In addition, we see that the gap between S i (W ) andS i (W ) decreases below 3% of S i (W ) (which is the value of in (5)) after W = 13; thus, we select W * = 13. Similarly, based on the results in Fig. 2 (b) and Fig. 2 (c), we select W * = 9 for NMHC and W * = 15 for Wind Speed. Fig. 3 presents the comparison between S i (W ) andS i (W ) for W ∈ {1, . . . , 100} for each of (a) Elevator Button, (b) NO 2 and (c) Temperature. Based on the results in Fig. 3 (b), we select W * = 12 for NO 2 , and based on the results Fig. 3 (c), W * = 10 for Temperature. In addition, the results in Fig. 3 (a) show that the gap between S i (W ) andS i (W ) does not decrease below 3% of S i (W ) until W = 100; thus, W * = 100 for Elevator Button. Finally, in Fig. 4, we present the comparison between S i (W ) andS i (W ) for W ∈ {1, . . . , 100} for each of (a) LDR and (b) RH. For the VBP class, we select W * = 22 for LDR and W * = 13 for RH respectively based on the results in Fig. 4 (a) and Fig. 4 (b).

B. ANALYSIS FOR FIXED BIT -EVENT-TRIGGERED TRAFFIC (FBA) BASED ON FANO'S INEQUALITY
We now present an analysis of the predictability of IoT traffic in the FBA class by calculating an upper bound for the accuracy achievable with any forecasting scheme.
First, we let S * i denote the entropy of the traffic generation of device i that is measured from real data under the optimal selection of window length, W * . Also, let max i denote the maximum accuracy that any forecaster can achieve for the traffic of device i.
Then, the well-known Fano's inequality [56] is where H ( max i ) is the binary entropy function which is defined as Thus, in our case for the traffic generation patterns in the FBA class where M i = 2, (6) simplifies to Based on (8), we calculate the upper bound max i for the accuracy which is presented in Table 6 for each device i in the dataset [57]. The results in this table show that the upper bound for accuracy is slightly above 0.8 for SHEG and Wind Speed devices; that is, the traffic generation of these devices is more than 80% predictable. On the other hand, while max i for NMHC equals 0.66, the traffic patterns of SHEG and Wind Speed devices are significantly more predictable than that of NMHC.
Subsequently, we analyze the performance of distinct forecasting schemes which have been used for the forecasting of IoT traffic in recent works [37], [38]. To this end, we define ''Normalized Accuracy'', denoted by i,f normalized as a metric that measures the ratio of the accuracy that is achieved by forecaster f to the upper bound of the accuracy for device i; that is, where f i is the accuracy achieved by forecaster f . Note that for the traffic of device i, i,f normalized is the ratio of the accuracy that is achieved by forecaster f to the accuracy upper bound. Fig. 5 displays the values of i,f normalized for various forecasting schemes for each of the SHEG, NMHC and Wind Speed devices. In Fig. 5 (a), for the SHEG device, we see that Logistic Regression, MLP, LSTM and ANOVA-MLP forecasters achieve at least 85% of the accuracy upper bound, where Logistic Regression is the best performing model with i,f normalized = 0.89. In Fig. 5 (b), the results for NMHC device show that all of the forecasters achieves more than 80% of the upper bound, while the value of We now extend our analysis for the case where the IoT devices generate varying numbers of bits in each burst. To this end, we use the estimation counterpart to Fano's inequality (given by Theorem 8.6.6 in [62]), which is adapted for the predictability analysis of Variable Bit Traffic as where MSE f i is the Mean Squared Error (MSE) that is measured for the predictions of forecaster f for device i, and h * i denotes differential entropy of traffic generation pattern of device i for W * .
According to Equation (8.94) in [62], differential entropy can be used to bound the discrete entropy. The application of this bound to our case gives h * i ≥ S * i . Hence, we set a lower bound to the forecasting error as and we compute the lower bound for MSE f i , denoted by MSE LB i , as First, in Table 7   classes, the order of devices with respect to predictability is as follows: LDR, NO 2 , RH, Temperature, and Elevator Button.
Next, we analyze the performance of different forecasting schemes which are presented in Section III-B. To this end, we define ''Normalized Reciprocal MSE'', denoted by E i,f normalized , as a metric that measures how precise the predic- tions of forecaster f are compared to the lower bound of error for device i; that is, We present E i,f normalized for each of the Elevator Button, NO 2 , and Temperature devices in Fig. 6. For all of the devices whose traffic generation patterns fall in the VBA class, we see that LSTM significantly outperforms other forecasters. In addition, LSTM achieves almost the maximum normalized reciprocal MSE with 0.99 (i.e. MSE under LSTM is almost equal to the lower error bound) for Temperature, and it also achieves 0.6 Normalized Reciprocal MSE for Elevator Button and NO 2 devices.
Finally, Fig. 7 displays E i,f normalized for the LDR and RH devices whose traffic belong to the VBP class. For the LDR device, our results in Fig. 7 (a) show that the MLP-based forecasters (MLP, ANOVA-MLP and ACF-MLP) slightly outperform the other forecasters with E i,f normalized ≈ 0.25. On the other hand, for this set of results, we see that none of the forecasters can perform better than 4×MSE LB i . Moreover, for the RH device in Fig. 7 (b), we see that none of the forecasters can perform better than almost 3 × MSE LB i while LSTM and the MLP-based forecasters perform better than 1D CNN and Logistic Regression with E i,f normalized ≈ 0.34.

V. CONCLUSION
In this paper, we have presented the analysis on the predictability of IoT traffic at the MAC layer and have defined information-theoretic performance bounds (i.e. upper bound for accuracy and lower bound for MSE) by using Fano's inequality and the counterpart to Fano's inequality. We also have compared the performance of well-known ML techniques against these bounds. We have performed the analysis as well as the performance comparison on a publicly available dataset contains the raw traffic generation patterns of various IoT devices.
The results of our analysis show that the traffic generation pattern of the majority of considered IoT devices is predictable with more than 80% accuracy or less than 0.4 MSE. On the other hand, the comparison results in this paper show that the ML techniques used to forecast IoT traffic in the current literature perform very close to the upper bound for FBA traffic class while the performance of the majority of these techniques is far from the performance bound for VBA or VBP traffic classes.
In our future work, we shall (1) develop advanced Neural Network architectures as well as ML algorithms specific to each traffic class (FBA, VBA, and VBP); (2) investigate the development of novel forecasting schemes that approaches information-theoretic bounds, especially in the case of Variable Bit traffic; (3) develop predictive MAC-layer protocols that exploit the information-theoretic bounds on forecasting performance presented in this paper. where he has worked as the Director of the Graduate School of Natural and Applied Science. Since 2015, he has been a Professor in electrical and electronics engineering at Yaşar University, İzmir, where he has worked as the Director of the Graduate School. He has supervised 17 M.S. and 14 Ph.D. students and published over 50 SCI indexed journal articles, six peer-reviewed book chapters, and more than 80 peer-reviewed conference papers. He has participated in over 20 scientific research projects funded by national and international institutions, such as the British Council and the French National Council for Scientific Research. His research interests include artificial neural networks, biomedical signal and image processing, nonlinear circuits-systems and control, and educational systems. VOLUME 10, 2022