Prediction of Network Traffic in Wireless Mesh Networks using Hybrid Deep Learning Model

Wireless mesh networks are getting adopted in the domain of network communication. Their main benefits include adaptability, configuration,and flexibility, with added efficiency in cost and transmission time. Traffic prediction refers to forecasting the traffic volumes in a network. The traffic volume includes incoming requests and outgoing data transmitted by the network nodes. The previous logs of traffic in the network are used for extracting patterns that help for accurate predictions. In this paper, an analysis of various existing traffic prediction methods is done. Specifically, the analysis of a case study where the performance of the High-Speed Diesel (HSD) pump is predicted by observing its output. A network of sensors forms a less mesh network, sensors act as nodes while reading the parameters namely, three phase Current, Voltage, Temperature, and Vibration. In this case study, a High-Speed Diesel pumps’ performance is predicted by predicting the vibration parameter as the output parameter. Other parameters affecting the performance of the High-Speed Diesel pump which are causing the change in vibration value are identified. Various algorithms including Statistical Auto-Regressive Integration and Moving Average, Poisson’s regression, and few Machine Learning and Deep Learning algorithms like Decision Tree Regressor,Multi Layer Perceptron, Linear Regression, and Long Short-Term Memory are implemented and evaluated for this purpose. Along with the comparison, a novel architecture using Convolution Neural Network and Long Short-Term Memory is described in this paper. The result and comparison between these give the clear understanding that the suggested novel Convo-LSTM model gives better performance and helps to predict the performance of the High-Speed Diesel pump. The proposed system makes a strong case for the network traffic prediction, where the use of historical data is collected over the wireless mesh network. A similar analogy can be used where this model could be implemented further for network monitoring tasks.


I. INTRODUCTION
N ETWORKS are playing an important role during this age of digital expansion. For a given network, the most critical issues are its security, load balancing ability, maintainability, and speed. Various network topologies have existed, including bus, ring, star, mesh, hybrid, etc. Out of this, mesh networks have been one of the most popular choices owing to their stronger connection ability, lesser disadvantages in terms of lag and rigidity [1]. Wireless mesh networks are wireless based on the mesh topology, as shown in Figure 1. The main advantages of wireless mesh networks are their easy adaptability and configuration ability. Any future changes can be easily accommodated, thus leading to lower costs and maintenance. The main concepts related to a wireless mesh network are traffic prediction, traffic routing, and traffic control. Out of this, traffic prediction is a crucial aspect owing to being the fundamental block on which the performance of routing and congestion control algorithms is dependent. Traffic prediction refers to accurately predicting the possible traffic in a network at a given instance based on previous network data. An accurate estimate of the network traffic can help the network administrator improve the availability and transmission speeds of the network [2], [3]. Previous approaches for network traffic prediction have primarily focused on the host server logs along with consideration of the network parameter configuration [4]- [6].This paper compares the performance of the six algorithms, namely, Decision Tree Regressor,Linear Regression, Multi-layer Perceptron, Poisson's Regression, Auto Regressive Integration and Moving average, Long Short Term Memory. Out of these, ARIMA is well-known model [7], Poisson's regression is a probabilistic model [8] and both are implemented traditionally in various applications.Decision Tree Regressor [9] and Linear Regressor [10] are machine learning algorithms whereas Multi-Layer Perceptron [11]is a subset of Deep Neural Network and LSTM is artificial recurrent neural network (RNN) architecture [12].This way the paper compares the performance of various different algorithms implemented on the same, real world data.This paper proposes a neoteric, hybrid technique for network traffic prediction in wireless mesh networks by focusing on the historical data collected over the network. Specifically,the case study of High-Speed Diesel (HSD) pumps is considered, where different sensors are used to collect the data values. Specifically, the case study of HSD pumps is considered, whose sensor readings are used to evaluate the performance. The installation of sensors(for reading various parameters) forms a mesh network, that would be a good indicator to portray a typical mesh network traffic scenario. The sensor's mesh network is used to collect the data that includes readings of input and output parameters. In wireless mesh networks the mesh nodes like in MANET nodes can form spontaneous connections with other nodes due to their intrinsic features to connect with and can traverse the network, collecting data from sensors, RFIDenabled nodes, and other fixed Wireless nodes [13]. Wireless Sensor Networks(WSN)s have been identified as a significant enabler of the IoT models since their inception. In IoT, all sensor nodes can get connected to the Internet to share and receive data [14]. A set of statistical and machine learning algorithms are used for the multivariate time series analysis of the collected data and subsequent output prediction. The traditional, fundamental, technical analysis approach is used by both statistical and non-statistical methods, with a significant focus on lag, first-order difference, and secondorder difference. The stationarity and nature of the time series are more important in the fundamental analysis method [15]. Convolution Neural Network (CNN) is commonly employed in feature engineering because it focuses on the most evident elements in the line of sight.Long Short Term (LSTM) is extensively utilized in time series because it has the property of adopting/enhancing following the sequence of time [3].
In this paper, a novel Convo-LSTM architecture is proposed for the wireless mesh network's traffic prediction, citing a case study of the HSD pump. A Vibration forecasting model based on one dimensional (1-D) CNN-LSTM is built considering the properties of CNN and LSTM. The data tuples are collected over one year's period, with constant monitoring of the network, which is a real world data. The fundamental structure of the model is a hybrid or mixing of one dimensional (1-D) Convolution neural network and Long Short Term (LSTM). It has an input layer, one-dimensional convolution layer, pooling layer, LSTM hidden layer, and full connection layer in its architecture. The proposed model is evaluated in terms of Mean Square Error, Mean Absolute Error and Root Mean Square Error to check its performance.
The main contributions through this paper could be enlisted as follows: 1) A hybrid model for network traffic prediction is proposed for the wireless mesh networks formed by various sensors, with the HSD pump as a case study. 2) A set of statistical, non-statistical, deep learning and machine learning algorithms are implemented, and results are compared for the collected multivariate time series data. 3) After applying these time-series based algorithms, unbiased analysis of the performances is done, which can be helpful for the researchers in the domain of wireless mesh networks. 4) An unbiased analysis of the results that a researcher can expect when applying these multivariate time-series algorithms in the domain of wireless mesh networks. The paper outline is as follows: Section 2 provides a review of the previous work done in this domain. Section 3 explains the data collection process and data description. Section 4 elucidates the various analysis methods that are applied for traffic prediction. The obtained results are presented and analyzed in Section 5, while the conclusion of our findings is shown in Section 6.

II. BACKGROUND WORK
Network traffic prediction and routing have been a topic of interest and research in the last few decades. Approaches in this domain have comprised the use of various time series models, namely machine learning algorithms, deep learning (neural networks), and various traditional statistical methods Time series models like ARIMA have been preferred for forecasting, even in the case of regular networks. Zhou et al. [5] used a combination of ARIMA and GARCH models for prediction and modeling of the network traffic. Waveletbased transformation models have also been used for this cause. Unlike image or text data problems, neural networks made inroads into network analysis and prediction almost a decade earlier. Khotanzad et al. [16] were one of the first to use neural networks for high-speed network traffic prediction. Alarcon et al. [17] applied a multi-resolution neural network for the task. Chen et al. [18] deployed a flexible neural tree for the resolution of traffic in the case of small scale networks. There have also been extensive studies comparing the performance of traditional methods versus deep learning networks [19]. Vinayakumar et al. [20] applied various sequence models, including Long Short Term Memory (LSTM)s, Recurrent neural networks (RNN), and Image Recognition Neural Networks (IRNN), for prediction of network traffic. Similar types of traditional and advanced approaches have also been applied for traffic prediction in the specific case of wireless networks. Amongst one of the earlier approaches, Gowrishankar and Satyanarayana [21] presented a neural network architecture for wireless network traffic prediction. Xiang et al. [22] had proposed a hybrid ANNbased approach for this task. Nikravesh et al. [23] analyzed the use of multiple techniques, including SVM, MLP, and MLPWD. Ke Wang, et al. [10] have suggested a hybrid model using CNN and LSTM where the ability of CNN and LSTM combined for traffic flow prediction. Stefany Coxe et al. [24] have worked on Poisson's regression and alternatives where they have stated that Count data represent the number of times an activity occurred over a specific period. e.g., suppose one wants to measure or observe how many hyper aggressive actions are expressed by children while playing during a playground area on particular occasions. In that case, that is nothing but count data. As per them, almost all Poisson regression models provide easy method to implement analyses of count data. Farhan Mohammad Khan and Rajiv Gupta used an Auto-Regressive Integrated Moving Average (ARIMA) model to compare the accuracy of the predicted model [?]with a nonlinear autoregressive (NAR) neural network. Recently, for the daily prediction of COVID-19 cases for the next 50 days, the model was developed and implemented. Nie et al. [25], [26] have made multiple contributions in this domain in recent years. One of their initial approaches involved the use of deep belief networks for wireless mesh backbone networks. Recently, they further enhanced their work by applying reinforcement learning in an IoT setup [26]. Qiu et al. [27] deployed RNNs in a Spatio-temporal sense for improved performance for traffic prediction. Xu et al. [28] applied a multi-layer Gaussian framework for this task. Recently, researchers have also seen the rise of attention mechanism [29] and deep learning in interdisciplinary ways [30]. This trend is slowly reflected in traffic prediction.He et al. [31] applied a meta-learning scheme for faster traffic prediction in smaller networks. Li et al. [32] combined wavelet analysis with backpropagation neural networks for traffic flow analysis in wireless networks. Zhang et al. [33] considered a spatiotemporal network with a modified sequence model. Kim came up with an INGARCH model, an enhancement over the previous GARCH models [34]. Researchers have also used some statistical and evolutionary algorithms. Recently, most notably Li et al. [32] applied a Gaussian regressor with a Prophet model for user traffic prediction in networks. A quantum-PSO approach has also been tried out in this domain [32]. Costa et al., gives a thorough overview of predictive maintenance projects in Industry 4.0, identifying and classifying techniques, standards, and applications.Their survey's key contributions include a discussion of the existing issues and limitations in predictive maintenance, as well as a proposal for a new taxonomy to define this research topic in light of Industry 4.0 requirements [35]. As per Costa et al.the industry has entered an era due to the necessity to adapt and adopt new technologies.The Internet of Things (IoT) is a recent era for communication, in which all kinds of objects in our daily lives, such as smartphones, sensors, or devices, that have been linked to networkenabled objects (such as RFID) to communicate with each other and get to be a part of the Internet. Industry 4.0 is characterized by connectivity, data volume, tech gadgets, inventory reduction, customization, and controlled production [35]. Tan et al.,in their research have analysed recent advancements in smart monitoring and data analytics that have enabled infrastructure predictive maintenance (PdM). As per them, the industry is currently hesitant in adopting new smart monitoring sensors, information technologies', and data analytics to achieve PdM. PdM is data-driven, relying on smart monitoring and data analytics insights to prevent downtime through maintenance, protection, and repairs. PdM is a relatively new trend in the industry that has recently taken a leap in the industrial world since the 1990s; yet, their recent analysis of the industry revealed that its applicability in infrastructure maintenance is quite limited [36]. Chuang et al.,while emphasizing the importance of predictive maintenance, have stated that [37], business in all industries can be redefined with the emergence of AI and IoT. The information gathered is utilized to not only used to draw inferences from the past but also to forecast the future. Artificial neural networks and evolutionary algorithms are two of the most common AI techniques for machine diagnosis. According to them, Predictive maintenance cuts down accidental device downtime, lowers maintenance costs, and extends equipment life cycle, among other benefits. The fundamental infrastructure of an IoT framework consists of sensors, actuators, computation servers, and the communication network [38]. The TCP/IP (Transmission Control Protocol/Internet Protocol) communication protocol transmits sensor data. The environmental sensing sensors are programmed into the programmable interface controller, and the data is saved in a historical manner [35], [37]. Pallavi et al. have mentioned that because IoT devices are typically located in geographically separated places, they communicate primarily over wireless mediums. They also have stated that Wireless channels are known for having significant distortion levels while being unstable. Communication techniques are essential for the VOLUME X, 2020 analysis of IoT devices. In this scenario, reliably transferring data without too many retransmissions is a significant concern. The fundamental infrastructure of an IoT framework consists of sensors, actuators, computation servers, and the communication network [38]. MANET is a vital element of the IoT network, serving as its backbone. MANET nodes with mesh architecture can form spontaneous connections with other nodes due to their intrinsic features, requiring minimal infrastructure. MANET nodes can traverse the IoT network, collecting data from sensors, RFID-enabled nodes, and other fixed Wireless nodes The MANET nodes take the most effective route to connect with the Internet gateways, which one is available. MANET nodes, like sensor nodes, can be employed as an essential technology in a variety of IoT applications.MANET nodes and sensor nodes (including RFID-enabled devices), forming a MESH Network, can be deployed in huge numbers due to their self-configuring nature. For the past decade, researchers have been working on the Internet of Things (IoT) using a variety of mature technologies such as Radio Frequency Identifiers (RFID), Wireless Sensor Network (WSN), Mobile Adhoc Network (MANET), and so on [39]. Nagarajan et al.have proposed remote health monitoring and data analysis by combining IoT and Deep Learning techniques. They have suggested a new IOT-based FoG-assisted cloud network architecture that collects real-time health care data from patients via numerous medical IoT sensor networks. The analyses of it by using a deep learning algorithm installed at a Fog-based Healthcare Platform. Furthermore, they have proposed a methodology used to analyze the process in real-time for smart cities. As per them, for timely, accurate and secure data analysis, new IOT-based FoG-assisted cloud network architecture can be effectuated to various domains such as traffic analysis and management, agriculture and smart farming, weather forecasting etc [14]. In the view of Manuel et al., WSNs have been identified as a significant enabler of the IoT models since their inception. In IoT, all sensor nodes can get connected to the Internet to share and receive data; however, in WSNs, the nodes do not have a direct internet connection. To connect to the Internet, all nodes in the WSN need a mediator [40]. In their work, Krishnasamy et al. mention that, a wireless sensor network (WSN) comprises a large number of sensor nodes that can both sense and communicate. The sensor nodes work together to gather and send the data to the sink node, also known as the coordinator node. The primary goal of sensor nodes is to oversee the environment before processing and transferring data to an analysis centre. Sensors are installed in locations, which are frequently uneven in design. Sensors are also installed randomly in specific sites that are irregular in shape, relying on the transmission range. As a result, an algorithm that can adapt to each geographic region with different deployment structures is required [41]. Authors Deverajan Ganesh Gopal et al. have researched a DANET, or dynamic ad hoc network, is a network of multiple dynamic nodes that does not necessitate any infrastructure. As and when needed, the movable nodes build a temporary network. This form of a network without the concept of a centralized entity, so nodes need to lean on another node to send packets. Multiple copies of data are created to enhance data availability. The access frequency and node level are taken into account while allocating copies [42]. In the research by Devarajan, Ganesh Gopal et al, the Internet of Things (IoT)is a computational concept that envisions widespread Internet connectivity, transforming everyday objects into connected devices. The fundamental methodology in an IOT-based model is the transmission of billions or perhaps trillions of sensitive data capable of detecting the surrounding situation, communicating and transferring precise information, and then providing feedback to nature. Remote connections are frequently used to meet the adaptability and versatility required by IoT interchanges. While cellular innovations such as 3/4/5G provide interface separations of large devices, they necessitate framework support and legally allowed band. They have explained the concepts of IoT and IIoT,as well as the current trend of robotization and data exchange in manufacturing breakthroughs known as Industry 4.0 [43]. As per Farrukh et al.,it is understood that, With the creation of highly accurate and accurate algorithms, investigation aims to focus on building more rigorous and practical methodologies [44].
It should be noted here that rarely has the work on traffic prediction focused on the historical data, which eventually can be used for traffic prediction. As a result, having models that are both resilient and appropriate is critical. Further, a standard paper highlighting all the contemporary machine learning and deep learning methods together could benefit young researchers in this domain. Also, there is hardly any notable work done where data is collected over a given period and analyzed for identifying the patterns in the collected data where real-life data is considered and eventually predicting the performance of the network. All these points highlight the scope for improvement and the need for our research work.

III. DATA COLLECTION AND PREPROCESSING
Before describing the algorithms used for modeling the data, the data collection and preparation process is as mentioned below.

A. DATA COLLECTION
The data was collected for over a year using a set of HSD pumps. A total of eight sensors were placed for collecting data, forming a wireless mesh network.The collected data consists of seven input variables and one output variable. Each of these parameters are described in Table 1. The total data consists of 8960 such tuples, each containing 1 identifier (Date / Time), seven input variables(sensor reading for 3phase Current, 3phase Voltage and Temperature), and 1 output variable that is reading from the Vibration sensor. The data is divided into training and test sets in the ratio of 4:1 i.e. an 80% -20 % split is observed. The data is processed before converting it into a more algorithm-friendly format.

B. DATA PREPROCESSING
The following checks were performed on the data as a part of preprocessing: 1) Check the time series stationarity, that is if the time series is stationary or not. 2) Check whether a consistent mean and standard deviation exists in the collected data range or not. This is verified by plotting the mean and standard deviation values on a rolling window across the entire data range.
3) The ADFuller test for stationarity checks: The AD-Fuller test is used to check how well a trend persists over the time series. This is achieved by keeping a null hypothesis and an alternative hypothesis. Results of the test are shown in Figure 2. Our null hypothesis is that the time series has a common root and is nonstationary. The alternative hypothesis would be the series being stationary. The p − value for the null hypothesis is calculated. The threshold is set to 0.05 for the p-value. As the value is observed to be less than the threshold, it can be concluded that the null hypothesis is true and that the series is indeed stationary. 4) Application of seasonal decompose: The seasonal decompose method is applied to get the triad values used for setting up the stationary time series that is used for forecasting. Residuals, Seasonality, and Trends are the three values from this method Figure 3.

IV. ALGORITHMS USED FOR TRAFFIC PREDICTION
In general, there are two types of traffic modelling for shortterm traffic prediction: parametric and non-parametric meth- ods. To employ parametric methods, first a well-structured but flexible family of models is created, after which the model parameters using training data must be estimated. Forecasts can then be made using the model. This is a wellknown method in traffic modelling, as seen in Figure 4. The Auto-Regressive Integrated Moving Average is a prominent modelling tool for traffic forecasting (ARIMA). [45]. Artificial intelligence-based regression models can provide the necessary skills. Regression is a solution for creating models capable of predicting the value of an output variable in accordance with a set of input variables that is widely utilized in many disciplines. AI-based algorithms are frequently utilized for complex regression models. This type of regression approach detects complex correlations between input variables and interactions between input variables and output variables automatically [46]. Thus, forecasting Internet traffic is critical for network planning, resource allocation and network anomaly detection caused by attacks. This is because enhanced TCP/IP (Transmission Control Protocol/Internet Protocol)traffic forecasting can assist network providers in optimizing their resources. Better traffic predictions can assist avoid congestion and resource waste in bandwidth allocation schemes. Short-term prediction and long-term prediction are the two categories of network traffic prediction. Short-term predicts traffic conditions in the near future based on historical and present traffic data. A forecast's horizon is only a few minutes long. On the other hand, long-term prediction provides traffic estimates for longer time periods, such as years. Traditional forecasting models such as the Poisson regression model (PRM) is used to model a counting variable , which is usually computed by using maximum likelihood estimation (MLE) method [47]. Auto regressive (AR) and Auto regressive Integrated Moving Average (ARIMA) can figure-out the linear and Short Range Dependencies (SRD) between terms, but not the Long Range Dependencies (LRD), resulting in poor performance when used for Internet traffic forecasting. Nonetheless, they are widely used [48]. In this paper, both statistical and non-statistical algorithms are implemented for the task of output prediction. The description of each of these is given below: VOLUME X, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.  The Decision Tree Regressor(DTR) takes together all the input features and iteratively generates multiple trees trying out possible combinations of the root, internal, and leaf nodes amongst all the features [9]. The tree with the closest predicted output to the actual traffic is considered the best tree for subsequent inference.
For each tree, for each node, two metrics can be calculated: gini index and entropy. Gini index is a probabilistic measure that indicates the probability that the particular feature being at the given node would lead to the prediction error crossing a set threshold. At a particular node index n, the Gini index for a feature f i is calculated as follows: where p(f i ) indicates the probability of f i being present at node n. Ideally, the feature that produces the lowest Gini index for the particular node is assigned to that node. Overall, creating the decision tree would be to reduce the entropy i.e. degree of randomness of possible traffic at each position. Entropy is defined as follows: The lesser the entropy, the more confident and accurate the predictions would be. The Decision Tree Regressor figures out the best tree setup by iteratively trying out multiple combinations of positions of the seed and internal nodes in the tree.

B. LINEAR REGRESSION
The Linear Regression model uses a linear mapping of features to get a continuous prediction output. It is one of the most primitive algorithms represented mathematically as: where ϵ is the error factor put in to accommodate normalization [10].To approximate the given data, the regression and log-linear models can be employed.The data is modelled to match a straight line in (Simple) linear regression. A dependent variable, y i.e. response variable, can, for example, be described as a linear function of another random variable, x. i.e. predictor variable [10].

C. MULTI-LAYER PERCEPTRON
The multi-layer perceptron model is a type of deep learning architecture where there is a combination of hidden layers through which the input features are passed to derive the output value. Each node in the hidden layer assigns an importance weight to each input that it receives from its preceding layer along with a bias value for error normalization. Generally, a fully connected setup is used wherein each node in a layer is connected to all nodes in its next layer [11]. Mathematically, the value at a particular node i in a given layer l is derived as follows: A rectified linear unit activation function is used at the output layer to get a continuous prediction value. After comparing the predicted value with the actual value, the network back-propagates i.e. updates the initially randomized weights so that the predicted output matches the actual value as closely as possible.

D. POISSON REGRESSION
The Poisson regression is a probabilistic model [8].It deploys a probabilistic mass function (PMF) to check what could be the probability of observing a particular continuous output y for a given input containing a set of dependent variables X ∈ [X 1 , X 2 , .., X n ]. This function is defined as follows: Where λ i is the mean rate, also meant to be the predicted value. The predicted regression output for a given input x is defined as follows: Here, β is the regression coefficient or feature importance that is given to each dependent variable. The training objective of a Poisson regressor model is to find this β value. The best β value would be the one that produces the maximum value for P M F . The maximum value would be when the slope of the PMF curve is minimum, which is better derived by taking derivatives of the logarithm of PMF. This derivative equation is as follows: This is equated to zero to find the best β value. This value is then used to derive the prediction for a new test input x p : The training of Poisson regression is done with the objective of finding the values of the regression coefficient β that would make the vector of observed count y most likely. Following are the steps to be taken: 1) Convert the data set into only numeric values.
2) The data set should contain only non-negative integer values that represent the frequency of an event during a set interval. For our problem statement, it would be the traffic of network crossing host in particular interval. 3) Then find the regression variables that will influence the observed counts to derive the maximum PMF value.

E. AUTO REGRESSIVE INTEGRATION AND MOVING AVERAGE
ARIMA, that stands for Auto Regressive Integrated Moving Average, deploys a combination of auto-regression and moving average algorithms to get future predictions from past time series value [7]. Mathematically, it is represented as follows: There are some terms based on auto-regression and some terms based on moving average. If terms in the time series are under-different, add more AR terms, and in cases of excess difference then add more MA terms.ARIMA (p, d, q) method applies lag at the 1st or 2nd level if the nonstationary problem exists in the data; otherwise, if stationary without lag, then ARMA (p, q) is an alternative method, hence p for Moving Average (MA) and q for Autoregressive (AR) order that is the number of errors lag in ARIMA model forecast.The most common method used for making a sequence stationary is to subtract the initial value from the current value. Depending upon if the type of time series i.e. univariate or multivariate one or more lag is anticipated. Subsequently, the value of d signifies the smallest number of differentiation which is prescribed to keep the series stationary, so if without differentiation, the data series is still stationary, then d = 0. The identification method began by measuring the presence of autocorrelation (ACF) and partial autocorrelation (PACF) by plotting the correlogram by [49]. Then, depending on the ACF and PACF of the series, estimate of relevant models, setting the level of auto-regressive and moving averages. The (p, q) identified and the best model is performed based on the spikes and curve in the graph of ACF and PACF. Once the best model is selected, forecasting is done using parameters (p,d,q) given by the model. Diagnostic forecasting evaluation involves evaluating the efficacy of the currently built model using statistically relevant measures such as the Akaike information criterion (AIC), Bayesian criterion (BIC), and mean square error measurement [49].

F. LONG SHORT TERM MEMORY
Long Short-Term Memory is a recurrent network-based architecture where it keeps track of a cell state to remember certain memory trends in the series, shown in Figure 5. For every point in the state, the model decides whether to let go of some information, update some pattern information, or output any new information.LSTMs are specifically developed to prevent the problem of long-term dependency. All recurrent neural networks are made up of a series of repeated neural network modules. This recurring module in standard Recurrent Neural Networks (RNNs),shown in Figure 5, will have a simple structure, such as a single tanh layer [12]. There are three gates: forget, update, and output gates that operate on the given input for a time series input X i and intermediate output h t .

G. CONVO-LSTM
This paper proposes a novel combination of CNN along with LSTM such that the feature extraction ability of CNN can benefit the sequence recurrence mapping ability of the Recurrent Neural Networks(RNN). Figure 4 depicts the model structural diagram where as Figure 6 depicts the proposed model's architectural diagram. An input layer, onedimensional convolution layer, pooling layer, LSTM hidden layer, and full connection layer are the main constituents that build the main structure of Convo-LSTM. Lecun et al.
proposed the CNN network model in 1998. The convolution operation extracts the attributes from the input layer vectors [50]. In this case, exclusively one dimensional -1 D operations are only performed owing to the data structure. Pooling layers are deployed to reduce storage requirements and avoid VOLUME X, 2020 the huge training costs in the system. The pooling layer subdivides the convolutional layer's small rectangular chunks to generate a single output from each block. Pooling can be done in various ways, such as by calculating the average or the maximum. The average pooling takes the average value of the block it is pooling, whereas the max-pooling takes the maximum of the block it is pooling [51]. Firstly, the CNN layer extracts the features from the data, which are the readings from the Current, Voltage, Temperature and Vibration sensor's readings collected over the previous year. The LSTM is then used to forecast the output, Vibration, based on the retrieved feature data.As per the experiment's findings, With the maximum prediction accuracy, the CNN-LSTM that is Convo-LSTM can provide credible forecasting of the output parameter (Vibration). Time Complexity of Convo-LSTM: To determine the time complexity of both the forward propagation and back propagation processes, the total number of operations at each 1D CNN layer must first be determined, and then the entire number of operations must be aggregated to determine the overall time complexity [52]. During forward propagation P, the number of connections to the preceding layer at a CNN layer,l,is N l−1 N l the previous layer's number of connections is N l−1 N l , an individual linear convolution, which is a linear weighted sum, is evaluated. Let S l−1 and W l−1 represent the vector sizes of the preceding layer output, S l−1 k and the kernel (weight), respectively. A linear convolution is constituted of(S l−1 W l−1 ) 2 multiplications and S l−1 additions from a single connection, ignoring the boundary conditions. If the bias is ignored, the aggregate number of multiplications and additions in layer l will be: A low computational complexity is attained in all of the 1D CNN . Thus, in forward propagation, the total number of multiplications T(mul) and total number of addition T(add), in the CNN layer l will be Now , similarly, at back propagation iteration, the total number of multiplications and additions due to the first convolution will, therefore, be: So at each BP iteration, the total number of multiplications and additions will be, respectively: T F P (mul) + T F P (add) + T B P (mul)T B P (add) (21) Statistical Analysis: CNN stands for convolutional neural network and is a type of feed-forward neural network. It can be exploited to forecast time series with great success. Two inherent features namely, Local perception and weight sharing, of CNN can significantly lower the number of parameters, which in turn help enhance model learning efficiency. The convolution layer and the pooling layer are the two fundamental constituents of CNN. Each convolution layer has a number of convolution kernels.The formula mentioned below is used for calculating them where I t is output value as a result of convolution, x t is input vector,tanh is activation function , b t is bias and k t is convolution kernels' weight.The data features are obtained once the convolution layer completes the convolution operation, but, as the extracted feature dimensions are quite large, after the convolution layer, to lower the feature dimension and to lower the cost of training the network, a pooling layer is added. The forget gate receives the output value of the previous moment and the input value of the current time, with which the forget gate's output value is calculated, [50] as indicated in the following formula: The last time's output value and the current time's input value are both fed into the input gate, and the output value and candidate cell state of the input gate are calculated, as shown in the formulas below: The final output O t is calculated as follows: where f t is having the value range of(0,1),W f and b f are the weight and bias of forget gate. Similarly,W i and b i are the weight and bias of the input gate having value range (0,1)and C t is the output of the current cell with value and (0,1) [50].
The following is a summary of the proposed architecture: The convolution layer and the pooling layer are the two fundamental components of CNN. Each convolution layer has several convolution kernels. The data features are extracted after the convolution operation of the Complexity the convolution layer. As the pulled feature dimensions are very large, a pooling layer is added after the convolution layer to reduce the feature dimension and to reduce the cost of training the network. The convolution operation extracts the features from the input layer vectors. In this case, exclusively 1D operations are only performed owing to the structure of the data. To avoid the introduction of huge training costs in the system, pooling layers are also deployed to reduce storage requirements. The summary of the proposed architecture is shown in Table 1 To demonstrate the model's usefulness, the Sample Data set Description The proposed system makes a strong case for the network traffic prediction, where the use of historical(time series data) data is collected over the wireless mesh network. The sensors nodes with mesh architecture form connections. The data is collected over one year starting from 1st June 2019 to 8th June 2020 are obtained from the sensors measuring the Temperature, Three-phase Voltage, Three-phase current and Vibration values of the HSD Pump. A snapshot of the data is shown in Figure 7. As to implement various algorithms the data is required to be divided into training set, testing data set and validation set.The first 7178 readings of the data are taken as the training set, and the data of 1345 readings are taken as the validation set the last 450 readings as the test set. According to the influence factors, including the Temperature, Three Phase Current (IR, IY, IB), Three Phase voltage (VR, VY, VB), the HSD pump's Vibration is predicted On daily, hourly, weekly, monthly, and yearly basis. As per Standard industry practice, LM35 temperature sensor (Texas Instruments, Dallas, TX, USA) is a precision IC temperature sensor with a proportional output is used for temperature. A Hall Effect-based DC current sensor is the ideal method of measurement for monitoring the current of the motor. Allegro MicroSystems LLC's ACS712 current transducer is used in this circuit (Worcester, MA, USA) [39]. High frequency accelerometers with a flat frequency response up to 28kHz for multi-stage compressors and boiler feed pumps monitoring and bearing wear detection. The snapshot of the data set is shown in Figure 7. Model Description Various parameters of the Convo-LSTM  Table 2, it shows the CNN-LSTM parameter settings used in this experiment. The specific model is built as follows, based on the parameter settings of the Convo-LSTM network: A three-dimensional data vector is used as the input training data (None, 10, 7), where 10 is the time step size and as there are 7 attributes of the input data. A one-dimensional convolution layer is used to send the data at first, which extracts additional features and produces a three-dimensional output vector (None, 15,64), where the size of the convolution layer filters is 64. After passing through the pooling layer, the vector is transformed into a three-dimensional output vector (None, 13,32). The output vector is then trained using the LSTM layer and two dense layers, and the output data (None, 64) goes through another complete connection layer after training to retrieve the output value; 64 is the number of hidden units in the LSTM layer. This CNN-LSTM model structure is shown in Figure 6.

V. RESULTS
The models are trained using the data from the processed training set, namely, Decision Tree Regressor (DTR), Lin-VOLUME X, 2020 The results are evaluated across the following three metrics: • Mean Absolute Errors (MAE): Absolute difference between actual and predicted output value.From the observations it is found that the MAE for Convo-LSTM model is 0.1, which indicates the model's performance over other models.
• Mean Square Error (MSE): Square of the difference between actual and predicted output.
The hourly, daily, weekly, monthly and yearly prediction of the output is presented as the results as mentioned in the Table 3 to Table 7. The result shows the comparison of the MAE, MSE and RMSE value of Convo-LSTM over other models for hourly, daily, weekly,monthly and yearly models, which indicates the model's performance over other models.From the observations it is found that the MAE for Convo-LSTM model is 0.1, MSE is 0.025 and RMSE is 0.16 which indicates the model's performance over other models. For interpretability of the results,the graphs indicating comparison between predicted and actual output for the implemented algorithms are shown,where Figure 8 MLP, Figure 9 ARIMA, Figure 10 DTR, Figure 11 Linear Regression, Figure 12 Multi Layer Perceptron, Figure 13 LSTM, Figure 14 Convo-LSTM. The resultant graphs have been added in this paper.

VI. CONCLUSION
This paper proposes a hybrid model for traffic prediction in wireless mesh networks by application of regression methods on system configuration parameters. Specifically, six different algorithms are applied: decision tree regressor, linear regression, multi-layer perceptron, Poisson regression, ARIMA, and LSTM on the three main feature types: threephase current, three-phase voltage, and temperature to predict the output-Vibration of the HSD Pump. This paper also proposes a new Convo-LSTM setup for this task and achieve good results from the same. The system was evaluated on five different intervals: hourly, daily, weekly, monthly, and yearly and it is found the Convo-LSTM algorithm to be the best performing one. A time-series multivariate data set is similar to that of wireless mesh networks data set. This gives a direction to the researchers to implement the proposed algorithm to predict the volume of network traffic. Future work in the domain includes the application of some contemporary methods like the attention mechanism and transformers for traffic prediction. Multimodal networks that combine the physical configuration values and the network system log values can also be proposed. As network systems become larger and more distributed in nature, smart algorithms that can automatically predict the incoming traffic and accordingly allocate the resources would become the need of the hour. KETAN KOTECHA has Ph.D.and MTech from (IIT Bombay) and is currently holding the positions as Head, Symbiosis Centre for Applied AI ( SCAAI).Dr.Kotecha has expertise and experience in cutting-edge research and projects in AI and Deep Learning for the last 25 + years. He has published 100+ widely in a number of excellent peer-reviewed journals on various topics ranging from cutting-edge AI, education policies, teaching-learning practices and AI for all. He is a recipient of the two SPARC projects worth INR 166 lacs from MHRD govt of India in AI in collaboration with Arizona State uni, USA and the University of Queensland Australia, and also the recipient of numerous prestigious awards like Erasmus+ faculty mobility grant to Poland, DUO-India professors fellowship for research in Responsible AI in collaboration with Brunel University, UK, LEAP grant at Cambridge University UK, UKIERI grant with Aston University UK, and a grant from Royal Academy of Engineering, the UK under Newton Bhabha Fund.Dr.Kotecha has published 3 patents and delivered key note speeches at various national and international forums, including at Machine Intelligence Lab, USA, at IIT Bombay under World bank project, at International Indian Science Festival organized by Department of Science Technology, Govt of India and many more. Currently,he is also an Associate Editor of IEEE Access journal. VOLUME X, 2020