Prediction of Scenarios for Routing in MANETs Based on Expanding Ring Search and Random Early Detection Parameters Using Machine Learning Techniques

Routing protocols in Mobile Ad Hoc Networks (MANETs) play a pivotal role in ensuring quality of service (QoS) and improving network performance. Selection of optimal routing protocol and suitable parameters for a given network scenario is a major task that ultimately affects the behavior of network. This work exploits machine learning (ML) techniques for the selection of adequate routing parameters and protocol by regression of parameters in given network scenario to ensure optimal performance. The network is trained based on parametric setup of expanding ring search mechanism (ERS) and random early detection (RED) technique to estimate network throughput, end to end (E2E) delay and packets delivery ratio (PDR) and is tested via wide-ranging simulations in varying network topologies. Both RED and ERS mechanisms are aimed to control link and node level congestion in the reactive routing protocols and our aim is to select the best suited parameters for given network topologies based on ERS and RED parametric setups and improve performance for ensuring QoS. ML algorithms are trained and tested for their performance in varying network topologies. We have exploited these models with best performance for ERS and RED based routing in given topological arrangements. The performance of the ML algorithms is evaluated on the basis of root mean squared error (RMSE) and mean absolute error (MAE) for regression settings. Prediction models with up to par RMSE and MAE out-turns are attained and exploited for selection of suitable ERS and RED parameters and routing protocols in order to ensure the QoS for given network scenario. Variants of standard routing protocols are devised based on their performance and the ML techniques are exploited for prediction of QoS parameters to decide on the optimal variant that attains significant improvement in performance. Results are shown to confirm that considerable improvement in QoS is attained.


I. INTRODUCTION
In recent research, ad hoc routing has taken over the networking domain due to its high end applications in internet of things (IoT), wireless ad hoc sensor networks (WASN), vehicular edge computing (VEC), and ad hoc networks (VANETs) and unmanned mobile vehicles (UMV) [1]- [3]. Researchers are more interested in playing The associate editor coordinating the review of this manuscript and approving it for publication was Eyuphan Bulut .
with the ad hoc environment to optimize the network based on routing parameters. Mobile ad hoc networks (MANETs) has also gained great attention due to its highly dynamic nature where the network comprises of nodes that carry out the communication without any fixed infrastructure through vibrant ways and performing as source nodes, destination nodes or even as intermediate nodes depending upon the requirement of the network [4]. The connectivity is managed through hops by hops communication without any base station due to restricted transmission range of the wireless edges. Hence managing the routing mechanisms in ad hoc networks has become a critical job and plays a pivotal role in affecting the Quality of Service (QoS) [5].
We addressed the problem of congestion due to flooding in reactive routing protocols of MANETs. The reactive routing protocols maintain the routes on on-demand basis for which they exchange frequent route update messages. Since reactive routing protocols do not maintain big routing tables like proactive routing protocols, they constantly keep exchanging route updates. Hence the motivation for this work is to tackle the issue of congestion due to flooding in reactive routing with optimizing the packets handling at the network level and retransmissions at the transport level by exploiting the expanding ring search mechanism and routing parameters. This is attained by changing the way in which the packets are handled for routing. Mechanisms are designed for this purpose in which the TTL value of each route request is associated with per hop behavior before processing the packet. ML algorithms are exploited for this purpose and the QoS metrics are improved as shown in the results section.
When nodes are in the transmission range of each other, the link between then is up and they can communicate directly. Contrarily if the nodes are not in the transmission range of each other i.e. the link between them is down then the source node can take facility of intermediate nodes in order to reach its destination [4]. The number of intermediate nodes specifies the number of hops required for the message of the source node to reach the destination node that also stipulates the path distance between the source and destination in terms of hops. This way of communication where nodes are exploited in hop by hop fashion for routing is also known as mobile multi hop ad hoc networks [5]. Fig. 1 depicts a typical MANET environment where the source node A is communicating with destination node B via various intermediate nodes in their transmission range. Nodes in MANETs do not perform periodic tasks neither are they allotted fixed jobs to keep records for long period of time to cover the whole network, hence routing in MANETs becomes really complicated task due to this dynamicity. Due this hop to hop communication and non fixed responsibility of nodes numerous messages and requests are repeatedly sent from source node that may lead to congestion and ultimately affecting the network performance due to flooding. Nodes with predefined time to live (TTL) for message request are initialized and once the nodes starts searching for destination the TTL value is decremented for each hop until the TTL is exhausted [6]. There are also numerous other parameters associated with the routing mechanism of MANETs such as number of route requests, expiry time, traversal time, route retries, waiting time, hop count, network diameter, TTL thresholds and many other parameters that decides how the routing behaves and responds in different network situations [7]- [9]. Automating these parameters according to network scenario is a crucial but very affective task which can bring drastic improvement in routing and ultimately improve network performance by timely tackling issues like congestion. Performing this job manually can be really tricky and no effective work is available that deals with such scenario.

A. EXPANDING RING SEARCH TECHNIQUE
Many routing protocols in MANETs make use of methods like expanding ring search (ERS), random early detection (RED) and many other techniques and their variants to exploit these routing parameters in various formats in order to ensure QoS and deal with issues like congestion due to flooding, energy drainage, overheads and link breakage [10]- [20]. In ERS the route search is managed in the form of rings of predefined steps instead of linear search methods. ERS has a mechanism for searching the destination node in terms of hop counts that expands in predefined steps while monitoring the TTL values. ERS uses the TTL field in the IP header of the node to conduct the route search in non conventional manner. The radius of search expands stepwise until the target is reached or the whole network is traversed. Other parameters such as TTL start, TTL threshold, TTL increment, route retries, time to expire, buffer time, network traversal time, network diameter, node traversal time etc are also monitored accordingly in order attain efficient routing [11]. The details of these parameters are given in the Request for Comments (RFCs) by the internet society such as Internet Engineering Task Force (IETF) for each routing protocol that exhibit ERS [21], [22]. Fig. 2 provides a general illustration of ERS mechanism where node A (source node) is trying to reach node B (destination node). The search steps are taken in the form of rings as in ERS-Ring-1, ERS-Ring-2,. . . , ERS-Ring-5. The nodes have to be in the transmission range of each other in order to communicate and progress from source node A to destination node B through intermediate nodes. ERS mechanism resolves the issue of flooding but network may lead to counter overheads and loads that cause energy drainage in highly dense networks [23]. To tackle such issues many version of ERS enhancements [10]- [13] has been proposed that manages the route search with modified parametric setups and techniques to improve routing and QoS.

B. RANDOM EARLY DETECTION TECHNIQUE
Techniques akin to RED also manage to reduce the load on the network in case of congestion because of flooding or high route queries making use of parametric setups as discussed earlier. RED and its variants such as Weighted Random Early Detection (WRED), Adaptive Random Early Detection (ARED), Robust Random Early Detection (RRED), Smart Random Early Detection (SRED) and many others [14]- [20] execute packets dropping that are wasting more network energy and taking longer to be processed for routing in a pre-emptive manner. Instead of conventional manner where the packets are dropped irrespective or any criteria or priority when the buffer is full, RED manages the packets drop before the buffer is full on pre-emptive premises.
A characteristic illustration of RED mechanism is provided in Fig. 3 where two thresholds (i.e. THRESH_MAX and THRESH_MIN) are exploited for deciding the mechanism of drop the packets. Three packets (i.e. TTL_1, TTL_2 and TTL_3) are coming towards the buffer with size N and are queued directly unless THRESH_MIN is reached. When the number of packets in buffer exceeds THRESH_MIN, the packets are dropped randomly at defined rate. All the packets are dropped once the THRESH_MAX is reached. Packet 0 is already ready to be processed for routing. There are several parameters such as IP precedence, DSCP, buffer size etc are exploited by RED based on which the packets are dropped before congestion occurs [24], [25]. We take the concepts of these techniques and exploit the ML techniques to automate the parametric setup of routing protocols of MANETs.
In this work we have exploited the same parametric setups and the way in which the packets are presented to the routing protocols such that the ML techniques are trained on these networks data to automate the parametric setups of MANETs with respect to the network scenarios. The models are used for decision making in case of congestion and other critical network condition to make the right choice for parametric setups and select the suitable routing mechanism. The major contributions of this work are training the ML algorithms on ERS and routing parameters of MANET under random way point (RWP) mobility and using regression to automate the optimal parametric setup of these quantities in order to attain improved QoS. The parameters are not only automated but the ERS is used to design a mechanism for packets handling in such a way that the packets are handled with different priorities, drop rates, congestion flags and threshold depending upon the requirement of the network. This idea of handling packets on ERS parameters is exploited with the RED technique that is never done before and proposed by us in [34].
The rest of paper is organized into Section II that provides details of the related work, Section III provides an insight on the ML techniques exploited for experimentations, Section IV provides details on metrics used for ML algorithms implementation, Section V demonstrates the methodology and stepwise implementation of the proposed algorithm, Section VI illustrates the simulation environment, Section VII discusses the results and observations and Section VIII concludes the paper and provides an insight on future enhancements.

II. RELATED WORKS
There are several techniques that are proposed for improving QoS by congestion control for routing in MANETs based on parametric setup and ML techniques. Guo et al. [5] proposed a delay prediction mechanism integrated with a proactive ad hoc network routing protocol namely OLSR. They used queuing delay only and showed that queuing delay can be modeled as a non-stationary time series. They used multilayer perceptron (MLP) and radial basis function (RBF) to predict from the non-stationary time series model of queuing delay in MANET. Ghadimi et al. [6] proposed an analytical model to predict accurate media access delay by obtaining its distribution function in a single wireless node. They claim to derive accurate analytical models for the media access delay for IEEE 802.11 ad hoc networks in finite load conditions with and without exposed terminals. Variations of ERS techniques i.e. Blocking and Improved Blocking ERS are proposed in [10] and [11] that exploit the parameters such as different waiting time to manage early flood cancellation and attain high performance. Another technique proposed in [12] exploits the route request (RREQ) parameter and the TTL_INCREMENT values to reduce the energy consumption by the nodes and improving the QoS in terms of PDR, VOLUME 9, 2021 E2E delay and throughput. A packet priority mechanism based on network coding scheme is proposed in [13] that takes in to account a class based priority to the ERS rings and uses proportional fair scheduling to deal with congestion. The ML technique MLP was exploited for WRED method to automatically adapt the end users to the network and improving the QoS [14]. Per hop behavior (PHB) parameter was exploited to ensure the QoS for new users in the network. The technique improved the performance of the network and enabled the network to respond efficiently in manual QoS parameter pre-settings. Authors in [18] presented a probability based RED technique to tackle ill-behaved traffic flows by managing the buffer inflow and outflows. The authors claimed significant performance improvement by dropping the packets on well defined probability model. An unequal packet priority parameter is exploited for Smart RED in [20] for its application in TCP to UDP traffic with different needs of bandwidth inspired by smart access point with limited advertised window (SAP-LAW) concepts. The research in [26] proposed a hybrid technique for clustering and queuing is attained by arranging the packets in queues before been processed for routing. Differentiated services parameter was used for setting the priority of packets and selecting them for drop in case of congestion. The priority is allotted on the basis of buffer aided decodable network coding.
Guo and Malakooti [27] presented a scheme for predicting mean delays using neural network a time series using tappeddelay-line MLP network and tapped-delay-line Radial Basis Function Network (RBFN). The inputs used by them were mean delay time series itself only and the mean delay time series together with the corresponding traffic loads. They ignored the effect of any other parameter on delay as well as their scheme predict only one hop delay not the complete end-to-end packet delay. The model proposed in [28] devised a Q-learning algorithm for improving MANETs routing using reinforcement methods. The technique is exploited for automating the routing decisions based on reinforcement concepts. Continuous Hopfield neural network (CHNN) model was proposed for Dynamic Source Routing (DSR) in [29] to obtain the most stable route to improve the QoS in MANET. Authors in [30] proposed different methods for mobility prediction. However, these methods assume that nodes move according the Random Waypoint Mobility (RWM) model. As a result, nodes mobility prediction moving according to other models can lose its accuracy and efficiency. Mobility prediction allows estimating the stability of paths in a mobile wireless ad hoc network. Hongyan et al. [31] used autoregressive models and neural network to predict internet time delay whereas Tabib and Jalali [32] used feed-forward multilayer perceptron for the same purpose.. Both considered only internet time delay and have not considered any other network types and their characteristics. An adaptive QoS routing is proposed in [33] based on prediction for link performance in MANETs. The predictions are made for lower layer parameters in order to attain QoS and improve mobility.
Routing in MANETs is an interesting research domain due to extremely dynamic nature of MANETs and its applications in copious domains. The motivation for this work is to explore ML mechanism for existing routing protocols and those proposed in [33] and [34] to enhance the performance of MANETs on QoS premises. This is attained by changing the way in which various parameters are defined for routing in case of congestion and normal scenarios.

A. LINEAR REGRESSION
Linear Regression (LR) is a very simple type of supervised ML algorithm that solves only regression problems. LR estimates the coefficients for the hyper plane that separates the input data on optimal premises. LRs are estimated based on model parameters that are predictable from the input data. In simple LRs two variables are used for finding the predictive function i.e. the predictor variable which the independent variable and the criterion variable which is the dependant variable [35]. We have exploited LR algorithm for forecasting the routing parameters as the predictor variables to generate a predictive model based on observed routing dataset. The model is exploited for ensuring the QoS based on trained routing data set and additional values of the variables are used to test the trained model for predicting an unknown response. For our data we have removed the highly correlated input attributes in order to achieve higher efficiency. Similarly irrelevant attributes to the output variable are also removed with feature selection methods. In order to reduce the complexity of the model ridge regularization technique is used that averts any coefficient to reach high value by reducing the absolute sum square of the learned coefficients [35].
LRs have very simple implementation and representation as they combine the set of single inputs such as terrain size, nodes speed or network density to achieve particular outputs such as E2E delay, throughput or PDR both of which are related based on linear relationship or linear model. The LR model is also exploited for multiple inputs such as buffer size, time to expire, thresholds and upper bounds for predicting the outputs as mentioned for single inputs scenario. Once the model is developed, making predictions is as simple as solving an equation for a particular set of inputs. A simple LR model for single variable scenario can be represented as in (1).
where X(t) is the input dataset, Y(t) is the output dataset, B 0 is the bias coefficient and B 1 is the coefficient for X(t).
The aim is to find the optimal value for the coefficients that relate the input with the output. In case of multiple input variables refer to (2) where X 1 (t). . . X n (t) are the input variables datasets and B 0 . . . B n and B 1 . . . B m are the bias coefficients and coefficients for relating in inputs.

B. K-NEAREST NEIGHBORS
k-Nearest Neighbors (KNN) is also a supervised ML algorithm that can solve both regression and classification problems. KNN uses whole dataset for training in classification problems whereas in regression problems it utilizes the features similarity method for predicting the new data point [36]. KNN chooses k nearest data points where k can be any integer depending upon the data i.e. for bigger data the value of k is larger and vice versa. In case of regression problem, KNN takes the mean of k most related instances in the training data. We have used the automated value for k using cross validation. Euclidean distance is used to measure the distance between the data because the routing data is in same scale where as Linear NN search method is exploited for the way in which the data is searched and stored [37].

C. DECISION TREE
Decision tree exhibits a tree like model of decisions that comprises of branches of consequences of event outcomes and resource costs and utility associated with them. Decision tree also solves both regression and classification problems that is why they are referred to as classification and regression trees (CART) [38]. The tree is constructed on the basis of greed based selection of suitable split points for predictions that is repeated until the depth is reached. We have devised an adaptive method for specifying depth the decision tree depending upon the problem. An internal node of a branch is a test on an attribute and the branch is the outcome of given node or test. The leaf node represents the decision finalized after computation of the attributes which is also known as class label. The classification rules are defined through the paths from root to leaf of the tree [39]. We have used the model to prune to generalize to new routing data. A simple illustration of decision tree is given in Fig. 4 where the decision for PDR estimation has to be taken based on the terrain size length variable (YY) with a tree of size nineteen. The output illustration in the form of tree is given in Fig. 5. The figure shows that how the decision tree develops solution for a simple regression scenario by generating branches of possible output possibilities.

D. SUPPORT VECTOR MACHINE
Support vector machines (SVMs) also known as support vector networks (SVNs) are another type of supervised learning algorithm that support classification problems in particular and has an adaptation to support regression problems known as support vector regression (SVR). The algorithm is based on statistical framework that solves linear problems typically and non linear problems using kernel trick. The SVR solves the regression scenario by attaining a line of best fit that reduces the error of cost function via optimization process that considers only the support vectors i.e. the data instances closest to the line [40]. The best fit line or hyper plane is not always a straight line as per problem scenario; hence in some cases a line with curves and polygonal regions is achieved using different kernels. Fig. 6 depicts an SVM model for typical regression problem. The model shows input data that in our case is the routing data fed to the model that applies the support vectors (SVs) for extracting the required output values in given scenario. The alphas (α n ) represents the weights given to the output of each SV and a bias is applied on the final output that is used for toning the output accordingly. We evaluated standard reactive routing protocol AODV (V-ERS) with the varying parametric setups used versions of AODV i.e. AODV1g, AODV2g AODV3g, AODV4g and AODV5g, AODV1s, AODV2s AODV3s, AODV4s and AODV5s in which 1,2,3,4,5 are configuration versions and g and s are the gigantic and small modes based on the topological format. The gigantic terminology is also associated with huge networks with higher requirement for QoS that implies the extension of these algorithms to 5g, 4g and 3g networks. Among these formats the best ones V3g, V4g and V5g are selected as they are attain tangible enhancement. ML algorithms are exploited for all routing protocols and  their variants based on Table 1 configurations. It is important to mention that the selections are made through ML and the fixed optimal configurations and application of random early detection on expanding ring search parameters are also proposed by our research and is never done before. Also as shown in Table 6, 7, 8 and 9 the ML algorithms are exploited for the standard routing protocols as well as the modified versions of reactive routing protocol with the configurations of Table 1. V-ML is the reactive routing version selected by the ML algorithms with best performance. Fig. 14 shows the performance of these algorithms and improvement with the proposed algorithms. It is visible that the ML is selecting the optimal routing paradigm (V-ML) on the basis of QoS metrics.
We have tested the performance for linear, polynomial and RBF kernel to fit the observed data and selected the best suited kernel for given test data. The learning rate is set to be 0.001 and the regularization parameter is set to be 1/epochs hence the higher the number of epochs (i.e. 10 12 ) the lower is the regularization parameter. We have exploited polynomial kernel method instead of linear kernel method in order to attain better performance. Polynomial kernel prediction method is expressed mathematically in (3) where input (x) and support vectors (y) are used to calculate the estimations setting the value of p that is the degree of polynomial and c that is a free parameter to trade off the influence of polynomial between higher and lower order.

E. ARTIFICIAL NEURAL NETWORK
Artificial neural networks (ANNs) are complex models with huge configuration parameters tuned for given classification or regression problem. The parameters are configured through exhaustive process of learning and numerous trial and error checks. The ANN model we have exploited for our data is a feed forward ANN (FFANN) that is also known as a multi layer perceptron (MLP) that is majorly composed of three layers i.e. input layer, hidden layer and output layer. This model takes an analogy from the natural neural networks where the building blocks i.e. neurons are arranged in layers to perform approximations [41], [42]. Then number of hidden layers, number of neurons and activation functions are selected for given problem based in performance analysis and trial and error method. This process can also be managed through evolution as done in evolutional ANN but we have explored simple feed forward neural network. The model utilizes back propagation supervised method for training that learns through updating the connection weights after each run. The learning rate is set to be 0.3, momentum 0.2, validation threshold 20 and the number of epochs is set to be 500 initially. Fig. 7 depicts a typical FFANN model representation. The model consists of input layer that is the instances of the input features (Att 1 , Att 2 ,. . . , Att n ), the hidden layer transform the values from the input layer with weighted linear summation (α 1 , α 2 ,. . . , α n ) followed by an activation function that takes the output from the last hidden layer and transform into output values and toning with a bias. During the back propagation process for training the model the parameters are updated using the gradient of the loss function with respect to parameters that needs adaptation. MLP uses the square error loss function expressed mathematically in (4).
where a ||W|| 2 2 is the L2 regularization term that is used as penalty to control the error and α is the non-negative hyper parameter [43]. MLP is very sensitive to feature scaling and it performs well when the data is scaled well.

IV. METRICS FOR MACHINE LEARNING TECHNIQUES A. INPUT FEATURES FOR MACHINE LEARNING TECHNIQUES 1) NETWORK DENSITY
The network size in terms of total number of nodes in network defines the network density. The network density has effect on the E2E, throughput and PDR as denser networks may cause congestion, signal interferences and retransmission and vice versa. We have used network density as input to our ML models and observe the response of the network in terms of E2E delay, throughput and PDR. The network density is varied from 5 to 100 with varying node to sink ratio.

2) MAX NODE SPEED
Max nodes speed specifies the upper velocity limit up to which the nodes can move. The nodes velocity also has significant effect on the QoS. Very high or very low speeds tend to negatively effect on E2E delay, throughput and PDR and vice versa. We have taken the speeds ranging from 1 to 100 mps for our experiments taking into account the normal speed human, ground and air vehicles.

3) TERRAIN SIZE
The physical area of the network, taken in x and y coordinates and denoted in meter squares is the terrain size in which the nodes are scattered. The terrain size also has impact on the QoS as the larger terrains tend to scatter the nodes far away from each other whereas too small terrains tend to congest the nodes and impact E2E delay, throughput and PDR likewise. We have observed the terrains ranging from 300 × 500m 2 to 3000 × 3200m 2 .

B. OUTPUTS FOR REGRESSION SCENARIO OF MACHINE LEARNING TECHNIQUES 1) END TO END DELAY
The time taken by a packet to reach from source node to destination node over the network intermediate nodes is the E2E delay. The E2E delay is the sum of all delays such as request processing delays, buffering delays of route discovery, retransmission delays, queuing delays, propagation and transfer delays etc. Average E2E delay for n packets can be measured using (5) that is the average sum of difference between the time a packet is received and the time when the packet is sent. E2E delay measures the capability of a specific protocol and architecture used to communicate between nodes besides noise profile and media access mode. In simulations the delay is measured in nano seconds that is then converted into seconds. The average data rate or rate of packets being received at destination nodes from source nodes is the throughput of the network. Throughput is also sometimes measured as the bandwidth of the channel. We have measured throughput in kbps using (6). The bytes are converted to bit to be tallied in kbps and if packets are used then measured in number of packets received.

3) PACKETS DELIVERY RATIO
The measure of successful delivery of packets from source to destination is the Packets Delivery Ratio (PDR). This metric is also used to measure the efficiency of the network protocol or architecture exploited. If the PDR is low that may create the scenario of congestion due to retransmissions or incomplete data transmissions. The PDR is measured using (7) that illustrates PDR mathematically. PDR = Packets_Received/Packets_Transferred * 100 (7)

C. PARAMETERS AUTOMATION FOR MANET ROUTING
The ultimate goal of this work is to automate the mentioned parameters and more parameters for improved performance of MANETs routing. We have done both the automation of parameters and improvement of QoS metrics with ML algorithms. Among these parameters, those for ERS such as thresholds, drop rates and upper bounds are already automated and selected. But since the ranges for these parameters were manually selected from the predictions of simulations results in NS-3 and then training the ML algorithms on same values imply partial automation, hence we have claimed and used automated values. The regression of these parameters is done with ML algorithms for improving QoS parameters, which is the aim of this paper and that is why we have focused on these results. The automation of all mentioned and more parameters is an exhaustive task in process and requires respective details, methods and explanations that we aim to contribute as future enhancement of this research as mentioned in the conclusion section. The details of parameters that we are automating are as follows:

1) SIZE OF BUFFER
Every node in the network has a queuing buffer that stores the packets before being processed for routing. When nodes in the network send packets at higher rate and the processing VOLUME 9, 2021 of packets takes longer time then the buffer overflow occurs due to congestion that may cause loss of packets before being queued. The maximum limit for a node buffer is already specified by the routing protocol i.e. max buffer limit for AODV is 64 and that for DSR is 50 packets. The size of buffer can be calculated using mathematical expression given in (8).
where Size_Buff remaining is the buffer capacity, Length_Buff Max is the max buffer limit as defined by routing protocol and Packets_Buff occupied is the number of packets already occupying the buffer.
2) BUFFER OUTSTANDING TIME When a packet reaches an intermediate or target node it is first queued in to the node buffer. The packet has to wait for buffer outstanding time (T Outstanding ) before being handled by routing protocol for routing. The packet is allowed to wait till max buffer time after which the packet is dropped for retransmission. The buffer outstanding time also known as waiting time or queue buffering time is responsible for causing queuing delays. When packets have longer buffer outstanding time due to congestion or any other reason, it may lead to further worsening of the situation in the form of increased congestion and packets drop or denial of services (DoS). Buffer outstanding time is mathematical illustrated in (9).
where T Outstanding is the residual time of a packet, T Max is the total allowed waiting time (i.e. 30s for DSR, DSDV and AODV), T Arrival is the arrival time of the packet and T Current is the time at which the residual time is being calculated.

3) NODE ROUTE ATTEMPTS
The sender node attempts to send a packet on route towards target once or a number of times depending upon the fact that the target is reached in single or multiple attempts. When a node does not receive a route reply within specified time it re transmits the packet unless it receives a reply and the packet reaches its target. Allowing infinite re transmissions may lead to congestion in case the target is unreachable and the network density is high. Hence the routing protocols in MANET specify fixed number of re transmissions that are allowed for a packet in order to reach its destination. Each time the node attempts to re transmit the copy of original packet the route attempt is incremented as illustrated in (10).

4) NUMBER OF HOPS
The number of hops is the node by node steps taken by a packet when it is dispatched by sender towards the destination. A count is kept by the packet header that is incremented each time a packet travels from one node to another. Higher hops count illustrates that the packet as spent more time in the network and more packets with higher hops count results in high bandwidth consumption that ultimately results in congestion at transport layer. The number of hops is evaluated as illustrated mathematically in (11) of how when a packet moves from one node (N i ) to another (N j ).

5) PACKET EXPIRY TIME
Each packet when starts traversing the network for its destination, remains in the network for specified time known as packet expiry time after which the packets leaves the network. The packets expiry time is specified by the routing protocol at the time when the packet leaves the source node and keeps on decrementing until the packets either reaches the target node or leaves the network. The packet expiry time can be expressed mathematically as shown in (12).
where T Expiry is the packet expiry time, T Start is the time when packets leaves the source node, T TTL is the total time of the packet in the network and T Current is the current time of the packet in the network.

V. PROPOSED METHODOLOGY
In order to implement our algorithms for attaining better performance for routing in MANETs, we exploit open source simulator Network Simulator-3 (NS-3) [44]- [46] to implement the topologies that generates the routing data as inputs for ML algorithms training and testing phases. Initially MANET's scenarios are implemented to monitor the performance of existing routing protocols such as AODV, DSR, DSDV and OLSR in terms of E2E delay, throughput and PDR. Once enough data is generated after exhaustive simulations, modified ERS and RED mechanisms are implemented with the parametric setup as shown in Table 1 for various topologies and monitored again for E2E delay, PDR and throughput. Routing data of three of the best scenarios i.e. V5g, V4g and V3g are used for our ML techniques training and testing. These variants are the reactive routing protocols versions with the Table 1 configurations. Two modes of each version are produced based on the fact that the networks behave differently in different topological arrangements such as network densities, max node speeds and terrain sizes. Different parametric setups are prepared for adverse topologies (MOD1) where the routing performance of the network decline whereas in normal topologies (MOD2) the network already performs well [47]- [49]. Other parameters include time to expire (Time_Exp), route retries (Route_Ret), buffer waiting time (Wait_Time), size of buffer (Size_Buff), thresholds for congestion detection (Thrsh_1, Thrsh_2 and Thrsh_3), upper bounds for packets dropping (UP_B1, UP_B2 and UP_B3) and packets drop rates (Drop_Rate) [34]. The details and mathematical explanations for all these parameters are given in Section VI (A). Fig. 8 provides an illustration of thresholds, drop rates and upper bounds employed for RED implementation. The values are taken from ERS implementation of AODV. Fig. 9 provides an illustration of the proposed methodology in the form of block diagram. Initially MANET topology is implanted in NS-3 and routing data is gathered from existing standard routing protocols and their modified versions based on ERS and RED parameters. Important features are extracted from the data for training the ML algorithms and the ML techniques are tested for their performance.
We have done both the automation of parameters and improvement of QoS metrics with ML algorithms. Initially we have selected the best suited fixed parameters manually under given network setups or modes. Among these twelve parameters, those for random early detection such as thresholds, drop rates and upper bounds are already automated and selected through heuristic algorithm. Since the ranges for these parameters were manually selected from the predictions of simulations results in NS-3 and then training the ML algorithms on same values imply partial automation. It is important to mention that the fixed configurations in Table 1 for routing are attained manually through simulations and are proposed by our research [34]. These fixed configurations perform well with respect to the network modes or setups in which they are employed and vice versa. These configurations proposed by us perform well in their respective network modes but to automate their selection we have employed ML algorithms. This is how we first automate these fixed configurations to improve the QoS metrics and later we are working to automate all parameters irrespective of any fixed configurations as an extension of this work. Hence we have claimed and used automated values and in order to relate with title and major contribution of the manuscript i.e. prediction of scenarios, the ML algorithms are trained on network parameters and the fixed configurations. The regression of these parameters is done with ML algorithms for improving QoS parameter that is the aim of this paper and that is why we have focused on these results.
The performance of each ML algorithm is evaluated for each protocol separately for varying network topologies and the best performing techniques are then exploited for future estimations. These estimations are used for regression scenarios to predict the outputs for each protocol in terms of E2E delay, throughput and PDR in order to select the best suited routing protocol among monitored developments. The estimations are then used for testing to predict the performance of the protocols in situations under consideration and protocol with optimal performance is selected. The ML results are monitored with actual simulation results for evaluations.

VI. EXPERIMENTAL SETUP AND SIMULATION ENVIRONMENT A. SETUP FOR MACHINE LEARNING ALGORITHMS
For all ML methods 10 folds cross validation is used for input data to create the model. Almost 675 instances are used for throughput and 294 instances are used for E2E delay and PDR. For LR model M5 method is used for most attribute selection while in some cases greedy model is used depending up the performance and a batch size 100 is used. For our data we have removed the highly correlated input attributes in order to achieve higher efficiency. Similarly irrelevant attributes to the output variable are also removed with feature VOLUME 9, 2021 selection methods. We have used the automated value for k using cross validation. Euclidean distance is used to measure the distance between the data because the routing data is in same scale.
Linear NN search method is exploited for the way in which the data is searched and stored. Linear search mechanism is utilized for searching the nearest neighbor and no windowing is exploited for any problem. We have tested the performance of SVM for linear, polynomial and RBF kernel to fit the observed data and selected the best suited kernel for given test data. The learning rate is set to be 0.001 and the regularization parameter is set to be 1/epochs hence the higher the number of epochs (i.e. 10 12 ) the lower is the regularization parameter. We have exploited polynomial kernel method instead of linear kernel method in order to attain better performance. The learning rate of ANN is set to be 0.3, momentum 0.2, validation threshold 20 and the number of epochs is set to be 500 initially. Table 2, 3 and 4 tabulate the specifications of E2E delay, throughput and PDR data respectively used for running the ML experiments. The data specifications are given in terms of minimum values (Dmin, Thmin, Pmin), maximum values (Dmax, Thmax, Pmax), average (Davg, Thavg, Pavg) and standard deviation (Dsdv, Thsdv, Psdv). The data has been normalized for experiments using (13), where X = {x 1 , x 2 , x 3 ,. . . x n } is the set of input data and Norm (x i ) is the i th normalized data item.

B. SETUP FOR NETWORK SIMULATIONS
We have implemented our MANET topologies using an open source discrete event network simulation tool NS-3 that is extensively exploited wireless Ad-hoc networking research. The details of parametric setup and considerations for MANET topologies are provided in Table 5. We have used varying topologies to generate routing data for training our ML techniques; hence some of the parameters are taken as input features and are varied for different scenarios. Parameters that are kept constant are tabulated in Table 5.

C. METRICS FOR PERFORMANCE EVALUATION 1) MEAN ABSOLUTE ERROR
The performance of the ML techniques is evaluated on the basis of Mean Absolute Error (MAE) that is the measure of error in paired observations for specified problem scenario [51]. In a given input set of k instances, X i ' are the estimated values for given observations X i , the MAE can be mathematically expressed as shown in (14).
MAE is the arithmetic sum of absolute errors (e i = |x i − x i |) that measures the average magnitude of errors where x i ' is the prediction and x i is the true value. MAE is a linear scores i.e. all the differences are measured equally while taking the average.

2) ROOT MEAN SQUARED ERROR
Root Mean Squared Error (RMSE) is also used for measuring the performance of ML techniques in terms of standard deviation of the estimated errors [51]. RMSE is the measure of the distance of the data from the line of best fit. RMSE can 47042 VOLUME 9, 2021  be expressed mathematically as shown in (15).
Hence RMSE is the quadratic scoring rule that measures the square root of squared differences between the estimated data x i ' versus observed data x i . RMSE is a good estimator of the standard deviation of the distribution of the errors.

VII. SIMULATION RESULTS AND DISCUSSIONS
This section summarizes the results of the proposed algorithms and compares them with standard protocols for performance evaluation. Table 6 illustrates the best RMSE and MAE attained by the five regression ML algorithms for E2E VOLUME 9, 2021 delay, PDR and throughput in varying network topology in terms of single predictor variable i.e. network density, terrain size and max nodes speed. It has been evaluated from the results that the FF-ANN/MLP method attained best results for estimations in varying network topologies as compared to other ML algorithms exploited in this work. For throughput    MLP attained RMSE and MAE as low as 0.0289 and 0.0227 respectively in varying network terrain. Our overall results propose the use of MLP for estimations in most network topologies in order to attain best results. LR algorithm also performs well for estimations in varying network densities. The performance of other techniques, particularly DT, is not as good in most scenarios. Fig. 10 shows the estimations by ML techniques exploited versus actual values used for testing for (a) E2E attained in varying network densities, (b) PDR in varying terrain and (c) throughput in varying speed. It can be observed that the ML techniques have attained much closer estimations to actual routing data. Table 7 provides minimum RMSE and MAE for standard routing protocols by varying multiple network parameters. It can be illustrated from Table 7 that least RMSE and MAE for throughput, PDR and E2E delay prediction is attained by KNN in most scenarios. These results proposed the use of KNN in multiple input features cases. The least RMSE and MAE attained for E2E delay prediction are 0.1718 and 0.116 respectively, these values for PDR prediction are 0.1878 and 0.124 and for throughput prediction are 0.0407 and 0.071. Overall performance of the ML algorithms is good for throughput prediction and worst for E2E delay prediction. This implies further refinement and preprocessing of the input data to improve the performance of ML techniques. Fig. 11 provides graphical illustration of precise estimations of exploited ML techniques for E2E delay (a), PDR (b) and throughput in multiple features extracted for training the ML algorithms. The predictions for E2E delay are not as good as that of PDR and throughput. It can be observed that the predictions in case of multiple input features are not as close to actual values as in single input features scenarios. The E2E delay predictions are shown to have not attained closed values due to comparatively high RMSE. This implies improvement in input data for training and testing. Fig 12 and Fig. 13 provide graphical illustration of performance of all ML techniques exploited in this work in terms of least RMSE attained for predicting E2E delay, PDR and throughput in single and multiple input features respectively. It can be observed in Fig. 12 that KNN attains least error for all estimations whereas DT has comparatively low performance in multi feature scenario. The performance FFANN is also comparatively better while SVM and LR have similar performance. Fig. 13 illustrates that LR has significantly better estimation for E2E delay in single input feature scenario while FFANN outperforms all in estimating the PDR. SVM VOLUME 9, 2021 and LR have better performance for throughput prediction in single input feature scenario.

VIII. CONCLUSION AND FUTURE ENHANCEMENTS
Routing protocols in mobile ad hoc networks (MANETs) employ various parametric arrangements in order to attain optimal performance and improve QoS. Researchers are keen to improve and tune these parameters for performance enhancement of the routing protocols in critical scenarios such as link level and node level congestion due to flooding and packets loss. We propose the utilization of machine learning (ML) techniques to enhance the network response in various topological arrangements by exploiting the parametric setup of expanding ring search (ERS) mechanism with random early detection (RED) technique. The comportment of ERS mechanism for handling packets is monitored in terms of hop count, buffer utilization and management at transport level and retransmission of packets at network level. More congestion management techniques are explored from RED technique based on the thresholds and packets dropping criteria that involves levels and rate of packets drop. These mechanisms are monitored and various versions are created by making changes with respect to network topologies for training the ML techniques. The ML techniques are then employed to select the suitable parametric setup among the versions of routing protocols in critical topological arrangement in order to avoid packets loss and congestion. It is shown that the ML techniques particularly KNN and MLP attained high accuracy and low RMSE in predicting the E2E delay, throughput and PDR both in single and multiple in features scenarios. We have evaluated the six variants of standard routing protocols with defined configurations and tested their performance in order to attain improved QoS. The results verify to improve the performance by selection of optimal variants and likewise optimal parametric arrangements. There is supplementary research pull in this work as we aim to apply classification and heuristic methods on further automating the parametric arrangements of various routing mechanisms such as weighted RED, adaptive RED and other variants of RED and ERS based on the significant simulations carried out in this work. These algorithms can be applied on more routing protocols, situations that are congestion prone and more application groups such as IoT, WSN and IETF can be indulged to attain context aware routing.