An Intelligent Load Balancing Technique for Software Defined Networking Based 5G Using Machine Learning Models

The emergence of two new technologies, namely software defined networking (SDN) and 5G networks, has greatly changed the development of network functions and network topologies. These two technologies provide cost benefits for mobile operators, a more flexible and scalable network, and a shorter time to market for new services and applications. Scalability and effectiveness are increased when 5G and SDN are used together. SDN increases the reliability of the 5G network by separating the control plane from the data plane. Incorrect load balancing, a lack of knowledge of network traffic, and other issues make it difficult to provide Quality of Service (QoS) with SDN. This research proposes a unique load-balancing method to resolve these concerns using Hierarchical Agglomerative Clustering (HAC) and Back Propagation Neural Network (BPNN) algorithms. The proposed method segments the network services into several groups after normalizing data requirements. It consists of two phases: in first phase, there is clustering of bandwidth for different services (e.g., social media, automated homes, and automated cars) inside the network for different data requirements. The agglomerative hierarchical clustering (single link technique) is implemented to make the clusters of bandwidth inside the SDN work based on minimum distance. After clustering, we allotted bandwidths to the respective clusters. In the second phase, the BPNN technique trains the network to choose the optimal path and check the error faults. The proposed approach evaluates the network delay, packet loss, throughput, latency rate, and bandwidth usage to evaluate the performance with Multiple Regression-based Searching (MRBS) and Software-defined Sensor Network Load Balancing (SDSNLB) algorithms. The experimental results of the proposed model are promising as the performance increased by 15%, 23%, 27%, 21%, and 30%, respectively, compared with existing approaches. In addition we compared the computational time complexity with increased the no of nodes and services, when the rate of nodes and services is varying, the proposed solution’s efficiency remains constant.


I. INTRODUCTION
In traditional networks [1], [2] both management and control planes are applied along with data planes at each network device like switches/routers, which creates many drawbacks like network oscillation problems, manual configurations The associate editor coordinating the review of this manuscript and approving it for publication was Maurizio Casoni .
of each network device, etc. Software-defined networking (SDN) [3], [4] is an emerging technology that physically splits the control and management planes from the data plane by applying both the control and management planes at a logically centralized entity called the SDN controller.SDN [5] intends to make the network increasingly versatile and scalable.The controller handles the traffic flow and allows smart networking between network devices and apps.In this framework, the operation of the network comes to be programmable.Figure 1 shows the basic structure of SDN, which contains three layers; northbound and southbound APIs are used to connect these layers.The infrastructure layer, called the data plane, comprises physical network switches.It connects to the control layer via southbound interfaces.The control layer resides between the infrastructure and application layers, which serve as the SDN brain.This controller is located on a server and handles network regulations as well as traffic flow.The application layer transmits the network needs to the control layer via northbound interfaces.The fifth-generation (5G) networking [6] is becoming one of the most active research fields.An advanced, virtualized, configurable 5G [7] network would allow operators to evolve their activities and new services.They can offer new facilities to customers and hence increase their whole performance.SDN [8] has presented a promising method for these networks, which will play an important role in the architecture of 5G wireless networks.5G [9] builds on a user-centric idea rather than an operator-centric concept.5G and SDN [10], [11] allow innovative business models and improve sales growth.The SDN [12], [13] architecture with 5G would be flexible, strongly customizable, and cost-effective.It is suitable for high-bandwidth 5G applications.The 5G network [14] consists of a huge number of devices, applications, and technologies.5G with SDN [15], [16] provides an intelligent architecture for network programmability and can create multiple network hierarchies.Figure 2 shows the integration of SDN with 5G.
The broad use of 5G networks [17], [18], the massive growth of wireless networks, and the convergence of new network technologies demand load balancing (e.g., link load, rerouting, and updating).Different services (e.g., social media, automated homes, and automated cars) are running within the network, as shown in figure 2. Therefore, the bandwidth requirement for each service is also different.Chen-Xiao et al. [19] introduced a load balance resolution system using Artificial Neural Network (ANN) with global view benefit for SDN.It selects the real-time packed route; the load balancer automatically calculates all system stability of various paths received from the SDN controller.After getting the route for transmission, the load balancer assigns the routing table for OpenFlow control to achieve the dataflow transmission.Yu et al. [20] present a new load-balancing architecture using bp, Q-learning, and Dijkstra algorithms based upon SDN for flow forecasting and efficient path selection.The SDN controllers oversee monitoring and collecting real-time traffic information and uploading it to the intelligent centre database.The intelligent centre uses intelligent algorithms to make routing decisions based on information gathered by the SDN controllers and returns decisions to the SDN controllers.The SDN controllers rewrite the flow table of the SDN switch.Aly et al. [21] presented a Controller Adaptive Load Balancing (CALB) algorithm for fault tolerance in SDN.The proposed algorithm works under two circumstances: either primary failure or imbalanced network load.It uses a switch migration technique and a primary controller election algorithm.The primary control divides the switches according to the workload list stored at the primary controller.Despite the numerous current research methods for load balancing, most solutions are not flexible enough to adapt to the characteristics of 5G networks.There is no method introduced that balances the load between different services for different data requirements that exist within the network.Our primary goal of traffic classification in SDN is to provide load balancing and QoS for incoming traffic.We propose a load balancing mechanism by using machine learning techniques to manage SDN's 5G network load.This paper has threefold; firstly, there is clustering of bandwidth for different services (e.g., social media, automated homes, and automated cars) inside the network for different data requirements.The agglomerative clustering algorithm is applied to make the clusters inside the SDN according to the different service requirements.Secondly, for optimal path selection and fault localization Back Propagation Neural Network (BPNN) technique is used.Thirdly, we compared the simulation results of different factors (network delay, packet loss, throughput, latency rate, computational time complexity with increase no of nodes and services and bandwidth) with existing techniques.
The remainder of this paper is organized as follows.Section II presents an extensive analysis of the related work.In section III, the problem statement is discussed.In section IV, the detail of the proposed model is described.In section V, Simulation and Performance analysis are explained with the comparison of traditional methods, and in section VI paper is summed up with the conclusion.

II. RELATED WORK
Begam et al. [22] presented multiple regression-based searching (MRBS) algorithms in Dynamic Circuit Network (DCN) for best server allocation and routing path to increase performance even under severe load situations such as message spiking, various message frequencies, and unexpected traffic flows.MRBS chooses a server based on a regression analysis that estimates traffic patterns and response times depending on server data factors such as load, processing speed, bandwidth, and server usage.MRBS combines a heuristic method with a regression model for appropriate server and path selection.According to stochastic gradient descent weights estimation, the suggested technique decreases delay and time by more than 45% and exhibits a severe superior utilization of 83% compared to existing algorithms.The proposed model of Begam et al. [22] is shown in figure 3. Khairi et al. [23] offers multiple machine learning techniques for identifying and categorizing conflicting flows in SDNs, including Decision Tree (DT), Support Vector Machine (SVM), Extremely Fast Decision Tree (EFDT), and Hybrid (DT-SVM).For better performance, the EFDT and hybrid DT-SVM algorithms were created and applied based on DT and SVM algorithms.The experimental findings of conflict flow identification reveal that the DT and SVM algorithms reach 99.27% and 98.53% accuracy, respectively, whereas the EFDT and hybrid DT-SVM algorithms obtain 99.49% and 99.27% accuracy.Furthermore, the suggested EFDT approach obtained 95.73% accuracy in classifying dispute flow categories.The suggested EFDT and hybrid DT-SVM algorithms demonstrate SDN operation's excellent ability to detect and categorize conflict rapidly flows.Figure 4 shows the proposed model of Khairi et al. [23].Lin et al. [24] estimate random access; the author constructs a new analytical model 5G NB-IoT system performance from a human point of view.The authors suggest decoupling random intrusion detection in 5G NB-IoT systems for congestion control coordination and coverage class adaptation.It is based upon the human-proposed method.It tackled the serious problems of congestion and system throughput.Precisely, the random access control is gradually controlled by a central control plane.Under the guidance of the SDN controller, the adaptation of the exposure classification can be carried out progressively through IoT units.Results show that busty traffic is spread equally between base stations to balance the loads, and it is possible to reduce the need for network control over complexity greatly.It demonstrates consistency and scalability.
Chahlaoui et al. [25] examine the effect on the efficiency of the load-balancing methods of software-defined networks and the configuration of the SDN controller.The author presents a thorough study of several load-balancing techniques and methodologies in centralized and distributed SDN architectures.Authors are also implementing load-balancing techniques for 5G networks.Hongvanthong et al. [26] proposed a Smart Traffic Analyzer on the application layer that classifies the data from the user plane.The Enriched Neuro-Fuzzy (ENF) clustering algorithm is also presented for traffic analysis.The main and secondary load balancers are deployed in the load balancing plane.Maintaining the load between switches is the responsibility of this plane.Switch migration is outlined for controller load balancing.The Entropy function predicts an overloaded controller.Then the Fitness-based Reinforcement Learning (F-RL) algorithm makes the migration decision.Finally, NS-3.26 is modelled on the four-layer SDN-5G networks.The findings show the ratio of error, packet transmission ratio, latency, and RR time.The presented work would improve the SD-5G networks.
Cui et al. [27] proposed different load-balancing techniques which are based on SDN.Flow configuration and neural network conditional motions are recommended for 5G data centre networks for resource utilization and task scheduling.Data is divided into two ways: Elephant flow and Mice flow in this proposed solution.At the same point, they are subsequently redirected by using different approaches.If the reroute mechanism needs to be triggered, the meaning of various motions of connection deployment is determined.Sun et al. [28] introduced MARVEL's scheme using artificial intelligence for con-troller load balancing in SDN.They used multi-agent reinforcement learning (MARL) ML approaches for dynamic controller load balancing, which balances the workload through switching migration acts.It operates in two phases (i.e., offline preparation and online decision-making).For each agent in the MARL technique, a deep reinforcement learning (DRL) model is designed, which accepts the workflow sequence of the application layer as inputs.And it generates output as a switch migration decision.In the training phase, the DRL agent learns, and in the online phase, it decides to switch migration.Performance evaluation shows that their scheme outperforms the previous schemes by reducing processing time and request processing of the control plane in SDN. Figure 5 shows the proposed model of Sun et al. [28].
Shi et al. [29] proposed a load-balancing strategy based on OpenFlow protocol that split elephant and mouse zones.Their proposed scheme works for data centre networks with different flow types while enforcing distinct routing, detecting elephant flows, and obtaining optimal network utilization capabilities.The feasibility of this load-balancing strategy was verified by implementing it with the POX controller in the Mininet testbed.The verification of the proposed technique was performed while dealing with network degradation and flow conflict.They show that their load-balancing strategy can increase bandwidth usage performance and generate flow rules compared to previous loadbalancing solutions.
Qu et al. [30] For efficient resource management between slices, the TE framework is proposed to prevent congestion and the resulting QoS output.The TE problem is formulated for multiple objectives with mixed-integer optimal linear solutions, and a time-efficient heuristic algorithm has been introduced.The proposed TE system is assisted through the NFV built-in framework of two-level SDN controllers situated in user and network dimensions.This review analysis determines the efficiency of the presented TE system with respect to Quality-of-Service output assurance,   Cui et al. [32] presented a load-balancing framework based upon a software-defined wireless sensor network (SDWSN) to satisfy availability and high throughput specifications.This mechanism utilizes the benefit of unified SDN management and versatile congestion control.In smart cities and wireless communication infrastructures, whether data transmission is regulated can directly impact the efficiency of the network quality.The conventional WSN load balancing technique does not fulfil the resilience and high throughput criteria.The authors discussed the load balancing problems and examined the SDWSN for multipath optimization.It is a route optimization infrastructure based on the Elman neural network to boost the solution.It maximizes link utilization, and the author authors a mathematical model based upon multipath route technologies with the satisfaction of QoS.The Elman neural network is used for optimization calculation, and an enhanced load-balancing forwarding direction is achieved.It can also include software-defined sensor networks with programmable data flow control methods.The SDSNLB routing algorithm has better performance than the LEACH algorithm.The researcher's work has practical guidance and implementation relevance for optimizing the smart city wireless network node's connection load balancing.

III. PROBLEM FORMULATION
A load-balancing approach is used in networks to accomplish the goal of optimum utilization across multiple services.The load-balancing mechanism distributes the network load across different network components according to the bandwidth demands (e.g., upgradation and rerouting) for various services.Network upgrades are essential in the sense of more dynamic and fine-grained networks (e.g., minimizing the network loads) and the capability to reroute flows for consistency, accessibility, and overall performance.Network upgrading issues occur due to improvements in the network's policies (security), connection faults, or routine maintenance.In our scenario, we are balancing the load inside Links and routers.Figure 6 shows the congestion and updation problem in the 5G networks with different services.Figure 6 contains 21 nodes (active and passive) with different links.Different services (automated cars, automated homes, and social media) exist, including some sources and destinations.Each link has its capacity, and data packets also move over there.The link's capacity starts from 6mbps to 20mbps.The problem is that some data packets' size exceeds the link capacity.Due to a large number of data packets with a maximum size than the link size, congestion will result in packet drop.Some routers also needed to be updated.Thus, the traffic will also need to reroute to other routers without loss.The red colour nodes represent active nodes, and the green colour nodes show that they need to be updated.The red circle shows the congestion (e.g., 19-20 nodes consist of congestion, and the link capacity is less than the packet size).Some problems cannot be solved easily but require a hard time determining which could and which could not.The problems which cannot be solved in polynomial time are called NP-hard problems.Many researchers have worked on np hard problems, e.g., disjoint paths for bounded length, forwarding indices of routings, edge-addition or edge-deletion problems, and wide diameter.We are also working on load balancing between links and routers, a critical challenge in networks that cannot be solved easily and leads to the NP-hard problem.We formulate the NP-hard problem dynamically regarding congestion-free updating, switching, and rerouting in 5G networks.In addition to this, we also analyze the bandwidth, link utilization, and efficiency of the framework.Due to the minimum bandwidth availability in networks, it becomes difficult to design effective load-balancing algorithms for wireless networks.When your network consumption crosses the available capacity, all things on the network slow down.It can also be the reason to halt the network completely.The load balancing method that was suggested by Babayigit et al. [33] distributes the network's workload among several sources and prevents resource overloading.After describing the load balancing issue in SDN, two distinct load balancing algorithms (Round Robin and Dijkstra) are tested to see which performs better.Chen et al. [34] suggest a software-defined networking (SDN)-based traffic-aware load balancing method for machine-to-machine (M2M) networks.The suggested load balancing strategy may meet various quality of service needs through traffic identification and rerouting by using SDN's ability to monitor and regulate the network.To enable network flows that need low latency communications, Filali et al. [35] suggest beforehand balancing the load in the SDN control plane.First, they prepare ahead for data plane migrations by forecasting the demand on SDN controllers to avoid load imbalances.In order to improve load balancing while adhering to delay limitations, they secondly optimize the migration activities.A Blockchain-based Anonymous Anti-Counterfeit (BA2C) supply chain system that takes advantage of RFID and blockchain technology was presented by Anita et al. [36].The suggested methodology streamlines all transactions using Ethereum, a platform built on a blockchain and supporting Proof of Authority (PoA) consensus.If we compare our proposed solution with Babayigit et al. [33], Chen et al. [34], Filali et al. [35] and Anita et al. [36], then all these have used traditional methods for load balancing.There is no one that explained on what basis they are doing load balancing between different services.In our scenario, we allocated the bandwidth with respect to different services.So that we use the bandwidth on the right service on the right time for better utilization of bandwidth and provide good service to the services (e.g., mobile devices, automated cars, automated homes).

IV. PROPOSED MODEL
This research addresses the basic issues of load balancing and the changes of various network flows in a coordinated SDN congestion-free manner.The frequency band of computer network connections has increased significantly due to the latest technical developments in computer networks.It is a very difficult task, leading us toward the NP-hard problem.With a unit delay, we demonstrate the NP-hard problem for unit-size flows and network connections.To handle the problems discussed in figure 6, we proposed an efficient and congestion-free model by implementing clustering algorithms in SDN.There is clustering of bandwidth for different services (e.g., social media, automated homes, and automated cars) inside the network for different data requirements.The agglomerative clustering algorithm is used to make the clusters inside the SDN.Secondly, the Back Propagation Neural Network (BPNN) trains the network and finds the optimal path.Figure 7 shows the scenario of the proposed model that explains how it works.There are different nodes in the network.The hierarchical agglomerative technique is used to make the clusters of these nodes.A centralized controller or coordinator connects to every cluster, which takes the record of every cluster and assigns them tasks according to the situation.BPNN is used to train the network and find the optimal path, which assists in the creation of statistical models from massive datasets.In addition to this, our model also provides security features.Attacks that originate from both inside and outside of the  SDN environment are recognized and stopped in order to improve the security of the SDN ecosystem.Availability services implemented to improve the application performance against system and network failures, including through the integration of legacy security applications, and compromised network devices are automatically detected and quarantined before they can negatively affect the network.We propose to model also use an Intrusion Detection System (IDS) in our layered architecture, which works in concert against threats coming from both within and outside of SDN.Numerous algorithms that automatically identify and respond to threats are implemented in the IDS's back-end.The SDN gateway's IDS 1 module then handles attacks coming from outside of SDN, while IDS 2 handles attacks coming from inside SDN.Table 1 reports the key and notations which is used in the paper.

A. CLUSTERING
Hierarchical clustering is a cluster analysis technique that aims to create a hierarchy of groups.Agglomerative hierarchical clustering follows a bottom-up approach in which clusters are divided into sub-clusters, which are divided into sub-clusters, and so on.It is also called the single-link technique.The single link cluster is the minimum distance among the objects.The complexity of the single link technique is O (n) 2 .We use hierarchical agglomerative clustering of bandwidth within the network for different services (e.g., social media, automated homes, and automated cars).Afterwards, we specify each cluster's speed limit.Basically, we have applied the clustering method for bandwidth regarding service requirements.For Social Media (SM), we assigned data 1 to 50Mbps, for Automated Homes (AH) 1 to 100Mbps, and Automated Cars (AC) 1 to 150Mbps.In these manners, we have done load balancing.If we talk about traditional or existing methods, so the main difference between existing work and our proposed solution is bandwidth allocation.Existing solutions do not consider the load balancing for bandwidth.Some services require less data or bandwidth; on the other hand, some services require more data or bandwidth.If we do not specify the bandwidth regarding the services, we cannot load-balancing efficiently between these services.To our knowledge, there is no existing work doing load balancing as we have done.There is an input 9 by 9 distance matrix in figure 9.This distance matrix is calculated based on the object features.We have 9 objects (e.g., a, b, c, d, e, f, g, h, i), and our goal is to group these objects into one single cluster.In the first step, find the closest pair, A and C, with a minimum distance of 2. Thus, make a cluster of A and C and afterwards update the distance matrix.Next, the dendrogram shows how the clusters are merged into groups.Finally, draw a plot of the cluster hierarchy.The ungrouped objects with the same values remain, but the values of new groups A and C are calculated, shown in figure 9.At this stage, a single link technique to find out the minimum distance between two clusters.We again computed the input matrix to find out the distance.The new distance matrix values are updated, and now the closest pair AC and I with minimum distance 3 as shown in figure 10.Thus, merge AC with me into one cluster.Next, repeat these steps to update the new distance matrix.The new closest cluster, ACIH and B, merge into one cluster as shown in figure 11.We repeat the same procedure for figure 12, figure 13 and get only a single cluster that contains the whole 9 objects.Thus, the computation is finished, and summarized the results of this computation are as follows: 1) At the start, there are 9 objects A, B, C, D, E, F, G, H the computation By using this information, draw the final dendrogram and plot the clustering hierarchy, which is based on the distances to merge the clusters.Figure 14 shows the clusters for different services for load balancing.Here we assign the clusters to the various services and assign the data according to their work to avoid congestion in the network.There are three main clusters (e.g., social media, automated homes, and automated cars).For Social Media (SM), data is assigned 1 to 50Mbps, for Automated Homes (AH) 1 to 100Mbps, and for Automated Cars (AC) 1 to 150Mbps.In these manners, we have done load balancing.Assigned data to every cluster according to its function.Due to clustering and assigned data, there is no wastage of data; second, no congestion occurs; and third, it reduces latency rate.Figure 15 shows the proposed model we have been clustering for different services.They define the speed limit of each cluster.There are 3 clusters: Blue, Red, and yellow.The Red Cluster is used for social media, the yellow cluster is used for automated homes, and the blue cluster a used for automatic cars.Each cluster is connected to a controller.The controller manages all these clusters.The controller has the entire cluster record, and he decides which cluster to send the traffic to based on their requirements.The implementation of the single link hierarchical technique is shown in algorithm 1. ▷ Find node labels from cluster representatives 5: return L 6: end process 1: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

FIGURE 9. Applying Agglomerative Clustering
Step 1: Using this method, we associate each point with a distinct cluster.Let's suppose there are 5 data points.Each of these locations will be assigned to a cluster, resulting in a total of 5 clusters at the start.The nearest pair of clusters are then combined at each iteration, and this process is repeated until only one cluster is left.

B. BACK PROPAGATION NEURAL NETWORK
Back Propagation (BP) belongs to the vast family of Artificial Neural Networks (ANN) as shown in figure 16, which is constructed of various multi-layer feed-forward networks trained using the error backpropagation technique.It is one of the most extensively used neural network models.The Back Propagation network could be used to train and maintain many mapping relations of an input-output model.There is no need to reveal the mathematical model that relates these mapping connections in advance.Its training process employs the gradient descent approach, in which backpropagation is used to control the network's weight and threshold value to attain the smallest error sum of squares.The Back Propagation Neural Network (BPNN) is being used to train the network.A neural network is a set of interconnected I/O modules, and each has its weight correlated with its computer programs.
It assists in the creation of statistical models from massive datasets.This model acts as the human nervous system.Back Propagation is essential in neural network training.The finetuning technique optimizes the error rate based on previous iterations.By fine-tuning the weights, we can minimize the error rates and make the system more accurate by enhancing its generalization.Procedure: 1) Initialization of weights 2) Feed-forward 3) Back Propagation of errors 4) Updation of weight and bias In equation ( 1) and equation ( 2), Wjk and Vij indicate the network's weights, which come from zj and xi.Minimize the  2) While the stopping condition is false, do steps 3 to 10.
3) For each training, pair do steps 4 to 9. 4) Each input unit receives the signal unit, transmits this signal xi, and transmits the signal to all units.5) In equation ( 3) each hidden unit zj (z = 1 to p) sums its weighted input signals.
Applying the activation function Zj = f(Zinj) sends these signals to all units in the layer about, i.e., the output unit.6) In equation ( 4) each output unit Yk (k = 1 to m) sums its weighted input signals.
And apply the activation function to calculate the o/p signals.The equation ( 5) Yk is the actual output.
Back Propagation of error: 7) Each output unit Yk (k = 1 to n) receives a target pattern corresponding to input pattern error information.
In equation ( 6) tk is the target value minus form the 105092 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.actual value to calculate the total error.
8) In equation ( 7) each hidden unit Zj (j = 1 to n) sums its input from units in the layer above.δinj = m k=1 δj W jk (7) The error information term is calculated as in equation ( 8) Updation of weight and bias: 9) Each output unit Yk (k = 1 to m) updates its bias and weight (j = 1 to p).The weight correction term is given in equation ( 9) And the bias correction term is given by in equation ( 10) Therefore in equation (11) and in equation ( 12) In equation ( 13) for each hidden unit Zj (j = 1 to p) update its bias and weight (i =1 to n) the weight connection term.
And the bias connection term in equation ( 14) Therefore in equation ( 15) and equation ( 16) 10) Test the stopping condition.The stopping condition may be the initialization of the error, no of epochs, etc.The implementation of the backpropagation neural network technique is shown in algorithm 2. Figure 17 shows the flow of the proposed solution.The proposed solution starts with topology information in the shape of graph topology.The proposed solution follows two steps.The first step is to apply a clustering algorithm in SDN with efficient and congestion-free manners for different services and different data requirements.The clustering algorithm works based on minimum distance until there is only one cluster remaining.When clustering is complete, the second step starts, applying BPPN to train the network and optimizing the error rate based on previous iterations.BPNN is working around four steps: initial weights, feed-forward pass, backpropagation of errors, and updating the weights and biases.The controller works as the brain of the network, which has all network topology records and makes a decision according to the network situation that where to send traffic.

V. SIMULATION AND PERFORMANCE ANALYSIS
For load balancing, this section explains the experimental scenario, presents the analysis, and compares the performance of the proposed solution with traditional methods.The main purpose is to test the advantages of our proposed solution from the perspective of network delay, packet loss, throughput, latency rate, and bandwidth usage.Traditional routing protocols often forward traffic along the shortest way, resulting in anomalous pauses caused by over-loaded connections.The controller may become obese (due to the decoupling of network control and forwarding functions that enable network control to become directly programmable) to accommodate diverse network applications, affecting SDN's scalability and flexibility.We start with the introduction of the experimental scenario and the methodology used in this experiment.

A. SCENARIO
In the simulation environment, the Mininet simulator is used to simulate the network topology (shown in figure 18), which is running on the Ubuntu Linux operating system.The loadbalancing approach is implemented by using an OpenDaylight controller.In our experiment, the network topology contains 120 hosts, 30 OpenFlow switches, and one controller.The topology simulates with Celeron(R) Dual-Core CPU, 4GB RAM.The OpenDaylight controller is opensource, which organizes traffic flow in SDN.OpenDaylight controller based on the OpenFlow protocol concept.The OpenDaylight controller manages communication through the SDN switches placed on the flow table in the Open-Flow protocol.As illustrated in topology, for load balancing, we have tested our proposed solution in two steps.In the first step, clustering for different services (e.g., social media, automated homes, and automated cars) are considered inside 105094 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the network for different data requirements.The agglomerative clustering algorithm is used to make the clusters inside the SDN based on minimum distance.The OpenDaylight controller is connected to every cluster and assigns tasks according to the situation.Secondly, we incorporate the BPNN model into the SDN-enabled networking architecture.BPNN is being used to train the network and for link utilization.BPNN has x inputs which are arrived via a predetermined route.Inputs are described by using realtime weights, which indicate w.The weights are selected randomly.Compute the output of each neuron as it moves from the input layer to the hidden layers and then to the output layer.In the end, it calculates the error with the help of actual output and desired output.Reverse the procedure till the desired results are obtained.The SDN controllers monitor and collect real-time network information via the southbound interface and upload it to the database.We have used the SDN dataset from our past studies [23] and internet sources like Kaggle for machine learning algorithms.The information comprises network architecture, connection delays, traffic updates, and so on.

B. DATA ACQUISITION
All the data required for the Big Data monitoring approach is gathered during the data acquisition activity.Inventory and streaming data are two categories for the information gathered.In our scenario, we have used inventory data.The streaming data are gathered and transferred right away for additional processing, whilst the inventory data fill a repository.Depending on how it is implemented, inventory data gives fundamental information about the network's resources and may be utilized for various activities.For instance, information about the available bandwidth is recorded in the inventory data, which may be used to determine the percentage of a port's own available bandwidth that a port is using.The Host, Interface, Switch Port, Switch, FlowTable, and Flow entities store inventory data.Because they are associated with the fundamental identification and configuration of the network hardware (hosts and switches), inventory data are less likely to change over time.

1) COLLECTION OF INVENTORY DATA
This task objective is to compile a network inventory.Hosts and switches make up the inventory in an OF network.Additionally, data regarding how hosts and switches are connected is gathered by this job.This connection suggests that the Switch Port entity is dependent on the Switch entity.The connection Maintains between Switch and Flow Table entities holds true in the same way.The link that defines Flow

inventory category. The identification of the Flow Table and
Flow entities is one of its fundamental properties.The SDN controller may add any additional properties.

VI. AND DISCUSSION
This section shows the performance of the proposed model.To evaluate the proposed model's performance, we compare the following factors (network delay, packet loss, throughput, latency rate, and bandwidth usage).

A. NETWORK DELAY
The length of time required for a packet to get from source to destination is called network delay.The source is A point, and the destination is point B. The delay is referred to as an endto-end delay.Typically, network delays are minor.A crosscountry network, for example, has an end-to-end latency of about 30 milliseconds.Packets, on the other hand, can become lost in a network.Software for an efficient connection must detect and retry lost connections.If a resend is required, the overall delay is at least twice since a resend request and answer adds additional round-trip time.The impact might be considerably greater for high-speed, dependable data transport technologies.Process delay, queuing delay, transmission delay, and propagation delay are all types of network delay as shown in equation (17).
For network delay, the packet delay is divided into a series of nodal delays.Every nodal delay is the amount of time that passes between a packet's arrival at one node and its arrival at the next.The above formula breaks down the nodal delay into simpler-to-understand components.

1) PROCESSING DELAY
The nodal processing delay dproc is the amount of time it takes a node to process a packet.This duration comprises error detection, scanning the incoming packets, and seeking the link to the next node depending on the destination  node.Although the processing may appear complex, the nodal processing delay is often insignificant compared to several other components in the delay formula.SDN controller processing ability is simulated in Mininet to evaluate the capability of the proposed solution by using the clustering technique.Figure 19 shows the process delay comparison with Begam et al. [22], Cui et al. [32].Processing delay is not an issue for SDN networks when network traffic is relatively low.Due to increasing data arrivals and network stress, the proposed solution maintains a processing delay under 1ms most of the time.Due to the long path across several nodes has limited handling capability, Begam et al. [22], Cui et al. [32] have the longest latency.Furthermore, the data arrival rates at the nodes are rather high.Meanwhile, the proposed solution routes have a modest delay.Due to the processing performance of the queue nodes and a clear route, the delays are less than one second.The proposed solution decreases 15% process delay to traditional solutions.The proposed solution offers superior performance in satisfying the key latency requirement in 5G networks while retaining SDN elasticity and programmability.

2) QUEUING DELAY
The queuing delay dqueue is the amount of time a packet spends in a queue at a node while some other packets are being sent.If the node is a high-speed router, there is one queue for each outbound link, so a packet is just waiting for other packets travelling over the same link.The following  approximately in equation ( 18) relates the queuing delay to the transmission delay dtransmi.dqueue = dtransmi * lqueue (18) lqueue is the average length of the queue in this case.The load factor is the attempted link transmission rate ratio to the link maximum transmission rate, which determines the average queue length.When the load factor is less than 1/2, the average queue length is less than one.When the load factor surpasses one, the queue length rises indefinitely.Figure 20 shows the queuing delay of three different types of traffic.
The average queuing delay is plotted in figure 20; the real trace's average queuing delay differs from the short-range dependent arrival process's estimated delay.The burstiness of the incoming traffic is a factor that rapidly increases the delays at relatively high utilization.Figure 20 compares Begam et al. [22], Cui et al. [32], and the proposed method.Due to the long path across several nodes has limited handling capability, Begam et al. [22], Cui et al. [32] has the longest latency.Furthermore, the data arrival rates at the nodes are rather high.Meanwhile, the proposed solution routes have a modest delay.Due to the queuing performance of the queue nodes and a clear route, the delays are less than one second.It indicates that the proposed solution for queuing delay has 35% better results than traditional methods for both moderate and high traffic programmability.

3) TRANSMISSION DELAY
The transmission delay is the time it takes to insert a complete packet into the communication medium.The following  equation ( 19) can be used to calculate it.
In equation 19, L is the packet's length in bits, and R denotes the transmission rate in bits per time unit.In both dtransmi and R, the time unit must be the same.Figure 21, compared to Begam et al. [22], Cui et al. [32], depicts the link between the average transmission.According to figure 21, the proposed solution decides packet priority based on traffic flow.The packet will be successfully transported to the destination node, determined using the incremental averaging technique.However, the transmission delay tends to remain flat; the proposed solution demonstrates that the information transmission delay does not increase exponentially with time.It has better performance than traditional methods.

4) PROPAGATION DELAY
The propagation delay dpropa is the amount of time it takes for a signal change to go from one node to the next across the communication channels.The following equation ( 20) can be used to calculate it.
D is the distance from one node to the next, and s denotes the propagation speed of the network.A signalling modification propagates near the velocity of light, roughly 186,000 miles per second for radio broadcast connections.A signal update propagates at 60% to 80% of the speed of light via copper and fiber connections.100,000 miles per second is a good approximation of propagation speed for back-ofthe-envelope computations in copper or fibre connections.Figure 22 shows the percentage decrease of average propagation delay to be 15.8 percent for four inputs and 61.25 percent for eight inputs.It is deduced from a rough calculation of the area that the suggested method has an area advantage over the traditional stack.Figure 22 that compared to Begam et al. [22], Cui et al. [32] shows the proposed solution has a shorter propagation delay for all numbers of inputs but a higher power consumption for practical numbers of inputs.

B. PACKET LOSS
While using the network or any internet, tiny data units known as packets are delivered and retrieved.Packet loss occurs when one or more of these packets fail to meet their desired target.Packet loss presents itself to users as network interruption, poor performance, and even entire network connection failure.The business network of today is the cornerstone of corporate performance.When the network suffers from performance difficulties, the business suffers as a result.
A variety of operational issues can impact network performance, with packet loss being one of the most prevalent.
Figure 23 shows the packet loss performance compared to Begam et al. [22], and Cui et al. [32].By utilizing the network simulator program, a comparison with other traditional techniques of packet loss was performed in various scenarios by altering factors such as node density, node speed, and stop duration.Due to the long path across several nodes having limited handling capability, Begam et al. [22], Cui et al. [32] have the longest latency.Furthermore, the data arrival rates at the nodes are rather high.Meanwhile, the proposed solution routes have a modest delay.Due to the processing performance of the queue nodes and a clear route, the delays are less than one second.The simulation results demonstrate that the proposed approach is a significantly better alternative in extremely dense networks than the old solution, except when real-time transmissions are demanded.

C. THROUGHPUT
Throughput is the number of units of material that a system can process in a given length of time.It is extensively used for systems spanning from computers connected to the network to organizations.Related metrics of system productivity include the speed with which a given job may be performed and response time, the length of time between a single interactive client request, and the reply's arrival.Figure 24 shows the performance of throughput, which is compared to Begam et al. [22], Cui et al. [32], which illustrates the variance in throughput for various request creation rates in Mbps.The graph shows that the proposed solution is better than traditional methods in terms of throughput.

D. LATENCY RATE
The amount of time required for messages to move from one location to another is latency.The physical range determines that the message would move over cables, networks, and other means to reach the target.One essential capability of the OpenFlow controller is its capacity to handle incoming packets as quickly and efficiently as possible, which we refer to as latency.Figure 26 shows the latency performance compared to Begam et al. [22], and Cui et al. [32].Due to the long path across several nodes has limited handling capability, Begam et al. [22], and Cui et al. [32] have the longest latency.Furthermore, the data arrival rates at the nodes are rather high.Meanwhile, the proposed solution routes have a modest delay.Due to the processing performance of the queue nodes and a clear route, the delays are less than one second.The simulation results demonstrate that the proposed approach is significantly better than traditional methods.

E. BANDWIDTH USAGE
Bandwidth is the pace at which information is transferred in a specific period, which is generally expressed in terms of ''Megabits per second'' (Mbps) or ''Gigabits per second'' (Gbps).As the word ''bandwidth'' indicates, it acquired its name since transmission speed used to be primarily determined by the breadth of a communication band.
Figure 26 shows the bandwidth usage performance compared to Begam et al. [22], Cui et al. [32].Begam et al. [22], Cui et al. [32] have the longest delay due to the long route across numerous nodes and inadequate handling capability.In addition, data arrival rates at the nodes are rather high.The recommended solution routes, in the meantime, have a slight delay.The delays are less than one second due to the queue nodes' processing speed and a clear path.The simulation results demonstrate that the proposed solution in bandwidth usage has 30% better results than traditional methods.

F. COMPUTATIONAL TIME COMPLEXITY WITH INCREASING THE NUMBER OF SERVICES
Computational time complexity is a metric for how much computing time and space an algorithm uses up as it operates.
In our scenario, we have compared the computational time complexity with an increase in the number of services.In figure 27, it is clearly shown that when the rate of services is varying, then the efficiency of the proposed solution remains constant.

G. COMPUTATIONAL TIME COMPLEXITY WITH INCREASE THE NUMBER OF NODES
In figure 28, we have compared the computational time complexity with an increase in the number of nodes and it is clearly shown that when the rate of nodes is varying, then the efficiency of the proposed solution remains constant.

VII. CONCLUSION
This research addresses the basic issues of load balancing and the changes of various network flows in a coordinated SDN congestion-free manner.The proposed load-balancing method is used in networks to accomplish the goal of utilizing optimum energy.It distributes the load by assigning a series of demands to a collection of services and rapidly rerouting flows for consistency, accessibility, and overall performance.The Proposed model had two phases.In the first phase, clusters of bandwidth for different services inside the network are generated for different data requirements.
Agglomerative clustering (single link technique) was used to make the clusters inside the SDN based on minimum distance.OpenDaylight controller has coordinated with every cluster and assigns them tasks according to the situation.
In the second phase, we applied the BPNN algorithm to the SDN-enabled networking architecture.BPNN trained the network for link utilization and optimized the error rate based on previous iterations.In our scenario, we have discussed the scalability of our proposed model.It is clearly shown in the results section through graphs in section (F) computational time complexity with an increase in the number of services and in section (G) computational time complexity with an increase in the number of nodes that when increasing the number of nodes or services is varying, then the efficiency of the proposed model remains constant.The simulation result showed that the performance of the proposed model for load balancing improves all factors.Results are promising as the performance of the proposed method increased from 15% to 35% in terms of network delay, 23% in terms of packet loss, 27% in terms of throughput, 21% in terms of latency rate, and 30% in terms of bandwidth usage when compared to traditional methods.For future research, we have planned to explore other machine learning algorithms for load balancing for more complex networks by using the same datasets.We'll also look at the influence of different 5G network conditions on flow tables, as well as the identification of collision and regular flows.We'll also look at the influence of different 105102 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

FIGURE 1 .
FIGURE 1. SDN Architecture: A logical picture of the SDN architecture is shown in SDN Architecture.Three levels make up the SDN network architecture: application, control, and infrastructure.

FIGURE 2 .
FIGURE 2. Integration SDN with 5G: SDN coupled NFV, edge computing, network slicing, and segment routes which are the right match for 5G transport requirements.

FIGURE 3 .
FIGURE 3. Load balancing during congestion by using MRBS algorithm, implemented in the spine-leaf architecture of DCN.

FIGURE 4 .
FIGURE 4. Detection and Classification of conflict flow, implemented through DT and SVM classifier.

FIGURE 5 .
FIGURE 5. Introduced a scheme MARVEL by using artificial intelligence for load balancing in SDN.

FIGURE 6 .
FIGURE 6. Congestion and upgradation problem: There are different colours of data packets moving inside the network.The red colour nodes represent active nodes, and the green nodes show that they need updating.The red circle shows the congestion occurrences.

FIGURE 7 .
FIGURE 7. The scenario of the proposed model.
Figure 8 shows the network model, which we use as a sample model.There are n (nodes) that have different distance values.The diagonal elements always have S × S (0, ∞), → e.g., d(x, x) = 0 and (x, y) = (y, x) for all x, y ∈ S. Usually, the distance between two clusters is A and B min {d(x,y):x∈A,y∈B}.Every single item in a cluster is the starting point for agglomerative hierarchical clustering.Then, each iteration merges the nearest pair of clusters by meeting some similarity criteria until all the data is in one cluster.The metrics that we used in agglomerative clustering are Euclidean distance ∥ a−b ∥ 2 = i (a i − b i ) 2 .Start with n clusters; every cluster contains one object, and the clusters will be numbered 1 to n. Compute distance D (Pi, Pj) matrix as the between-object distance of the two objects in Pi and Pj respectively, Pi, Pj = 1,2, . . .,n.Let the square matrix D = (D (Pi, Pj)).Then select the closest pair of clusters next, Pi and Pj, such that the distance, D (Pi, Pj), is the shortest of all pairwise intervals.Merge Pi and Pj to a new cluster t and compute the between-cluster distance D(t,k) for any current cluster k Pi, Pj.If the distance is determined, delete the rows and columns in the D matrix that belong to the old clusters Pi and Pj because r and s no longer exist.Then, in D, add a new row and column to lead to cluster t.Repeat these steps n -1 time until there is only one cluster remaining.

FIGURE 8 .
FIGURE 8. Sample Network Model: There is n (nodes) that have different distance values; the diagonal elements always have 0 values.

FIGURE 13 .
FIGURE 13.Applying Agglomerative Clustering Step 5: We can observe that the single linkage distance clustering pattern prefers to produce compact clusters of clusters.

FIGURE 14 .
FIGURE 14. Assign clusters to different services.

FIGURE 15 .
FIGURE 15.Clustering: there is clustering between different services (e.g., social media, automated homes, and automated cars), and assign the data according to their work, for social media 1 to 50Mbps, for automated homes 1 to 100Mbps, and for automated cars 1 to 150Mbps.

FIGURE 16 .
FIGURE 16.Back propagation neural network model with three layers.The graphic shows a typical three-layer BPNN model topology, where x1, z1, and y1 stand for the input, hidden, and output layers' respective numbers of neurons.

Algorithm 2
Back Propagation Neural Network Algorithm 1: Initialize all inputs and outputs of the network 2: Both weights should be started with small random numbers, usually between -1 and 1 3: repeat 4: for every pattern in the training set 5: Show the Pattern to the network 6: Broadcast the input forward pass through the network ▷ Forward Pass 7: for every layer in the network 8: for each node in the layer 9: Calculate the weighted number of the node's inputs 10: Add the threshold to the sum 11: Calculate the activation of the node 12: end 13: end 14: Broadcast the errors backward pass through the network ▷ Backward Pass 15: for each node in the output layer 16: Determine the error signal 17: end 18: for all hidden layers 19: for each node in the layer 20: Calculate the node's signal error 21: In the network, update the weight of each node 22: end 23: end 24: Calculate Overall Error ▷ Error Calculation 25: Calculate the Error Function 26: end 27: while ((maximum number of iterations < than specified) AND (Error Function is > than specified))

FIGURE 17 .
FIGURE 17.Flowchart for the proposed solution: At the first step, there is clustering for different services (e.g., social media, automated homes, and automated cars) inside the network using agglomerative clustering algorithm and at the second step, implement BPNN technique for optimal path selection.

FIGURE 19 .
FIGURE 19.Comparison of process delay, Begam et al. [22], Cui et al. [32], and proposed solution.(a) Process Delay when the rate of traffic varies, and the efficiency of the proposed solution remains constant; (b) Process Delay when the rate of traffic varies, and the efficiency of the proposed solution remains constant.

FIGURE 20 .
FIGURE 20.Comparison of queuing delay, Begam et al. [22], Cui et al. [32], and proposed solution.(a) Queuing delay when the rate of traffic is varying, and the efficiency of the proposed solution remains constant; (b) Queuing delay when the rate of traffic is varying, and the efficiency of the proposed solution remains constant.

FIGURE 21 .
FIGURE 21.Comparison of transmission delay, Begam et al. [22], Cui et al. [32], and proposed solution.(a) Transmission Delay when the rate of traffic is varying, and the efficiency of the proposed solution remains constant; (b) Transmission Delay when the rate of traffic is varying, and the efficiency of the proposed solution remains constant.

FIGURE 22 .
FIGURE 22.Comparison of propagation delay,Begam et al. [22], Cui et al. [32], and proposed solution.(a) Propagation Delay when the rate of traffic varies, and the efficiency of the proposed solution remains constant; (b) Propagation Delay when the rate of traffic varies, and the efficiency of the proposed solution remains constant.

FIGURE 23 .
FIGURE 23.Comparison of packet loss, Begam et al. [22], Cui et al. [32], and proposed solution.(a) Packet Loss when the rate of traffic varies, and the efficiency of the proposed solution remains constant; (b) Packet Loss when the rate of traffic is varying, and the efficiency of the proposed solution remains constant.

FIGURE 24 .
FIGURE 24.Comparison of throughput, Begam et al. [22], Cui et al. [32], and proposed solution.(a) Throughput when the rate of traffic varies, and the efficiency of the proposed solution remains constant; (b) Throughput when the rate of traffic is varying, and the efficiency of the proposed solution remains constant.

FIGURE 25 .
FIGURE 25.Comparison of latency rate, Begam et al. [22], Cui et al. [32], and proposed solution.(a) Latency rate when the rate of traffic varies, and the efficiency of the proposed solution remains constant; (b) Latency Rate when the rate of traffic is varying, and the efficiency of the proposed solution remains constant.

FIGURE 26 .
FIGURE 26.Comparison of bandwidth usage, Begam et al. [22], Cui et al. [32], and proposed solution.(a) Bandwidth Usage when the rate of traffic is varying, and the efficiency of the proposed solution remains constant; (b) Bandwidth Usage when the rate of traffic is varying, and the efficiency of the proposed solution remains constant.

FIGURE 27 .
FIGURE 27.Comparison of computational time complexity with increase the number of services, Begam et al. [22], Cui et al. [32], and proposed solution.(a) Computational time complexity when the rate of services is varying, and the efficiency of the proposed solution remains constant; (b) Computational time complexity when the rate of services is varying, and the efficiency of the proposed solution remains constant.

FIGURE 28 .
FIGURE 28.Comparison of computational time complexity with increase the number of nodes, Begam et al.[22], Cui et al.[32], and proposed solution.(a) Computational time complexity when the rate of nodes is varying, and the efficiency of the proposed solution remains constant; (b) Computational time complexity when the rate of nodes is varying, and the efficiency of the proposed solution remains constant.

TABLE 1 .
The key notations of the network model.
Table and Flow suggests that the Flow entity is dependent upon the Flow Table object.This job gathers switch inventory data as well as port inventory data, flow tables, and flow identifications based on these linkages.Table 2 displays the set of properties specified for the Switch Port object in the

TABLE 2 .
Switch port entity attributes.