Dynamic Load Balancing Algorithm Based on Optimal Matching of Weighted Bipartite Graph

When the server cluster is processing concurrent task requests, if the performance difference among servers is not fully considered, task allocation will be unreasonable, which will lead to an increase in task making span and a decrease in cluster resource utilization. As one of the core technologies of server cluster, load balancing is used to balance the load of each server by allocating tasks to each server through an algorithm before the task processing. Therefore, this paper proposes a dynamic load balancing algorithm based on optimal matching of weighted bipartite graph. First, we constructed a bipartite graph with servers and tasks as vertices. The management server collects the load indicators of each server in the cluster in real time, using the real-time processing rate of each server as the load indicator. Each edge of the bipartite graph is determined by comparing the expected completion time of the tasks with the load of each server. The degree of matching between each task amount and each server load capacity is defined as the weight matrix of the edges, and the bipartite graph is weighted to construct a weighted bipartite graph. The Kuhn-Munkres algorithm was used to solve the optimal matching of the weighted bipartite graph, and the optimal assignment of tasks to servers was achieved based on the result of the optimal matching. The proposed algorithm fully considers the differentiation of each task amount and each server load capacity. By building a server cluster example and conducting comparison experiments, it is demonstrated that the algorithm can achieve load balancing of the server cluster and improve the resource utilization efficiency of the cluster, while offsetting the extra time overhead caused by the algorithm.


I. INTRODUCTION
Nowadays, there is an era of high-speed communication network developments. With the gradual enrichment of network service content and the growth of a variety and number of smart devices, the Internet has become an important and indispensable tool in life. According to a report on emerging technology trends released in the U.S., more than 100 billion devices are expected to be connected to networks by 2045. According to a white paper published by Seagate and the Internet Data Center, the volume of data worldwide will reach 163 ZB by 2025, and more than a quarter of this data will be real-time communication data.
The associate editor coordinating the review of this manuscript and approving it for publication was Nurul I. Sarkar . With this comes an exponential increase in concurrent access requests, and as a large number of requests are queued and stacked in the task pool, the traditional single server processing mode will cause an increase in make-span of tasks, congestion, overload and other problems. In serious cases, this can also lead to a server crash. This not only poses a huge challenge to the processing performance and data security of the platform server, but also causes an irreversible negative impact on the user experience.
Therefore, the industry mostly uses multiple servers to form a server cluster system. Multiple servers within the system work together to process parallel requests from users and provide external services [1]. The server cluster system can quickly switch to the remaining servers for response and data backup when a single server crash [2]. It improves the VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ computing response speed and ensures the stability of the system and the security of data. The server cluster system can reduce the average waiting time of users, reduce the energy consumption of the system, and expand the throughput of the system. There are some performance differences between different servers in the cluster system. If the tasks assigned to each server do not consider this problem, some servers will run under a high load, resulting in increased task processing time or failure to complete tasks, and some servers will run under load for a long time, resulting in a waste of system resources. This will affect the advantages of server cluster. One of the key issues in server cluster system is how to reasonably distribute tasks to each service node under the premise of meeting the maximum service demand and ensuring service quality. Load balancing technology is a key technology for solving this problem. A reasonable load balancing algorithm can make the server resources more efficient, even, and reasonable, thus improving the service performance, increasing the throughput, and improving the flexibility and stability of the system. By considering the complexity of different task requests and the performance differences of each server in the cluster, this study proposes a dynamic load balancing algorithm based on the optimal matching of weighted bipartite graph. In the proposed algorithm, we first construct a bipartite graph with the server and task nodes as vertices. The ratio of task volume to actual task completion time was used as the load indicator for the server [3]. The load indicator of each server in the server cluster was collected in real time by the management server. The expected completion time of each task is compared with the real-time load of each server to determine each edge of the bipartite graph. The edge weight matrix of the bipartite graph is generated by calculating the match between the task volume of each task and the load capacity of each server, weight the bipartite graph to construct the weighted bipartite graph. The Kuhn-Munkres algorithm [4] with depth-first traversal is used to find the maximum weight match of the weighted complete bipartite graph, which is the optimal matching of the original weighted bipartite graph. Finally, according to the optimal matching results, the task was assigned to the designated server for processing. In this manner, real-time scheduling of tasks and dynamic load balancing of the server cluster can be achieved.
The main contributions of this paper can be summarized as follows.
(i) Design of scheduling model based on proposed algorithm. The model realizes the real-time scheduling of task requests, and distributes tasks to the server cluster for execution reasonably. Improve the efficiency and stability of the server cluster. (ii) Propose a dynamic load balancing algorithm based on the optimal matching of weighted bipartite graph. The assignment of tasks to servers is accomplished by calculating the optimal matching results such that the load of the server cluster is relatively balanced.
(iii) Build a server cluster test example according to the proposed algorithm. Two experiments were conducted to demonstrate that the proposed algorithm had better load balancing effect.
The rest of this paper is organized as follows. In Section II, some related works are reviewed. Section III gives the problem statement, establishes the scheduling model and evaluation index of the proposed algorithm. Section IV introduces the algorithm flow in detail, analyzes the time complexity of the proposed algorithm. Section V conducts extensive experiments to evaluate the practical effects of the proposed algorithm by building test examples. Finally, in Section VI, the work of this paper is concluded and gives some possible directions for future work.

II. RELATED WORK
The load balancing algorithm needs to consider the complexity of the tasks as well as the processing capacity of each server in the cluster, and distribute the tasks to the servers in a reasonable manner. On the premise of ensuring effective task processing, the load of each server is relatively balanced. It avoids the load skewing phenomenon [5] that occurs when a few servers are running under a high load, whereas the rest are running under a low load. Load balancing algorithms can be divided into static load balancing algorithms and dynamic load balancing algorithms, according to the request distribution policy [6], [7]. The static load balancing algorithm pre-sets the load balancing policy based on the overall situation of the cluster servers, and distributes the task requests according to the established policy when the servers are running. The balancing policy was fixed during the running process. This method is more suitable for situations involving a single scene and fewer tasks. Unable to cope with the more complex request processing system. Typical static load balancing algorithms include the Weighted Round-Robin (WRR) algorithm [8] and the Destination Hashing Scheduling algorithm. The dynamic load balancing algorithm dynamically adjusts the distribution policy based on the real-time operation of the servers. This method is more suitable for complex cluster system and more tasks. Compared with the static load balancing algorithm, it is more effective, more flexible, and has better dynamic adaptability to the system. However, dynamically adjusting the distribution policy also introduces additional time overhead. Typical dynamic load balancing algorithms include the Consistent Hashing algorithm [9] and the Least Connections (LCS) algorithm. These algorithms lack the consideration of performance differences between servers and the complexity of task requests; therefore, they cannot accurately judge the load situation of each server. For the load balancing of the server cluster system, the distribution strategy should be dynamically adjusted by considering several indicators such as the complexity of task requests, the load situation of each server, and the overall load of the system.
First of all, paper [10] analyzes and compares the existing main load balancing algorithms, and points out that load balancing algorithms can effectively improve the resource utilization, response rate and throughput of server clusters, proving that a reasonable load balancing algorithm is the key to improve the system service performance. Many researchers have proposed a series of load balancing algorithms to solve the problems of overload, load skewing, low efficiency and resource optimization in the system. The work in [11] aims the problem of excessive gateway traffic load in IoT by using multiple gateways to share the traffic, proposing a queueing theory-based analysis method to evaluate the performance of multi-gateway IoT systems, and using a load balancing strategy based on multi-criteria decision making in multiple gateways to achieve the relative consistency of global load. But they do not consider the complexity differences of different task requests. The authors in [12], aims the problem of application performance degradation due to the use of multiple GPUs in heterogeneous environments, proposed a dynamic load balancing algorithm for multi-objective decision making. Constructed two objective decision-making models of performance priority and performance energy balance, and dynamically exchanged the models during the execution of the algorithm, which can make more effective use of system resources. But the algorithm model lacks the differentiation research of energy consumption indicators and the impact of different service performance on the balancing effect. The work in [13] in order to solve the load imbalance problem caused by each node of the server in the cloud center under the high concurrency of task requests, proposed a load balancing strategy with dynamic weight adjustment to improve the Weighted Round-Robin algorithm. By considering the server hardware performance and realtime load situation, static weight and dynamic weight are designed and the final weight is calculated by combining the two. But the algorithm model does not consider the complexity difference of different task requests. The work in [14] aims the lack of dynamic scheduling algorithms in Kubernetes cluster and the incomplete optimization of cluster resources, proposed a multi-resource load balancing algorithm based on cooperative game theory. The dynamic scheduling of tasks is achieved by monitoring the actual resource usage of servers in real time. A cooperative game model is established for load balancing based on the indicators of CPU, memory, network bandwidth and disk IO of the clustered servers. But they do not consider the extra time cost caused by the cooperative game model. Paper [15] aims the inefficiency caused by the Round-Robin algorithm used by the Flink engine in heterogeneous clusters, proposed a smooth weighted round-robin task scheduling algorithm and a task scheduling algorithm based on ant colony algorithm to solve the cluster load imbalance problem during operation. But they only consider the performance difference of service nodes in the definition of weight, and do not consider the difference between different task requests.
Most of the above researches have been done from the perspective of server cluster performance optimization, without fully considering the complexity differences of task requests and the matching with different performance servers. This is also the research motivation of this paper.
The optimal matching of weighted bipartite graph means that each edge in the bipartite graph has a weight value, and finding a set of edges that maximizes the total weight value while ensuring that each vertex is connected to a unique edge. It is often used to solve problems such as optimal assignment of tasks [16], [17], [18], resource allocation [19], [20], chemical interactions [21], [22], video information summary extraction [23], data matching [24] and collaborative sense optimization of composite events [25]. The work in [22] in order to solve the alignment problem of protein-protein interaction in biological systems, the initial alignment was created using a weighted bipartite graph matching technique to calculate the similarity scores and extended to obtain the final alignment, thus obtaining more similar regions. Paper [16] applied the weighted bipartite graph matching to the high-complexity flight service assignment problem and combined with the shortest-path algorithm to find an optimal solution. For the server cluster system, the maximum weight matching of weighted bipartite graph can be used to solve the problem of how to reasonably assign tasks to each server for processing and ensure the overall efficiency of the system. Paper [17] applied weighted bipartite graph matching to the digital twin technology for multi-energy systems to improve the performance of the computing nodes by allocating computing task blocks to each computing node evenly. The authors in [18] in order to solve the problem of scattered computing resources and limited resources of single computing node in the edge computing architecture, tasks are allocated to different edge data centers through the matching of weighted bipartite graphs and combined with particle swarm optimization algorithms to allocate tasks to specific docker containers for execution.
In this paper, optimal matching of weighted bipartite graphs is applied to load balancing of server cluster. Compared with the above researches, this paper uses the more intuitive real-time processing speed of the server as the load indicator. By calculating the matching degree between different tasks and different servers as the weight value, the load balancing between each server is realized from the perspective of task allocation.

III. ALGORITHM SCHEDULING MODEL A. PROBLEM STATEMENT
Define the graph G = (J , S, E) is a bipartite graph consisting of the nonempty set J and the nonempty set S as vertices and the set E as the set of edges. The set J = {J 1 , J 2 , J 3 , · · · , J m } represents the m independent pending task requests and the set S = {S 1 , S 2 , S 3 , · · · , S n } represents the n servers in the server cluster. The process of distributing a task J k to a server S i is defined as an edge e ki = (J k , S i ) of the bipartite graph, the set of all edges is called the set of edges E = {e ki |k = 1, 2, 3, · · · , m; i = 1, 2, 3, · · · , n; }. The weight that each edge has is defined as the edge weight w ki , the matrix consisting of all edge weights w ki as elements is defined as the edge weight matrix W mn . The edge set H = {h ki |k = 1, 2, 3, · · · , m; i = 1, 2, 3, · · · , n; } is defined as the maximum weight matching result of the bipartite graph G, where each edge h ki = (J k , S i ) represents the assignment of task J k to the server S i for processing.
The set of edges E is quantized for computational convenience and converted to an edge matrix E mn as elements, which contains only two elements, 0 and 1. The quantification rules are as follows: e ki = 0, Server S i cannot process the task J k 1, Server S i can process the task J k Thus, for edge matrix E mn where the edge element e ki = 0 corresponds to an edge with edge weight w ki = 0, indicating that the server S i cannot process the task J k .
The distribution policy provided by the proposed algorithm can be summarized as an optimal assignment problem [26]: There are n servers and one management server in the system (set S). The management server has a set of task queues of length m (set J ), from which n tasks are taken out. It is known that each server can process one or several of these tasks (edges e ki ), and each server has different efficiency in processing different tasks (edges weight w ki ). Find an allocation scheme to distribute the n tasks to the n servers, so that each server is assigned a task that is different from the other servers (edge set H ). The efficiency of the server cluster as a whole is maximized and the load on the n servers is relatively balanced.

B. SCHEDULING MODEL
The scheduling model of the proposed dynamic load balancing algorithm based on optimal matching of weighted bipartite graph is shown in Figure 1. The core of the proposed algorithm is deployed on the task assignment management server. The management server processes each arriving task in the task pool in the form of a stream. After determining the assignment result for each task according to the proposed algorithm, the task is assigned to a server in the cluster for processing. The whole system consists of three parts: task request, task assignment management server and task execution server cluster. The task request part places the service requests of the task queue in the task pool waiting to be allocated for execution. The task assignment management server part consists of task volume estimation module, server cluster load indicator monitoring module, calculation module and task assignment module. The task volume estimation module is responsible for estimating the volume of the task to be assigned and the expected time to complete the task processing. The server cluster load indicator monitoring module is responsible for collecting the load of each server in the cluster in real time, and updating and storing it in the management server. The calculation module is the core part of the algorithm, which constructs a bipartite graph by comparing the expected completion time of the task with the load of each server. The edge weight matrix of the bipartite graph is generated by calculating the match between the task volume of each task and the load capacity of each server. Finally, the optimal matching result of this weighted bipartite graph is obtained by Kuhn-Munkres algorithm. The task assignment module sends the task to the designated server for processing based on the optimal matching result obtained by the calculation module. The task execution server cluster part consists of each server, which is responsible for completing the execution of each task, calculating the load indicator of the server in real time and feeding back to the management server.
Server load indicator is related to the server's own CPU utilization, memory capacity and occupancy, disk read/write rate and other parameters, external network bandwidth, latency, and others will also have an impact [27]. In order to reduce the computational complexity, this paper uses a more intuitive task processing rate to represent the load of the server [28]. The load indicator for each server in the cluster is defined as v i , which represents the task processing rate of that server. The specific calculation is specified as follows: where r is the volume of tasks currently executed by the server, t start is the start time of the task execution, and t end is the end time of the task execution. A larger v i indicates a shorter task execution time and a higher service rate, which proves a better server performance and a lower server load. For the server cluster, the average value of the load indicator is defined as: where n is the number of servers in the server cluster. Thus, the variance of the load indicator for each server in the server cluster can be calculated as: where the variance σ 2 indicates the dispersion of the load indicator of each server in the server cluster. The smaller σ 2 proves that the load of each server in the cluster is more balanced.

IV. ALGORITHM PROCESS
In the proposed algorithm, we build a bipartite graph with tasks and servers as vertices, construct each edge of the bipartite graph according to the expected completion time of tasks and the load of each server, and calculate the matching degree of each task volume and each server to obtain the weight matrix of the edges. Finally, the Kuhn-Munkres algorithm is used to find the optimal matching of the weighted bipartite graph, and the tasks are assigned to the servers for processing according to the optimal matching result. The general flow of the proposed algorithm is shown in Figure 2.
The whole algorithm process is mainly completed by the management server and the server cluster together. Firstly, there are a series of task queues in the task pool with a certain amount of task requests waiting to be executed, which is represented by the set J . There are n servers in the server cluster, which is represented by the set S = {S 1 , S 2 , S 3 , · · · , S n }. The server records the task amount r, the task execution start time t start and the task execution end time t end when the task is executed. The current load indicator v i of the server is calculated by (2), and the load indicator v i is fed back to the management server. The management server collects and updates the load indicators of each server for storage in real time, and the load indicators of each server are represented by the set V = {v 1 , v 2 , v 3 , · · · , v n }. The initial load indicator for each server is calculated by sending a test task J 0 to each server from the management server. The task complexity of the test task J 0 is C 0 , the task volume is R 0 , and the time consumption is T 0 , where the task complexity is determined by the actual number of cycles in the task execution.
After getting the initial load indicator of each server, the management server takes out n tasks from the task pool with the same number of servers, which is represented by the set J = {J 1 , J 2 , J 3 , · · · , J n }. The task volume of each task is calculated by the task volume estimation module in the management server, which is represented by the set R = {R 1 , R 2 , R 3 , · · · , R n }. The expected completion time of each task is represented by the set T = {T 1 , T 2 , T 3 , · · · , T n }. For a task J k in the task set J , its task volume R k and expected completion time T k are calculated as: where R 0 is the volume of test task J 0 , C 0 is the complexity of test task J 0 , C k is the complexity of test task J k , and T 0 is the actual running time of test task J 0 . β is an empirical value for algorithm tuning. In practice, the processing time of the task does not expand proportionally with the increase of the task complexity. Therefore, we introduce the parameter β and increase the expected completion time T k of the task by adjusting the value of the parameter β, so that more matching results can be obtained in the subsequent steps. After getting the expected completion time of the tasks to be assigned with the load of each server in the cluster, the management server starts to construct the bipartite graph. Firstly, the task set J and the server set S are respectively used as the vertices of the bipartite graph to construct an edgeless bipartite graph, denoted as graph G 1 = (J , S, E), where E = ∅, as shown in Figure 3. For task J k and server S i , the management server estimates the actual execution time t ki of the task at the current server based on the stored set of load indicators V . The actual execution time t ki is calculated as follows: where R k is the volume of the task J k , and v i is the load indicator of the server S i . If t ki ≤ T k , it means that server S i is able to process task J k . Connecting vertex J k and vertex S i in the graph means obtaining an edge e ki = (J k , S i ) in the bipartite graph. If t ki > T k , it means that server S i is unable to complete the processing of task J k with the current load situation. The management server traverses all tasks and servers to obtain the edge set E. According to (1), the e ki in the edge set E is quantized to two values of 0 and 1, and the edge matrix E nn is obtained. From this, an edged bipartite graph can be obtained, representing which servers each task can be executed on, denoted as graph G 2 = (J , S, E), as shown in Figure 4. By traversing n tasks and n servers when building the edge set, the node with the largest task volume is defined as J I , and its task volume is R I . The node with the largest server load indicator is defined as S I ,, and its load indicator is v I ,. The matching degree between node J I and S I is set as 1, which means that for edge e II = (J I , S I ), its weight w II = 1. The weight w ki of the remaining edges connected by each task node J k and each server node S i is defined as: where R k is the volume of task J k , v i is the load indicator of server S i . Thus, the edge weight matrix W nn of the bipartite graph G 2 can be obtained as: By combining the weight matrix W nn with the bipartite graph G 2 , a weighted bipartite graph is obtained, which is still denoted as G 2 . According to the theorem, for any weighted bipartite graph, it is always possible to add some vertices and edges with zero weight to transform it into a weighted complete bipartite graph, and there is always a perfect matching, which corresponds to the maximum weight matching of the original weighted bipartite graph. This is also the optimal matching result sought by the proposed algorithm. A complete bipartite graph means that for every vertex in the set J is connected with an edge to every vertex in the set S. Therefore, for the later calculation, the weighted bipartite graph needs to be converted to the weighted complete bipartite graph. For the proposed algorithm, in the case where the number of task vertices is the same as the server vertices, some edges with weight of zero can be added to make it a weighted complete bipartite graph. In the case where the number of task vertices is less than the server vertices, some task vertices and edges with weight of zero are added to make it a weighted complete bipartite graph. A weight of zero means that the server node is unable to process the task node. In practical calculations, the weighted complete bipartite graph can be expressed by the Hadamard product of the edge weight matrix W nn and the edge matrix E nn . Define the weighted complete bipartite graph as G 3 = (J , S, E), as shown in Figure 5, where the dotted lines represent the edges with added weights of zero.
After the management server obtains the weighted complete bipartite graph G 3 , it finds the optimal matching of the  (8), we know that 0 ≤ w ki ≤ 1. Since the vertex labelling is a positive integer, all the elements in the edge weight matrix W nn are scaled up by a thousand times equally after keeping three decimal places. If each edge e ki in the weighted complete bipartite graph G 3 can satisfy the following conditions: (10) Then this set of vertex labeling is a set of feasible vertex labeling of the weighted complete bipartite graph G 3 , denoted as L, where N k ∈ l (J ) , Z i ∈ l (S) ; l (J ) , l (S) ∈ L. The feasible vertex labelling as shown in Figure 6. Define the edge set E L as: where E(G 3 ) is the edge set of weighted complete bipartite graph G 3 . After removing the edges that do not satisfy the above conditions, the spanning subgraph with E L as the edge set is the L equal subgraph of G 3 , denoted as graph K L , as shown in Figure 7.

B. STEP 2: FIND A PERFECT MATCHING M OF THE EQUAL SUBGRAPH K L
The Hungarian algorithm is executed in the equal subgraph K L . Take any one matching Q = {q ki |k, i = 1, 2, 3, · · · , n} of K L . The matching means the set consisting of some non-adjacent edges in the bipartite graph. In the task vertex set J , let the set of unsaturated vertices in the matching Q be A. If A = ∅, the matching Q is a perfect matching M of the equal subgraph K L , then the algorithm turns to the Step 4. The saturated vertex means that the vertex has and only has one Firstly, each vertex labelling l (u) in the weighted complete bipartite graph G 3 is modified according to the following equation and verified according to (10).
From this, we can compute a new set of feasible vertex labelling L . According to (11), a new edge set E(G 3 ) is found to generate a new equal subgraph K L . Then replace L in the first step with L , replace K L in the first step with K L . Then turn to the step 2 to find the perfect matching M of the equal subgraph K L .

D. STEP 4: FIND THE OPTIMAL MATCHING H OF THE WEIGHTED BIPARTITE GRAPH G 2
By repeatedly performing the above steps, the perfect matching result M of K L is solved, which is the optimal matching result of the weighted complete bipartite graph G 3 . After deleting the edges and their vertices with weight of zero, the optimal matching result H = {h ki |k, i = 1, 2, 3, · · · , n} VOLUME 10, 2022  of the original weighted bipartite graph G 2 is obtained, as shown in the following figure with bold lines.
Finally, the management server sends the task to the specified server for processing based on the optimal matching H , where each edge h ki = (J k , S i ) represents the assignment of task J k to the server S i .
The management server repeats the above algorithm process and assigns all tasks in the task pool to each server for processing based on the optimal matching result. Under the condition of ensuring the effective processing of tasks, according to the edge weight matrix set by the proposed algorithm, by fully considering the matching degree between each task volume and each server load indicator, it can ensure that the tasks with higher task volume are assigned to the server with lower load for processing. Therefore, it can effectively guarantee the priority matching and fast processing of key tasks. At the same time, it can effectively balance the load among servers while maximizing the overall efficiency of the server cluster. Thus, realizing the task scheduling and load balancing of the server cluster under high task concurrency.
The specific process of dynamic load balancing algorithm based on optimal matching of weighted bipartite graph is shown in Algorithm 1, and the time complexity of Algorithm 1 is shown in Table 1.
Through the analysis of the flow of the whole algorithm, the time complexity of the proposed algorithm is O(n 3 ). Among them, the edge set E and the weight matrix W nn required to construct the weighted bipartite graph G 2 need to be traversed over n task nodes and n server nodes in the computation, so the time complexity is O(n 2 ). When computing the perfect matching M of the weighted complete bipartite graph G 3 , since the perfect matching M has n Algorithm 1 Dynamic Load Balancing Algorithm Based on Optimal Matching of Weighted Bipartite Graph Input: Task request queue set J, server cluster set S, and server initial load indicator set V Output: The optimal matching H (1) while (J is not empty) do; // There are tasks pending assignment in the task pool (2) establish task set J n and server set S n ; // Take out an equal number of n tasks as the server (3) compute R according to Equation (5); (4) compute T k according to Equation (6); (5) establish G 1 with J n and S n ; // Construct edgeless bipartite graph G 1 (6) for (k, i = 1 : n) do (7) compute t ki according to Equation (7); (8) establish E by comparing T k and t ki ; // Construct the edge set E (9) compute W nn according to Equation (8); // Compute the edge weight matrix W nn (10) end for (11) establish G 2 with J n , S n , E and W nn ; // Construct the bipartite graph G 2 (12) establish G 3 by adding edges with zero-weight to G 2 ; // Construct the weighted complete bipartite graph G 3 (13) find equal subgraph K L ; (14) while do (15) find the perfect matching M of K L by using the Hungarian algorithm; (16) if (M is non-existent) then (17) find a new K L to replace K L ; (18) continue; // Find the perfect matching M again (19) else (20) end while (21) end if (22) get the optimal matching H by deleting edges with zero-weight from perfect matching M; (23)

V. SIMULATION
According to the model framework of the proposed algorithm, a test example of server cluster system is established for testing the performance of the proposed algorithm. The test system consists of a task pool, a management server and a server cluster composed of four servers. The program of management server and server cluster is written through the Java. There are task queues of a certain length in the task pool, and the complexity of each task obeys a uniform distribution in the interval (1, 10). The task pool will send the task stream to the management server in real time. The management server program specifically contains the task volume estimation module, the load indicator receiving module, the task assignment module and the core of the proposed algorithm. The server cluster program contains the task processing module, the load indicator calculation module and the load indicator feedback module. The configuration parameters of each server as shown in Table 2. The proposed algorithm calculates the expected task completion time by comparing it with the test task complexity. In practice, the processing time of a task is not simply proportional to the task complexity. Therefore, the empirical value β is set. The value of β is related to the determination of the edge set E in the bipartite graph, which represents whether the server S i can process the task J k . By adjusting the value of β, more matching results can be obtained, and the optimal matching results can be found from them. It directly affects the assignment of tasks and the load balancing effect of the proposed algorithm. Therefore, based on the test example established in this paper, the proposed algorithm is tuned by varying the value of the empirical value β. During the experiment, the management server completes the assignment of tasks to servers by sending a task queue of 500, 1000 and 1500 tasks to the management server. The load indicator of each server in the server cluster is recorded every 5 seconds after running for 20 seconds, calculate the variance of the load indicator of the server cluster, and record the time consuming to complete all tasks at the same time.
In this way, we compare the influence of different values of β on the running time and load balancing effect of the proposed algorithm. For the accuracy of the experimental results, each experiment was repeated three times, and the average value of the three experiments was taken to represent the final experimental results. The results of experiment 1 are shown in Figure 9-11 and Table 3.
According to the results of experiment 1, when the system completes the assignment and processing of 500 tasks, the variance of the load indicator of β = 1.5 is lower than the  case of other values in most of the time. When the system completes the assignment and processing of 1000 tasks, the four cases fluctuate substantially in the early stage, and in the middle and late stage the variance of the load indicator with β = 1.5 is mostly lower than that of the other values. When the system completes the assignment and processing of 1500 tasks, the variance of the load indicator with β = 1.5 is relatively lower than that of the other values in the first stage when the fluctuations of all four cases are large. In the middle and late stage, the variance of the load indicator with β = 1.5 is lower than that of the other values in most of the time, and the variance is relatively stable in the lower area. According to Table 3, in terms of the assignment and processing time to complete different number of tasks, the time consuming for β = 1.5 is lower than the other cases. In summary, when β = 1.5, the proposed algorithm can ensure relatively stable server cluster load and less time consuming to complete the task. Therefore, the value of β is 1.5 in the experiments after this paper.

B. EXPERIMENTAL 2: COMPARISON OF LOAD BALANCING EFFECTS OF DIFFERENT ALGORITHMS
In the test example of the server cluster system, the load balancing effect of the proposed algorithm is compared with VOLUME 10, 2022  that of other load balancing algorithms. During the experiment, the management server completes the assignment of tasks to servers by sending a task queue of 500, 1000 and 1500 tasks to the management server. The load indicator of each server in the server cluster is recorded every 5 seconds after running for 20 seconds, calculate the variance of the load indicator of the server cluster, and record the time consuming to complete all tasks at the same time. We select the Maximum Matching of Bipartite Graph algorithm (BGM), the Least Connections algorithm (LCS), and the Weighted Round-Robin algorithm (WRR) for comparison experiments with the proposed algorithm (WBGM). In order to evaluate the additional time complexity imposed on the system by introducing the proposed algorithm, we also set up a control check group without using any algorithm (CK). For the accuracy of the experimental results, each experiment was repeated three times, and the average value of the three experiments was taken to represent the final experimental results. The results of experiment 2 are shown in Figure 12-14 and Table 4.
According to the results of Experiment 2, when the system completes the assignment and processing of 500 tasks, the variance of the load indicator of proposed algorithm is lower than other algorithms most of the time. When completing the allocation and processing of 1000 and 1500 tasks, the fluctuations of all four algorithms in the early stage are large. In the middle and late stage, the proposed algorithm can quickly stabilize the variance of the load indicator in the lower area, and it is lower than the other algorithms most of the time. When the number of tasks gradually increases, the variance of the load indicator of the proposed algorithm   gradually decreases. The variance of the load indicator of the CK group fluctuates greatly because no algorithm is used. According to Table 4, the proposed algorithm consumes less time in most cases. Compared with the BGM algorithm, we increase the consideration of matching the task volume with the server load, so the complexity of the proposed algorithm is slightly higher. However, the time consuming of the proposed algorithm is lower than it, which also proves that the matching degree of tasks and servers considered in the proposed algorithm can improve the server efficiency. The LCS algorithm consumes less time in some cases due to its lower algorithm complexity and is comparable to the proposed algorithm in terms of overall time consumption, but its balancing effect is worse than the proposed algorithm. Since the WRR algorithm assigns tasks to the server for processing based on fixed weights, it lacks consideration of the real-time load on the server, so it consumes more time than the rest of the algorithms. Experiments with a control check group show that the introduction of the proposed algorithm does not add too much time complexity to the system. When the number of tasks increases, this algorithm can even offset the additional time complexity introduced and reduce the task processing time. In summary, compared with other algorithms, the proposed algorithm can obtain better load balancing effect on the premise of ensuring server processing efficiency, especially when the number of concurrent tasks gradually increases.

VI. CONCLUSION
In this paper, we propose a dynamic load balancing algorithm based on the optimal matching of weighted bipartite graph to solve problems in server cluster system. This is mainly due to the unreasonable assignment of tasks caused by not fully considering the performance differences among servers, which leads to problems of prolonged task completion processing time, decreased cluster load efficiency and load skewing. By calculating the matching degree of each task volume and each server load capacity, the weighted bipartite graph with task nodes and server nodes as vertices is constructed by fully considering the differentiation of task volume and server load capacity. The Kuhn-Munkres algorithm is used to find the optimal matching of the weighted bipartite graph, and the task assignment module achieves the optimal assignment of tasks to servers according to the optimal matching. The simulation results show that, compared with the BGM, LCS and the WRR algorithms, the proposed algorithm can improve the overall performance of the server cluster and better balance the load among the servers while ensuring the task processing efficiency. When the number of concurrent tasks is gradually increased, the proposed algorithm can achieve better load balancing effect. However, as the number of servers in the cluster increases, the efficiency of the proposed algorithm decrease; thus, improving the Kuhn-Munkres algorithm to reduce the time complexity is an important optimization direction for the proposed algorithm. For the overload problem generated by a server during task processing, the proposed algorithm dynamically adjusts the weights using real-time feedback load indicators, reduces the complexity of tasks assigned to that server or stops sending tasks to that server to solve the problem. This problem can also be solved using the load migration technology, which is our next research direction.