Server Placement and Task Allocation for Load Balancing in Edge-Computing Networks

Offloading tasks to cloud servers has increasingly been used to provide terminal users with powerful computation capabilities for a variety of services. Recently, edge computing, which offloads tasks from user devices to nearby edge servers, has been exploited to avoid the long latency associated with cloud computing. However, edge server placement and task allocation strongly affect the offloading process and the quality of a user’s experience. Therefore, appropriately deploying the edge servers within a network and evenly allocating the workload to the servers are vital. This paper thus considers both the workload of edge servers and the distances involved in offloading tasks to these servers. To improve the user experience, edge server locations are carefully selected and the workload for the servers are allocated in a balanced manner. This scenario is formulated as a mixed-integer linear programming problem, and a novel solution that searches for the best server placement using simulated annealing while integrating task allocation using the Lagrangian duality theory with the sub-gradient method is proposed. Numerical simulations verify that the proposed algorithm can achieve better results than conventional heuristics.

servers may still result in longer computational latency when offloading tasks.
Some previous studies have considered load balancing by monitoring certain utilization metrics for load generated by server-side computation [4], [5], but rarely considered communication distance as the main factor impacting workload balancing. The transmission time is usually determined by the data size and data transmission rate [6], [7], with longer transmission distances also creating greater loads within the network. Traditional models for load balancing usually consider only the computational load on the servers and do not consider the transmission load due to distance.
In this paper, a task-offloading paradigm is considered. A number of edge servers are deployed with edge nodes within a network. Offloading requests from other edge nodes can be allocated to the edge servers in order to fulfill the demand for computational power. Edge server placement and task allocation are tailored by considering both the workload balance among the edge servers and the transmission distance for the offloaded tasks. Balancing the workload among the edge servers and reducing the transmission distance for offloaded tasks is equivalent to reducing the computational and communication latency, respectively. This is formulated as a mixed-integer linear programming problem. The Edge Server Placement (ESP) Algorithm is thus proposed as an efficient strategy to determine a better solution for edge server placement and task allocation. The ESP algorithm is based on simulated annealing and considers both edge server placement and task allocation. Simulated annealing is used to search the enormous solution space for edge server locations, while task allocation is resolved for the selected edge server locations using the Lagrangian duality theory [8] and the sub-gradient method. Simulations are conducted to evaluate the performance of the proposed scheme, with the results indicating that the ESP algorithm exhibits great flexibility in balancing the load among the edge servers and the transmission distance.
The main contributions of this paper are as follows: • This paper proposes a load balancing model that considers both the computational load on the edge servers and the transmission distance for task offloading in edge-computing networks.
• An efficient and effective scheme is developed to provide better solution for both task allocation and server placement.
• The Lagrangian duality theory is integrated with the simulated annealing process to improve the search efficiency The rest of this paper is organized as follows. Section II briefly reviews related work. Section III presents the problem formulation and solution in detail. Section IV describes the simulation experiments and evaluates the performance. Finally, Section V concludes the paper.

II. RELATED WORK
Task allocation and server placement problems have been intensively studied in recent decades. Both problems are often associated with optimization objectives and constraints that make them non-trivial and very challenging.
Some research has assumed that servers are already deployed and focus on allocating the clients to servers with fixed locations. For example, Ye et al. [9] studied user association policies in heterogeneous networks (HetNets) for load balancing and presented an efficient distribution algorithm that obtained a near-optimal solution. In [10], Athanasiou et al. modeled a client association problem that assigned a client to access points in a 60-GHz wireless access network. However, these studies did not consider the transmission distance between the clients and servers. Heuristic algorithms are often used for discovering the effective association between clients and servers.
The studies reported in [11], [12], and [13] employed genetic algorithms for server placement and task allocation, while Lim and Lee [11] aimed to find an effective strategy based on graph coloring to offload tasks to edge servers to balance the load. Tang and Pan [12] focused on the energy consumption in the communication network of a data center. They proposed a hybrid genetic algorithm that improved performance and efficiency by optimizing the placement of virtual machines. Xu et al. [13] presented a computational model based on vehicle-to-all communication (V2X) in edge computing. A genetic algorithm-based method was proposed as a balanced offloading strategy.
Game theory has also been widely adopted in task allocation for mobile-edge cloud computing [14], [15], [16], [17]. With distributed multiple users, every user is modeled as a game player, with each can independently determining their own offloading strategies. In [14], Chen presented a computation offloading model in which the tasks are assumed to be either processed locally or offloaded entirely to a single cloud server. The problem was formulated as a decentralized computation offloading game that promised to achieve a Nash equilibrium. Based on [14], Chen [15] further considered the problem of deciding whether to forward a user's tasks to a single remote cloud server with a single access point in each round of the game. In contrast, instead of considering a single access point, Ma et al. [16] took multiple access points into account. In addition, in [17], a multi-user offloading problem was formulated as a stochastic game, and a stochastic learning algorithm was proposed in order to reach a Nash equilibrium.
Other researchers have proposed the task-offloading schemes based on machine-learning techniques in edge computing. Li et al. [18] and Shuja et al. [19] surveyed solutions based on machine learning for caching in an edge network. These approaches trained the model to offload computationally-intense applications to a specified edge server. However, the offloading model could be complex, and a huge amount of data was required to train the model.
In term of server placement, previous studies have mostly concentrated on searching for candidate server positions as a cluster head to reduce the response time. Recently, studies have investigated the K -edge server placement (k-ESP) problem [20], [21]. k-ESP primarily focuses on minimizing the number of edge servers to cover the entire Internet while satisfying budget constraints. Zeng et al. [20] presented a greedy algorithm to determine the fewest number of servers required in wireless metropolitan area networks. The proposed method iteratively selected as many nodes as possible to maximize the edge servers' coverage. In [21], Yin et al. provided a dynamic, resource-provisioning framework to obtain feasible edge server locations according to the workload and users' proximity. However, they did not consider load balancing. Though the requirement for latency constraints was met, their proposal may cause overloading.
Li and Wang [22] proposed an energy-aware edge server placement model to reduce the energy consumption and computing resource utilization. A discrete particle swarm optimization algorithm was proposed for both server placement and task allocation. In [23], Xu et al. proposed a model to minimize a multi-objective problem for social media services within the Cognitive Internet of Vehicles. An integrated genetic algorithm was adopted to improve the quality of the services. Guo et al. [24] described a multi-objective optimization problem to minimize the communication delay with load balancing between devices. Although the above studies aim to optimize multiple objectives, the weights for each term in the objective function are usually selected intuitively and, thus, can dramatically change the final results. In contrast, the model proposed in this paper seamlessly integrates the cost of computation and communication and provides a more meaningful control between the loads of computation and communication.

III. TASK ALLOCATION AND SERVER PLACEMENT
In this section, the system model for the server placement and task allocation problem is illustrated. An integer programming problem is then formulated based on the system model.

A. SYSTEM MODEL
Consider a network composed of nodes and links, as shown in Fig. 1. The network topology can be considered an denotes the set of nodes that could be base stations or access points, and E is the set of links between the nodes. The nodes provide access to the network for mobile users. To facilitate task-offloading operations, a number of edge servers will be deployed on some of the nodes. Users' devices can attach to the nodes in their vicinity. A variety of computationally intensive and delay-sensitive tasks from user devices can be offloaded to the edge servers through the nodes to which the user is currently attached. For simplicity, each node allocates a non-dividable task entirely to a solitary node equipped with an edge server. The problem in which nodes can split tasks between different edge servers can be modeled in a similar way. In addition, every node can allocate its tasks to any node with an edge server in the topology.

B. PROBLEM FORMULATION
Assume that there are K edge servers deployed in the network to process the received tasks. In this system, each node i can be allocated to node j deployed with an edge server. The access delay between node i and j can be indicated by the number of hops as follows: where hop ij is the minimum number of hops between node i and node j, and α is the weight factor for distance. The distance is at least 1 because tasks from a mobile device are initially offloaded to the nearest base station i. Let λ i be the offloading task request rate originating from node i. The total workload is weighted by the distance. The largest workload among all edge servers can be expressed as where y ij is a binary variable that represents the allocation decision. In particular, for all i, j ∈ V , , if the task originated from node i is allocated to edge server at node j 0, otherwise.
(3) Note that the workload is weighted by the distance from the node to the edge server. Let x i be another binary decision variable that indicates whether node i is deployed with an edge server and can be defined as follows: The goal of the problem is to minimize the largest workload for edge server from among the set of all edge servers. Specifically, the server placement and task allocation problem can be formally expressed as follows: Constraint (6) guarantees that the workload for all edge servers is less than or equal to η, which is the largest workload among the edge servers. Constraint (7) guarantees that tasks originating from a particular node will be allocated to a single edge server. The total number of edge servers is limited to K as in constraint (8). Constraint (9) ensures that nodes must be allocated to a node with an edge server. Instead of relying on exponentially complex global methods, this paper proposes an efficient approach that is feasible and better than simple heuristics.
The proposed solution can be separated into two phases: (1) the location of the edge servers is selected, and (2) the tasks from each node are allocated to the edge servers. To simplify the problem, the task allocation is resolved under the assumption that the edge server locations have already been selected. The edge server locations are then selected by integrating the task-allocation scheme.

C. TASK ALLOCATION PROBLEM
Assume that K -edge servers have already been deployed in the network. Let X be the set of nodes for which edge servers are deployed. Because the edge server locations have already been selected, the problem can be simplified as follows: This is a combinatorial problem. In the worst case, the complexity of conventional algorithms in solving this problem would grow exponentially with the increase of the topology size.
The task allocation problem differs from the general assignment problem in that the objective is to reduce the largest load among the edge servers rather than the sum of the assignment costs. The Lagrangian duality theory applied for the association problem in [10] is referred to solve the problem. For completeness, the derivation is briefly described as follows. The Lagrangian duality theory aims to solve the original objective function by finding the solution for a dual function that is derived from the original function. The transformed dual function is usually easier to solve, and the obtained solution can place a bound on the primal function. First, denote u = (u j ) j∈X as the vector of the Lagrange multipliers to dualize constraint (13) in problem (12). The partial Lagrangian can be formed as To simplify the notation, let Y be the set of all possible solutions for the allocation vectors y according to the constraints in Eqs. (14) and (15). Y can further be denoted as a Cartesian product for the set of where Y i is given by Furthermore, the Lagrange dual function g(u) can be obtained by minimizing the partial Lagrangian in (16) with the input including η, y and u as follows: = inf y∈Y i∈V j∈X d ij λ i u j y ij , = i∈V inf y i ∈Y i j∈X d ij λ i u j y ij , In Eq. (16), η(1 − j∈X u j ) should be zero or the value of this equation would be infinity. Therefore, j∈X u j = 1 is needed to prevent the objective function from going to infinity. In particular, the constraints for task allocation in Eqs. (14) and (15) are implied in Eqs. (17) and (18). The optimal value of g i (u) in problem (22) can be obtained from Finally, the Lagrange dual problem can be formulated as The load-balancing problem in (12) is converted to the Lagrange dual problem which is now a convex problem.
According to the properties of the Lagrangian duality theory, if the primal problem is convex, the optimal value of the primal problem and the corresponding dual problem is the same. In contrast, if the primal problem is non-convex, there would be a duality gap between the optimal value of the two problems. Though a gap exists, a feasible solution approximating the optimal solution for the primal problem can still be obtained by solving the dual problem.
The objective g(u) in problem (25) is a non-differentiable function. Therefore, instead of using gradient-based algorithms, a sub-gradient method [25] is used to solve this problem.
In problem (23), although it is combinatorial, the solution can be obtained trivially as follows: The solution y * ij determines whether the task requests of node i are allocated to server j.
The sub-gradient method is used for the problem in (25). Let s = (s j ) j∈X denote the sub-gradient of −g at a feasible u, where s j can be obtained as follows: where y * ij is the solution obtained from Eq. (28) for problem (23). The iterative projected sub-gradient method can be formulated as where k is the index of the iteration in the projected sub-gradient method, and P is the function of Euclidean projection that can project a value onto the unit simplex VOLUME 9, 2021 = (u| j∈V u j = 1, u j ≥ 0) [26]. β k is the step size at the kth iteration. In this work, β k is set as β k = β / k, where β is a constant greater than 0. In the beginning, the value u j is given randomly; u is optimized with the iteration of the projected sub-gradient method, and the dual problem (25) can be solved gradually.

Algorithm 1: Task Allocation Algorithm
Input: K -edge servers' positions X Output: Task allocation result y * 1 Given u j randomly with j∈X u j = 1 2 Set sub-gradient iteration number k = 1 3 Temporary allocation result setỹ = ∅ 4 for k ← 1 to ϕ do 5 Step 1 Determine task allocation y ij by Eq. (28)

6
Step 2 Store the allocation result intoỹ 7 Step 3 Compute s j of each edge server j by Eq. (29) 8 Step 4 Update each u j by projected sub-gradient method in Eq. (30) as u k+1 9 compute cost of primal problem (12) with each allocation result store inỹ 10 y * = the allocation with the lowest cost by problem (12)  inỹ However, as mentioned previously, the primal problem (12) is non-convex, so a feasible solution for the primal problem cannot be obtained from the dual problem directly. In our problem, let ϕ be the number of iterations running for the projected sub-gradient method. The most feasible primal solution for task allocation is taken as the best dual solution from the ϕ iterations in Algorithm 1. The complete process for the allocation algorithm is presented in Algorithm 1.

D. THE PROPOSED SCHEME
In the next phase, the edge server locations are selected. The task allocation scheme presented above is integrated with the server placement. Selecting locations for the edge servers is important because different edge server locations may lead to different workloads, which are weighted with the distance between the servers and the allocated nodes. Given the increasing size of the toplogy, it has become very challenging to optimally deploy edge servers due to the exponential increase in the number of combinations in task allocation and server location selection. To deal with this large combinatorial optimization problem, simulated annealing is adopted in the search for the global optimal solution for edge server placement and task allocation. The proposed algorithm is listed in Algorithm 2.
Initially, the K -edge servers are deployed at the nodes with the highest request rates. There are several parts in the simulated annealing process. First, the configuration state of the system has to be defined. Server placement S * and task allocation y * are the configurations of the architecture. S * is

Algorithm 2: Edge Server Placement Algorithm
Input: A network topology G = (V , E) Output: Edge server location set S * Task allocation result y * 1 T = T 0 2 η * = inf, η = 0 3 S ← the K nodes with the largest request rates 4 while T >T f do 5 Obtain y by Algorithm 1 for the edge servers in S 6 Compute the largest edge server workload η a set of nodes that are chosen to place edge servers, and y * is the allocation results obtained from Algorithm 1 according to the specified edge server locations. Second, the acceptance probability for the generated configurations needs to be calculated. While the generation mechanism provides candidate configurations, the probability decides whether worse configurations are selected as the next state or not.
In the proposed algorithm, if the new configuration exhibits a better objective value, it is accepted directly. If not, it may still be accepted based on this probability, which depends on Boltzmann's function, formulated as where f is the difference between the current and previous values of the objective function at the iteration, and T is the current temperature. The purpose for accepting worse configurations is to provide chance to jump out of local minima which may obtain better results. Third, a generation mechanism for new configurations is required. To search efficiently in the large combinatorial problem, simulated annealing explores different configurations to obtain better results. At every iteration, the configurations are updated and compared, allowing the better configuration to be determined. In the proposed algorithm, at each iteration, the configuration is updated as follows. One of the edge servers including the group of nodes that are allocated to the edge server is selected. Then, in the cluster, the node which will has the least workload

Algorithm 3: Edge Server Updating Algorithm
Input: Edge server location set S Task allocation result y Output: New edge server location set S 1 Randomly select an s from if performing as the edge server is chosen as the new edge server. Algorithm 3 lists the sever updating process. Finally, the nodes in the network are then reallocated according to the new set of servers in the next iteration. The algorithm stops when the temperature is lower than the set threshold.

E. TIME COMPLEXITY
In the ESP algorithm, time is primarily spent on the allocation algorithm. At each iteration, server placement and task allocation are updated, with the complexity of updating the server in Algorithm 3 is about O(N 2 ). In Algorithm 1, the calculation time is primarily consumed in Step 1, with the time complexity for determining the allocation being O(NK ). The total time complexity for Algorithm 1 is O(ϕ(NK )), where ϕ is the number of iterations for the sub-gradient method. The computation and comparison of the costs can be calculated in constant time. Let r be the number of moves executed in simulated annealing. The total time complexity is O((N 2 + ϕ(NK )) * r) which is executed in polynomial time.

IV. SIMULATIONS
In this section, the proposed algorithm is verified and evaluated using topologies of various sizes. The experimental results are discussed to determine the effectiveness of the proposed algorithm in terms of the placement and allocation problem.

A. ENVIRONMENTAL SETUP
In the experiments, 100 different topologies are generated with different numbers of links ranging from 1.1 × N to 2.0 × N , where N is the number of nodes. The offloading task request rate from the nodes are randomly selected, but the sum of the injected task request rates is controlled by the size of the topology, i.e., the number of nodes N . In the simulations, the total injected task rate is set to 20 × N . For the parameters used in the Algorithms, the total iteration number ϕ for sub-gradient is set to 100. The initial and final temperatures are 300 and 10, respectively. The cooling rate γ is set to 0.9. In addition, the number of hops between nodes is obtained using the Dijkstra algorithm. The value for every data point in the figures is the average over the 100 topologies. The performance of the proposed scheme is compared to classic K-medoids clustering [27] and a greedy algorithm. The greedy algorithm deploys the edge server one by one. In each iteration, it places an edge server at the node with the highest request rate among the remaining nodes that have not yet been allocated to an edge server. The nodes in its vicinity are allocated to that edge server starting with the closest node until the sum of the allocated task request rate reaches the average, i.e., i∈V λ i / K . This process repeats until K edge servers are deployed and all of the nodes are allocated. Fig. 2 presents the results for the largest server workload against the number of nodes in the topology. Twenty edge servers are deployed in the topology. Intuitively, the average objective value (η) increases with an increase in the number of nodes. The greedy approach is better than K-medoids because its task allocation strategy focuses on balancing the weighted load, which considers the task request rates and the distance from the nodes to the edge servers. In contrast, K-medoids only considers the distance between the nodes and the edge servers. The larger the environment, the stronger the impact of this load balancing mechanism. The proposed ESP algorithm always exhibits a lower value than the greedy and K-medoids algorithms and has the lowest increase rate, indicating that its performance will improve for larger topologies.

B. SIMULATION RESULTS
In Fig. 3, N is fixed at 200 and K is varied from 10 to 30 to observe the impact of the number of servers on the average objective value. With an increase in the number of edge servers, each node has a higher chance of finding a closer edge server, thus lowering the total workload. The results reveal that the optimization mechanism used by the proposed method is much better than the greedy and K-medoids algorithms. Fig. 4 investigates the impact of the number of links in the topology. N is fixed at 200 and K is fixed at 20. The proposed ESP produces a better performance than the other approaches. With a rise in the number of   links, the average objective value steadily decreases. This is because the nodes can find shorter paths to servers when there are more links. However, once a certain number of links has been reached, the decline in η gradually slows down because a load balance status can be maintained. In Fig. 5, the convergence behavior of the proposed ESP algorithm is presented with N fixed at 200 and K fixed at 20. During the simulated annealing process, the largest workload among edge servers is reduced along the execution. The average objective value becomes stable after about 30 iterations.
In addition, the algorithms are also compared with the approximate solution obtained using CPLEX, an IBM tool for optimization problems. CPLEX is used as a comparison because its solution is considered to be close to the optimal.  The community edition of the optimizer is employed in the experiments, so the solver can only handle small topologies. The number of edge servers is fixed at K = 5, and the number of nodes N ranges from 11 to 20. Fig. 6 presents the average objective value for each scheme. The results of the proposed ESP algorithm are the closest to those from CPLEX, which implies that the results of ESP are much closer to the optimal than the other schemes.
The computational efficiency of each algorithm is also compared. Fig. 7 shows the average execution time according to the number of nodes. K is fixed at 5 and n ranges from 11 to 20. Although the ESP algorithm has a higher computation time than the greedy and K-medoids algorithms, it is still within an acceptable range because the average objective value it obtains is much better. For CPLEX, the average execution time grows exponentially with an increase in the number of nodes. The execution time of the proposed algorithm also grows as the nodes increase, but this increase is much slower than the CPLEX. Predictably, CPLEX is much more computationally expensive for large-scale topologies.
Experiments are also conducted to evaluate the impact of the distance factor α by the ESP algorithm. In Fig. 8, by adjusting the distance factor α, the average number of hops is evaluated for different numbers of nodes. K is fixed at 20. By increasing α, the average number of hops between the nodes and edge servers decreases gradually. Because when the weight of distance is higher, nodes tend to offload to closer edge servers to reduce the transmission load, which  results in a lower average number of hops. In addition, when n increases, the topology becomes larger. The number of hops also increases since the nodes are farther away from the edge servers.
In Fig. 9, the difference in the weighted load between servers is evaluated with different distance factor α. N is fixed at 200 and K is fixed at 20. In the figure, the upper bound of the vertical bars represents the largest server weighted load, while the lower bound represents the smallest, and the data point is the average. With an increase in α, the difference in the weighted load becomes larger. When α is large, the weight of the distance becomes higher and offloading tasks to more distant edge servers is prevented to reduce the load. However, this prevents nodes from being allocated to servers that would produce a better load balance and leads to a larger difference between the loads of edge servers. α can be adjusted based on specific scenarios; for example, if transmission costs are expensive, α could be set higher.
In Fig. 10, an example of the effect of the distance factor α is illustrated using the USNET topology [28], which contains 24 nodes and four edge servers are deployed. The ESP algorithm is used to solve the server placement and task allocation problem. For lower values of α, task allocation in each cluster is more widely dispersed. With an increase in α, each cluster becomes more centralized because the distance becomes more important, so nodes tend to offload their tasks to a closer edge server.

V. CONCLUSION
Edge server placement strongly affects the efficiency of task allocation. In this paper, a server placement and task allocation method for an edge-computing network is studied. We formulate the scenario as a mixed-integer linear programming problem and propose a simulated annealingbased edge server placement and task allocation algorithm. To evaluate the performance of the proposed algorithm, different topology sizes are considered, including real-world networks. The impact of the distance factor is also examined in detail. The results show that, by adopting the proposed ESP algorithm, the costs of the largest server, i.e., the objective value in this paper, can be effectively reduced while the runtime remains manageable.
In reality, the ability of each server to handle the workload may differ. Furthermore, mobile users may offload different types of task, and those tasks could be processed by different edge servers. Thus, a more advanced model with different types of task and different computing capacities for the edge servers should be considered in the future.