Load-Balancing Method for LEO Satellite Edge-Computing Networks Based on the Maximum Flow of Virtual Links

With the increasing number of satellites in orbit, traditional scheduling methods can no longer satisfy the increasing data demands of users. The timeliness of remote sensing images with large data volumes is poor in the backhaul process through low-earth-orbit (LEO) satellite networks. To address the above problems, we propose an edge-computing load-balancing method for LEO satellite networks based on the maximum flow of virtual links. First, the minimum rectangle composed of computing nodes is determined by the source and destination nodes of the transmission task under the configuration of the 2D-Torus topology of LEO satellite networks. Second, edge computing virtual links are established between computing nodes and users. Third, the Ford-Fulkerson algorithm is used to obtain the maximum flow of the topology with virtual links. Finally, a strategy is generated for computing and transmission resource allocation. The simulation results show that the proposed method can optimize the total capacity of the multi-node information backhaul in the remote sensing scenario of LEO satellite networks. The effectiveness of the proposed algorithm is verified in several special scenarios.

posed a game-theory-based approach to optimize compu-84 tational offloading in satellite edge computing networks. 85 Wang et al. [21] proposed a joint offloading and resource  The third is research on performance evaluation. Kim and 95 Choi [24] studied the propagation and queuing delay per-96 formance of satellite edge-computing networks under the 97 uplink/downlink packet error rate. Existing methods are 98 mainly based on mixed-integer programming, which has 99 high time-complexity. Satellite networks with high mobility 100 are different from terrestrial networks. Satellites are in the 101 process of periodic high-speed motion and must be solved 102 quickly. 103 In this paper, we study an edge computing load-balancing 104 method for LEO-satellite-network backhaul tasks, which has 105 low time complexity and engineering achievability. The main 106 contributions of this study are summarized as follows: 107 • We designed an LEO satellite networks edge-computing 108 architecture that combines the optimization of transmis-109 sion and computing. The architecture models the rela-110 tionship between transmission and computing resources. 111 • We proposed a 2D-Torus network minimum rectangle 112 computing node selection method. The method selects 113 the calculation offload of sensing information back to 114 the ground station.

115
• We proposed a computational load-balancing algorithm 116 based on the maximum flow of virtual links. The algo-117 rithm determines the size of data processed by each 118 routing node. 119 The reminder of this paper is organized as follows. 120 In Section II, the application scenario, network model, 121 transmission model, and calculation model are presented. 122 In Section III, a problem model that needed to be optimized 123 was formulated. An edge-computing load-balancing method 124 based on the maximum flow of virtual links is proposed. Sim-125 ulation results and discussions are provided in Section IV. 126 Finally, Section V concludes the paper.

128
For the aforementioned scenario description, a real-time 129 information acquisition and transmission LEO constellation 130 with Earth observation, onboard processing and routing is 131 modeled as follows.

132
A. CONSTELLATION SCENARIO 133 We consider an application scenario in which the Earth obser-134 vation satellite obtains image information and transmits the 135 data back to the ground station through LEO satellite net-136 works. This scenario is illustrated in FIGURE 1.

137
The space segment consists of a single-layer Walker con-138 stellation. The constellation configuration is Walker-Delta, 139 where the number of orbital planes is M p , and the number 140 of satellites per orbit is M s . It has a relatively stable topology. 141 The main function of the system is to monitor global disaster. 142 After the detection information is generated by the Earth 143 observation satellite, transmission and computing resources 144 are called within a predetermined time window so that the 145 detection information is processed and transmitted to a lim-146 ited area in real time.

147
The Walker-Delta constellation configuration is repre-148 sented by adjacency matrices A Sat , where the element 149 VOLUME 10, 2022 a Sat i,j ∈ A Sat is the capacity of the crosslink from node i to node j at the instant t, which can be expressed as

152
where v i , v j represents the crosslink where the first item called a task. The ratio between different tasks is called the 173 weight. It is assumed that all tasks originate from the set of 174 sending nodes V T , and the task eventually flows to the set of 175 receiving nodes in a limited area. This forms the task weight 176 where β i,j ∈ B represents the proportion of traffic sent by 179 the observation node v i ∈ V US to the ground node v j ∈ V UD . 180 The i-th row represents all tasks sent by v i . The j-th column 181 represents all tasks received by v j , satisfying: This indicates the N US number of the transmission satel-184 lites that simultaneously access the observation task at the 185 same time. Binary k = 1 indicates that the node has been 186 accessed by the observation task.

188
We consider that the processing of observation information 189 mainly involves preprocessing a large number of Earth obser-190 vation images. Data processing can reduce the size of back-191 haul data by extracting feature information from the data. 192 For the information received by a single satellite node S i , 193 D i represents the size of the original data and F i represents 194 the size of the processed data. If the satellite S i performs 195 edge computing, we define the calculation transfer ratio ρ i = 196 (D i − F i ) F i . At the same time, the decision variable is 197 defined as the selected calculation mode. l i = 1 indicates that 198 edge computing processing is performed on the data, whereas 199 l i = 0 represents no calculation processing. The data size 200 of the information generated after the original information 201 passes through the satellite S i is The processing time at the satellite S i is In this study, it is assumed that the set of all low-orbit satellites 207 is S and the set of ground stations is G. The set of all nodes 208 in the network is called A ∈ N × N .
where N = N S + N G . The matrix A S ∈ N S × N S represents 211 the crosslink connectivity matrix of LEO satellite networks. 212 A S (i, j) = 1 indicates that there is a connected crosslink 213 between satellite i and satellite j, Similarly, A R ∈ N S × N G and A T ∈ N G × N S represent the 215 connected downlink between the satellite and ground station, 216 and A G ∈ N G × N G represents the connection relationship 217 between the ground stations.

218
The channel capacity is the maximum data rate for reliable 219 transmission. The power and bandwidth-limited Gaussian 220 channel capacity is given by 223 C l limits the maximum data rate R i of information trans-224 mitted over the channel. Then, the communication delay can 225 be defined as where T comm i is the propagation time between the i-th node 228 and the (i + 1)-th node.  2) The size of the data transmitted during the task is less 252 than or equal to the size of the data available on the satellite 253 at the instant of the task.

III. PROBLEM FORMULATION AND PROPOSED
The problem is NP-hard. Optimization 9 indicates that 270 the optimization goal is to maximize the capacity of the 271 information backhaul per unit time. Constraint 10 indicates 272 that the data entering the node are conserved with the data 273 processed by the node and the data flowing out of it. Con-274 straint 11 indicates that the data transmitted or received in 275 a single task are less than or equal to the data generated by 276 the task. Constraint 12 indicates that when the task data are 277 backhauled, the transmission decision variable I a is set to 278 one. Constraint 13 limits the maximum data capacity that 279 can be transmitted per unit time in a single link. Constraint 280 14 limits the data capacity transmitted on a single link for 281 a single observation task. Constraint 15 indicates that the 282 difference between the computing resource occupancy of any 283 two nodes participating in the calculation cannot exceed the 284 constraint C 0 . We propose a method for selecting routing nodes. First,

302
LEO satellite networks topology is generated according to 303 the constellation position and adjacency relationship per unit 304 time. Second, the source and destination nodes of the task 305 are determined. A minimum routing rectangle is generated. 306 If the minimum rectangle does not exist, the routing neighbor-307 hood is adopted to generate the extended minimum rectangle. 308 Finally, all nodes in the minimum rectangle are selected as 309 the path nodes for the information backhaul. The specific 310 algorithm is shown in Algorithm 1.

Algorithm 1 Multiple Shortest-Path Nodes Selection Algorithm
Input: source node position P sn , destination node P dn , network topology T L Output: set of selected nodes N r Begin 1 Calculate the network topology T L 2 Bring in the source node position P sn , destination node P dn 3 Find the shortest path R sp of P sn and P dn on T L 4 if R sp is a line segment do 5 Find the extended minimum rectangle R sd of P sn and P dn according to Definition 2 6 else do 7 Find the minimum rectangle R sd of P sn and P dn according to Definition 1 8 end if 9 Output the set of selected nodes N r in R sd 10 End

312
After selecting the routing nodes for the information 313 backhaul, it is necessary to allocate the computing and trans-314 mission resources of each node according to the observation 315 tasks and resource occupancy. We propose a resource allo-316 cation method based on the maximum flow of virtual links. 317 First, according to the computing nodes and node adjacencies 318 selected by Algorithm 1, a routing topology of the source 319 and destination node is generated. Second, according to the 320 computing resource occupancy of each node, a virtual link 321 between each node and the user is established. The routing 322 topology is updated. Third, all routing nodes traverse to the 323 full-load state in equal proportions using the available com-324 puting resources of each node as the independent variable. 325 A maximum flow search is performed to obtain the maximum 326 capacity of the network topology that satisfies the constraints. 327 Finally, the flow result is output as the allocation strategy 328 for transmission and computing resources. The specific algo-329 rithm is shown in Algorithm 2. 330 The solution of the maximum flow from the source node 331 to the destination node is based on the Ford-Fulkerson algo-332 rithm. The Ford-Fulkerson algorithm aims to find an aug-333 mented path to increase the flow. It determines the path with 334 positive tolerance that can reach the source node. In this 335

Algorithm 2 Resources Allocation Algorithm Based on the Maximum Flow of Virtual Links
Input: set of selected nodes N r , task weight matrix B, minimum rectangle R sd , the computing difference constraint C 0 Output: information backhaul throughput C d , resources allocation strategy Begin 1 for each node C p = 1: floor f CPU z do 2 Find occupied computing resource R u in N r , calculate processing rate Establish the virtual link between the node and the user, the link capacity is R p 9 Add the virtual link to the minimum rectangle R sd 10 Update the topology R sd 11 Calculate the maximum flow of the topology It should be emphasized that the above process of selecting 338 and allocating computing resources is only for a snapshot. Determine the computing difference constraint C 0 6 Find occupied computing resource R u 7 Calculate processing rate R p 8 Calculate C d and using Algorithm 2 end for 10 End The parameters of the simulation are set as fol-356 lows. Walker-Delta LEO satellite networks composed of 357 220 satellites are used in the simulation, with a total of 358 20 orbital planes. Each orbital plane has 11 satellites. The 359 orbital height is H = 1000km. The orbital inclination angle 360 is 60 • . The sampling interval of the simulation snapshot is 5s. In this section, the performance of the algorithm is 373 characterized using three metrics. They are the backhaul 374 throughput, delay of information backhaul, and average CPU 375 occupancy rate. The strategy given by the algorithm in this 376 paper is compared with the always-transmission strategy 377 and the always-computing strategy. The always-transmission 378 strategy involves transmitting all the data back to the user 379 through LEO satellite networks. The always-computing strat-380 egy involves sending the processed feature information of 381 all data to the user. There is no difference in the time com-382 plexity of the three methods. In the simulation, it is assumed 383 that all nodes are in an idle state. After selecting a fixed 384 source node, we select different destination nodes to verify 385 the performance of the algorithm under different numbers of 386 computing nodes. We select a 2D-Torus network topology 387 ranging from 2×2 to 5×6. The network is simulated as shown 388 VOLUME 10, 2022  node is far from the destination node in the topology, a large 407 number of optional routing nodes can meet the computational 408 requirements of the task. There is an intersection between 409 the always-transmission curve and always-computing curve. 410 In this case, the processing capability of the multi-node 411 computing network and the downlink of the last hop for 412 information backhaul have reached a dynamic balance.

FIGURE 8. Delay of information backhaul with different numbers of routing nodes.
We use the delay of information backhaul to characterize 414 the time consumption from information generation to the 415 user acquiring the information. FIGURE 8 shows results of 416 information backhaul delay for different numbers of routing 417 nodes. The x-axis represents the number of routing nodes 418 occupied by information backhaul. The x-axis represents 419 the delay of information backhaul. It can be seen that the 420 delay of information backhaul obtained by our strategy in 421 this study is better than the other two strategies in the 422 same scenario. Under the condition of a certain amount of 423 remote sensing image data, the delay of information back-424 haul is inversely proportional to the information backhaul 425 throughput. 426 We use the CPU average occupancy rate to represent the 427 computing resource occupancy of routing nodes in a single 428 task. FIGURE 9 shows the average CPU occupancy rate of 429 the routing nodes with different numbers of routing nodes. 430 The x-axis represents the number of routing nodes occupied 431 by the backhaul information. The y-axis represents the aver-432 age CPU occupancy rate. It can be seen that the always-433 transmission strategy only needs to perform packet routing 434 table lookup and forwarding. It requires almost no computing 435 resources. With an increase in the number of computing 436 nodes, the curve of our strategy in this study and the curve of 437 the always-computing strategy both have an inflection point 438 that decreases from the full load state. Because our strategy 439 balances the occupancy of the computing resources well, the 440 drop point appears earlier.  algorithm when the computing resources of some nodes are 478 occupied. FIGURE 11 shows the resource allocation strategy 479 for a multi-node information backhaul when the computing 480 resources of some nodes are occupied. The green nodes are 481 the source nodes where the tasks are initiated. The orange 482 node is the destination node for the information backhaul. 483 The red nodes represent nodes occupied by 30% of the com-484 puting resources. The purple nodes are those occupied by 485 50% of the computing resources. The blue links represent 486 crosslinks. The yellow link represents the downlink. The 487 arrow represents the transmission direction of information 488 flow. The value of R p on the node represents the data pro-489 cessing rate of the node per unit time. The value of R t C l 490 on the link represents the current transmission rate R t of 491 the link and the maximum available transmission capacity 492 C l of the link. It can be observed that the algorithm in this 493 VOLUME 10, 2022 study can quickly provide an optimal strategy under complex 494 constraints. 496 We study the load-balancing problem of transmission and