High-Fidelity Decision-Making and Simulation for Cooperative Autonomous Air Combat Considering the Effect of Flight Controller

In the past decade, how to improve the fidelity of maneuver gaming and the synergy of target allocation has become a key issue in cooperative autonomous air combat researches. To address the problem, this study proposes a maneuver decision-making algorithm based on an optimized dynamic Bayesian network and a target allocation decision-making algorithm based on an optimized hybrid particle swarm optimization. In maneuvering decision-making, the state transition’s reliability and the air combat’s autonomy are enhanced through considering the effect of sliding mode control. The Bayesian network is improved through introducing a strategy for dynamic prior probability updating. The computation is reduced and the efficiency is increased through pruning the minimax search tree according to visual prediction. In target allocation decision-making, the algorithm’s convergence speed is greatly accelerated and the solution’s global optimality is improved through introducing immigrant particles. The algorithm’s application scope is expanded through proposing a solution principle about unequal quantity combat situations. Furthermore, the end criterion is specially designed to fit real-world combats through introducing a fire control. The simulation results show that the designed decision-making algorithms are more effective in solving the problem of cooperative autonomous air combat, which indicates that the various improvements introduced in this study are reasonable and effective.


I. INTRODUCTION
At the beginning, the advantages of UAVs over manned aircraft give birth to the studies on autonomous air combat. Afterwards, the disadvantages and limitations of a single UAV in combat further promote researches on cooperative autonomous air combat (CAAC). Autonomous decisionmaking is the most vital procedure in CAAC. It is responsible for providing UAVs with offensive/defensive strategy selections based on real-time battlefield situations, so that these UAVs can overcome enemies or save themselves [1], [2], [3].
The associate editor coordinating the review of this manuscript and approving it for publication was Ming Xu .
Though great efforts have been devoted in researches on CAAC in recent decades, some flaws still exist in currently adopted autonomous decision-making methods that hinder CAAC from being large-scaled used in military [4]. Autonomous decision-making in air combat can be divided into two parts: maneuver decision-making and target allocation decision-making.
Maneuver decision-making is about UAVs' behaviors to plan specific flight instructions and track specific flight trajectories for certain combat purposes. Various algorithms have been applied. For example, differential game method can ensure an optimal decision result, but needs extremely accurate models that are hard to build [5]. Approximate dynamic programming method can alleviate the computation pressure, whereas the results tend to have insufficient generalization abilities [6]. Bayesian network (BN) enables machines to take correct decision-making operations by emulating the expert's actions [2], [3], and is appropriate for air combat at various distances from a theoretical point of view [4], whereas the accuracy of decisions highly depends on the the acquisition of empirical knowledge.
Target allocation decision-making refers to the matching between ourselves and enemies, and aims to achieve group optimization under a certain scale. Three kinds of method are in common use presently, including mathematical programming method, negotiation method and swarm intelligence method. Mathematical programming method, such as Hungarian algorithm [7], has the features of simple structure and easy implementation. Its essence is to traverse all feasible spaces, which leads to large-scale calculation and low operation speed. Negotiation method, such as distributed algorithm [8], has strong robustness but requires higher computing and communication abilities of UAVs themselves. Swarm intelligence method, such as genetic algorithm [9], particle swarm optimization algorithm [10], [11], and simulated annealing algorithm [12], has the advantages of high convergence speed and simple operation, but may fall into local optimization and cause the decreased accuracy of the solution attributed to the randomness of initial settings [13].
To solve some aforementioned flaws in those existed methods, we propose an optimized dynamic Bayesian network (DBN) method for maneuver decision-making and an optimized hybrid particle swarm optimization (HPSO) method for target allocation decision-making. Table 1 shows the main advantages of our work compared with related works. Firstly, in the optimized maneuver decision-making method: (1)To enhance the state transition's reliability and the air combat's autonomy, we propose a method of executing combat maneuvers according to sliding mode flight controller (SFC). (2) To improve the dynamic performance and decision-making ability of BN, we propose a method of updating the posterior probabilities as the combat going on. (3) To raise the computing efficiency and decision timeliness, we propose a method of pruning the minimax search tree based on visual prediction. Secondly, in the optimized target allocation decisionmaking method: (1) To accelerate the convergence speed and ensure the global optimum solution, we propose a method of introducing immigrant particles when the iteration falls into local optimization. (2) To reinforce the cooperation between UAVs and expand the application scope of algorithm, we propose an allocation strategy when the UAV numbers of two sides are unequal. Last but not the least, in the combat end criterion: To narrow the gap between simulations and real air combats, we propose a method of introducing a fire control system to determine the combat end. The simulation results prove that the designed CAAC system has higher fidelity and better decision-making capability. The system could provide a simple and convenient solution to maneuver decision.

A. FLIGHT MODELING
Flight modeling depends on force analysis [14]. As shown in Fig. 1, O c x b y b z b is the Body Axis system, O c x g y g z g is the Earth Axis system, and O c x a y a z a is the Wind Axis system. Regarding the aircraft as a rigid body, its force model can be established on the basis of these three coordinate systems. Main variables required for modeling are summarized in Table 2.
According to flight dynamic theories, the motions of the barycenter (O c in Fig.1) and the rotations around the barycenter under Body Axis system can be written in scalar form as (1) [15], [16]. For the convenience of writing, sin θ and cos θ are abbreviated as S θ and C θ , et al.
Then the motions of the aircraft under Earth Axis system can be derived through coordinate axis transformation [15], [16]: Autonomous flight controller is often underappreciated in previous reports [17]. To highlight the unmanned concept in CAAC, we manage to apply a second-order sliding mode flight controller (SFC) to the given six-DOF flight model [18]. The logic is to ensure that the system states can converge to the sliding mode manifolds s =ṡ = 0 in limited time with reference to appropriate control laws [19], [20], [21]. Rewrite (1) into the following form: where x ∈ R n , u ∈ R n are the state vector and the control vector, respectively. Assume the state to be tracked is x c , then the tracking error can be defined as e = x c − x. Now that the control problem has been transformed into finding the input u that makes e stable: If the sliding mode surface function and the approach law are selected as: then control vector u can be designed as: The basic structure of the SFC applied in this paper is shown in Fig. 2. Assume thatṗ c ,q c andṙ c are the desired angular accelerations of the aircraft. Combing (1) and (6), the control quantities of pitch, roll and yaw channels can be worked out as δ x , δ y and δ z , respectively. Usually, δ x is equivalent to δ r (rudder deflection angel), δ y is equivalent to δ a (aileron deflection angle), and δ z is equivalent to δ e (elevator deflection angle).
where λ and ρ with numeric subscript are proportional gain coefficients. L p , M β , N α , et al. are aerodynamic derivatives. The aerodynamic derivatives' form indicates that the aerodynamic data and the flight model are directly linked with the rudder deflection angles [22]. In this case, the controller can imitate the operation of pilots in manned air combat, which is consistent with the concept of DBN (described later).

B. MANEUVER DECISION-MAKING
Compared with differential games and other methods, the maneuver decision-making model based on Bayesian network is able to be applied to the decision-making of near, medium and long range air combat at the same time, which is a great advantage [23], [24]. However, BN has very strong dependent on the empirical data, which means BN cannot provide valid decisions for UAVs throughout the whole battle process if in lack of authentic empirical data [25]. To address this problem, a more effective DBN system is designed here. The optimized DBN enables UAVs to modify and update the prior probabilities in real-time with regard to their current situations during the air combat.
First and foremost, the blue side is regarded as our side whereas the red side is regarded as the enemy side under normal circumstances. A one-to-one maneuver decision-making model based on DBN is shown in Fig. 3. With the help of influence diagram [26], the DBN model can be presented in the form of an acyclic directed graph composed by nodes and directed arcs. These nodes and arcs demonstrate the logic of maneuver decision-making in a very intuitive way. Taking the blue side as an example, the steps of making maneuver decisions within a time interval k ∼ (k + 1) are as follow: Step 1: At time k, get the observation state Z R k through observing the red side's flight state X R k ; Step 2: Calculate the current combat state for the blue side C B k through comparing blue side's flight state X B k and the observation status Z R k ; Step 3: Assess the current combat state and obtain the posterior probability P( B k |C B k ) according to C B k and the prior probabilities P( B k ); Step 4: Establish an utility value function J B k about the posterior probability P( B k |C B k ) and combat state C B k , then solve out the maneuver u B k that maximizes the situation evaluation J B k through minimax search tree; Step 5: Apply the automatic flight controller (SFC) to make the blue side UAV transfer from X B k to X B k+1 based on u B k ; Step 6: Take the posterior probability P( B k |C B k ) of time k as the prior probabilities P( B k+1 ); Step 7: Repeat step 1 to step 6 until the air combat ends. The core of decision-making behaviors based on dynamic Bayesian network is to determine the criterion of decision-making and the algorithm for solving out the maneuver. In other words, the core is to establish a reasonable utility function J k and adopt an efficient method of searching   maneuver u k [27], [28]. Therefore, step 4 is the most critical one of the above seven steps.

1) FLIGHT STATE
At time point k in an air combat, a plane's flight state can be described in the form of a vector: where state variables x i k ,y i k , z i k are the positions of the aircraft in Earth Axis system. ψ i k , θ i k , φ i k are the Euler angles: yaw angle, pitch angle and roll angle respectively. V i k is the ground velocity.

2) COMBAT STATE
Combat state is the relative position of two UAVs in threedimensional space at a certain moment. It includes the position and the direction information ( Fig. 4).
At time point k in the air combat, the combat state is demonstrated as . a k and χ k are the leading angle and lagging angle of blue side relative to red side respectively; d k is the physical distance between the two UAVs; η k = E B /E R is the ratio of the blue side UAV's energy to the red side UAV's. a k , χ k and d k can be obtained according to geometric knowledge, as shown in (9).
η k then can be calculated by definition, as shown in (10).
where H is the altitude and g is the gravitational acceleration.

3) MANEUVER DECISION-MAKING
Maneuvers in actual air combat determines how planes act during the battle. Taking Pougatcheff Cobra Maneuver as an example, it allows a plane to quickly change speed, so that the battlefield situation shall be reversed in an instant [29]. Previous studies commonly use typical maneuver library as the object of maneuver decision-making [4], [5], [6]. By comparison, we hope that the obtained result could cover as many positions as possible. From the perspective of overload, the maneuver flight of a UAV can be described with the tangential overload n x and the normal overloads n y , n z . Tangential overload n x = X /mg is parallel to the direction of flight speed, which means the maneuver of changing the flight speed. Normal overload n y = Y /mg is perpendicular to the direction of flight speed, which means the maneuver of ascending or descending in the vertical plane. And normal overload n z = Z /mg is also perpendicular to the direction of flight speed, which means the maneuver of left or right turning in the horizontal plane. Therefore, rather than using the typical maneuver library, we choose to regard the current position of aircraft X i k as the reference point and then retrieve the position of next moment with seven kinds of overload commands within the decision-making time interval k ∼ (k + 1): 1. n xc , 2. n yc , 3. n zc , 4. n xc and n yc , 5. n xc and n zc , 6. n yc and n zc , 7. n xc , n yc and n zc . Compared with the typical maneuver library, these seven commands have an advantage that they can provide a more complete description of the UAV's motions in three-dimensional space. The motion types considered in the typical maneuver library method are limited. This limitation usually results in a problem that the UAV may not be able to reach the most favorable position if adopting the typical maneuver library method in making maneuver decision-making. However, if adopting the above-proposed method which is based on the seven overload commands, the UAV is able to carry out more complex motions, and reach more potential positions. The most favorable position may exist in these extra potential positions. This design of optional maneuvers based on overloads is named as Overload-limited Maneuver. The Overload-limited Maneuver, to a certain extent, solves the above-mentioned problem caused by limited considerations of the motion types.
After working out what maneuver to do at the next time interval, the estimate position of the UAV at the next moment can be obtained. The estimate position is remarked asX i k+1 . The following step is to consider the effect of the automatic flight controller. Only if the UAV do can fly to the positioñ X i k+1 stably within the time interval, can the decided maneuver be effective. As mentioned in Chapter A, the SFC adopted in this paper is designed to track the desired accelerations and the desired angular accelerations, rather than the desired overloads. Therefore, it is needed to establish a relationship between the accelerations and the overloads.
According to the definitions of n x , n y , n z , forces X c , Y c and Z c can be obtained.
Combining (1) and (11), a set of desired accelerations (u c , v c ,ẇ c ) can be obtained. (u c ,v c ,ẇ c ) brings the change of velocities, resulting in a set of desired velocities u c , v c , w c . u c , v c , w c cause the change of moments, resulting in a set of desired moments L c , M c , N c .
Combining (1) and (12), a set of desired angular accelerationṡ p c ,q c ,ṙ c can be obtained. Then combining (7) with (12), a control quantity set u i k can be solved out, as shown in (13). The control quantity set (u i k ) enables the UAV to move from the current position X i k to the future positionX i k+1 .
Then combining u i k , (1) and (2) together, the actual position that the UAV can reach at time k + 1 in fact can be derived and recorded as X i k+1 . In this way, the UAV completes a state transition based on the maneuver decision considering the effect of the automatic flight controller.

4) OBSERVATION STATE
The enemy side's flight states cannot be obtained directly in fact, which is the main source of uncertainty in air combat [26]. To bring simulations closer to real combat, it's designed that one can only obtain the other one's state with noise.
At time point k in air combat, the observation state of red side for blue side is set as: where C i k s B k , s R k is the combat state mentioned above, ζ k is the observation noise and is simulated by Gaussian white noise in this work.

5) STATE ASSESSMENT
State assessment is carried on by the aircraft in combat at each time, which refers to the probability distributions of one UAV's combat situation after a certain maneuver. Situations are divided into four categories: For blue side, Superiority shows possibility to defeat red side; Neutrality represents they tied up; Inferiority indicates blue side is in great danger; Mutual Disadvantage means they both have opportunity to win.
Prior probabilities can be defined as P i k = j , j = 1, 2, 3, 4. It is easy to get that 4 j=1 P i k = j = 1. The posterior probability of the state assessment is given as: Equation (15) means that the posterior probability is based on probability distributions, and can be more realistic by taking the combat state into consideration. P C i k | j is the FIGURE 5. Minimax search tree for maneuver decision-making. VOLUME 10, 2022 conditional likelihood probability, which is often artificially defined according to C i k : and p η i k | j denote conditional likelihood probabilities in combat situation i k .

6) SITUATION EVALUATION
An utility function J i k is applied to reflect the preferences of the aircraft in combat, as shown in (17). (17) where N is the forward step number. U i j, C i k is the utility according to C i k : The optimal maneuver action and the corresponding largest utility value for each time point shall be determined by the minimax method (Fig. 5). The minimax method assumes that the opponent is smart, which means that the maneuver is obtained under the circumstance that the opponent has taken a sufficiently advantageous maneuver [30].
Obviously, the computation of searching will increase exponentially as the expected level rises [31]. To solve the problem, the idea is that one could foresee the future movement of target aircraft to some degree via a visual check, which is inspired by human pilots' experience [32]. The basic principle of this method is that the position of enemy aircraft relative to our aircraft at the next moment [x tob , y tob , z tob ] T k+1 is able to be predicted by the information at the current moment. The predicted position is record as where [x tob , y tob , z tob ] T k is the target aircraft's position relative to our aircraft's body axis at time point k; [i at ,j at , k at ] T is the aerodynamic coordinates of the target aircraft; [i ao ,j ao , k ao ] T is the aerodynamic coordinates of our aircraft; v o and v t are the velocity of our own aircraft and the target one respectively; α o and β o are the angle of attack and the angle of sideslip for our aircraft respectively.
Based on this idea, we can eliminate some maneuvers that the target enemy cannot take, thereby realiz- ing pruning the search tree and reducing the amount of calculation (Fig. 6).

C. TARGET ALLOCATION DECISION-MAKING
The target allocation decision-making of many-to-many CAAC refers to assigning all enemy UAVs to our UAVs, so as to reach the optimal overall attack utility. The matching function (objective optimization function) is given as below: where ta j i represents whether enemy UAV j is allocated to our UAV i or not, ta j i = 1 means yes and ta j i = 0 means no; U ij is the utility function value when our UAV i fight against enemy UAV j; l is the number of UAVs fighting against enemy UAV j at the same time after allocation; m is the total number of our UAVs; n is the total number of enemy UAVs.
Equation (20) describes a typical non-deterministic polynomial complete (NP-C) problem [33]. To address it, a HPSO algorithm is adopted here. And an optimized HPSO is proposed to accelerate the change of particles in the later iteration stages by introducing a judgement parameter.

1) STANDARD HPSO
The HPSO algorithm is developed on the basis of the PSO algorithm. A PSO algorithm is an intelligent algorithm which simulates the foraging behavior of birds [34]. It can achieve the extremum optimization by following the individual extremum and group extremum [35]. The method of updating particles is given as below: where w is the inertia weight coefficient; k is the current number of iterations; X is the position of particles; V is the velocity of particles; l 1 and l 2 are the learning factors; r 1 and r 2 are random numbers distributed in interval [0, 1]; P best and G best are the individual extremum and group extremum respectively. Based on (21), the standard HPSO algorithm draws lessons from genetic algorithm about crossover operation and mutation operation when updating particles [36].

2) OPTIMIZED HPSO
Compared with PSO algorithm, the standard HPSO algorithm does not update the particle position by tracking the extreme value, so as to prevent the influence of contrived inertia weight, particle speed and learning rate [37]. However, the convergence speed of the standard HPSO algorithm is fast at the early stage of iteration and is slow at the later stage of iteration. With the increase of iteration numbers, the particles become more and more similar as the population converges, which may lead to local optimal solutions [38].
The way to avoid the condition is introducing a judgement parameter named . When iterating, if the difference between the group extrema of several adjacent iterations is less than , it is considered that the algorithm falls into local optimization. The immigrant particle should be introduced by this moment. Their fitness ought to be calculated and then compared with existed particles. The particle with higher fitness will be selected and retained. It is worth mentioning that immigrant particle is only introduced when local optimization happens, which means the global search ability is improved whereas the calculation quantities do not increase a lot.
Steps of the optimized HPSO algorithm are as follows (see Fig. 7): Step 1: Initialize, set the maximum number of iterations Num, set the number of particle populations K , generate K numbers of distribution schemes as the initial particle swarm, calculate the fitness of each particle, record the initial individual extremum P best and the global extremum G best ; Step 2: Cross over each particle in the swarm with P best , calculate the fitness of the two particles generated by crossover operation, select and retain the one with better fitness as a new particle, update P best if the fitness of the new particle is better than the original P best ; Step 3: Cross over the new particle obtained in step 2 with G best , calculate their fitness, select and retain the one with better fitness as a new particle, update P best if the fitness of the new particle is better than the original P best ; Step 4: Mutate the new particle obtained in step 3, calculate its fitness; update P best if the fitness of the new particle is better than the original P best ; Step 5: Determine G best according to the updated P best , end the iteration if the maximum number of iterations is reached, go to step 6 if not; Step 6: Calculate the difference between G best s of several iterations, go to step 2 directly if the difference is greater than , go step 2 after introducing immigrant particles and updating the swarm if not; However, numbers of UAVs on both sides are usually different in real air combats [39]. It is prone to have difficulties VOLUME 10, 2022 in initializing the particle swarm due to constraints. Solutions are as follows: First of all, suppose that our side have m number of UAVs and the enemy side has n number of UAVs. Then, a matrix U about utility is established to describe the relationship between UAVs.
When m > n, supplement m − n columns of constants C 1 on the right side of the matrix, as shown in (23).
C 1 is a constant less than all other elements in the matrix. In this case, there will be another m − n number of our UAVs after target allocation. Thus, the next step is to allocate the redundant UAVs regarding real-time mission requirements. For example, if adopting defense principle, they will be assigned to the enemy UAV which poses the greatest threat to our targets; if adopting marking principle, they will be assigned to the enemy UAV which has the strongest flight performance; if adopting annihilation principle, they will be assigned to the enemy UAV which has the poorest combat advantage.
When m < n, supplement n − m rows of constants C 2 on the bottom of the matrix, as shown in (24).
C 2 is a constant greater than all other elements in the matrix. In this case, there is no need to do additional target allocation operations. Aforementioned operations ensure that the generated initial particle swarm can meet the needs of target allocation as long as the swarm is a positive integer random arrangement from 1 to the utility matrix's order.

D. END CRITERIA
In previous studies on air combat, the end criteria are generally simple. There are mainly two kinds: one is setting a coneshaped attack zone to represent the attack performance of a fighter [1], [2], [3], another is constructing a value function that describes the payoff of an attack [23], [29].
To fight closer to the real world, a fire control system is especially involved here. It is designed according to the principle of proportional guidance method. As shown in Fig. 8(a), the basic law is: where K is the proportional constant; σ m is the angle between the missile velocity vector and reference line; q tm is the angle between the sight line and reference line [40], [41]. The graphical solution of proportional guidance method is given in Fig. 8(b). According to the trigonometric relationship, we can obtain: Compared with the real dq, the dq obtained at this moment may have a large error, especially when the missile is close to the target [42], [43]. Equation (27) provides a method to correct dq as dq * . dq * is closer to dq much more than dq.
The number of iterations depends on the requirement. Usually, two iterations are enough.
Then, a condition (FS) is defined to represent the final stage of an autonomous air combat.

FS
When one side enters the final stage, it will begin to conduct an additional decision while doing Maneuver Decisionmaking [44], [45], [46]: shoot a virtual missile that follows the above-mentioned fire control law. When the calculation shows the missile can hit the enemy's aircraft (the distance between missile and target d mt is less than 10 m) within next five simulation time intervals, then it is determined that one side wins and the air battle is in the end.

III. SIMULATION RESULTS
Conditional likelihood probabilities, utility functions and weighting factors in this paper are designed according to different situations (see Table 3 -5).
To reiterate, blue side is regarded as our side while red side is regarded as enemy side in all the following simulation cases.

A. 1-TO-1 MANEUVER DECISION-MAKING SIMULATIONS
In all simulations below, our plane fights on the basis of the optimized DBN and the Overload-limited Maneuver, while the enemy plane relies on the standard BN and the typical maneuver library. In order to verify our optimization, we selected two typical initial situations (Mutual Disadvantage and Inferiority) for simulation.

1) MUTUAL DISADVANTAGE CASE
Mutual Disadvantage means they both have opportunity to win. This initial situation (see Table 6) shows that both UAVs are in great dangerous. VOLUME 10, 2022  The simulation result is presented in Fig. 9(a). Both sides tried to escape the risk through scissor maneuvers. After a series of scissor maneuvers and S-breaking maneuvers, the situation changed. The blue UAV entered a slight advantage situation through smaller and faster turns and the red UAV began to defend. The red UAV then expected to get rid of the blue UAV through climbing and decelerating, or diving and accelerating. However, the red UAV was still closely followed by the blue UAV. Finally, after several rounds of decision-making, with the gradual update of the blue UAV's prior probability, the blue UAV successfully predicted the red UAV's actions, and achieved an absolute advantage by taking smaller but faster maneuvers. Then the fire control system judged that the red UAV shall be shot down by blue side in no time. The combat ended. The whole air combat process cost about 100 steps, which equals to about 25 s in real world.
Considering that there is uncertainty in the observation of each other's position and state in air combat (see the section of observation state), this Mutual Disadvantage case was simulated for 20 times. Each time the blue side could win the combat, which shows that the maneuver decision-making algorithm adopted by the blue side is more advanced.

2) INFERIORITY CASE
Inferiority indicates the blue side is in great danger. This initial situation (see Table 7) shows that the enemy plane is occupying a favorable position and is chasing us.
The simulated trajectories are shown in Fig. 9(b). The red UAV was in a superior position at initial, in chasing of the blue UAV. Meanwhile, blue side tried to get rid of red side through climbing and diving. The blue UAV got rid of the red UAV through a sudden change of direction. That is to say, although the blue UAV was at an absolute disadvantage at first, it still could reverse the situation in less than one minute through better maneuvers, and finally turned defeat into victory. The whole process cost about 220 steps (55s).
Likewise, 20 simulation tests were conducted, and the overturn occurred in 14 times out of 20. This result shows that even if the blue UAV is at a disadvantage at the beginning, it still has a high probability of winning. This result also proves that the blue UAV has better maneuver decisionmaking algorithm.

B. TARGET ALLOCATION DECISION-MAKING SIMULATIONS
Two different simulation cases are carried out in this part. Both cases are simulated many times based on three different algorithms. The parameter settings of the algorithm are as follows: (1) Genetic algorithm: The initial number of individuals is set as 50, the crossover probability is set as 90%, and the mutation probability is set as 5%; (2) Standard HPSO: The initial particle number is set as 5; (3) Optimized HPSO: The initial particle number is set as 5, the number of consecutive adjacent iterations is set as 15, the decision parameter is set as 0.1.
The number of iterations is all set as 50 for the three algorithms.

1) 8-TO-8 TARGET ALLOCATION CASE
The initial states of eight blue UAVs (our side) and eight red UAVs (enemy side) are given in Table 8. All three algorithms are simulated for 20 times, as shown in Table 9. As will be readily seen, the result of the genetic algorithm is the worst and that of the optimized HPSO algorithm is the best. The standard deviation of the optimized HPSO algorithm completely is the smallest. And the maximum, minimum and mean value of the optimized HPSO algorithm's solution are all better than those of the other two's. Besides, the average computing time of the optimized HPSO algorithm (0.105 s) is much less than that (0.216 s) of the genetic algorithm, but just about 6% higher than that (0.099 s) of the standard HPSO. That is, the optimized HPSO algorithm can bring better target allocation results, while ensure a relatively short computing time. Fig. 10(a) and Fig. 11(a)-(c) show the results of one certain simulation. As shown in Fig. 10(a), the final fitness value of the optimized HPSO algorithm is the highest. A higher fitness value means our side poses a larger threat to the enemy. Fig. 11(a)-(c) are the three different allocation results. Obviously, the allocation result of Fig. 11(a) is not as reasonable as that of the other two figures, which is consistent with its minimum fitness value. Comparing Fig. 11(b) and Fig. 11(c), it can be found that the matching combination between B 1 , B 5 , B 6 and R 4 , R 5 , R 8 delivers an obvious difference. The standard HPSO algorithm's allocation result is B 1 → R 5 , B 5 → R 4 , B 6 → R 8 , and the corresponding fitness value of this matching combination is 1.953. The optimized one's allocation result is B 1 → R 4 , B 5 → R 8 , B 6 → R 5 , and the fitness value is 2.167, which is better for blue side. This is because the UAVs' initial states, especially the ψ i 0 , will result in their preferring to deal with enemies in specific areas.
Therefore, the optimized HPSO algorithm has better result than the other two, which shows that our optimization is successful.

2) 8-TO-6 TARGET ALLOCATION CASE
The initial states of eight blue UAVs (our side) and six red UAVs (enemy side) are given in Table 10.
All three algorithms are simulated 20 times as well and the relevant data are shown in Table 11. Similar to the 8-to-8  case, the average computing time of the optimized HPSO algorithm (0.125 s) is much less than that of the genetic algorithm, but merely a little more than that (0.112 s) of the standard HPSO. And the maximum, minimum and mean value of the optimized HPSO algorithm's solution are better as well. The smallest standard deviation proves once again that our optimization on the HPSO algorithm is valuable and meaningful. That is, the performance of the improved HPSO algorithm is also superior when it comes to uneven air combat cases. Fig. 10(b) shows the fitness value change of the three algorithms. Fig. 11(d)-(f) show the target allocation results. Similarly, the optimized HPSO algorithm obtains best result. Comparing Fig. 11(e) and Fig. 11(f), there are two main differences. Firstly, in one-to-one matching, the optimized HPSO algorithm's results are B 8 → R 3 , B 1 → R 1 while the standard HPSO's results are B 8 → R 1 , B 1 → R 3 . The speed of B 1 is much faster than that of R 1 , which results in a greater advantage and a higher fitness value when B 1 fights against R 1 rather than R 3 . Secondly, in many-to-one matching, the optimized HPSO algorithm's results are B 3 , B 5 → R 6 , B 5 → R 2 while the standard HPSO's results are B 3 , B 7 → R 6 , B 7 → R 2 . The initial direction of B 7 is much closer to R 2 than VOLUME 10, 2022  R 6 . Therefore, there would be a small advantage in direction if B 7 fights against R 2 , but a great advantage if B 7 fights against R 6 . It is more suitable for B 5 to fight against R 6 instead of B 7 .

3) 8-TO-8 COOPERATIVE COMBAT CASE
The 8-to-8 case is selected for the simulation example of many-to-many CAAC. The initial states information of blue and red UAVs have been shown in Table 8. The target allocation results derived from the optimized HPSO algorithm are Fig. 11(c). The simulated trajectories are shown in Fig. 12.
Soon after the air combat began (11 steps of simulation), R 2 was quickly defeated by B 7 due to the great disadvantage caused by their initial states. Then, according to the annihilation principle, the target allocation decision-making algorithm determined a new target R 3 for B 7 . After 47 steps, R 1 was defeated by B 2 . B 2 then chose R 8 as the next target.
After 91 steps, B 4 defeated R 7 and began to attack R 4 . After 94 steps, R 3 soon fell into a disadvantage under the siege of B 7 and B 8 , and was finally defeated by B 7 . Based on the proximity principle, R 4 was also assigned to R 7 and R 8 as the next target. R 7 and R 8 took this opportunity to approach other friendly UAVs. After 104 steps, R 6 was shot down by B 3 . B 3 then began to chase R 4 as well. At this time, R 4 was the target of B 1 , B 3 , B 4 , B 7 and B 8 at the same time, and was falling into a huge disadvantage. However, B 7 and B 8 were far away, the main threat to R 4 actually came mainly from B 1 , B 3 , B 4 . Not surprisingly, after 191 steps, R 4 was defeated by B 3 . At this very moment, there were only R 5 and R 8 left in the red side. Through the calculation of the target allocation decision-making algorithm, the new match result was B 1 → R 5 , After 201 steps, B 5 and B 6 had achieved great advantages in fighting, indicating that they could quickly defeat their targets on their own without help of other UAVs. Therefore, it is considered that the 8-to-8 air combat is over. Total 201 steps are equivalent to 50 s in real world.
The utility value changes of the blue and red side UAVs are shown in Fig. 13. The utility value of the blue side was greater than that of the red side in the process of the 8-to-8 air combat, indicating advantageous situations in the combat. Consistent with the analysis in the previous paragraph, the UAVs' utility value would suddenly change at time step 11,47,91,94,104,191 and 201, indicating the following three situations: (1) The UAV was defeated and its utility value would be reset to 0, such as R 2 at time step 11 and R 1 at time step 47; (2) The UAV was pursued by several enemy UAVs and its utility would suddenly decrease a lot, such as R 3 at time step 91 and R 6 at time step 97; (3) The UAV got a new target after defeating its original target, such as B 3 at time step 104 and B 7 at time step 191. This strongly proves that both the   maneuver and the target allocation decision-making method designed above can meet the requirements of handling manyto-many cooperative autonomous air combat.

IV. CONCLUSION
In this paper, we have successfully designed a modular and high-fidelity autonomous air combat system. We have made some proper improvements on both the maneuver decision-making and target allocation decision-making of cooperative autonomous air combat. A sliding mode control is especially involved when conducting maneuvers, so that the reliability and authenticity of the decision-making and trajectory results are greatly improved. A method to update the prior probability in the dynamic Bayesian network is applied, so that the problem of lacking enough expert knowledge is solved to a certain extent. An unique visual way to pruning the minimax search tree is applied, so that the decision-making model's complexity is appropriately reduced. A method of introducing immigration particles into the hybrid particle swarm optimization is proposed, so that the global optimality of the matching result is ensured. A principle for extending the target allocation algorithm's application scenario is proposed, so that the algorithm is able to handle air combat cases with unequal number of participants. We also put forward a new method about judging the end of air combat. A fire control is creatively introduced in the final stage of autonomous air combat. The fire control ensures that the intelligence of the enemy aircraft is not underestimated, hence one's attack can be evaded. This kind of combat end criterion enhances the fidelity of simulation model and the confidence of simulation results. Therefore, it can be said that our work offers promising opportunities for the development of autonomous air combat system. However, there are still some limitations in our work. Some contents are worthy of in-depth study in the future: (1) Combat situations are usually more comprehensive in real air combats. There is also a high probability that several situations may overlap. The combat situations in this paper are divided into four categories: Superiority, Inferiority, Neutrality and Mutual Disadvantage. These four situations are typical, but not precise and specific enough. And when making decisions, the four situations are regarded independent of each other. Therefore, it is needed to make a more detailed division of combat situations and take the coupling between situations into consideration in future works.
(2) There are various fighting concepts of many-to-many combat. Each concept has its own advantages and disadvantages, and its corresponding target allocation criteria also has differences. In this paper, we considered a fighting concept about utility, which means the criteria is to maximize the swarm's total utility value. However, combat efficiency and combat loss ratio, et al. can also be applied as criteria for target allocation. For example, when efficiency is the priority, it is reasonable for one side to pair several UAVs in Superiority with one enemy UAV in Inferiority at the same time. Through forming multiple many-to-one pairs, the enemy will be downsized rapidly. Therefore, the impact of different fighting concepts on target allocation results should be studied in the follow-up.
JIANLIANG AI received the bachelor's, master's, and Ph.D. degrees in aircraft design from Northwestern Polytechnical University, Xi'an, China, in 1986China, in , 1989, and 1997, respectively. He visited the Moscow State Aviation Institute as a Visiting Scholar, from 1996 to 1997. He is currently a Professor with the Department of Aeronautics and Astronautics, Fudan University. His current research interests include aircraft design and nonlinear control and intelligent control with application on flight dynamics.
YIQUN DONG received the Ph.D. degree in flight dynamics and flight control from Fudan University, Shanghai, China, in 2016. He conducted postdoctoral research at Nanyang Technological University, Singapore, from 2016 to 2017, and University of Michigan, Ann Arbor, USA, from 2017 to 2019. He is currently an Associate Professor with the Department of Aeronautics and Astronautics, Fudan University. His current research interests include implementation and application of intelligent autonomous system based on man-machine hybrid intelligence.