A Novel Elitism Co-Evolutionary Algorithm for Antagonistic Weapon-Target Assignment

The Antagonistic Weapon-Target Assignment (AGWTA) problem is a crucial decision issue in Command & Control (C2). Since this is a minimax problem, co-evolutionary algorithms can be used to solve it effectively. However, co-evolutionary algorithm is originally designed for continuous minimax problem which loses its efficiency to discrete contexts. In this paper, a novel elitism co-evolutionary algorithm is proposed to solve the AGWTA. Firstly, an improved AGWTA model for air combat based on the attack and evasion strategies is proposed. Secondly, an elite cooperative genetic algorithm based on the framework of the co-evolutionary algorithm is put forward. In this proposed algorithm, a problem-specific coding method and evolution operator are designed. Meanwhile, an elite individual update mechanism is presented. Finally, based on the analysis of the relationship between the feasible solutions under the air combat environment, an evaluation index is proposed. Experiments show that the proposed algorithm has higher accuracy than traditional co-evolutionary algorithms for solving AGWTA problems.


I. INTRODUCTION
Weapon target assignment (WTA) is a research hotspot in the field of command and control [1][2][3] and operational research [4][5][6]. It mainly studies how to reasonably allocate its own weapons so that the allocated weapons can attack the most suitable enemy targets. The research on the WTA problem can provide accurate and reliable WTA schemes for commanders in different complex battlefield environments. It can improve the efficiency and effectiveness of observeorient-decide-act (OODA) Loop [7,8].
WTA problems are mainly divided into two types: static weapon target assignment (SWTA) [9,10] and dynamic weapon target assignment (DWTA) [11][12][13]. In SWTA, all weapons are launched at the same time, while in DWTA all weapons are launched in phases. Therefore, for the DWTA problem, the distribution of weapons in the previous stage will have an impact on the later stage. In addition, for different combat tasks, each type of WTA also includes resource-oriented problems [14][15][16][17] and target-oriented problems [18][19][20].
At present, WTA has been applied in a variety of military scenarios, such as base defense resource assignment [21,22], multi-stage weapon target assignment [23][24][25], sensorweapon-target assignment [26][27][28][29], and antagonistic weapon target assignment [30]. In these applications, antagonistic environment is the current research hotspot. Due to the existence of antagonism, the war presents a high degree of uncertainty. Under such circumstances, the engagement will be greatly affected by the decisions and behaviors of both sides. In the process of the engagement, we will choose the strategy that is beneficial to us based on the current situation, while the opponent chooses the strategy that is beneficial to itself. Both sides will adjust their strategy according to the other's strategy to keep themselves in a favorable position.
The traditional WTA model only considers how to maximize the reward of the assignment scheme. It does not consider the impact of the opponent's strategies on us in the antagonistic environment. However, in the air combat of the modern war, there are many attack-defense strategies on both sides, such as launching a missile attack, using the sensor to track, using electronic jamming to suppress, and using evasive maneuvers for defense. How to add these strategies into the WTA problem to describe the antagonistic process more accurately is one of the important problems currently faced.
In terms of an antagonistic environment, Golany et al. [21,22] proposed a zero-sum defense-attack game model, in which the protagonists are the defender and the attacker. The defender allocates the defensive resources to the areas to be protected, and the at-tacker attacks these areas. Shan et al. [30] proposed a sequential defense-attack game model based on mixed defense resources, in which the attacker takes strategic actions with a certain probability. Zha et al. [31] put forward a dynamic multi-formation antagonistic model under incomplete information. In this model, one formation is un-manned combat aircraft (UCAV) and unmanned reconnaissance aircraft (UCAV), while the other formation is air defense missile (AAM) and ground command vehicle (GCV). Although the above models reflect the characteristics of antagonistic weapon-target assignment (AGWTA) to a certain extent, they all describe the combat situation in which one side is the attacker and the other is the defender. In view of this kind of combat situation, a typical combat scenario is an attack-defense antagonism in air combat. In terms of this scenario, Pan et al. [32] proposed an AGWTA model, in which the combat aircraft of both sides could not only suppress the enemy by electronic jamming but also attack the target by a missile. However, the model mainly reflects the antagonism from the perspective of information fusion. It does not consider the evasive strategy. To reflect the tactical characteristics of using attack and evasive strategies in air combat, this paper focuses on the design of an attack and evasive tactical strategy-oriented AGWTA model for air combat environment, which reflects the characteristics of air combat antagonism from the perspective of using attack and evasive strategies.
WTA is a kind of NP problem [33]. Its model is nondifferentiable, nonlinear, and non-convex. When the scale of the WTA problem is large and complex, the traditional method, such as mathematical programming [34,35], branchand-bound method [36], or approximate method [37], is difficult to solve the problem efficiently without any simplification. Therefore, evolutionary computation [38,39] can be used to solve AGWTA problems. Although the evolutionary computation can effectively solve NP problems, it is difficult to obtain the Nash equilibrium (NE) in antagonistic problems. However, according to the actual situation of air combat, the opponent cannot always take the optimal action against our strategy. If reasonable and feasible strategies can provide for the decision-makers, it can ensure that an advantage can be achieved for us in the engagement. Although the evolutionary computation cannot provide NE strategies, it can provide us with a feasible strategy to assist commanders to make reasonable decisions.
To solve such antagonistic problems, the model of this problem is usually established as a kind of Minimal-Maximum problem [40]. In the current study, coevolution [41][42][43] is an effective method to solve the Minimal-Maximum problem. Literature [32] proposes a cooperative genetic algorithm to solve the AGWTA problem. However, the algorithm does not make full use of the cooperative relationship between populations. Hence the accuracy and efficiency of the algorithm for solving AGWTA problems can be further improved. In this paper, an Elitism Coevolutionary Algorithm for Antagonistic Weapon-Target Assignment (ECO-AGWTA) is proposed. The main work and contributions made in this paper are as follows: (1) A new model of the AGWTA problem in air combat is established. According to the characteristics of attack and defense in air combat, this paper puts forward a model based on the attack and evasion strategy.
(2) An elite cooperative genetic algorithm framework is proposed. In this framework, an elite collaboration mechanism is proposed. The coding method and the evolution operators are redesigned.
(3) An elite individual updating mechanism is proposed. In this mechanism, the updating rules of elite individuals are established. The initialization and updating algorithms of elite individuals are put forward.
The remainder of this paper is organized as follows. The model of the AGWTA problem for air combat is established in Section 2. The ECO-AGWTA algorithm is described in Section 3. Section 4 gives the simulation results and analysis. Section 5 concludes the paper.

A. ANALYSIS ON THE BASIC ELEMENTS OF ANTAGONISTIC WEAPON-TARGET ASSIGNMENT
In air combat, both sides of the confrontation need to make a series of favorable decisions according to the air combat situation, such as the decisions of the control of sensors, the strategies of the usage of weapons, and the tactics of the maneuver. Therefore, the process of air combat is so complex that some reasonable assumptions need to be made in advance to facilitate the analysis of the problem.
In this version of the AGWTA problem, it is assumed that both sides have a certain number of identical aircraft. Since the force of both sides are the same, many complex factors in air combat can be ignored, such as the maneuverability of the aircraft, the performance of weapon (type, range, guidance mode, etc.), and the capability of sensor. In addition, the equal force of both sides can also avoid causing one side to be too strong to lose antagonism. The equal force makes this problem a two-person zero-sum game with a NE solution [44], which is the solution required in this problem. As a result, we just need to pay attention to the impact of the strategies of both sides, which is convenient for us to analyze and model the problem. In addition, we also assumed that all the decisions of As shown in Figure 1, both red and blue have a certain number of aircraft. The mission of these aircraft is to destroy the targets of the opponent as much as possible and protect itself from destruction. All these aircraft can only execute an attack or evasive strategies at the same time. Each aircraft can only attack one target or evade one target at a time. During this engagement, both attack and evasion have a certain probability of success.
According to the assumptions of the problem, the AGWTA problem contains many basic elements. The definition of each element is firstly given.
(1) Participant Sets: Suppose that there are m and n aircraft in red and blue, respectively. Our side is red, denoted as R={r 1 ,r 2 ,...,r m } . The opponent is blue, denoted as B={b 1 ,b 2 ,...,b n }. r m and b n represent the mth and nth aircraft.
(2) denotes that the red takes the strategy S R and the blue takes the strategy S B .
(3) Reward Function: The reward function describes the benefit that both sides obtained when they use either strategy. When the red adopts the strategy S R and the blue adopts the strategy S B , the reward function of the red and blue can be expressed as f R (S R ,S B ) and f B (S B ,S R ) respectively. The attack and evasion reward of the red and blue can be expressed as The decision matrix describes the strategy taken by the aircraft at a certain time. The decision matrix of the attack strategy adopted by red against blue is expressed as follows: if the jth blue aircraft is attacked by the ith red aircraft.
The decision matrix of the evasion strategy adopted by red against blue is expressed as follows: if the jth blue aircraft is evade by the ith red aircraft 0, otherwise (2) Similarly, the decision matrix of the attack strategy adopted by blue against red is expressed as follows: if the jth red aircraft is attacked by the ith blue aircraft 0, otherwise The decision matrix of the evasion strategy adopted by blue against red is expressed as follows: if the ith red aircraft is evaded by the jth blue aircraft 0, otherwise Although we assume that the aircraft of both sides are identical, the damage probability, the evasion probability, and the target value are different for each aircraft as mentioned above. It is because that all these elements represent the current air combat situation rather than the capability of the aircraft. Figure 2 shows a typical air combat situation. B1 R2 R1

FIGURE 2. A typical air combat situation with four aircraft
As shown in Figure 2, two red aircraft and two blue aircraft are fighting in the air. The aircraft B1 and B2 execute the attack strategy, while the aircraft R1 and R2 carry out the evasive strategy. In this typical air combat situation, R1 carries out the evasive maneuver in the same direction as B1, and R2 performs an evasive maneuver in the opposite direction to B2. In this situation, R1 can quickly get out of the range of weapons of B1, thereby improving the success probability of evasive strategy. Meanwhile, R2 needs to turn back first and then accelerate to escape, which takes additional time. Hence, R2 takes more time to get out of the range of weapons of B2. As a result, for this air combat situation, the evasion probability of R1 against B1 is high, and the damage probability of B1 against R1 is low. On the contrary, the evasion probability of R2 against B2 is low, and the damage probability of B2 against R2 is high. In addition, from the perspective of blue, R2 is more dangerous than R1. Therefore, the target value of R2 is higher than that of R1as R2 is attacked by B2 can produce higher rewards. Based on the above analysis, all the elements used in this version of AGWTA can reflect the characteristic of air combat to a certain extent. As a result, these elements can help us build a reasonable AGWTA mode.

B. AIR COMBAT EVASIVE STRATEGY MODEL
In the traditional AGWTA model, only the benefit of attack is considered. However, in actual air combat, if the opponent's attack is evaded by our aircraft, the effectiveness of air combat will be affected. Therefore, it is necessary to consider the benefit of the evasive strategy in AGWTA problem.
When the evasive strategy is adopted, take the red as an example, the reward of the evasive strategy can be expressed as follows: (5) where E ij (E ij ≥0) represents the reward that the attack of the jth blue aircraft evaded by the ith red aircraft. y ij R x ji B is the constraint to calculate the benefit of evasion. The benefit of evasion will increase only when both y ij R and x ji B are equal to 1. If the jth blue aircraft is evaded by the ith red aircraft, while the jth blue aircraft does not attack the ith red aircraft, the evasive strategy taken by red is considered to have no effect on the combat. In addition, when both y ij R and x ji B are equal to 1, it is also necessary to ensure that the red can successfully evade the attack of blue. In order to further explain the calculation method of the evasion reward, formula (5) can be expressed as R indicates the evasion reward of ith red aircraft against jth blue aircraft, and C represents whether R be calculated. When the ith red aircraft does not evade any attack from blue aircraft, C is equal to 0. Hence, the reward R is no longer calculated. When the evasion reward of each red aircraft is added up, the total evasion reward is achieved. In formula (5), the calculation method of E ij is the key problem to be studied. E ij can be expressed as E(s j B (A,r i )) which represents the reward of the jth blue aircraft evaded by the ith red aircraft. The model is designed as follows: where x j'i' Bj* is the attack decision variable of blue, which represents the decision variable that the i'th red aircraft attacked by the j'th blue aircraft when the jth blue aircraft does not attack. It can be expressed as follows: where x j'i' Bjk~i s the constrained attack decision variable of the blue, which means that only the kth red aircraft is attacked by the jth blue aircraft. It can be expressed as follows: According to formula (6), the model can be divided into two parts. It can be expressed as E(s j . E part1 (s j B (A,r i ))represents the expected attack reward of the jth blue aircraft without attacking the ith red aircraft. E part2 (s j B (A,r i )) indicates the attack reward of all the blue aircraft when any red aircraft is attacked by the jth blue aircraft. When the jth blue aircraft is evaded by the ith red aircraft, the jth blue aircraft has no reward of attack. However, if the jth blue aircraft is assigned to attack another red aircraft, it can achieve the reward. As a result, is the increased reward of the blue. To further elaborate the model, the derivation process of E part1 (s j B (A,r i )) and E part2 (s j B (A,r i )) is as follows: represents the attack reward of the blue when the jth blue aircraft does not attack any red aircraft. It is equal to E part2 (s j B (A,r i )). The derivation process can be referred to the modeling process of the basic WTA problem. Second, the expression∑ v i' represents the attack reward of the blue when the jth blue aircraft is assigned to attack the kth red aircraft. Finally, the sum of the attack reward of the jth blue aircraft attacking the targets of the red except the ith target is calculated. Then calculate the mean value of this reward to obtain E part1 (s j B (A,r i )), which needs to satisfy constraint s.t. k≠i.
Similarly, the model of the expected attack reward of the ith red aircraft evaded by the jth blue aircraft can be designed as follows: Rih~ can be respectively represent as follows: In summary, the reward model of evasion strategy for air combat as shown above can be expressed as follows:

C. AIR COMBAT ANTAGONISTIC WEAPON-TARGET ASSIGNMENT MODEL
To establish the AGWTA model, it is necessary to establish the action reward of both sides first. The action reward includes two parts: the attack reward and the evasion reward. Therefore, the action reward model of blue and red can be expressed as follows: where f A R (S R ,S B ) and ( , ) can be respectively expressed as follows: When the strategy of one side is fixed, the problem of AGWTA is transformed into the traditional WTA problem. This problem is to find the optimal strategy * of the red or the optimal strategy * of the blue to maximize the reward function f R (S R ,S B ) or f B (S B ,S R ) . It can be expressed as follows: Equation (15) means that both sides try their best to maximize their own reward. However, in actual air combat, we need to consider not only our own reward but also the possible actions of the opponent to assist the decisionmakers to choose a reasonable strategy. To further describe the interaction between red and blue in the antagonism environment, equation (15) is converted into the following form: where represents the difference in reward between the two parties. For the red, a higher value of the difference not only means that the red gets more reward, but also indicates that the blue gets less reward. This can involve the attack reward and the evasion reward at the same time to fully reflect the characteristics of air combat.
To solve the AGWTA problem, the robust design problem [45] can be used for reference. This kind of problem can be transformed into the minimax optimization problem because that this problem is to obtain the most suitable strategy we can take when the opponent uses the best strategy. Therefore, the NE solutions is obtained by solving the problem, which is the set of all pairs of opposing "good strategies" [44]. As a result, in the AGWTA problem, a conservative strategy can be adopted without knowing the opponent's strategy. In this way, although we cannot guarantee that the strategy adopt has the highest reward, we can provide a reasonable strategy for the decision-makers when the opponent's action is unknown.
Based on the above analysis, take the red as an example, the model of the AGWTA problem in air combat can be expressed as follows: where are two constraints which means that only one strategy can be executed by each red and blue aircraft at the same time.

A. THE FRAMEWORK OF ECO-AGWTA
Based on the framework of the co-evolutionary algorithm, the ECO-AGWTA algorithm is proposed to solve the model shown in equation (17). In this algorithm, the red and blue sides are represented by two independent populations, which are evolved simultaneously. In the process of evolution, the two populations are evaluated against each other but perform evolutionary operations independently. This process is similar to the air combat between the red and blue, in which their own strategies are chosen according to the opponent's strategies. The framework of the proposed ECO-AGWTA algorithm is shown in Figure 3.
As shown in Figure 3, ECO-AGWTA algorithm includes two populations, i.e., the red population P R and the blue population P B . At the beginning of the algorithm, P R and P B are initialized. Then, the fitness of the individual is calculated according to equation (17). For the initial population P R and P B , the fitness is calculated only depend on the two initial populations. Based on the results of fitness evaluation, a specified number of the best individuals are selected from P R and P B to form the elite population EP R and EP B . After that, the algorithm enters an iterative process, in which the red and blue populations perform selection, crossover, and mutation operations respectively to generate new individuals. In the process of evolution, each population only performs evolutionary operations on individuals in its own population and does not perform these operations with individuals in other population. After these operations, the fitness of these new populations is recalculated. However, the fitness of an individual needs to be calculated not only based on the opposing population P R or P B , but also based on the elite population EP R or EP B . Meanwhile, the elite populations are updated. In this process, the elite populations are not only used to preserve the current excellent individuals but also participates in the crossover operation and fitness evaluation. As a result, the searching ability of the algorithm in the discrete space can be improved. Table 1 describes the detailed steps of the ECO-AGWTA algorithm.
To solve the AGWTA problem effectively, the coding method evolutionary operator is also redesigned. Furthermore, a new elite individual updating criterion and elite population updating method are proposed. These mechanisms are explained in detail in the following sections.   MaxIter (the maximum number of iterations), popR (the population size of the red),popB (the population size of the blue), EPsize (the population size of the elite), pc (the crossover probability), pm (the mutation probability).

Output：Ibest(the best individual).
Step1 Initialization: Randomly generate the initial red population R =r 1 ,r 2 ,⋯,r popR and the initial blue population B =b 1

B. BI-LAYER CODING
In a traditional WTA problem, only the attack strategy is executed. However, in an AGWTA problem, both attack and evasion strategies are executed. Therefore, the traditional coding method is no longer applicable to the AGWTA problem. In this paper, a bi-layer coding is proposed, which includes two parts: strategy layer and target layer. The target layer is coded in the same way as the traditional method. The strategy layer describes the strategy executed by the aircraft. Each coding bit in the strategy layer corresponds to the corresponding target layer coding. The diagram of the bilayer coding is shown in Figure 4.  As shown in Figure 4, the coding dimensions of red and blue are equal to the number of aircraft in their own formation. The red strategy layer s r1 ,⋯,s ri ,⋯,s rm represent the strategies executed by red. The blue strategy layer s b1 ,⋯,s bj ,⋯ s bn represent the strategies executed by blue. The value of strategy layer can only be 0 or 1. The value 0 means that the evasion strategy is executed and the value 1 indicates that the attack strategy is executed. The red target t r1 ,⋯,t ri ,⋯,t rm and the blue target layer t b1 ,⋯,t bj ,⋯t bn represent the targets attacked by the opponent's target.
Take the code of red as an example. Assume that there are 3 aircraft in the red team and 3 aircraft in the blue team. The red strategy layer is 101, and the target layer is 321. The meanings represented by the codes are as follows: the first red aircraft takes an attack strategy to attack the third blue aircraft, the second red aircraft takes an evasive strategy to evade the second blue aircraft, and the third red aircraft takes an attack strategy to attack the first blue aircraft.
According to the above analysis, the dimension of coding restricts the number of individuals. The value in the strategy layer restricts the type of strategy. In addition, the one-to-one correspondence between the strategy layer and the target layer ensures that one of our aircraft can only confront one of the opponent aircraft at the same time.

C. EVOLUTIONARY OPERATOR FOR AGWTA
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. For WTA problem, the commonly used crossover operator is the Ex operator proposed by Lee et al. [46]. However, the Ex operator is designed for the traditional WTA problem, which is not suitable for AGWTA problem. To improve the performance of the algorithm, an Elitism Crossover (EC) operator is proposed. The principle of the operator is shown in Figure 5. In Figure 5, A is the individual that needs to be evolved. B is the individual randomly selected from the red elite population. In the process of EC operation, two bits in the individual are randomly selected first. Then the corresponding coding bits in the two individuals are exchanged as the method shown in the figure. As a result, the proposed crossover operator exchanges some parts of the parent individuals. The elite population is used to determine the parent pool for crossover.
Compared with the Ex operator, EC operator not only retains the basic rules of the Ex operator but also extends the traditional single-layer mode to the bi-layer mode. There are many differences between the EC operator and the Ex operator. First, the EC operator does not adopt the "good gene" retention strategy of the Ex operator. This is because, for the AGWTA problem, the opponent's strategies are constantly changing in the process of evolution. Hence, it is difficult to determine whether genes are good or bad. Second, the EC operator is executed on the individual selected from the population and the elite individual selected from the elite population, while the Ex operator is executed on the individuals only selected from the population.

2) MUTATION OPERATOR
Aiming at the bi-layer coding method proposed in this paper, the mutation operation rules are adaptively adjusted, so that they can be applied to the ECO-AGWTA algorithm. The principle of the mutation operator is shown in Figure 6.   As shown in Figure 6, two kinds of mutation operators are designed in this paper. The two operators have the same mutation probability in the algorithm. The mutation operator (a) is implemented in the strategy layer, while the mutation operator (b) is performed in the target layer. When the mutation operator (a) is used, the operator is performed on a coding bit randomly selected in the strategy layer. When the mutation operator (b) is used, the operator is implemented on a coding bit randomly selected in the target layer.
Since the value of the strategy layer code is only 0 and 1, the mutation operation only modifies the value of the strategy layer code between 0 and 1. The mutation to the strategy layer is mainly to change the strategy of the aircraft represented by the executed mutation operation code. In addition, the mutation operation to modify the number of targets of the target layer means to change the target attacked or evaded by the aircraft corresponding to the current code. Based on the above analysis, both the strategy layer and the target layer can be mutated to enable individuals to meet the constraints of the problem.

3) ELITE INDIVIDUAL UPDATING MECHANISM
In the framework of the traditional co-evolutionary algorithm, the elite individuals are mostly used to preserve the best individuals that have been found. The principle of the bigger the better is usually adopted for the update and selection of the elite individuals. However, for the AGWTA problem, the quality of an individual is not only related to their fitness but also related to the opponent's strategy. This problem is illustrated in Figure 7.     In Figure 7, EP R is the elite population, (ES R ,ES B ) is the strategy set of EP R ; P R is the population, (S R ,S B ) is the strategy set of P R ; F1, ⋯, F N+M are the sorted individuals. As shown in Figure 7, the impact of changes in strategies on fitness is not considered in the traditional elite individual update mechanism. However, the proposed elite individual update mechanism can combine all the current strategy sets to update the changed strategies to evaluate the quality of individuals. As a result, the proposed mechanism can achieve the elite individuals under the changed strategy set. To further explain this mechanism, take the red as an example, the following theorem is first given. . According to the model of F R (S R ,S B ) , F R (S R2 ,S B1 ) < F R (S R2 ,S B2 ) means that when the strategy S R2 is adopted, the reward of the strategyS B1 is greater than the strategy S B2 . Meanwhile, F R (S R1 ,S B1 ) > F R (S R2 ,S B2 ) indicates that the reward of the strategy S R1 against the strategy S B1 is greater than the reward of the strategy S R2 against the strategy S B2 . According to the above analysis, if the strategy S B1 is better than the strategy S B2 and the red strategy S R1 against the blue strategy S B1 have the higher reward, the strategy S R1 is better than the strategy S R2 .
Situation 2:F R (S R1 ,S B1 ) > F R (S R2 ,S B2 ) and F R (S R2 ,S B1 ) = F R (S R2 ,S B2 ). If the strategy S B1 and S B2 have the same reward and the strategy S R1 against the strategy S B1 have the higher reward than the strategy S R2 against the strategy S B2 , the strategy S R1 is better than strategy S R2 . Situation 3: F R (S R1 ,S B1 ) = F R (S R2 ,S B2 ) and F R (S R2 ,S B1 ) < F R (S R2 ,S B2 ). If the strategy S B1 is better than the strategy S B2 and the strategy S R1 against the strategy S B1 have the same reward as the strategy S R2 against the blue strategy S B2 , the strategy S R1 is better than the strategy S R2 . Situation 4:F R (S R1 ,S B1 ) = F R (S R2 ,S B2 ) and F R (S R2 ,S B1 ) = F R (S R2 ,S B2 ) . If the strategy S B1 or S B2 is adopted and the reward of these strategies is the same no matter which the strategy is adopted, the strategy S R1 and strategy S R2 have the same reward.
In conclusion, the strategy S R1 is no worse than the strategy S R2 . □ Theorem 1 provides a criterion for judging the quality of two sets of solutions in AGWTA problem. When the conditions listed in Theorem 1 are satisfied, the feasible solution (S R1 ,S B1 ) can be considered as a better choice than (S R2 ,S B2 ). Based on this principle, the flow chart of the elite individual updating mechanism is shown in Figure 8.

Select an individual (ES Rj ,ES Bj ) from EP in order.
Whether the theorem 1 is satisfied.
Copy the individual(ES Rj ,ES Bj ) into the temporary population TP.
Input the current population P and elite population EP.

Select an individual (S Ri ,S Bi ) from P in order.
Whether to traverse all the individuals in EP.
Whether the TP is Empty. As can be seen from Figure 8, everyone in the population is compared with all the elite individuals. Then the individual with the worst strategy is selected for update. Table 2 shows the pseudocode of the updating mechanism.
In table 2, Step 3 to step 7 realize the selection of elite individuals that need to be updated. By comparing all the individuals in the elite individuals with those in the current population, the elite individuals that need to be updated are selected and stored in the temporary population TP for further selection.
Step 8 to step 13 implement the update of the elite population. In this process, the individuals which have the worst fitness are selected from TP. Then the corresponding individuals in EP are updated. Repeat the above process until all the individuals of the current population are traversed. The pseudocode shown in Table 2 is taken as an example of the red population.

A. TEST CASE GENERATION
Since there is no public test set of AGWTA problem, 9 groups of test cases from small scale to large scale AGWTA problem are generated. The generated cases are shown in Table 3, in which m and n are the number of aircraft of red and blue, respectively. The generated case mainly includes the following parameters: the number of aircraft of red and blue, the value of aircraft, the damage probability matrix, and the evasive probability matrix. The generating methods of the above parameters are given below.
(1) Number of red and blue aircraft m and n: The number of aircraft of both sides is equal. At the same time, 9 different antagonism situations from small scale to large scale are selected.
(2) Value of red and blue aircraft Since the target value represents the advantage of the strategy adopted by the aircraft in the current situation, it can be regarded as the weight of the model. Therefore, to simplify the calculation, all aircraft of both sides are set to value 1.
(3) Damage probability matrices of red and blue P R =(p ij R ) m×n and P B =(p ji B ) n×m : Each value in the damage probability matrices of both sides is generated randomly within a certain range. In this paper, the range is set as [0.4,0.8].
(4) Evasion probability matrices of red and blue Q R =(q ij R ) m×n and Q B =(q ji B ) n×m : Each value in the evasion probability matrices of both sides is generated randomly within a certain range. In this paper, the range is set as [0.5,0.9]. It should be noted that in actual air combat, as the air combat situation, including distance, angle of entry, velocity, etc., are dynamically changing, the damage probability and the evasion probability are different. The selected ranges of these two probabilities are not from the real air combat scenarios, but a reference value designed for the convenience of the experiment. In practice, it is necessary to determine the actual damage probability and evasion probability according to some techniques, such as threat assessment, tactical reasoning, and intention prediction. However, in this paper, these two probabilities are not arbitrarily set. In this experiment, to reflect the influence of the evasion strategy on the air combat process as much as possible, the evasion probability is settled higher than the damage probability because that an aircraft can only evade one opponent at the same time, whereas an aircraft can be attacked by multiple opponents.

B. EVALUATION INDEX
For the traditional WTA problem, the performance of the algorithm can be judged by the value of the objective function without comparing it with the optimal solution. However, for the AGWTA problem, the reward function depends not only on the strategy adopted by our side but also on the strategy adopted by the opponent. According to Theorem 1, when the reward of strategy S A is greater than strategy S B , it does not indicate that the strategy S A is better than the strategy S B . Therefore, if the selected opponent's strategy is different, the fitness of the individual is probably changed. However, the solutions obtained by different algorithms have different opponent's strategy set. All these opponent's strategy sets should be considerd in the performance evaluation of the comparison algorithms. Therefore, in this paper, a Unity Strategy Evaluation (USE) index is proposed to compare the performance of the algorithms for the AGWTA problem. The USE index is shown in Equation (18). which makes our strategy S Ri the least reward. Therefore, the combined opponent's strategy set can be calculated by Equation (19).
where ES i B and S i B are the blue strategies set of the elite population EP and the population P of the ith comparison algorithm after the algorithm terminated.
ACGA and PCGA are two commonly used coevolutionary algorithms to solve the maximum and minimum problems. The main purpose of selecting ACGA and PCGA as the comparison algorithms is to prove that the elitism co-evolutionary framework proposed in this paper can be better applicable to the AGWTA problem. Although these two algorithms are proposed earlier, they have been still used in related work in recent years due to their excellent performance [45,49]. At the same time, as the AGWTA problem has received attention in recent years, many traditional co-evolutionary algorithm frameworks for solving the minimax problems cannot be directly used to solve such discrete problems. The algorithm frameworks of ACGA and PCGA can be well combined with the mechanisms proposed in this article, which is convenient for comparison.
DCEA-AGWTA algorithm is the only co-evolutionary algorithm designed for the AGWTA problem that we know. The purpose of selecting this algorithm as the comparison algorithm is to verify that the proposed ECO-AGWTA algorithm can better solve the AGWTA problem.
In the experiment of this paper, for ECO-AGWTA, the population size popR and popB are set to 100, respectively, the crossover probability pc=0.8, the mutation probability pm=0.1, and the elite population size EPsize=50. For PCGA and ACGA, the evolutionary operator put forward in this paper is adopted, while the other parameters remain unchanged. For DCEA-AGWTA, the size of each subpopulation is set to 20, while the remaining parameters is the same. Furthermore, the algorithm terminates when the evaluation times of the algorithm reach 500*popR*popB or when the optimal individual found remains unchanged within 50 generations.

D. EXPERIMENTS ON COMPARISON ALGORITHMS
For all the test cases, each algorithm is run 30 times independently. The USE index obtained by statistics is shown in Table 4, and the best results are marked in bold.
As shown in Table 4, ECO-AGWTA has better performance than other algorithms in most test cases. For the small-scale cases Case1 and Case2, there is little difference in the performance of all algorithms. For Case3-Case9, the performance difference between algorithms increases significantly as the case scale increases. For most test cases, DCEA-AGWTA performed better than ACGA, and ACGA performed better than PCGA. According to the above experimental results, when the problem scale is small, all the algorithms can obtain better results because the search space required by the algorithm is exceedingly small. When the problem scale is large, the ECO-AGWTA algorithm has better performance because it adopts the elite cooperation mechanism, which can effectively store the current opponent's excellent strategy set, while using these elitist individuals to guide the evolution of the population.
In addition, with the increase of population size, the gap between the best and the worst values obtained by the algorithm also increases gradually. This is because that the number of strategies that can be selected by the opponent is gradually increasing with the increase in the size of the problem. As a result, an individual that is better in one population may be worse in another population. In this case, it is necessary to expand the population size so that individuals can evaluate each other with more opponent strategies.
It should be noted that the negative value of the USE index in Table IV does not mean that the red is defeated by the blue. The designed reward function F R (S R ,S B ) is only a reference value for decision-makers. The larger the reward value, the better the strategy used by the red. However, the AGWTA is designed as a minimax problem, which means that the strategy adopted by the red is always based on the principle that the optimal strategy is adopted by blue. Therefore, in this version of AGWTA, it can be considered that the blue has always been at an advantage. As a result, the reward value of red is negative in most cases.

E. EXPERIMENTS ON ELITE INDIVIDUAL UPDATING MECHANISM
This section tests and verifies the effectiveness of the elite individual updating mechanism. The algorithm called ECO-AGWTA-r1 is used as the comparison algorithm, in which individuals are replaced according to their fitness. Each algorithm is run independently 30 times. Figure 9 shows the statistical box diagram of the average USE index. As can be seen from Figure 9, the average USE index of the ECO-AGWTA algorithm is better than that of ECO-AGWTA-r1 in all test cases. This indicates that the elite individual updating mechanism proposed in this paper is feasible and effective. It can significantly improve the convergence of the algorithm. In this experiment, the reason for the poor performance of ECO-AGWTA-r1 is that the algorithm only uses the current individual's own fitness to update. However, the fitness of the individual here only represents the performance quality of the individual under a certain opponent strategy. It does not mean that the individual performs well under the corresponding opponent strategy of the relative individual, which leads to the replacement of the superior individual with the inferior one in some cases. From the above analysis, the proposed elite individual updating mechanism is more consistent with the AGWTA problem. It can comprehensively consider the strategies of antagonistic parties to evaluate individuals and accurately judge the quality of individuals.

F. SENSITIVITY ANALYSIS
In this section, the influence of relevant parameters on the performance of ECO-AGWTA algorithm is analyzed. There are five main parameters in ECO-AGWTA, including the sizes of co-evolutionary populations popR and popB, the size of elite population EPsize, crossover probability pc, and mutation probability pm. Four test scenarios are designed to perform the sensitivity analysis for these parameters. The detailed settings are described below.
Scenario 1: Sensitivity analysis on co-evolutionary population size.
In this experiment, different co-evolutionary population sizes are selected for testing. popR and popB are set to 50, In addition, all the algorithms are run 30 times independently for each condition in all test scenarios.
In this test scenario, the averages of the USE index under different co-evolutionary population sizes are compared. When the co-evolutionary population size is 50, 60, 70, 80, 90, 100, 150, and 200, the averages of the USE index are shown in Table 5. As can be seen from Table 5, increasing the size of the cooperative population can significantly improve the performance of the algorithm. However, when the population increases to a certain extent, the improvement of the effect gradually decreases. In addition, for the smallscale scenarios of Case1 and Case2, the effect of increasing population is not obvious. For the large-scale scenarios Case8 and Case9, increasing the population can significantly improve the algorithm performance. Considering that increasing the population will also increase the time consumption of the algorithm, different population sizes can be selected according to the requirements to achieve the balance between performance and time consumption. It is recommended to set the co-evolutionary population size above 100.
In this test scenario, the averages of the USE index under different elite population sizes are compared. When the elite population size is 30, 40, 50, 60, 70, 80, 90, and 100, the averages of the USE index are shown in Table 6.
As can be seen from Table 6, for all test cases, the averages of the USE index with different elite population sizes fluctuated slightly. However, the general trend shows that increasing the size of elite population will reduce the performance of the algorithm to some extent. For traditional optimization problems, in general, increasing the population can make the algorithm search for more feasible solutions in the evolutionary process to improve the efficiency of the algorithm. However, in this experiment, increasing the elite population reduces the performance of the algorithm. This is because the elite cooperation mechanism is designed for the AGWTA problem. In an antagonistic environment, our strategy may have a greater advantage under a certain strategy of the opponent. However, when the opponent changes its strategy, our strategy may get worse. Therefore, when the elite population is increased, more excellent antagonistic strategies are preserved correspondingly, which leads to the influence of more strategies on the individual in the process of evolution. As a result, the elite population is updated too fast, which further disrupts the evolutionary process and makes it unstable. According to the above analysis, the size of elite population should not be set too large. It is recommended to set between [30,50].

(3) Experiments on Scenario 3.
In this test scenario, the averages of the USE index under different crossover probabilities are compared. When the crossover probability is 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0, the averages of the USE index are shown in Table 7. As can be seen from Table 7, for problems of medium and small scale Case1-Case7, increasing the crossover probability can improve the performance of the algorithm. However, when the crossover probability increases to a certain extent, the improvement of the effect becomes insignificant. For large-scale problems Case8 and Case9, when the crossover probability is increased, the performance of the algorithm fluctuates to some extent. However, it does not significantly improve its performance. This is because when the scale of the problem is large, the strategies available to both antagonists will also increase. Due to the limitations of the population size and other factors, it is difficult for the algorithm to fully search the feasible solution space in a single operation. From a general point of view, it is recommended to set the crossover probability above 0.8 to ensure the performance of the algorithm.
(4) Experiments on Scenario 4. In this test scenario, the averages of the USE index under different mutation probabilities are compared. When the mutation probability is 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0, the averages of the USE index are shown in Table 8. As can be seen from Table 8, for all test cases, although the averages of the USE index fluctuate under different mutation probabilities, different mutation probabilities have no obvious influence on the performance of the algorithm. Due to the limited number of experiments, these results cannot fully reflect the effect of different mutation probabilities on the algorithm. However, according to the basic framework of genetic algorithm, mutation operators can improve the diversity of the population and avoid the population falling into local optimum. Therefore, the mutation operator is still used in the ECO-AGWTA. Considering that increasing the mutation probability will increase the time consumption of the algorithm and reduce the convergence efficiency of the algorithm to some extent, it is recommended to set it to a smaller value, such as 0.1.

V. CONCLUSIONS
The WTA problem has been widely concerned and studied, but there is still less research on the problem of AGWTA. In this paper, the AGWTA problem is studied. Firstly, a new model of AGWTA based on attack and evasion strategies is proposed. Secondly, an elite cooperative genetic algorithm framework is proposed. In this proposed framework, the coding method and evolution operators are redesigned. Furthermore, an elite individual updating mechanism for the AGWTA problem is also proposed. Finally, the performances of the proposed ECO-AGWTA algorithm compared with three state-of-the-art co-evolutionary algorithms are discussed.
Extensive experiments demonstrate that the proposed ECO-AGWTA algorithm outperforms the other algorithms in terms of convergence and efficiency. Furthermore, experiments on the elite individual updating mechanism show that the proposed mechanism can improve the performance of the ECO-AGWTA algorithm obviously. As there are few co-evolutionary algorithms for solving AGWTA problems, this paper just selects three representative co-evolutionary frameworks for comparison. It is because that the focus of this paper is to analyze and study the AGWTA problem, including establishing a reasonable AGWTA problem model and designing some mechanisms that can effectively solve the problem. In our future work, we will further improve the algorithm and compare it with more state-of-the-art algorithms to help us design a better framework and mechanism to improve the performance of solving AGWTA problems.
Since this paper adopts a co-evolutionary algorithm to solve the AGWTA problem, it is difficult to get the optimal solution, especially when the problem scale is large. Therefore, the result may be a suboptimal solution. However, for the AGWTA problem, the goal of solving this kind of game problem is to get one or more Nash equilibrium solutions, which are the optimal solutions to the problem. As a result, the suboptimal solutions are not necessarily useful because the opponent's strategy is always changing which leads to a small probability that the performance of our strategies is poor under certain opponent's strategies. Therefore, in future work, the author will focus on the characteristics of the suboptimal solution of the AGWTA problem to ensure that a set of available suboptimal solutions in specific situations can be obtained and the proposed algorithm can be applied to actual air combat. XIAOYANG LI received the Ph.D. degree in systems engineering from Northwestern Polytechnical University, Xi'an, China, in 2020, where he has been an assistant Professor, since 2020.He has authored three refereed international journal papers and six peer-reviewed international conference papers to date. His current research interests include multi-objective optimization, information fusion, and intelligent information processing.
ZHEN YANG received the Bachelor, Master and Ph.D. degree in system and control engineering from Northwestern Polytechnical University (NPU), Xi'an, China, in 2014, 2017 and 2020, respectively. He is currently a postdoctor with the school of Automation Science and Electrical Engineering, Beihang University, Beijing, China. His current research interests include intelligent air combat system modelling and simulation, autonomous maneuvering decision-making, integrated avionics fire control system simulation and testing.
WEIREN KONG received the M.Sc. degree in electronics science and technology from Northwestern Polytechnical University (NPU), Xi'an, China, in 2016, where he is currently pursuing the Ph.D. degree. His current research interests include multiagent-based reinforcement learning for multi UAV air combat confrontation decision making, modeling and simulation of complex systems, and UAV trajectory tracking control.
YIYANG ZHAO received the M.Sc. degree in system and control engineering from Northwestern Polytechnical University (NPU), Xi'an, China, in 2017, where he is currently pursuing the Ph.D. degree. His current research interests include cooperative task assignment of multi UAV systems, decision making, and intelligent air combat systems. DEYUN ZHOU received the bachelor's, master's, and Ph.D. degrees from Northwestern Polytechnical University (NPU), Xi'an, China, in 1985China, in , 1988, and 1991, respectively. He has been a Professor of NPU since 1997, where he is currently the Dean of the School of Electronics Information. His current research interests are integrated control theory and application, information fusion, intelligent information processing, and so on.