Optimization of Unmanned Air Vehicle Tactical Formation in War Games

War game simulations are decision-making tools that may provide quantitative data about the scenario analyzed by stakeholders. They are widely used to develop tactics and doctrines in the military context. Recently, unmanned air vehicles (UAVs) have become a relevant element in these simulations because of their prominent role in contemporary conflicts, surveillance missions, and search and rescue missions. For instance, it is possible to admit aircraft losses from a tactical formation in favor of the victory of a squadron in a given combat scenario. The optimization of the position of UAVs in beyond visual range (BVR) combat has attracted attention in the literature, considering that the distribution of UAVs can be a determining factor in this scenario. This work aims to optimize UAV tactical formations considering enemy uncertainties such as firing distance and position using six metaheuristics and a high-fidelity simulator. A tactical formation often employed by air forces called line abreast was chosen for the RED swarm for a case study. The objective of the optimization is to obtain a tactical formation of the BLUE swarm that wins the BVR combat against the RED swarm. A procedure to confirm the robustness of the optimization is employed, varying the position of each UAV of the RED swarm up to 8 km from its initial configuration and using the war game approach. A tactical analysis is performed to confirm whether the formations found in the optimization are applicable.


I. INTRODUCTION
War games are analytical games that simulate warfare at the tactical, operational, or strategic level and are used to analyze combat concepts and train and prepare commanders and subordinates, explore scenarios, and assess how planning affects outcomes. These simulations are very useful for developing tactical, strategic, and doctrinal solutions, providing participants with insight into the decision-making process and stress management [1].
Recently, unmanned aerial vehicles (UAVs) have emerged as a new high-tech force. Using them to achieve air The associate editor coordinating the review of this manuscript and approving it for publication was Yang Tang . supremacy could result in a deep military transformation [2]. As a result, their effectiveness has frequently been tested and evaluated in war games.
With several performance advantages such as increased agility, increased overload durability, and increased stealth capability, UAVs have been gradually evolving and are replacing manned systems in many air missions [3]. However, replacing a manned platform with an unmanned system in air combat beyond the visual range is challenging because of the dynamic nature of combat. A UAV can be remotely controlled in aerial combat, but it will be at a disadvantage against a manned platform because of the limited situational awareness of the UAV pilot. However, this limitation can be overcome through automated combat maneuvers [4] and the VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ optimization of tactical formations. In addition, the use of UAVs can allow some tactical formations and strategies that would not be considered with human-crewed aircraft, such as allowing an aircraft of the squadron to be shot down if it helped the team win the combat. One of the first articles in the literature aimed at optimizing the tactical formation of aircraft in beyond visual range (BVR) combat [5] shows that air combat tactics are candidates for optimization with a genetic algorithm (GA). The implementation uses a hierarchical concept that builds large formation tactics from small conventional combat units and starts with formations of two aircraft, then four aircraft, and then multiples of those. Missile launches were not modeled in the simulations. The simplified engagement simulator declares a casualty when an aircraft places its opponent in the high probability-of-kill (Pkill) region of the weapons engagement zone (WEZ) for a specified period. The application of the proposed methodology proved to be effective, eliminating all aircraft in the team that did not optimize the formation and providing for the survival of the entire aircraft team that optimized its formation.
Keshi et al. [6] used the same hierarchical concept of building large tactical formations from elements composed of two aircraft that were used in [5]. The simulated annealing genetic algorithm (SAGA) was used to optimize the formation, allowing it to overcome convergence to the local optimal solutions. An optimization of the formation of 16 aircraft was implemented, and the optimal solutions presented showed that SAGA was more efficient than the basic GA. Finally, to explore a robust SAGA, comparisons of different Markov chains were made, and the self-adjusting Markov current proved to be more appropriate for the problem presented.
Junior et al. [7] proposed the use of computer simulation as a solution to determine the best tactics for BVR air combats that maximize the probability of shooting down an enemy aircraft. Generic parameters were used to model both aircraft and missiles at low resolution with an adaptation of the simulation optimization algorithm called COM-PASS and simulating a BVR combat of two aircraft against one. The low-resolution model assumes a uniform rectilinear movement in two dimensions in a horizontal plane. Using optimized tactics demonstrated an increase in the average success rate in the shooting down of enemy aircraft from 16.69% to 76.85%.
Yang et al. [8] proposed a methodology to optimize the best attack position and best path of an aircraft against a set of targets. The work considers that the aircraft is capable of firing a missile for each target at the same time and uses the aircraft's offensive and vulnerability factors in relation to the targets as metrics for evaluating the attack position. A high-fidelity simulation was used to model the dynamic characteristics of each missile's aircraft, radar, missiles, and WEZ. This work does not address the problem of optimizing the formation of a set of aircraft against another group of aircraft within a BVR combat scenario.
Li et al. [9] proposed a method for formation optimization based on the commanders' subjective understanding of the problem of selecting an aircraft formation with uncertain equipment information about the target in air combat. Initially, the combat power of the fighter is calculated, which is the basis of the assessment of the target's combat power through the subjective recognition of the commanders. The fighter's combat power is expressed in the form of capabilities, including attack, detection, survivability, communication, electronic warfare, and the warning system. Thus, air combat training is optimized by employing prospect theory and comprehensive fuzzy assessment. Finally, an application example demonstrates the feasibility of the method in smallscale air combat. The authors claim the ability to assess the combat situation using combat power provides a new approach to optimize training in air combat.
Özpala et al. [10] proposed a decision-making method for aerial combat with multiple unmanned combat air vehicles (UCAVs) in two opposing teams. First, the superiority of each agent on both teams was determined. Superiority status includes the weighted sum of the angle, distance, and speed superiorities. After each agent in a team is compared against each agent in the opposing group, each air vehicle is assigned a target for their team's advantage rather than their own advantage. A zero-sum game was implemented for a pair of opposing teams. A reduction method is proposed for mixed Nash equilibrium strategies when many agents are involved. The solution is based on game theory approaches; therefore, this approach is tested on a numerical case, and its effectiveness is demonstrated.
Huang et al. [11] developed new methods to deal with the cooperative target assignment and path planning (CTAPPP) problems of UCAV formation against multiple targets. The formation of the UCAV is based on cooperative decision making and control. After completing target reconnaissance, a training command center transmits task assignment commands to each UCAV quickly according to the battlefield environment and combat mission. The UCAV maneuvers into the best position calculated by its fire-control system to launch the weaponry. The cooperative target assignment (CTAP) problem is solved by enhanced particle swarm optimization (IPSO), the ant colony algorithm (ACA), and the genetic algorithm (GA), and a comparative analysis is performed on the aspects of attribution, accuracy, and search speed. The cooperative path planning (CPPP) problem for UCAV formation for multiple targets is developed based on an evolution algorithm, in which a unique chromosome encoding method, crossover operator, and mutation operator are provided and redefined, and cooperative paths are planned considering the cost of fuel, cost of threat, cost of risk, and cost of remaining time.
The work developed by Ma et al. [12] addressed the problem of optimizing the predominance between two groups (R and B) opponents of UAVs in a BVR combat scenario. The predominance of a UAV r i ∈ R over a UAV b j ∈ B is estimated through the distance between r i and b j , the lower and upper limits of the missile firing distance of r i , the difference between the altitude of r i and the altitude of b j , and the best firing altitude of r i . The decisive variables are the spatial distribution of the UAVs in the two groups and the allocation of targets for each aircraft in these groups. The possible positions of a UAV in the three-dimensional combat space BVR are simplified (discretized) and represented through the central positions of the cubes. There was a set of cubes for each UAV group. The optimization problem is modeled as a zero-sum game and is solved to obtain a Nash equilibrium.
The work presented in Ma et al. [12] does not use highfidelity simulation to analyze the effects of the choices of spatial distributions of UAVs and the targets assigned to them on BVR combat. High-fidelity simulations model the dynamic characteristics of aircraft, radars, missiles, and the WEZ of their missiles. These dynamic characteristics also influence the trigger of actions of each aircraft during BVR combat and, therefore, the final result. For example, if a highfidelity BVR combat simulation is considered during a time window after the first clash between the two sets of UAVs, new clashes may occur until the simulation ends. Thus, each UAV surviving an engagement will be able to select a new target, depending on the predominance values of the available targets. Uncertainties related to the behavior of UAVs were not considered in [12]. Information regarding the exact position of the enemy UAV in tactical formation and its missile firing distance are examples of behavioral uncertainties. These two pieces of information and the other information described above are relevant in the context of a BVR combat: they directly influence the result of the engagement between the aircraft.
In this study, we seek to solve some of the limitations identified in the literature, such as low-resolution simulations, treatment of uncertainties associated with the enemy, and lack of confirmation of the robustness of the optimized solutions, aiming to increase the quality of war game results. The goal is to verify which BLUE swarm tactical formations would allow BVR combat victory against the RED swarm. As a case study, the RED swarm uses a tactical formation often employed by air forces called line abreast [13]. To evaluate the robustness of the solutions obtained for the BLUE swarm, new problems are solved, altering the position of each aircraft of the RED swarm aiming to estimate the impact of the new RED swarm formation on the efficiency of the optimized tactical formation of the BLUE swarm.
We use autonomous agents and high-fidelity computer simulations to optimize tactical formations of UAVs in BVR combats, considering uncertainties associated with the enemy, such as the position error in tactical formation and missile launch distance. The unified behavior framework (UBF) was adopted as the base to create autonomous agents. The aircraft and missiles are modeled with six degrees of freedom (DoFs) in a three-dimensional environment.
The procedure is further discussed in the next sections.

II. PROBLEM FORMULATION
BVR combat consists of the use of radar sensors and radar warning receivers (RWR) for the detection and tracking of targets and the use of specific air-to-air missiles for this type of combat. Engagement is the main phase of BVR combat. In summary, this phase is a sequence of the following activities: the pilot performs the target tracking procedure, checks the possibility of shooting, decides whether to shoot and if so, performs the triggering procedure, performs the trigger support procedure, and makes an evasive maneuver regarding the end of the engagement [14]. An important step during the engagement is the tactical formation of the aircraft, as their positioning can be decisive for the combat result. This work addresses high-fidelity simulations of BVR combat involving two opposing swarms, containing up to four UAVs each, with the same types of aircraft, sensors, and weapons, to optimize the position of each aircraft in the swarm used in war games.
Two relevant aspects of uncertainty will be addressed in this study: first, the position uncertainty of enemy aircraft, because in addition to the radar position error, the aircraft do not maintain a fixed position in the formation. Their position can vary up to hundreds of meters around a pre-established position in the swarm. The second aspect of uncertainty is the enemy swarm's firing distance, which determines the moment when each UAV will launch its missile. Firing distance is more conservative when the aircraft fires the missile at a distance considered safe or more aggressive launching at a distance closer to the opponent.
The aspects of uncertainty will be evaluated from a war game perspective to give the player or decision maker a probability of success, providing additional information to proceed or not with the engagement, depending on the game strategy.

A. COMPUTER SIMULATION MODEL
Constructive and continuous simulations based on autonomous agents were used in this study. The autonomous agent is the computational implementation of a set of algorithms composed of: -Mathematical models that simulate UAV systems and sensors; and -Artificial intelligence techniques that simulate the possible behavior of a UAV in a BVR combat.
The structure of an autonomous agent is shown in Figure 1.
Autonomous agents have been developed as multi-agent systems. Thus, an agent can share information collected from the environment with other allied agents. Each agent elaborates a superstate, containing information of itself (attitude, position, altitude, speed, etc.), environmental information from its sensor system (data on targets detected by radar, missile alerts, etc.), plus relevant information sent by other agents on themselves and elements of the environment captured by them (position of each allied agent, of targets detected by each ally, etc.). The superstate, containing information from the group of UAVs and the sensing of the environment carried out by each one, makes the collective decision-making process possible, improving the effects of planned/adopted behaviors in response to a state. Among a set of autonomous agents, this process of collective and decentralized decision-making is a type of swarm intelligence [15], [16].
The input and output data of the autonomous agents were associated with the scenario (external environment).
A behavior can be understood as a function that maps a state into an action. Thus, the behavior of index i at an instant t can be defined by An action is created and sent to be executed on an actuator, to carry out communication, or to change some information of the agent's own state.
This work makes use of the computational platform named Aerospace Simulation Environment (ASA) [17] to build fictitious operational scenarios in which combat simulations are performed between two opposing swarms of UAVs, named BLUE and RED. The ASA platform uses a mixed reality simulation platform (MIXR) [18] as its simulation engine.
The MIXR platform was designed for the rapid development of robust and scalable applications for constructive or virtual simulations. The platform allows such applications to be independent (stand-alone) or distributed [18]. MIXR has been widely used to build deterministic applications that demand performance in real-time. Deterministic applications mean that a given input data always produce the same result.
Simulated UAVs, that is, autonomous agents, were developed based on a high-fidelity JSBSim [19] dynamic model. JSBSim is open-source, configurable, and compatible with several operating systems' dynamic flight models. The aircraft model represents a high-performance aircraft fighter based on open data available as an example in the JSB-Sim framework. The model, named ''General Dynamics F-22A'' and available in [20], does not simulate the actual F-22. However, the implementation ensures a high-fidelity generic 6 DoF model, including flight control system, aerodynamics, and propulsion, all of which operate at 100 Hz during all the simulations.
Using information from the superstate, the decisionmaking process, that is, the artificial intelligence of an autonomous agent, is performed through a finite-state machine (FSM) and a targeting module (choice of targets). FSM defines the agent's behavior: navigate in formation; navigate towards a target; shoot (fire) a missile at the target; after the shot, perform a maneuver to illuminate the target with the radar; perform a defensive maneuver (evasive maneuver); receive or send communication data. Actions corresponding to maneuver behaviors are sent and executed by the UAV's navigation control system (autopilot). Fire actions and communication actions are sent and executed by the weapon control system and UAV datalink, respectively. The targeting module selects the target with the highest offensiveness value to the agent, that is, the target with the highest probability of being shot down by a missile.
In the MIXR platform, the decision-making process of each autonomous agent was modeled using a computational architecture called the unified behavior framework (UBF) [21].
The UBF architecture, proposed in [21], was developed to model the behavior of autonomous agents encompassing the concepts of reactive controllers. The main advantage of UBF is that it allows the development of behaviors in a modular way, aiming to simplify the development and testing of new functions, code reuse, support projects that easily adapt to large hierarchies, restrict code complexity, and allow the developer of behavior-based systems the freedom to use different behavior options and select the most suitable ones [21].
The UBF architecture allows independent behaviors to be encapsulated in other behaviors with the purpose of creating compound behaviors and increasing their abstraction level.
The use of high-fidelity simulation is an excellent tool for evaluating UAV tactics in BVR combats, as it models the aerodynamics of aircraft and missiles, the electromagnetic envelopes of emitters and sensors, and the behavior of autonomous agents inserted in combat with adequate realism for the study in question. Additionally, the autonomous agents model the target assignment tactic, the missile firing time, the missile support time, and the type of evasive maneuver.
The ASA allows us to perform simulations with different situations. Given the initial configuration of the swarms in the chosen scenario, it is possible to carry out a deterministic simulation, in which the simulator uses the implemented models to define the next steps of the confrontation. It is also possible to use stochastic parameters to include natural uncertainties of the confrontation process. The ones used in this work, so far, are: • RED swarm firing distance: the firing distance used in the simulations, to simplify the missile model used, is based on a static range in which the shortest firing distance is 35 km (19 NM), and the maximum is 53.7 km (29 NM). The ASA can select a shooting philosophy that is associated with the aggressiveness of the aircraft. In an aggressive approach, the aircraft fires at the shortest possible firing distance (e.g., 19 NM). In a conservative approach, the shot is fired at the longest shooting distance. A distance of 44.4 km (24 NM) was adopted for the two swarms in the simulations. However, to create an uncertainty in the firing distance of the RED swarm, a random number generator (in ASA) with normal distribution was used with a mean of 24 NM and variance of 0.025 in some simulations, following the procedure adopted in [22].
• Choice of targets (Target Commit/binary): the assignment of targets performed by agents in the ASA is based on the target offering the highest level of offensiveness in relation to the aircraft that allocates the target (attacking aircraft), but if there is equality of offensiveness, the attacker tends to allocate the target from the left, owing to the radar sweep. To try to minimize this tendency, in some experiments, a stochastic target assignment process was used with a 50% chance of choice for each side when there was a tie in the target assignment. In the case of using stochastic controls, a set of simulations with the same initial conditions (''batch'') can be performed by the ASA, and the result considered is an average of the results obtained.

B. GAME FORMULATION AND OPTIMIZATION
The war game proposed in this work is a confrontation between two swarms of opposing UAVs, a BLUE team and a RED team, in BVR type combat, containing up to four UAVs each, with the same types of aircraft, sensors, and weapons. The purpose of this study was to assess which tactical formation maximizes BLUE survival and lethality.
The game arena starts with a given tactical formation of the RED swarm. The BLUE swarm aircraft are initialized randomly in the arena but constrained in latitude and longitude by a rectangle (P1, P2, P4, P3). Figure 2 illustrates a distribution of four BLUE aircraft in the arena. The position of BLUE01, BLUE02, BLUE03, and BLUE04 is w b1 , w b2 , w b3 , and w b4 , respectively.
In BVR combat, altitude is a factor that directly influences the result, as the higher the aircraft, the greater the range of the missile. Therefore, to give equal combat conditions for the two swarms, it was considered that each aircraft of the two swarms had equal altitude at the start of combat. The simulator uses information of the radar contact from the enemy swarm (around 52 km away) to dynamically vary the position (latitude, longitude, altitude) of each UAV of each swarm during the combat simulation. Thus, the problem is a 3D optimization of the tactical formation in which the swarms have the same altitudes at the beginning of the simulation.
The game corresponds to finding the best initial tactical position of the BLUE aircraft relative to the opposing swarm. The swarm simulation optimization problem is defined by the objective function: in which: • X B is the configuration of the BLUE swarm; • X R is the configuration of the RED swarm; • K B is the number of aircraft shot down by the BLUE swarm at the end of the simulation; • K R is the number of aircraft shot down by the RED swarm at the end of the simulation. A positive value of f(X B , X R ) means BLUE swarm victory; any other value means that the RED swarm wins. The optimal initial distributions of the BLUE swarm are obtained by maximizing (2). Conversely, minimizing (2) optimizes the RED swarm formation.
Notice that the optimization problem formulation does not impose additional restrictions on the initial distribution of the RED and BLUE swarms.
Considering a given initial distribution of the RED swarm, the BLUE swarm configuration corresponds to the set of input parameters (or free parameters): the latitude, longitude, and altitude of each UAV. Latitude and longitude can assume continuous values between the coordinates defined by the rectangle with vertices P1, P2, P4, and P3, and the altitude can take any value but is restricted by the operational parameters of the aircraft.
A simulation optimization [23], [24] is used that consists of searching for configurations of a set of input parameters of a simulation to maximize the value of the objective function based on the output of this simulation.
The basic operation of simulation-based optimization is presented in the simplified scheme in Figure 3. Considering an initial parameter setting (x 0 , y 0 ), each iteration i generates a set of input parameters that are evaluated by the simulation to obtain the value of an objective function f(x i , y i ). The optimization method aims to find a good approximation for the extreme E of f(x i , y i ), according to a set of constraints imposed by the problem. In simulation-based optimization, the simulation is usually not represented by an algebraic model [23]. The simulation is interpreted as a black box that receives the input parameters and returns the outputs. In this context, the optimization algorithm generates the input parameters, and the simulator uses the given input to evaluate the clash result.
In this study, the optimization process was carried out using populational and single-solution-based metaheuristics (MH). Metaheuristics have been successfully applied to solve simulation-based optimization problems [24], [25]. A metaheuristic is an algorithm created to find satisfactory solutions for different classes of problems that may include uncertainties, stochastic parameters, or dynamic information in its mathematical formulation [25].

III. METHODOLOGY
The procedure for optimizing aircraft tactical formations in BVR combats uses high-fidelity computer simulations and autonomous agents created through the unified behavior framework (UBF), considering uncertainty in enemy behavior. The simulation with uncertainties uses stochastic factors, and the value of f (X B , X R ) is the average of ten independent simulations performed in ''batch'' by ASA with the same input data.
War games theory was used to analyze the results obtained.
The main steps adopted are: 1) Studies to optimize the formation of the BLUE swarm for 2 vs. 2 and 4 vs. 4 fights using deterministic simulation; the main objective is to verify the behavior of each MH and the parameters to be used in the next step. 2) Optimization of the BLUE swarm for 4 vs. 4 combats using stochastic simulations; a fine adjustment of the control parameters of each MH is done. 3) Study of the robustness of solutions found for a BLUE swarm: Fights 4 vs. 4. In the first step, experiments are performed considering two identical swarms of two (2) or four (4) aircraft. Several MHs are used to identify configurations that result in the greatest number of successes for the BLUE swarm. The optimization processes were performed using a computational optimization tool developed in-house [26], named LEV optimization framework (LOF), through which several populational and single-solution based MH and their control parameters can be selected, the number of times the problem has to be solved by each MH, and the sequence of application of MH running independently or in a hybridization scheme. The LOF computational tool allows exploration of parallel/distributed execution, among other possibilities.
Because each MH explores different approaches to find the extrema of the objective function, using different ways to do local and global searches, it is interesting to explore different MHs to study the problem instead of only adjusting the control parameters of a given MH. The optimizations were performed with widely used MHs, taking advantage of their different search methodologies: particle swarm optimization (PSO) [27], black hole (BH) [28], vortex search (VS) [29], modified vortex search (MVS) [30], sine cosine algorithm (SCA) [31] and simulated annealing (SA) [32], with different population sizes, neighborhood sizes, numbers of iterations, and numbers of tests. Deterministic simulations are used to evaluate these MH parameters, and then uncertainties are used for the RED swarm firing distance and target designation for the BLUE swarm. Design-of-experiment techniques, such as fractional factorial [33], were used to plan the experiments and define the parameters listed above.
LOF generates different initial configurations for the swarm and uses ASA to evaluate the result of the confrontation with this configuration, according to the problem defined in the previous section. The MH search algorithm uses ASA results to guide the search process for the optimal configuration.
The two software tools, LOF and ASA, are independent and interact using a particular protocol. Such an approach presents a time overhead. Each optimization iteration implies the necessity of loading many instances of ASA. Configuration file readings are needed for each instance. In this context, the focus of our analysis is not related to the time spent to obtain a solution. The total optimization time can be considerably reduced by integrating both tools, avoiding the loading of executables and mass storage device readings.
The second step follows the same procedure but considers two swarms of four aircraft in a stochastic simulation.
The first two steps assume a RED swarm initial tactical formation called line abreast, often employed by air forces, as the case study. In these cases, the game arena starts with the RED swarm aligned under the following condition for step 2: where w r1 , w r2 , w r3 , w rN−1 and w rN are the positions of RED01, RED02, RED03, RED(N-1) and REDN, respectively. d(a,b) represents the distance between points a and b.
The third step is dedicated to analyzing the strength and effectiveness of some of the optimized formations of the BLUE swarm obtained in the previous step. Five different experiments were carried out to accomplish this task. In each experiment, each RED swarm aircraft was positioned ''randomly'' in a region of radius R centered in its initial position assumed in step 2, as illustrated in Figure 4. In each experiment, the radius R assumes the values 0.5, 1, 2, 4, and 8 km, as shown in Figure 4, maintaining the optimized formation of the BLUE swarm. Again, the BLUE swarm's victories are verified for all simulations performed in the optimization of the RED swarm position. The BLUE swarm wins if the objective function (2) results are positive. In a military operation, the commander is advised by the general staff and decides the acceptable risk level [34] and the metrics of operational success and robustness [35]. As this work uses simulations of swarms of UAVs, a higher risk level is considered acceptable; that is, any result other than a loss and draw (i. e., K B -K R > 0) is accepted as a BLUE swarm victory. The robustness of each optimized BLUE swarm formation obtained in step 2 is evaluated in step 3 in terms of the efficiency metric. It is defined as the sum of all objective functions f (X B ,X R ) whose value is greater than zero (for which the BLUE swarm obtained some numerical advantage in combat) divided by the total number of f (X B ,X R ) calculated on the test. The robustness analysis considers the mean, standard deviation, and median. In this work, an efficiency of 80% was chosen to evaluate the mission's success.

IV. RESULTS OBTAINED A. BLUE SWARM FORMATION OPTIMIZATION
This section shows the results of the experiments used to identify optimized formations of the BLUE swarm, assuming line abreast formation for the RED swarm. As presented in the Methodology section, various MHs and MH parameters were explored through several tests to obtain the solutions.
The MHs are particle swarm optimization (PSO), black hole (BH), vortex search (VS), modified vortex search (MVS), sine cosine algorithm (SCA) and simulated annealing (SA) [32], with different population or neighborhood size, numbers of iterations, and numbers of tests.
An experiment is considered complete when an MH is executed with a given population or neighborhood size, maximum number of iterations, and number of tests.
Combats of 2 vs. 2 and 4 vs. 4 were simulated with a deterministic approach and with stochastic factors. The maximum value of the objective function (2) in the 2 vs. 2 combats is 2, and for 4 vs. 4 combats is 4.
The following scenarios were used in the 2 vs. 2 combat experiments: RED aircraft according to Table 1; BLUE aircraft ranging in latitude and longitude from −0.7675 • to −0.7500 • , and from −0.018 • to 0.018 • , respectively, and all aircraft initially at 6096 m (20,000 ft) ( Figure 5). The initial positions of the aircraft in the RED swarm are presented in Table 2 and Figure 6 for the 4 vs. 4   The parameters used in the main experiments with MH are presented in Table 3.

1) ANALYSIS OF EXPERIMENTS WITH DETERMINISTIC SIMULATIONS
In these experiments, we sought to set the MH population or neighborhood size as a multiple (or close to a multiple) of the problem input variables. There are four variables in 2 vs. 2 combats: latitude and longitude of BLUE01 and BLUE02. There are eight variables in 4 vs. 4 combats: latitude and longitude of BLUE01, BLUE02, BLUE03, and BLUE04. In both cases, the altitude of each aircraft of the BLUE swarm at the beginning of the combat assumes the value of the RED counterpart.  With the PSO (combat 2 vs. 2), experiments were carried out with a population of 20 individuals, and 20 iterations were sufficient to reach convergence. PSO did not find cases in which the BLUE swarm had an advantage.
The experiment carried out with BH uses a population of 30 individuals, 20 iterations and two tests. In the initial tests in the 2 vs. 2 combat experiments, it converged to a value of 1 in the 1st iteration. In the 4 vs. 4 combat experiment performed with a population of 20 individuals, 30 iterations, and four tests, BH converged quickly to the maximum value on this type of problem. BH proved to be an MH that worked well for this problem.
The experiments performed with VS converged to objective function values equal to 1 and 4 for 2 vs. An experiment carried out with SCA (2 vs. 2), with a population of 20 individuals, 300 iterations, and five tests converged to unfavorable values for the BLUE swarm in the second iteration. This MH was not suitable to solve the problem and was discarded in the following experiments.
An experiment was also carried out with SA, with a neighborhood of 20, 30 iterations, and four tests. SA obtained the value 4 in the first iteration by chance in all tests.
In this step, the MHs that presented the best performance to solve the deterministic problem were BH, VS, and MVS.

2) ANALYSIS OF EXPERIMENTS WITH STOCHASTIC SIMULATIONS
The stochastic simulations were carried out with random factors in the firing distance of the RED swarm and in the choice of targets for the BLUE swarm, as described in Section 3. The experiments use BH, VS, MVS, and SA. A configuration with 20 individuals (five individuals and four vortices in MVS) in population-based MH or 20 neighborhood size in single-solution-based MH, 10 iterations, a batch of 10 stochastic simulations per individual, and four tests were sufficient to guarantee obtaining in all experiments the value 4 of the objective function at least once before the 10th iteration. Notice that this configuration results in 8,000 simulations per MH. Figure 7 illustrates the convergence of the four tests using MVS. All tests achieve the maximum value of the objective function in up to six iterations. Table 4 shows the number of iterations and time to attain convergence of each metaheuristic obtained using a workstation with 28 processors and 64 GB of RAM. Again, we emphasize that the time registered is only for reference because of the need to launch executables several times and an additional time needed to read the configuration files, as explained in the Methodology section.
MVS shows better performance to achieve optimal values both with the median and with the average; the VS did not reach the maximum value on average until the 10th iteration. SA obtained the best value in the first iteration and maintained this value to the end in all tests. Despite the mean and median in these few tests having reached the value of 3.7, additional studies should be done using SA to better explore the search parameters used in the MH.
Based on these results, a robustness study was carried out using MVS with the same parameters, that is, a population of five individuals and four vortices, 10 iterations, 10 ''batch'' (simulation) sets per individual, and four tests.

B. STUDY OF THE ROBUSTNESS OF THE BLUE SWARM FORMATION
In a real BVR combat scenario, it is common for aircraft to perform variations in the positions of their tactical formations. Therefore, this subsection is dedicated to analyzing the robustness and effectiveness of some optimized BLUE swarm positions obtained in the previous subsection, as mentioned in step 3 of the methodology.
The experiments aim to perform position optimizations of the RED swarm, considering radius distances of 0.5, 1, 2, 4, and 8 km in relation to the initial positions of each aircraft, keeping the optimized position of the aircraft of the BLUE swarm fixed. According to [13], the side-by-side formation allows a position variation of up to 914 m (3000 feet); however, in this study, this limit was extrapolated up to 8 km to verify the robustness of the optimized position against other possible formations.
For each optimization of the RED swarm within a given radius, 8,000 simulations were performed, making a total of 40,000 simulations per MH. For this analysis, the optimization procedure searched for solutions that minimize the objective function (2).
To perform the robustness analysis, one optimized initial formation of the BLUE swarm obtained in step 2 by BH, two by VS, and one by MVS were selected. The optimizations of the RED swarm were performed with the MVS, with a population of five individuals and four vortices, 10 iterations, 10 batches per individual, and four tests. SA and PSO were not used in this step. Figures 9, 11, 13, and 15 show the histograms of the distribution of objective functions obtained in each optimization process, considering the results of all experiments. The horizontal axis is the value of the objective function expressed in an amplitude of 0.5, with negative values when there is a numerical advantage for the RED swarm and positive values when there is a numerical advantage for the BLUE swarm. The numbers of the BLUE and RED swarm wins were obtained considering all the simulations performed in the optimization of the position of the RED swarm.
The same procedure was used to calculate the mean, standard deviation and median presented in Table 5 to 8.
The results are presented as follows.

1) FIRST EXPERIMENT (BH)
The BH optimized formation of the BLUE swarm used has the aircraft in the following latitude, longitude, altitude  Table 2. The position of the RED swarm was optimized, as mentioned in the third step of Section 3. The results are presented in Table 5 and Figure 9.  Table 2. The results are presented in Table 6 and Figure 11.

3) THIRD EXPERIMENT (2 nd VS)
In the third experiment, the initial BLUE swarm formation was also obtained by the VS optimization processes:      (Figure 12). The results are presented in Table 7 and Figure 13.

4) FOURTH EXPERIMENT (MVS)
As the initial BLUE swarm formation, the fourth experiment uses the optimized solution obtained by MVS:  (Figure 14). The results are presented in Table 8 and Figure 15.

5) ANALYSIS OF RESULTS OF THE FOUR EXPERIMENTS
The authors have performed a tactical analysis of the four initial BLUE swarm formations with the support of experts considering firepower, the number of aircraft in missile support, missile support time, and the formation's ability to deceive the enemy radar (binary variable: yes/no). Firepower consists of the aircraft's ability to launch a missile on the first engagement; as each aircraft in the game can only launch one missile at a time, the swarm score ranges from 1 to 4. Aircraft support time is related to the time interval that a retreated aircraft follows the missile by radar providing data to adjust the missile trajectory; the swarm score ranges from 1 to 4. The following classification is adopted for the missile support time: low-supporting, if the support aircraft is 2 to 6 km back; medium-supporting, if the aircraft is 6 to 10 km back; high support if the distance is from 10 to 14 km back. The summary is presented in Table 9.
The first tactical formation is very similar to the Champagne formation [36], which is often used by many air forces. However, the last aircraft is usually aligned with the center position in the Champagne formation. The aircraft in the front uses the armament, evades, and missiles are supported by the aircraft behind.
The second tactical formation can be considered a variation of the side-by-side formation with a short two-aircraft retreat. This swarm has medium firepower and low time support owing to the two slightly trailing aircraft.  The third tactical formation is a widely used formation called box (or offset square) [36], in which the swarm has a balance of firepower and support.
The fourth tactical formation is a variation of the Champagne formation, but two aircraft fly very close and separated in altitude to confuse the enemy radar.
Considering the results of the optimization of the formation of the RED swarm against the different optimized  positions of the BLUE swarm (Table 10), we evaluated the robustness of these tactical formations of the BLUE swarm regarding the side-by-side formation of the RED swarm. The optimization performed with BH and MVS exhibited good robustness up to 1 km. However, from 1 km on, the BH formation becomes fragile, and the robustness of the MVS formation is compromised above 2 km. In the 1st VS, the results at 0.5 km seem favorable to the BLUE swarm. However, by analyzing the mean, median, and standard deviation (Table 6), it appears that this formation is already fragile at this distance, with a reasonable effectiveness reduction. The 2 nd VS solution loses effectiveness for all variations in the RED swarm side-by-side formation, except for the 4 km variation. In the context of war games, different tactics should be adopted by the BLUE swarm, depending on the scenario. For example, if the enemy swarm (RED) assumes a side-by-side formation with no more than 1 km of position variation for each aircraft, the results presented thus far suggest that the BLUE swarm should prioritize the tactics BH, MVS, and 1st VS, in that order. This sequence of options is based on the values of the means, standard deviations, and medians indicated in Tables 5-8. For instance, consider Table 8. If the positions of the aircraft of the RED swarm vary up to four km in the line abreast formation, the mean and the median are positive, meaning that the BLUE swarm wins. However, the standard deviation is very high, and there is a distinct probability that the result of the clash in actual combat is not so favorable.
In contrast, the MVS configuration seems to be the best tactic if we consider the possibility of a variation of up to 2 km for each aircraft of the enemy swarm regarding the central position of the side-by-side formation.
The BLUE swarm can use the 2 nd VS option when the vehicles of the RED swarm are at a distance greater than 2 km regarding the original side-by-side formation.

V. CONCLUSION
Computational optimization using metaheuristics associated with a high-fidelity simulator was carried out to obtain formations of the BLUE swarm suitable for combating the RED swarm organized in side-by-side formation. Furthermore, the effectiveness and robustness of the optimized positions of the BLUE swarm were verified using an optimization procedure of the enemy swarm in a war game approach. This study also overcomes other limitations presented in the literature, approaching, for instance, the uncertainties related to the enemy.
The operational applicability of four randomly selected optimized solutions was verified, and all were feasible. Furthermore, three are well established by operational manuals [36].
The procedure adopted is not limited to the line abreast formation. This formation was selected for the case study and allows the analysis of the effectiveness of the optimized initial BLUE swarm formation even if the original formation of the RED swarm is very different from the side-by-side one.
The two software tools used in this approach, LOF and ASA, are independent and interact using a particular protocol. However, such an approach presents a time overhead because of the need to launch executables several times and additional time needed to read the configuration files. Integrating these tools will significantly reduce the time spent in the optimization process. HERMAN MONSUUR received the Ph.D. degree from Tilburg University, in 1994, with a Ph.D. thesis on axiomatic methods, game theory, and network theory. He studied mathematics (dynamical systems, with a focus on chaotic behavior of deterministic systems) at the University of Groningen. Since 2016, he has been a Professor of military operations research and analysis at the Netherlands Defence Academy. His current research interests include network theory, search and detection, game theory, optimal deployment of UAV's, and critical infrastructure security and resilience.
ANGELO PASSARO received the B.Sc. degree in physics and the M.Sc. degree in nuclear physics from the Instituto de Física da Universidade de São Paulo (IFUSP), Brazil, in 1981 and 1988, respectively, and the Ph.D. degree in electrical engineering from the Escola Politécnica da Universidade de São Paulo (EPUSP), Brazil, in 1998. In 1984, he joined the Instituto de Estudos Avançados (Institute for Advanced Studies), Departamento de Ciência e Tecnologia Aeroespacial (Department of Aerospace Science and Technology-former Aerospace Technical Center) (IEAv/DCTA), Brazil. Since 1999, he has been the Head of the Virtual Engineering Laboratory, IEAv/DCTA, and since 2013, he has been with the Space Science and Technology Graduation Program, Instituto Tecnológico de Aeronáutica, São José dos Campos, Brazil. His research interests include high-performance parallel programming, nanostructured semiconductor devices (quantum well, wires and dots), hypersonic, numerical methods, and computational optimization techniques. VOLUME 10, 2022