Simulation-Based Evolutionary Optimization of Air Traffic Management

In the context of aerospace engineering, the optimization of processes may often require to solve multi-objective optimization problems, including mixed variables, multi-modal and non-differentiable quantities, possibly involving highly-expensive objective function evaluations. In Air Traffic Management (ATM), the optimization of procedures and protocols becomes even more complicated, due to the involvement of human controllers, which act as final decision points in the control chain. In this article, we propose the use of computational intelligence techniques, such as Agent-Based Modelling and Simulation (ABMS) and Evolutionary Computing (EC), to design a simulation-based distributed architecture to optimize control plans and procedures in the context of ATM. We rely on Agent-Based fast-time simulations to carry out offline what-if analysis of multiple scenarios, also taking into account human-related decisions, during the strategic or pre-tactical phases. The scenarios are constructed using real-world traffic data traces, while multiple optimization variables governed by an EC algorithm allow to explore the search space to identify the best solutions. Our optimization approach relies on ad-hoc multi-objective performance metrics which allow to assess the goodness of the control of aircraft and air traffic regulations. We present experimental results which prove the viability of our approach, comparing them with real-world data traces, and proving their meaningfulness from an Air Traffic Control perspective.


I. INTRODUCTION
Air Traffic Management (ATM) is a complex socio-technical system composed of multiple entities and physical/human actors. Its ultimate goal is to safely and efficiently manage the flow of aircraft in all flight phases (departure, en-route, landing). The complexity behind ATM lies in the fact that the number of systems, devices, and the amount of people involved in the process is large, as well as the number of aircraft which have to be simultaneously managed. Moreover, this number is subject to bursts in rush hours, thus making the overall process even more complicated. Finally, an additional The associate editor coordinating the review of this manuscript and approving it for publication was Haluk Eren . level of complexity is related to the interoperability and the harmonization of different ATM systems, since various countries have developed their own ATM system individually, each one concentrating on its specific requirements.
Today, enhancing ATM is a relevant international objective, and it is fundamental to be ready to meet future traffic demand and environmental requirements. To overcome the complexity related to interoperability, there is an ongoing effort aimed at designing new services [1]. However, the high complexity of ATM makes enhancements extremely difficult, because performing in-vitro testing of new procedures or equipment is almost unfeasible. At the same time, the criticality of ATM makes it highly improbable that possible enhancements can be safely tested in real-world scenarios, also considering the safety-wise and performance-wise critical requirements of ATM services.
In this article, we propose an approach based on computational intelligence techniques which, combining Agent-Based Modelling and Simulation (ABMS) [2] and Evolutionary Computing (EC) [3], allows to carry out offline what-if analyses of changes to ATM policies or procedures, and supports the design of new solutions aimed at ATM system optimization. The purpose of these analyses is to explore new configurations for the ATM system in advance, namely during the strategic (i.e., several days before the actual operations) or pre-tactical phases (i.e., up to several hours before the actual operations). Such optimization is intended with respect to the estimated performance of the simulated ATM system and implies the minimization or maximization of some reference ATM performance metrics in the identified simulation scenario.
We designed a distributed simulation-based evolutionary optimization architecture which enables what-if analyses via an agent-based simulation model, and allows to evaluate objective measures (such as the timeliness of the flights in a set of sectors), as well as subjective measures related to human behavior-in particular the behavior of Air Traffic Controllers (ATCOs). This simulation model is placed at the center of an optimization loop, which allows the modification of input parameters to the simulation model so as to identify properly-tuned configurations which increase the global performance of the ATM system. Overall, our approach involves ABMS to discover the evolution of the ATM system under certain constraints, and EC to explore a complex search space to better understand how architectural and design choices influence the ATM system at micro-scale, i.e. at the level of individual agent behaviors. The versatility of our optimization architecture lies in the full decoupling between the exploration of the search space, the assessment of the goodness of a specific solution by means of ABMS, and the evaluation of performance metrics. The utilization of different simulation paradigms or the inclusion of additional performance metrics can be easily plugged in the proposed architecture.
The optimization process we designed is based on Evolutionary Algorithms (EAs). EAs are a category of metaheuristics-based algorithms. Inspired by concepts from nature, such as evolution and natural selection, they are particularly useful to provide solutions to computationally-intensive problems. They maintain a population of individuals which must compete for survival, and new offspring are created by recombining and mutating individuals selected from the population.
Placed in the context of ATM optimization, our proposal has a twofold goal. On the one hand, our architecture allows to perform an impact assessment of the changes applied to an ATM system. On the other hand, we support the design of new solutions, by identifying the optimal tuning of specific design parameters to achieve some required performance level. Generally speaking, according to our methodology, both approaches involve the following steps: i) modeling the part of ATM system of interest representing some reference scenario, i.e. the baseline which represents the basis for the change design; ii) modeling the scenario under the envisaged changes in the system; iii) identify the metrics of interest to study the impact of the changes; iv) simulating both the reference and the changed scenarios; v) assess the impact of the changes according to the selected metrics. Our architecture allows to effectively carry out all these steps, with minor human intervention, to study the behavior of an ATM system under differentiated characteristics.
We provide a detailed description of the overall simulation-based optimization architecture, along with an assessment comparing the performances achieved in two different scenarios, based on real-world traces. The two scenarios relate to ''direct routing'' (i.e., when an aircraft flies through the great circle arc 1 between 2 assigned waypoints) and ''free routing'' (i.e., when the pilot has the ability to plan/re-plan a route according to some defined preferences) operations of the ATM system. The selected case studies are highly representative of the ATM-domain complexity, and have been identified to stress-test our solution due to the inclusion of all elements of socio-technical systems. In detail, the transition from direct routing to free routing has already been tackled in Italy from 2013 to 2016, therefore it represents an ideal ''oracle'' with its known change history. Indeed, these scenarios have been used as a test oracle for our validation activities.
The remainder of this article is organized as follows. Section III presents the agent-based simulation model we use in our architecture. In Section IV, we discuss the evolutionary optimization algorithm which allows to explore the search space, seeking for a well-tuned configuration of the input parameters. Section V presents the distributed optimization architecture which we envisaged to cope with the computational demand of our optimization strategy. An experimental evaluation of our proposal, both from a performance and a correctness point of view, is presented in Section VI. Related work is discussed in Section II.

II. RELATED WORK
Agent-based Evolutionary Search (AES) is an emerging research paradigm lying at the intersection between EC and ABM. It is gaining attention [4] as a viable and innovative way to investigate complex adaptive systems thanks to a conceptual framework that allows to capture both the ''simplexity'' of ABM and the robustness and adaptability of EC methods.
In the context of ATM, some recent proposals in the literature have explored the possibility to rely on Evolutionary Computing to study or optimize some characteristics of ATM (see, e.g., [5]- [9]). This is an indication of the fact that this 1 The shortest path between two points on the surface of a sphere is given by the arc of the great circle passing through the two points. A great circle is defined to be the intersection with a sphere of a plane containing the centre of the sphere. kind of optimization strategy is hot. Nevertheless, no proposal has tried to consider all the aspects addressed by our work, nor ABM simulation-based optimization has ever been used for the purpose.
Modelling socio-technical systems has been largely studied in the last decade. Different and interconnected trends are notably to mention [10]: Actor Network Theory (ANT), Agent-Based Modelling technique (ABM), Bayesian Belief Network (BBN), Configuration Modelling (CM). Fuzzy Logic (FL), Morphological Analysis (MA), Social Network Analysis (SNA) and finally System Dynamics (SD). Such approaches have been applied to very different fields from project management to software engineering, from energy to aviation.
The choice of the most suitable formalism may depend on different aspects, the level of abstraction that it is expected to target, the capability to model and distinguish between human and non human elements, the capability to manage the interdependencies, the feedback and the uncertainties. In the last ten years, ABM has been used for modelling complex systems in different domains [11], including ATM.
There are several commercial tools to assess ATM concepts. Among them, we mention Simmod, 2 AirTOp, 3 CAST, 4 RAMSrams plus, 5 Total Airspace and Airport Modeler 6 (TAAM) or the AgentFly ATM simulation suite [12]. It is notable the effort put by NASA and FAA to develop FACET [13]. Most of these tools provide detailed models of airports and airspace for fast-time gate-to-gate simulation; very few use multi-agent architectures for different actors of the scene, e.g. for airport controllers. None embeds the human behavior modelling. In recent years, some attempts have been made within the SESAR Program. Many exploratory research projects (e.g., ELSA [14], ACCESS [15], TREE [16], CAS-SIOPEIA [17], MAREA [18], SPAD [19], EMERGIA [20]) have adopted the agent-based modeling paradigm to address a wide number of different ATM problems. In these projects, the agent-based simulation is used (usually in combination with other techniques) either as an analysis tool to understand emergent behaviors of the ATM system or to study resilience and disturbance propagation, or even as a tool to determine airport slot auctioning and allocation. Most of these studies address technical aspects in modeling, without considering the social aspects that influence the overall performance of the ATM system, and focus on specific cases instead of defining a methodological approach for modeling and simulation to assess the change impact.
A notable work [21] relies on ABMS as a method for capturing emergent behavior of the socio-technical air transportation system, so as to evaluate novel system designs. The author focuses on the identification of emergent safety risk of an active runway crossing operation, on the evaluation of the role of coordination in Airline Operations Control (AOC) resilience. Differently from our proposal, no utilization of EAs is proposed to rely on simulations so as to optimize ATM scenarios.
As concerns some technical choices which have driven the design of our evolutionary optimization architecture, Scott and De Jong [22] performed both theoretical and practical analysis of a simple asynchronous master-slave EA. They found that the amount of increased throughput that can be expected from an asynchronous EA depends on the number of slave processors, the size of the population and the variance of the evaluation time distribution. Furthermore, they investigated whether there is a bias towards fast-evaluating solutions in asynchronous EAs. In fact, when evaluation time is a heritable trait, many fast-evaluating individuals may be born, evaluated and compete for a place in the population in the time it takes for a single slower individual to be evaluated.
Similarly, Yagoubi et al. [23] observed evidence that asynchronous EAs had a harder time at finding good solutions when they were located in a region of the search space that was artificially configured to have slower evaluation times, and they similarly found that there is no independent selection pressure that favors fast-evaluating individuals. Indeed, they found that an asynchronous EA is able to find a global optimum on the Holder table function much more frequently than the generational EA does and that the asynchronous EA does not exhibit any particularly adverse performance. To the contrary, it does a good job of delivering the promised speedup in time-to-convergence and it has a much smaller risk of premature convergence. This does not, however, rule out an evaluation-time bias that can emerge from a combination of reproduction, variation and selection, so the results of [22] do not contradict the bias observed in practice by [23]. Overall, the studies in [22], [23], confirm several of the design choices which have driven the construction of our optimization architecture.

III. AGENT-BASED ATM SIMULATION
In this section, we present our agent-based simulation approach. We discuss the main challenges to face when designing ATM simulation systems and the related modeling problems, and we discuss the design choices that have been put in place, along with the related motivations. The overall organization of the Agent-based Model (ABM) based simulation system is depicted in Figure 1.
We implemented our ABM simulation system as a stochastic time-stepped Discrete Event Simulation (DES) model. This is a traditional paradigm typically used in the context of ABMS, but given the complex ATM scenarios which we target, great care has been put in the management, dispatching, and scheduling of the events, as well as into the ontologies formalizing the agent interactions.
The ABMS system is composed of different components. Agents implement the most important part of the logic in the simulation, and in a given simulation scenario there can be any required number of agents. At the same time, the system ships with some fundamental ATM-related algorithms, which allow to configure the overall simulation run, or which allow the different agents to carry out tasks which are related to the current dynamics of the simulation scenario. The agents interact with these algorithms by means of explicit function calls in the agents' implementation. The set of algorithms which have been implemented are: • Air space configuration: this algorithm creates a representation of the structured air space (Functional Airspace Blocks, sectors, waypoints, etc) as to be used in the simulated scenario.
• Extract Transform Load: it allows to transform real traces associated with flown flights into the internal representation which is used to carry out the airspace simulation.
• Conflict Detection: this set of algorithms implement medium-and short-term conflict detection, which can be used by ATCO agents to trigger corrective actions with respect to the current state of the observed flights.
• Conflict Resolution: this set of algorithms allow ATCO agents to make decisions on the current state of the flights.
• Flight Management System: this is a set of algorithms used to provide the relevant flight information, including the flight intent (expressed as a 4D profile until destination), and allow to modify trajectories according to simulated ATCOs' directives. The model is fed with an air space configuration, in the form of several sectors to which multiple agents are then mapped. These agents are in charge of monitoring and taking decisions with respect to the simulated air traffic-e.g., an ATCO planner and an ATCO executive. Input parameters to the model allow to fine tune the behavior of the agents, in the face of the different simulated conditions that can be encountered.
Traffic information is extracted from so6 files [24], a standard file format for ATM which stores a description of 4D-trajectories of flights in the European airspace. Our dataset is provided by Eurocontrol via their Demand Data Repository 2 (DDR2). 7 Similarly, a dedicated 7 https://www-test.eurocontrol.int/ddr. Extract-Transform-Load layer transforms the data contained in so6 files into an internal representation which maps the traffic information immediately to airspace sectors. Moreover, the agents involved in the simulation can, at any time, alter the trajectory of the flights, depending on the trajectory of the stochastic simulation.
There are agents of different nature available in the ABM system. As mentioned, any number of agents can be spawn in a simulation run, depending on the input configuration (in terms of flights and sectors). In particular, at the time of this writing, we have implemented the following agents: • Pilot: this agent is responsible for flying an aircraft, and interacting with the ATCOs in order to obtain clearance to carry out several tasks.
• Planner Controller: this agent is the ATCO who is mainly responsible for the coordination of the traffic entering or exiting within the sector.
• Executive Controller: this agent is the ATCO who is responsible for the safe and expeditious flow of all flights operating within its sector. This agent monitors and separates flights that operate within its area of responsibility and, if necessary, it issues instructions to pilots for conflict resolution.
• Aircraft: this agent implements an aircraft. The reason why we have decided to implement the aircraft as agents is that different companies typically implement different flying strategies. Therefore, this level of abstraction allows us to better capture the intrinsic variability in the behavior of different flights, and enhances the capability of the system to observe emergent behaviors.
• Controller Working Position: this agent implements the controller workstation where traffic can be monitored allowing for situational awareness. We have decided to implement a technical part of the system as an agent to reach the goal of allowing to perform what-if analyses of changes in the overall organization of the ATM system. In this sense, mapping a technical system to an agent allows to experiment more easily with changes at the level of ATCOs operations. As mentioned, one goal of our simulation-based architecture is to allow us to study the behavior of the ATM system including the behavior of the humans which take part of it [25]. Therefore, we have identified a set of human-behavior variables which are related to the ATCOs'tasks, also taking into account the organizational context and traffic situation. The ATCOs' behavior has been modeled [26] by characterizing the tasks assigned to each human actor according to both the human and context variables. This characterization of tasks includes a variety of factors, such as time sensitiveness, task complexity (number of alternative and optional actions in a task), behavioural level (skill/rule/knowledge based), resources allocation (human/computer/human with computer assistance).
The variability of pilots' behavior is represented with a higher-level of abstraction thanks to a characterization of airlines' standard interactions with ATCOs. One of the main modeling targets is to try to emulate the time variability of the ATM tasks by properly representing the ATCOs' mental resource availability and the tasks load considering the contextual conditions. Thus, when an ATCO is performing monitoring functions and a conflict is detected-namely, the possibility that two aircrafts infringe separation minimathe available mental resources will be different if at the same time the ATCO is evaluating a conflict resolution of another conflict in its sector.
Overall, the agents (with particular focus on ATCOs and pilots) are described in terms of multiple sets of attributes: • static attributes: these are attributes which are determined at simulation startup time, and typically persist throughout the whole simulation run; • dynamic attributes: these are human factor variables which can change at runtime, and typically allow to capture the current state of an agent, depending on the dynamic context evolution during the simulation; • cultural attributes: these are attributes that are drawn at simulation startup depending on some probability distributions, which are configured according to the input configuration of the simulation. We report in Tables 1-3 a characterization of all the variables which are represented in an agent, and which influence the evolution of the simulated ATM system. The metrics built on top of these aspects (as discussed in Section IV-E) will allow to establish a characterization of the ATM socio-technical system [27].
Some agents in the system can interact directly with each other by means of event exchange. Nevertheless, this solution does not scale, given the complexity of the scenarios which we tackle by means of ABM simulation.
Indeed, the performance of an ATM system is strongly affected by the decisions made by ATCOs involved in the process of managing the flights. Different ATCOs, depending on their expertise, might rely on different strategies to minimize the delay of flights and improve the overall safety. Moreover, different companies may have different business models that imply different objectives. In general, the human behavior which shall be captured by the ABM simulation system is strongly related to the type of activities performed by the agents. As an example, some companies might have as their primary goal the minimization of fuel consumption, while other companies might primarily target the comfort of their customers. Accordingly, the first ones typically ask the ATCOs in a stubborn way the permission to follow routes which minimize fuel consumption, independently of the current traffic condition, while the second ones ask for routes which, e.g., avoid severe weather conditions-again, independently of the current traffic. ATCOs have to manage all these aspects at once, still guaranteeing the safety of the flights. Ultimately, these different goals of different companies, especially in heavy-loaded traffic scenarios, increase the stress of ATCOs, who are in the end legally responsible for the safety of air traffic.
Generally speaking, the relations and information flow that characterize the ATM system could be too complicated to be handled in a closed form with respect to the large amount of interactions and interconnections which characterize the high number of agents which are involved in the simulation. Therefore, the ABM simulation system which we have devised relies on a projection module, which allows to support complex relationship dynamics described by multiple loops and chains, nested loops, mutual cross-feed relationships connecting the agents, inhibitory connections, and preferential reactions in the face of different events. The projection module acts as a broker of events and actions, which allows to efficiently interconnect the large amount of agents in a single simulation, in a data-and event-driven way. The projection module is one of fundamental components which allows to formalize the agent interactions and to analyze the aggregated behaviour from ABM simulations. This module allows for a fast extension of the simulation model, if new agents or different algorithms/behaviors shall be included to extend the modeling capabilities of the architecture.
The ABM model represents the agent behavior in terms of tasks which, in turn, are decomposed into elementary or atomic actions, whose execution consumes time and resources (both human and/or technical). A hierarchical task analysis has been performed to identify the relevant human procedures and protocols, as well as the systems and supporting tools relevant to the simulation scenarios. This hierarchical structure allows both to model tasks performed by a single agent (e.g. the continuous monitoring of the sector activity) as well as to allocate each of the actions of a distributed procedure involving several agents (e.g. the flight handover among two sectors). In the second case, an ontology formalizes the required interaction and communication among the involved agents.
On top of time events in traditional DES systems (e.g. sector monitoring is continuously executed), the simulation is driven by the so-called traffic events, emulating human and technical-component reactions in the face of the operational context evolution-e.g. an aircraft is approaching, entering or leaving a sector, a potential conflict has been detected by the ATCO or by the supporting tools, a communication request has been received from another agent, etc. Time and Traffic events trigger the task execution which consists in performing the corresponding actions. As the result of a performed action, a new action in the same task or a new task can be triggered. In this sense, the projection module plays a fundamental role, because it becomes a co-orchestrator of task-scheduling activities, effectively bridging the gap between two different traditional programming models, namely DES and task-based ones, such as OpenMP [32].
Once an action is triggered, the agent will execute it, provided that the agent is ready and all the required resources (human and/or technical) are also available. This means that at a given time step, an agent can have a certain amount of concurrent actions to perform, which are associated with the different tasks that the agents carry out. Parallelism at action level has been taken into account considering the following criteria, which are reasonable for the ATM scenario from the human factor perspective: i) no more than two actions can be executed in parallel, except when they involve shared physical resources (voice, hearing, sight, hand touch, foot touch); ii) both potential concurrent actions belong to a skill or rule-based task; iii) none of the potential concurrent actions is a safety-related priority action-prioritization criteria depend on the contextual traffic scenario. If an action cannot be executed, then it remains in the pending state, until the system determines that it can be executed or it is cancelled because of a specific deadline (in some cases, performing an action makes sense just during a given period after being triggered). Figure 2 shows the Finite State Machine which describes the lifecycle of an action in the system. Once an action is triggered, an object that represents it is created and handed over the projection module, along with the associated set of preconditions P, the set of resources R required for execution, and the temporal execution deadline T . A given action, thanks to the projection module, is constantly monitored. Once the conditions in P, and R are met-this could happen after that a previous action is completed, thus meeting some precondition in P, or because some resource in R becomes availablethe action is started. During the execution, the action will involve all the required agents in the model. Once the action is completed successfully, the projection module and the scheduler will determine whether some additional actions are ready to be started. On the other hand, if the time required to complete the action T expires, the action is considered as failed and the task associated with the action will be either restarted or cancelled. The human performance model consists of specific rules that combine traffic complexity, past traffic events and airline characteristics with the human variables identified as relevant for the specific task. This approach enables the simulation model to assign a nominal or degraded behavior to the human agents. The degraded behavior has the effect to extend the duration of the assigned action of a percentage of the nominal duration. Such a human-performance degradation model represents the effect of human behavior variability on the overall ATM system and its single parts.
The output of the simulation model is composed of two different pieces of information. On the one hand, a new so6 file is generated, which describes the simulated trajectories which show differences with respect to the planned trajectories supplied as model input. At the same time, a log file, which reports all the actions taken by the agents, is generated. These outputs are used to compute the objective measures, as we shall describe in Section IV-E.

IV. EVOLUTIONARY OPTIMIZATION FOR ATM SYSTEMS
In this section, we focus on the optimization process aimed at identifying properly-tuned configurations which increase the overall performance of the ATM system. The optimization problem that we tackle here relates to identifying the best non-dominated solution, i.e. the solution which minimizes or maximizes metrics related to the ATM scenario among all feasible solutions, i.e. solutions which do not violate critical thresholds (such as safety separation). We introduce our optimization approach based on evolutionary computing, and then we discuss the formulation of our multi-objective optimization problem and the definition of the objective functions.
In an ATM system, there are generally multiple metrics of interest to be optimized. We designed a simulation-based optimization architecture which relies on an Evolutionary Algorithm (EA), which manages a population of configurations for the ABM simulation system discussed before. The exploration of the overall search space is driven by multiple objective functions, which also target variables associated with the human behavior, such as the stress experienced by ATCOs. These functions are included in a multi-objective minimization problem, whose solutions identify the best-suited configurations of the input parameters. The multiple objectives are jointly optimized at the same time, so that the fitness function used to evaluate the goodness of the individuals is directly derived from these objectives.
The idea behind EAs is simple: if a population wants to thrive, it must improve constantly; the fittest individuals should inspire the new offspring, but the other individuals should not be forgotten in order to maintain diversity within the population, needed to produce outstanding individuals. By borrowing terms from biology, the individual or structure is called chromosome, and represents an encoded solution to some problem, in particular decoding into a set of parameters that are the input to the optimization function of the problem under consideration.
Overall, our optimization approach is based on a parallel/ distributed variant of the NSGA-II algorithm [35]. In the following, we provide the basics to discuss our design choices of the optimization engine. This discussion is motivated by the ''no free lunch theorem'' [36], which states that no algorithm can be better than all other algorithms on all problems. We therefore describe what principles behind EAs have driven the selection of the final organization of our optimization engine, with respect to the ATM system.
The first issue that must be faced is the representation of individuals, namely the choice of the data structure defining an individual. This representation affects the operators that are applied to the individuals, and is a fundamental building block-although not the only requirement-towards the creation of a solver which could be easily adapted to multiple optimization scenarios. In EAs, the data structure storing an individual has a fixed size, and therefore it shoud be general enough to have the potential to contain an optimal solution. Conversely, in Genetic Programming (GP) approaches the representativeness problem is solved by having individuals stored in variable-size data structures, mostly in parse trees [37]. In the context of ATM optimization, we can observe that configuration variables are mostly numerical. Therefore, it is sufficient to represent an individual as a (fixed-size) set of uniform parameters, without the need to resort to more complicated GP approaches. Generally speaking, we assume that there are multiple input variables to optimize: it is possible to suppose that this number can even change in different ATM-related optimization scenarios. Anyhow, it is legit to assume that a single population is homogeneous, i.e. that all the individuals in a single population (namely, an ATM optimization problem) will share the same set of explanatory variables. Therefore, the optimization framework and algorithm will have to deal with a number of input variables to optimize which is not necessarily known a-priori, but which is stable in a single optimization process. A more complicated individual representation, like the ones used for GP, is not necessary in our scenario.
Any EA must first create an initial population of individuals. Larger populations have higher initial diversity, but there is a trade-off between the need for sufficient diversity to ensure the identification of a proper optimal solution, and the need to avoid a so-large population that slows down the time-to-solution. It is also important that the initial population encompasses a wide range of different solutions. In our ATM context, the domains upon which the single components of the individuals are known beforehand. This knowledge can be exploited in the initialization of the population. In particular, we setup an initial population with a number of individuals proportional to the space covered by the domains of the variables, and each individual is randomly initialized.

A. MULTI-OBJECTIVE OPTIMIZATION
As discussed, ATM optimization requires to solve a problem characterized by multiple objectives. These objectives are formally captured by the metrics that we will discuss in Section IV-E. These metrics can not be handled in isolation, since some of them might be in contrast to each other. As an example, the timeliness of a flight might be in contrast with safety separation. The straightest route could increase timeliness, but it might produce safety-separation violations.
Thus, the problem which we address must consider all the objectives at once. It therefore comes handy the family of multi-objective optimization problems (MOPs). A MOP minimizes f(x) = (f 1 (x), . . . , f k (x)) subject to the constraints g i (x) ≤ 0 i = {1, . . . , m} and h j (x) = 0 j = {1, . . . , p}, where k is the number of objectives and x = (x 1 , . . . , x n ) is an N -dimensional vector.
Having several objective functions f i , the notion of optimum becomes the Pareto optimality (or Pareto efficiency), which is a state of the variables under optimization from which it is impossible to move so as to make any other configuration better off, without making at least one variable worse off. Considering the example of timeliness vs safety separation, this concept becomes clearer: timeliness might not be the best configuration on its own, but increasing timeliness might make safety separation worse. Therefore, with this approach, the search space is explored with the goal of finding the best-suited trade-offs among all the variables to be optimized. More formally, given f : is called the global minimum, where the set of all x * i is called the global minimum solution set, f is the multi-objective function, is the feasible region, and is used to indicate the Pareto dominance.
The concept of Pareto Front is illustrated in Fig. 3, in a sample search space composed of only the two objective functions introduced before. The boxed points represent feasible solutions to the problem, and smaller values are preferred to larger ones-a minimization problem. Point C is not on the Pareto front because it is dominated by both point A and point B. Points A and B are not strictly dominated by any other, and hence do lie on the front. Deciding to pick either A or B as the final solution highly depends on the problem being tackled, and/or the search algorithm being adopted. Multi-objective EAs (MOEAs) algorithms which allow to solve a MOP are generally based on three different techniques: i) a-priori techniques, ii) progressive techniques, and iii) a-posteriori techniques. A-priori techniques require to define the relative importance of the MOP objectives prior to the search. The optimum solution is then obtained by optimizing each objective in sequence, from the most to the least important. This strategy is not suitable for ATM systems, because it is not practical to assign a priority to different objectives. In this scenario, Foruman has proposed [38] to define an order by selecting at each evaluation a random objective. This could lead to wrong results in our contextas an example, safety-separation could be deemed to be not important, which is unacceptable. Similarly, progressive techniques have to define preferences to bias the search. The main issue here lies in the definition of the methods to incorporate the preferences into the MOEA.
We have therefore decided to exploit an a-posteriori technique, which tries to generate the elements of the Pareto optimal set (performing a broad search) followed by a decision-making process. There are several subclasses of these algorithms. We have explicitly discarded approaches which compute the fitness of the individuals according to some linear combination of the objective functions, or which consider only a subset of the objectives, as these approaches have been proven to suffer from an increased time-to-solution or they could miss an optimal solution at all [39]. This has been the only choice, given the large search space characterizing ATM systems, and the criticality of the identification of the optimal solution for the redesign of such systems.
Conversely, we have adopted a Pareto-based approach, which assigns a fitness value to each individual according to Pareto dominance, and therefore directly considers for reproduction all non-dominated individuals in the current population. Since those non-dominated solutions once found are not guaranteed to survive in the next generation, an additional population containing all non-dominated solutions found so far is included [40]. This secondary population is updated to include the newly-found non-dominated solutions and is periodically culled, since Pareto dominance is dependent upon the set on which it is evaluated. The drawback of this approach is that the update rate drastically increases the cost of the algorithm, and as the population size grows, the comparison time may become dominant. This is a relevant aspect. Considering that the evaluation of the population is based on an ABM stochastic model, a large number of simulations is required to effectively explore the search space. In our implementation, we paid close attention to reducing the time required to carry out a single simulation. Based on this, we have shown experimentally (see section VI-A) that, in our context, this approach is computationally viable.
We borrow the Pareto-based approach from NSGA-II [35]. The fitness of an individual is an integer number (the rank), representing a non-dominated front. An individual having rank equal to zero is a non-dominated solution in the current set, and individuals with rank i, with i ≤ M and M the population size, are dominated only by individuals with smaller ranks. In order to sort the individuals into non-dominated ranks, we rely on the fast non-dominated sorting approach requiring O(kM 2 ) time and O(M 2 ) space.

B. ELITISM
Another aspect to deal with in EC is elitism. This concept, borrowed from sociology, describes the belief that individuals who form an elite are more likely to be constructive to society as a whole, and therefore deserve influence or authority greater than the others. In the EA jargon, elitism refers to the high likelihood that a few individuals in the population are the best ones, and therefore should be preserved in the next iterations as much as possible.
Given the large search space of ATM optimization problems, elitism can be exploited as a way to reduce the time-tosolution. We preserve elitism by the strategy used to make the population evolve. In particular, after the computation of the rank of the individuals in the current generation of the population, we select the best-suited ones for reproduction, in order to generate an offspring. Different strategies exist to properly deal with the offspring. Our strategy is based on a variation of the Double Tournament Selection, called Binary Tournament Selection [41], where a group of two creatures is taken from the population and the most fit individual is picked as a parent, repeating the process to get a second parent, choosing whether the first parent is excluded in the second selection. In particular, we randomly choose two individuals and if they differ in the non-domination rank, we select the solution with the lower (better) rank. Otherwise, if both solutions belong to the same front, then we select the solution that is located in a ''lesser crowded'' region, to preserve diversity. The selection is based on a combined population, containing individuals of the offspring and individuals of the parent population. Since both the previous and current individuals are included, elitism is ensured.

C. CROSSOVER AND MUTATIONS
In EAs, children can be generated by applying either a single-point or a double-point crossover operator to the pair of parents, depending on the number of variables which are used to represent an individual. The crossover operator is used to combine disparate creatures facilitating novelty, and therefore leading to a broad search. In particular, it is not viable for our ATM scenario to consider an individual as a mere bit string, as in generic EAs. Indeed, as we have already discussed, every input variable is associated with a feasibility domain. This domain already limits the search space, because it excludes configurations which are a-priori known to be unacceptable with respect to the optimization measures. As an example, setting the minimum horizontal separation between two aircraft to a too low value would increase the likelihood of safety-separation violations. Since, as mentioned, our input parameters are all numerical (in most cases, they are real numbers), applying a crossover operator in the middle of the binary representation of a parameter will likely move far away the parameter from the feasible domain-this is related mainly to the binary representation of floating-point values. Our crossover operator, therefore, determines the crossover point at the granularity of a VOLUME 8, 2020 single (atomic) parameter. If the number of parameters is high (over a predetermined threshold), we identify two crossover points, and the exchange of portions of the parameters is carried out in between these two points. In case of a large number of parameters, a two-point crossover operator is likely to explore the search space more conservatively, preventing individuals in a new generation to drift too much from points associated with the parents in the search space.
With some random -probability, the newly-generated individual is mutated. The mutation operator facilitates the local search (especially when a tiny portion of the individual representation is mutated). Again, mutations must not violate the domains of the parameters. To this end, a mutation is applied in terms of an increment/decrement of the current value of the parameter, which anyhow is tested to keep the mutated value in the domain. The magnitude of this variation is such that the variable does not oscillate from the previous value of an amount greater than 10% of the span covered by its associated domain. Empirically, we have observed that this criterion fastens the convergence time to an optimum in ATM scenarios.

D. DIVERSITY PRESERVATION
In our ATM scenario, we do not want the overall population to grow to a large extent, particularly to keep bounded the amount of time which is required to run the simulations needed to evaluate the fitness of the individuals. Therefore, once the children have been generated, we borrow the replacement technique of NSGA-II, to ensure that the average fitness increases over time, to guarantee a sufficient diversity in the population, and to enhance elitism. This is an important point, because (as discussed) losing genetic variance can lead to sub-optimal solutions.
In more details, we adopt a non-dominated sorting approach, where for each solution we calculate the number of solutions dominating it (the domination count), and the set of solutions it dominates-this requires O(kM 2 ) comparisons. Each non-dominated solution has a domination count equal to zero. Then, for each solution with domination count zero, we reduce the rank of the members of the set of solutions it dominates, and if the domination count of any member becomes zero, we put it in a separate list representing the second non-dominated front. The process is repeated to identify all fronts. Since each solution can have dominance rank at most M − 1, it will be visited at most M − 1 times before identifying its front, thus requiring O(M 2 ) time. The overall complexity is therefore O(kM 2 ).
Diversity preservation is faced up by using the crowding distance estimator. This estimator assigns to each solution a real number representing the density of solutions surrounding it. The crowding-distance computation requires sorting the population according to each objective function value in ascending order of magnitude. For each objective, the solutions with the highest and the smallest values are assigned an infinite crowding distance value. Each other solution is assigned a value equal to the difference between the values of its adjacent solutions in the current sorting, divided by the difference between the highest and smallest values. The computation is continued with other objective functions and the overall crowding-distance value is calculated as the sum of individual distance values for each objective. In other words, the crowding distance is an estimate of the density of solutions surrounding a particular solution. Fig. 4 shows a graphical representation of this metric in a two-dimensional world. When a non-dominated set does not fit entirely in the new population, because the number of its members is higher than the remaining space, we sort the solutions in it using the crowded-comparison operator in descending order and choose only the best solutions to fill the remaining new population slots.

E. ATM OBJECTIVE FUNCTIONS
The objective functions which we use to drive the exploration of the search space are performance metrics which are related to both human and physical aspects of the ATM systems. We introduce the following parameters that we use to define the objective functions: • M is the total number of involved flights, • N m is the total number of sectors traversed by one flight, • t n−1 is the time at the entrance in a sector, • t n is the time at the departure from the sector, • T n is the expected time to cover that sector, • f n is the specific fuel consumption per unit thrust and unit time, • S n is the throttle setting of the thrust (in Newton), • G is the total number of groups of ATCOs, and • N m is the total number of actions of group m. The performance metrics we consider are the following: • Timeliness. It is evaluated as the time in seconds in a sector, namely: It is defined as the number of flights in a given sector, in a specified timeframe (in our case, 30 minutes).
161560 VOLUME 8, 2020 • Safety. It is evaluated through minimum vertical or horizontal separation, whose minimal thresholds depend on the type of route (e.g., national vs transoceanic) • ATCO workload. It is a human-related metric which describes the cognitive resources required to accomplish the various tasks. Namely, it is numerically expressed as: In the last metric, the parameter ω g measures the comparable workload. This parameter depends on a characterization of the ATCO. In particular, we consider different aspects which affect the value of ω g , namely the agents' attributes previously presented in Table 1.
Both physical and human-related metrics are computed starting from the output of a simulation run. As mentioned, the simulator generates an so6 file and a log of timestamped events. From the former, we evaluate the trajectory-related metrics associated with occupancy, timeliness, safety, and fuel consumption. Timeliness, occupancy, and fuel consumption are computed relying on the traffic library [42] and Ope-nAP [43]. To this end, we compare the trajectories recorded in the so6 file generated by the simulator, and the ones in the original input so6 file-which is preprocessed to determine the reference values for the performance metrics. For each mutated trajectory, we are able to determine the impact on the original metrics, and generate the new measures.
From the generated log file, we compute the final human-related metrics. In particular, this can be done by parsing the log-an excerpt of the log of one simulator is reported in Fig. 5-so as to identify what ATCOs are involved in which actions. The number of actions per timeframe which are found in the log file determine an increase in the workload, accounting for all the characteristic variables which are discussed in Table 1.
It is interesting to note that our approach decouples the metrics of interest form the simulation-based optimization loop. In this sense, it is possible to apply the same simulation-based optimization scheme in order to study different ATM scenarios, only by specifying additional metrics to be maximized or minimized. Conversely, as stated in Section III, if an extension or a change to the simulation model is provided, as well as the inclusion of additional decision variables, the same metrics (and therefore the same ATM scenario) can be studied under different perspectives, possibly in more detail.

V. THE DISTRIBUTED OPTIMIZATION ARCHITECTURE
MOEAs have been proven to be one of the most effective ways to solve MOPs, since they are able to produce several non-dominated solutions in a single run. However, to discover good solution sets, they require a large number of solutions to be evaluated during each iteration. This can be problematic for our ATM target. As explained, our Evolutionary Optimization approach is based on a stochastic ABM model, which therefore requires multiple runs, configured with different random seeds, to determine how well (from a performance-metrics point of view) a certain configuration of the model's explanatory variables (an individual) behaves.
With respect to the computational demand of the discussed EA approaches, in [44] a comparison between generational and steady-state asynchronous EAs has been carried out on two popular algorithms (NSGA-II [35] and SPEA2 [45]). They concluded that the level of variance in time to evaluate a solution and the parallelization ratio are the key factors that influence the relative performance between steady-state asynchronous and generational EAs. The authors showed that a substantial speedup can be achieved with steady-state asynchronous algorithms. VOLUME 8, 2020 The differences between steady-state and generational algorithms are more apparent when dealing with simulation-based optimization, where the simulation times can differ drastically from one solution to another. The Master-Slave parallelization paradigm [46] is well-suited for this case.
The overall organization of the optimization architecture is therefore depicted in Fig. 6. The logical building blocks of the Master-Slave architecture are the following: 1) The Orchestrator: it is responsible for splitting the current population into different subsets of individuals, distributing the sets to the different available computing nodes, and controlling them along the computation. 2) Compute nodes: they evaluate (through stochastic simulations) the goodness of an individual of the population. These compute nodes represent performance-critical elements. Indeed, for each individual of the population, the orchestrator schedules for execution multiple runs (the number is parameterizable) of the simulation model (with different random seeds). We refer to the multiple runs of the same configuration of the simulation model as a batch of simulation. 3) Compute metrics: they are activated after that a batch of simulation is completed, and compute the average values for the metrics of interest. In this way, given that each execution in a batch is configured using a different random seed, the different behavior of the stochastic runs can be safely captured when evaluating an individual. We emphasize that this simulation-based approach allows us to avoid scenarios in which, due to stochasticity, the EA-based optimization process might oscillate across two different points in the search space. The final metrics are here computed from the logs of the ABM simulation and are associated with the individual. 4) Selection and evolution: this logical block applies the evolutionary strategy to evolve the current population towards the optimal solution.
The master is therefore responsible for managing the current population, running the Orchestrator and performing the selection and evolution task. The slaves run a loop which executes the following steps: 1) receive in input a subset of the current population, 2) spawn instances of the ABM model on each available processing unit of the computing infrastructure by passing the explanatory variables taken from an individual in input to the simulation model, 3) wait for the completion of a batch of simulation, 4) parse the log files, 5) compute all the average metrics of interest and associate their values to the individual. Once all the subsets of the population have been evaluated by a slave, the metrics are sent back to the orchestrator, which is able to apply the aforementioned evaluation and selection strategy.

VI. EXPERIMENTAL EVALUATION
In this section, we present an experimental assessment of our simulation-based evolutionary optimization architecture, targeting two main goals. As a first goal, we assess the computational performance of the implementation of our architecture, also taking into account the fact that population evaluation is simulation-based. We also evaluate the performance sensitivity to the delay introduced by the execution of the ABM model, which, as discussed, can be non-minimal given the complexity of the problem being tackled. As a second and overall goal, we assess the capabilities of our optimization architecture on real-world scenarios.
We have implemented the software components of our EA-based optimization architecture in the C11 language, relying on the Message Passing Interface (MPI) [47], to support scatter/gather primitives to distribute the population to the slaves and recollect the computed metrics. In the experimental assessment we present, we have used a dedicated compute cluster composed of the following four heterogeneous compute nodes: • Two HP ProLiant servers equipped with 4 AMD Opteron 6128 multicore processors, each one encompassing 8 cores, for a total of 32 cores per node, and 32 GB of RAM per each node; • One HP ProLiant server, equipped with 2 AMD Opteron 6174 multicore processors, each one having 12 cores, for a total of 24 CPU cores, and 32 GB of RAM; • One server equipped with 4 AMD Opteron 6168 multicore processors, each one with 12 cores, resulting in a total of 48 cores, and 128 GB of RAM. Overall, our computing infrastructure accounts for 136 cores and 524 GB of RAM, making it a medium-size infrastructure. The system software on each node is Linux 4.19.13 (taken from the Debian 9 distribution), the compiler used is gcc 8.2.0. As for the MPI runtime, we have used MPICH 3.3 (Hydra runtime).

A. COMPUTATIONAL PERFORMANCE AND SENSITIVITY TO SIMULATION TIME
To study the computational performance of our architecture, we used a synthetic benchmark. We used a population of 50 individuals described by 10 explanatory variables, which have been randomly generated with no constrained domain. The number 10 has been selected as a worst-case scenario for the goals of our kind of ATM optimization. In particular, at the time of this writing, we have not been able to identify a single real-world optimization scenario which would hit this number of parameters.
For the sake of this performance study, we have replaced all metrics with uniform pseudo-random number generators. This means that the population which is kept in a different iteration of the evolutionary optimization is randomly selected. This has been done because we are not actually interested in the population being selected, rather on internal execution dynamics.
Taking inspiration from the Hold benchmark traditionally used to stress test software data structures [48] and the PHold model for Parallel Discrete Event Simulation [49], we have replaced the inner part of the evolutionary optimization architecture with a busy loop, which mimics the execution delay of arbitrary operations-this is, in our case, the execution latency of a simulation model.
The duration of this loop (in seconds) has been determined according to a Normal distribution, with standard deviation σ = 0.1, 0.2, 0.5 and with three different means µ = 0.5, 1, 2. Moreover, we have considered the special case in which no duration of the busy loop was considered. The mean value of the busy loop has been set to the order of seconds, because we observed that this is the order of magnitude of an execution of a simulation. Overall, this preliminary experimental assessment (which has been carried out to highlight the functional capabilities of the optimization architecture) is organized to be representative of the operating settings for the optimization problems.
We have carried out experiments running on all available 136 cores. The total population used in this evaluation has been set to 1360 individuals-we consider this to be significant for ATM optimization contexts with this number of explanatory variables in the individuals. The number of total iterations has been set to 10. We report results averaged over 5 different optimization runs.
In Table 4, we report the total execution time (in hours) required to run the synthetic PHold benchmark which we have described above. On the other hand, Table 5 shows the slowdown produced by the busy loop in the overall optimization architecture when executed with no delay at all-in this case, the overall execution time amounts to 1,36 seconds.  By these results, we can draw two conclusions. On the one hand, the optimization architecture which we have designed is quite efficient and its convergence time does not affect the overall performance of the ATM-optimization framework.
On the other hand, as expected, in a simulation-based evolutionary computation system, the part of the loop which affects more the duration of the algorithm to find an optimal solution is the execution of the simulation model. Although the slowdown figures might look large, we note that (Table 4) the total time required to find an optimal solution are perfectly acceptable, given the complexity of the problem which we tackle. Additionally, we can observe (Table 4, last column) that the overall optimization architecture is quite resilient to variations in the duration of the simulation. This result suggests that there is no actual requirement to devise more complicated strategies to schedule the execution of simulations on the distributed environment, which could anyhow increase the deploy complexity.
The fact that for higher standard deviations smaller total execution times are observed is related to the Normal distribution defined on [−∞, +∞], while we use this distribution to setup a busy loop. If a negative value is drawn from a distribution, this value is artificially forced to zero. This artifice happens more often with larger values of the standard deviation, and is observed in the reduced amount of total execution time.

B. RESULTS IN REAL-WORLD SCENARIOS
To evaluate the capabilities of our simulation-based optimization architecture, we have relied on a real-world ATM scenario addressed in the execution phase of the SESAR Solution #32 ''Free Route through the use of direct routing'', and the and SESAR Solution #33 ''Free Route through Free Routing'' [50]. The direct routing environment highlights the different aspects of the ATM socio-technical complexity, specifically: a) cognitive aspects, dealing with the cognitive behavior of human actors, b) social aspects, dealing with the place and the role of human actors within the system, c) technical aspects, dealing with the engineering of the technical components and of the system architecture, and d) interrelations of the above-mentioned three aspects.
We considered a real-world high-complexity traffic scenario in this part of the experimentation. It is related to a number of elementary sectors within the Brindisi (LIBB) and Rome (LIRR) Area Control Centres (ACCs) of the Italian airspace. Specifically the operational configuration includes four collapsed sectors belonging to Brindisi (LIBBCN37, LIBBCS37, LIBBES37, LIBBND37) and one belonging to Rome (LIRRUS37). The considered flights are those over FL315, which is the threshold for direct routing. Each considered sector has been collapsed. The number of flights in the day are 378.
To perform a preliminary validation to generate trust in the simulation model, we have carried out a comparison between the simulated and the actual timeliness and occupancy in a time window. Fuel consumption has not been used for validation purposes due to the adoption of DDR2 data, which do not allow an accurate determination of the fuel flow in a given trajectory point of a given aircraft. In Fig. 7 we report, for some simulated flights, the comparison between the simulated, planned and flown trajectory. The planned trajectory is the one which is established before the flight is actually departed. The flown trajectory is associated with the actual trace (taken from the so6 file), which is associated with direct routing. Conversely, the simulated trajectory is the output of our simulation model. Similarly, in Fig. 8 we report, for some sectors, the comparison between the simulated and the flown occupancy. The fact that in both Figures the results between the simulated and the flown data are very close is an indication of the capability of our simulation system to well capture the dynamics of individual flights ( Figure 7) and also to well capture emergent behavior (Figure 8).    or less the same for the other sectors and days. The deviation of the achieved timeliness in simulated traffic with respect to that evaluated on the actual traffic in percentage does not overcome 10% on all sectors. The deviation of the achieved occupancy in simulated traffic with respect to that evaluated on the actual traffic in percentage does not overcome the 7% on all sectors.
As a second experimentation, we have selected two optimization variables and we have run the evolutionary optimization module to fine tune the values of these variables. The variables are the horizontal separation minima for the ATCO Planner (HP) and the ATCO Executive (HE), expressed in nautical miles. We have run a total of 50 iterations, and we have used the trajectory-related metrics discussed in Section IV-E to drive the exploration of the search space, namely occupancy per 30 minutes (i.e., average number of flights every 30 minutes) and timeliness (in seconds). We have run the evolutionary optimization basing the simulation on the traffic in the aforementioned scenario focusing on the time slot from 12 pm to 2 pm on July 6, 2016 related to the ACC of Brindisi. By the nature of the simulated traffic, we categorize this experiment as a ''high-complexity'' one.
In Table 6, we report the results related to the first 10 iterations of the evolutionary optimization. After 10 iterations, the evolutionary optimization is able to identify the Pareto-optimal configuration-this individual has been generated by means of mutation on the previous iteration, and in the later iterations no better individual has been found, also by mutation. The optimization has been particularly challenging in the proposed case, due to the high number of objective metrics and the low number of decision variables. The obtained results correctly minimize the objective metrics according to Pareto dominance. In addition, it provides more ''conservative'' values for the separation minima, which are anyway able to minimize occupancy and timeliness metrics. This is related to the specific nature of the simulated traffic of the high-complexity scenario, which is mostly a sliding traffic over the ACC of Brindisi.
We have also run a similar optimization, with respect to optimization variables and metrics, but considering a more  complicated simulation scenario. In particular, we have optimized the aforementioned horizontal separation minima for the traffic on the 6th of July 2016 in the time slot from 12 pm to 2 pm, considering flights within the ACC of Milan and Padua-collapsed sectors LIMME37, LIMMWC37, LIPPCS37, LIPPN37, LIPPSD37, LIMMWC37, LIPPCS37, LIPPN37, LIPPSD37. By the nature of the simulated traffic, we classify this experiment as a ''very-high complexity'' one.
In Table 7, we report the results related to an excerpt of the first 50 iterations. Also in this case, the optimization is particularly challenging-there is a high number of objective metrics and a low number of decision variables. From the results, we can draw multiple conclusions. On the one hand, the results obtained from the optimization procedure exhibit more ''aggressive'' values for the separation minima with respect to their nominal values. This is an expected result, also due to the nature of the very-high complexity of the simulated traffic, which is composed of sliding, crossing, climbing and descending parts over the ACC of Milan and Padua. On the other hand, it could be argued that ''multiple'' optimal solutions are identified. We refer, in particular, to the best-fit individuals identified at iterations 16 and 23-the best-fit individual at iteration 23 was generated by means of mutation. The former on average implies better (i.e., lower) values for the occupancy metrics, whereas the latter on average implies better (i.e., lower) values for the timeliness metrics. While a human might decide that the individual at iteration 16 could be deemed ''better'' than the one at iteration 23, Pareto-optimality selects the latter as the best-suited one. This is coherent with the contrasting nature of occupancy and timeliness metrics, whose optimization requires somehow a compromise. Moreover, these results highlight the importance of the mutation operator in this kind of ATM optimization.
In a third experiment, we have carried out a different kind of optimization. In particular, we have optimized the operational sectorization configuration of the environment in terms of allocation of elementary sectors in the collapsed sectors of both executive controllers and planning controllers to optimize trajectory-related performances. The decision variables are represented by the combination of elementary sectors in the collapsed sectors, which represent the actual simulated sectors for each iteration of the optimization module. A number of 5 iterations is considered for the convergence threshold of the optimization module. A number of 6 individuals is applied for the population size at each iteration. The human-related metrics encompass the number of communications between the executive controllers (ExeC) and the flight crew (FC). We have used the real-world traffic on July 3, 2019 in the time slot from 12 pm to 2 pm, considering the ACCs of Milan and Padua, which corresponds to our ''very-high complexity'' scenario. In this scenario, we have also used human-related metrics, with the ultimate goal of minimizing also the workload suffered by ATCOs.
The obtained solution (highlighted in Table 8) on average minimizes the objective metrics. Indeed, even if other solutions exhibit better values for the total metrics (e.g., solution 1, which aggregates as much as possible the elementary sectors) or for the standard deviation metrics (e.g., solution 2, which separates as much as possible the  elementary sectors), the highlighted solution represents a good compromise between the ''extreme'' solutions. It qualitatively aggregates the sectors of the ACCs according to their peak of traffic, producing nearly uniform workload distributions for controllers and with a limited total workload.
The results shown so far have highlighted that evolutionary optimization can be effectively used to fine tune optimization parameters, also taking into account human-related metrics. In particular, we have shown that, in real-world scenarios, it is possible to minimize the timeliness and/or the number of ExeC/FC communications. By these preliminary results, we have shown that these aspects are mostly affected by the ExeC horizontal separation, the executive controller outbound lookahead, and the planning controller lookahead. From a numerical point of view, the variation of such parameters may be further explored to provide a detailed analysis for the optimization of the average timeliness and the total number of communication from ExeC to FC. We refer to this exploration as sensitivity analysis, i.e. a numerical optimization with the goal of assessing what is the actual decision variable which has the largest impact of the metrics.
Our evolutionary optimization architecture can be also used to carry out this kind of sensitivity analysis, in an automated way. In particular, we have run a set of optimizations in which we have used only one variable for the exploration of the search space. This assessment allows to determine what is the decision variable which has the largest impact on the results, therefore showing what should be the best-suited variables to be used when optimizing an ATM scenario. For this experiment, we have used the ''very-high complexity'' traffic on July 6, 2016 in the time slot from 12 pm to 2 pm, considering the ACCs of Milan and Padua.
Tables 9-12 report the results obtained when optimizing one variable at a time-among ExeC horizontal separation (nautical miles), ExeC Outbound Lookahead (seconds), HP Horizontal Separation (nautical miles), and PC Lookahead (seconds)-while using as performance metrics the overall timeliness (seconds) and the number of ExeC/FC communications. In all optimizations the evolutionary architecture is able to find Pareto-optima, but the most interesting   result lies in that some of these optimization variables (e.g., PC horizontal separation) do not affect the metrics. Therefore, this kind of analysis effectively allows to identify what are the best-suited parameters to be dealt with for the optimization of an ATM scenario.
The achieved validation results indicate good agreement between simulated and real flights, with limited deviations for the assessment of performance metrics under evaluation. Indeed, assessment errors are always under low thresholds, such as 4 minutes for flight timeliness and 0.6 flights/10 minutes for sector occupancy. Additionally, all the results have also been validated by ATCOs from Air Navigation Service Providers (ANSPs), such as ENAIRE (Spain) and ENAV (Italy), in a series of dedicated meetings. The optimizations which have been identified by our architecture have been considered satisfactory. In particular, the controllers have found acceptable the behavior of emulated human agents, highlighting that often humans behave in different ways though they are facing the same situation and such variability was, in fact, fairly modelled in our approach.
The achieved results are strictly generalizable since the simulation errors are limited and show the same magni-tudes for a given performance metric, without a clear dependence on the sectors and on the type of traffic. Furthermore, the simulation errors appear to be mainly related to real and non-deterministic ATM events (e.g., delays, severe weather conditions, etc.). These events produce deviations of the real traffic flow from the planned one and cannot be predicted by the simulation engine without injecting additional input data which are not usually available during the strategic and pre-tactical phase. Nevertheless, our simulation model effectively represents the socio-technical dynamics of a generic ATM scenario. This is directly applicable to an accurate performance evaluation of an ATM solution using both human-related and trajectory-related metrics, which in turn feed our optimization architecture for automated support to the design of the ATM solution itself.
As a consequence, the proposed simulation system and optimization architecture may be a first attempt at a ''standardized'' toolset to support ATM change management. Indeed, such a toolset would enable a systematic process for the design and the verification of new concepts in ATM by means of fast-time simulation and optimization, including human behavior assessment.

VII. CONCLUSION AND FUTURE WORK
We presented an ABMS-based evolutionary architecture designed for the evaluation and optimization of ATM systems. Our proposal allows to carry out several differentiated optimizations, involving multiple different optimization variables, and relying on several trajectory-and human-based metrics to assess the goodness of a solution. The optimization is based on an ABM model which allows to simulate both aircraft and humans (ATCOs) involved in the ATM procedures. Experimentally, we have shown that our architecture allows to carry out optimizations of real-world scenarios in a reduced amount of time, thus placing itself as a versatile tool for what-if analysis in the contexts of ATM.
Currently ongoing work entails the identification of proper trade-offs between accuracy and timeliness for real-time applications (during the tactical phase), in order to allow the identification of better-performing operations in a bounded execution time, as an assistive technology to ATCOs. This kind of any-time optimization is currently being tackled by tweaking the time-window and the sectors involved in the simulations in the main loop of the evolutionary algorithm, and also in terms of the selection of a proper subset of the simulated flights.
MIQUEL ÀNGEL PIERA (Member, IEEE) received the degree (Hons.) in computer engineering from the Autonomous University of Barcelona (UAB), Barcelona, Spain, in 1988, the M.Sc. degree in control engineering from the Institute of Science and Technology, The University of Manchester, Manchester, U.K., in 1991, and the Ph.D. degree from UAB, in 1993. He is currently a full-time Professor with the Telecommunications and System Engineering Department, UAB. He is also the former Director of the Technical Innovation Cluster on Aeronautical Management and the former Director of LogiSim (research group on modeling and simulation of complex systems). His main area of expertise is modeling simulation and optimization of manufacturing, logistics, and transport systems, and he has been coordinating several H2020 projects in these areas (AGENT, PARTAKE, and E-PILOTS). He is contributing as the Scientific Advisor to the company ASLOGIC (www.aslogic.es).
GABRIELLA GIGANTE received the Ph.D. degree from the Department of Industrial and Information Engineering, School of Polytechnic and Basic Sciences, Second University of Naples, Italy. She is currently the Head of the Laboratory Intelligent Systems, Verification and Validation, Department of Safety and Security with CIRA. Her current research interests include intelligent systems and software certification aspects, intelligent systems verification and validation, safety aspects in innovative aerial unmanned platforms, and multi-agents systems.