Evolutionary Coverage Optimization for a Self-Organizing UAV-Based Wireless Communication System

Self-organized wireless networks based on unmanned aerial vehicles (UAVs) are receiving increasing attention due to their flexible deployment opportunities. Typically, UAVs are used as supplementary access points to existing ground infrastructure to extend network coverage and capacity. Therefore, finding the optimal position for each UAV is a crucial task that can be resolved by multiobjective optimization. In this paper, a self-organizing UAV swarm model is introduced as an evolutionary type of heuristic that can be adjusted in real time based on the network’s key performance indicators (KPIs). The exogenous dynamics of the wireless communication system are reflected by an end-user mobility model based on Ornstein-Uhlenbeck processes. To achieve versatility in the positioning of UAVs, a model of virtual forces is adopted to enable control of the movement of each UAV via a collection of persistent and mutable parameters of the system. Evolution is controlled by eleven strategic versions of parametric mutations, including a neutral benchmark process. The simulation results reveal that the suggested solution allows for a substantial increase in the KPI efficiency metrics relative to those of the nonmutated system. The results also addressed the presence of the Pareto front, where it is not possible to improve the selected KPIs without negatively affecting others. While existing research that focuses on evolutionary methods usually considers offline approaches, this work uses an online evolutionary approach that is able to solve the presented problem by taking the dynamic nature of the environment into account.

almost simultaneously [2]- [4]. Because it is impossible to avoid complex or unpredictable external inputs, choosing an effective model or set of controls to simulate or optimize UAV and swarm behavior is extremely critical. In any case, reparameterization and rule modifications have to be sufficiently robust to handle minor and rare swarm deployment scenarios.
Currently, UAV mission system rules are no longer created exclusively by humans and are thus on the borderline of complexity corresponding to machine learning results, which seldom match human knowledge representation [5]. Conversely, rules for departing from a given path in a known if-then format can be extracted from alternative mission data and neuro-fuzzy inferences in a manner compatible to human apprehension. This process is dominated by the usage of a framework of computer science called ''explainable artificial intelligence'' (xAI).
Such rule-based systems define a set of triggers and reactions that are tailored to a particular response to environmental stimuli. Triggers in the rules are implemented according to instructions that a human expert feeds into the system, often after offline tuning. However, in the case of unexpected changes in the environment, the predefined rules may not select the appropriate response actions. A typical example is the optimization of UAV deployment using conventional deep reinforcement learning (DRL), which, in general, requires a large number of training iterations in complex environments and has difficulties adapting to external stimuli. Thus, environmental variability creates a new challenge regarding changes to the original rules, for example, slow processes of mutational changes through online parametric evolution.
In this paper, we develop an approach that leaves the ''rigid'' laws of UAV motion defined by virtual forces (corresponding to specific types of effective potentials) that can be used to determine position in UAV swarm dynamics [6]. Specifically, we propose to efficiently create and modify virtual force models for suitable collective dynamic solutions to achieve technological stability in UAV swarm systems. For example, the stability of communication maintenance or collision prevention among UAVs is considered. This means applying virtual forces to a UAV swarm solution, which includes rules that further allow for stabilization effects with high UAV spatial density under self-organizing functionality.
As a target scenario, we consider a mobile communication system where UAVs are deployed as aerial base stations to provide coverage for end-users [7]. A swarm of UAVs must track the mobility pattern of users and adapt their flying trajectories to optimize coverage quality [6], [8]. Thus, we integrate an approach based on virtual forces with the specific target key performance indicators (KPIs) and define a course of action or direction in which parametric development should occur. This approach entails, among other aspects, the introduction of an additional set of connections across parts of the system and a more in-depth layer in the study of the scheme. Given the need to cover multiple KPI-related characteristics, our investigation includes heuristic learning and online multiobjective decision making to perform real-time parametric improvements. Within its strategic, combination-imposed constraints, the parametrized swarm acts as a single object. In other words, the swarm experiences gradual changes as a kind of heuristic learning entity exposed to a stochastic environment.
The main contributions of this article are the following: • We carefully analyze and examine certain characteristics of the KPIs of a wireless communication system that are permanently improved by a transformed model of the virtual forces that continually reshape the UAV swarm.
• We develop an end-user mobility model based on Ornstein-Uhlenbeck (OU) processes to represent the exogenous dynamics of the system. • We propose a form of swarm evolution that uses strategic mutation of the swarm parameters of the UAV fleet. The remainder of this paper is organized as follows. Section II covers state-of-the-art related research on UAV trajectory design. In Section III, we define a system model with the corresponding parameters and performance metrics. Section IV explains the representation of virtual forces in the target context. Section V presents computational learning paradigm inspired by evolutionary principles applied to a swarm of UAVs. In Section VI, we provide extensive simulations and performance analysis of the proposed solution. Finally, Section VII concludes the paper. Some auxiliary information and the computational details are provided in Appendices A, B and C.

II. RELATED WORK
The practice of evolutionary simulations has led to the belief that design and versatility enable the development of basic approaches to multirobot systems. Scientists have become interested in adapting offline approaches to various situations within an evolutionary framework. If sufficient information about a particular environment or its type is available, there is usually a portfolio of relatively offline optimization methods that can be used for optimization. On the other hand, robotic environments are usually in a continuous state of change, and the rewards also change unpredictably. Therefore, scientific analysis should reflect such environments.
In the field of evolutionary game theory, when we analyze the expected profits for a selected agent, we are often interested in the impact of the changing environment, including the volatile strategies of the opponents of the agent. Evolutionary techniques implemented in offline mode are easy to evaluate and implement because they can be effectively mapped to standard optimization tasks. However, it can be difficult to identify the appropriate procedure for tasks when the system is set up through online modes with a high degree of environmental uncertainty.
The trajectory design and control of UAVs (in particular UAV swarms) has recently received considerable interest. In general, few strategies exist to solve this specific issue. Traditional approaches that address this problem, such as mixed-integer linear programming, in which collision and avoidance constraints are encoded as binary variable constraints, have been introduced in, for example, [9]. Further works elaborated on this concept and extended the operating environment with additional constraints on the UAV swarm trajectory. Specifically, the authors in [10] considered limited communication links among the UAV entities forming the UAV swarm and used advanced convex optimization tools to design decentralized control laws for UAV swarm formation. The nonlinear dynamics for UAV swarm orchestration based on Lyapunov stability theory combined with the theory of algebraic graphs were presented by Han et al. in [11]. These authors focused on the evolution of the velocities of UAVs in the swarm and swarm stability in general.
Another promising research direction is the application of potential fields to control swarm stability and efficiency. Most notably, this approach allows for simple time and memory computational efficiency while scaling well with different sizes of UAV swarms. The authors in [12] proposed a model based on the potential fields generated by the sigmoid and normal functions. They proved the efficiency of the model using extensive computer simulations. Further, Galvez et al.,in [13], applied artificial potential fields to implement obstacle avoidance in a UAV swarm. Complex path planning with remote obstacle detection using collaborative learning among the UAVs participating in a UAV swarm based on artificial potential fields was presented in [14]. Recently, an intelligent cooperative mission planning scheme for UAVs in an uncertain dynamic environment using a hybrid artificial potential field was proposed by Zhen et al. in [15]. Potential fields have mathematical features related to the concept of virtual forces, which is the key to our approach.
Another feature of our current work, which is common to the unmanned aircraft literature, is the use of learning and evolutionary principles dedicated to the control and dynamic collision-free stability of UAV swarms. Notably, the application of evolution-based path planning for the UAV swarm trajectory, specially addressing the uncertainties in the dynamic environment, was proposed in [16]. A differential evolution solution based on the circle trajectory assembly algorithm was further proposed in [17] and is characterized by a significant decrease in the response time of individual UAVs compared to that achieved by traditional evolutionary algorithms, including particle swarm optimization (PSO). A similar solution using the B-Spline curve to describe a UAV path was proposed in [18].
Moreover, others have focused on the traditional evolutionary approach for the design of UAV swarms in terms of trajectory, height, and obstacle avoidance, such as [19]- [21].
Nevertheless, few approaches to UAV control design have focused on the specific metrics used to evaluate wireless communication networks. Position control and trajectory design of UAVs, which play the role of flying base stations, were proposed by Plachy et al. in [22]. These authors adopted genetic algorithms and PSO and optimized both the user association and the trajectory planning of UAVs. Next, Shi et al. [23] focused on the maximization of user coverage when UAVs are leveraged to relay data between base stations and users. These authors suggested a modified variant of PSO to achieve low computational cost and rapid convergence.
Some previous research addresses the dynamic optimization of UAV trajectories in wireless communication systems via offline training. Namely, deep reinforcement learning is of particular interest due to its capability to generalize over the large state space and its ability to leverage deep neural nets (NN). Hanzo et al. proposed several important contributions to progress the application of machine learning in this area. First, [24] proposed the application of deep Q learning (DQN) to train a multiagent UAV system by interacting with an environment consisting of end-users tracked by Twitter time-stamps. The optimization of flight-time constraints was further expanded by the authors in [25]. Simultaneous maximization of the network capacity and network coverage in 3D space was achieved by the conventional DRL in [26]. Nevertheless, DRL suffers from a well-known weakness, that is, the exploding gradient. The slow convergence of offline learning and the selection of an appropriate reward function could limit the practical application of DRL in this area [27].
In contrast to previous work, our research aims to provide a methodology based on virtual forces to improve the selected performance metrics of UAV swarms. We should emphasize that we do not consider the system in the equilibrium state; we believe that due to the dynamic activity of user equipment (UE) acting as an exogenous component, the system tends to operate in a nonequilibrium stochastic environment characterized by frequent disturbances and phase transitions. In such an environment, the traditional strategies fail to operate efficiently; thus, alternative solutions must be developed. The solution based on the existence of virtual forces proposed in this paper appears to be feasible, opening further research directions in this area.

III. SYSTEM MODEL AND FORMULATIONS
In the following subsections, we present the basics of the system model considered throughout this paper, the characteristics used to evaluate the system and the corresponding algorithms, and the model for end-user mobility, which acts as an external stimulus for UAV adaptation.

A. SYSTEM MODEL
In this subsection, we introduce and explain the uplink mode of the UAV-assisted wireless communication network. We assume that we are deploying (N D + 1) UAVs in a given region to support end-user uplink communication. The reason for deployment is missing/damaged ground infrastructure or overused ground infrastructure. UAV-assisted wireless communication systems are mostly deployed in a single tier scenario, which is well suitable for the typical environmental monitoring and disaster recovery applications. However, for ultradense urban deployment, such communication systems are significantly constrained by the buildings and street layout of the city. To be reached by UEs, UAVs should maintain a reasonable flying altitude, which can be much lower than the height of typical modern skyscrapers. In such conditions, the existence of reliable backhaul connectivity is challenging due to signal shadowing and unpredictable reflections. Thus, we assume that multitier drone deployment can facilitate the adoption of UAV-assisted communication systems in large cities. Therefore, we introduce a hierarchical architecture consisting of two layers, D and C. When a single central UAV operates at low speed and at a higher altitude (C), it can maintain a stable line-of-sight backhaul to the terrestrial network infrastructure. Thus, C collects an information stream from UAVs that are located at low altitudes (D). These UAVs are characterized by higher velocities in the D plane. Note that in the near future, we expect costfriendly satellite backhaul connectivity (e.g., Starlink), which will extend the flexibility of the deployment [28]. Future variants of presented model could place the centralized drone at even larger heights typical for high altitude platforms (HAPs), to fully exploit connection to very low earth orbit (VLEO) satellites such as those used in Starlink. HAPs are able to provide the link between low altitude UAVs and VLEO satellites with high (higher than lower altitude solutions) number of line-of-sight VLEO satellites at any given time [29]. Thus, hereinafter, we will adhere to the model presented in Fig. 1.
The presentation and analysis of the model begins with a relatively general geometric perspective. At the abstract level, planes or points dominate as geometric elements. Specifically, the planes are referred to as E, D, and C. The pair relationships (E → D), (D → C), and (C → D) are understood as information flows in our model. These labels distinguish three functional levels: E, end-user level; D, level of small UAVs that form the swarm; and C, level with a single central UAV. We suppose that the geometry of E, D, and C is always planar, and we consider only the transmission in the uplink mode E → D → C, as depicted in Fig. 1. In this mode, the end-user transmits a signal toward the small UAVs that is then transmitted to the central UAV. In such a layered architecture, a control scheme based on virtual forces is applied to maintain a specific QoS-level between the UEs and dedicated signal-to-interference-plus-noise ratio (SINR)/signal-to-noise ratio (SNR) on the wireless links among UAVs. We have chosen the SINR as one of the key performance indicators used in this work, as the other indicators in other works [30], such as throughput or received VOLUME 9, 2021 signal strength indicator (RSSI), are either dependent on SINR or are themselves factors influencing SINR. Now, we introduce the mathematical ingredients of the proposed model. The system E → D links are characterized by the matrix SINR ED,jz of the signal-to-interference ratio between the end-user z ∈ {1, 2, . . . , N E } and the small UAV j ∈ {1, 2, . . . N D }. On the other hand, D → C links are characterized by the SNR DC,j . Note that the interference is not present for D → C transmissions because we assume that small UAVs utilize separate spectrum bands for transmission toward the central UAV.
The fundamental geometric properties of our model can be briefly summarized as follows: • The central UAV operates at the heights h C > h D . The 3d-coordinates of the single central UAV are • Small UAVs (D) operate at the height h D ; the 3d-coordinates of the UAVs are • The end-users (layer E) are localized at the instantaneous positions: The heights h C and h D are regarded as specific parameters that change very slowly.
In this case, communication between user z and UAV j is characterized by the signal-to-interference-plus-noise ratio where G ED,j,z represents the channel gain for the effective signal of an end-user with power P E . The sum in the denominator over G ED,... can be interpreted as signal interference. For simplicity, we assume that all users are transmitting with the same power P E , and the signal quality is affected by the thermal noise power P th . Note that using a unique P E for each user will lead to ambiguous results because it becomes unclear whether the KPI improvement is due to the better placement of UAVs or the optimized transmission power of UEs. Moreover, in case of variable P E , the computational complexity of the model will increase exponentially with the number of UEs. Thus, by fixing P E , we can purely study the influence of the dynamic UAV selforganization on the network KPIs, while keeping the feasible number of degrees of freedom. In such a case, the freepath loss between an end-user and a UAV is expressed as is used to express G ED,j,s by means of η G , which is in the range between 2 and 4. The noise power P th can be expressed as the product of the bandwidth B E and the Boltzmann signal noise power spectral density N 0 . In the rest of this paper, we restrict considerations to η G = 2.7, which corresponds to the mean path loss exponent used in log-distance path loss (LGPL) signal propagation model. The value of 2.7 falls in the typical range of path loss exponents values used to model the effects of urban environment [31], [32]. With further specification of the problem, we add a relation for the instantaneous mean value at the position of j-th drone where α σ,s is the activity measure of the s-th user in terms of UAV-mediated network usage. The σ index of α σ,s indicates that these end-user activity descriptors are constrained to the interval [0, 1]. The way activity is expressed here is similar to mean-field techniques as well as membership functions for fuzzy sets of active users. This type of noninteger description implies that changes are faster than typical end-user observation times. Being more active means recording more events with unit activity per unit time. The normalization is performed by means of Z E,α = N E s =1 α σ,s . It is used to achieve the numerator compensation, where α σ,s serves as the weight of the mean. Normalization is motivated primarily by the requirement that a reduction in the mean activity should not directly imply a reduction in the mean SNR. Note that the I E,s term in the relation above that represents interference effects is specified in more detail below [see (7)].
Hereinafter, instead of P E , we use the rescaled form In this formula, λ ED is the transmission wavelength. The merit of using theP E combination is clear: there is no need to evolve P E , λ ED , B E and N 0 as separate parameters. In particular, the technological issues are regarded implicitly. Intracellular multiplexing is assumed to be the means of operation, thereby eliminating the interference of E → D links. The j link,z (t) link represents the momentary mapping from z ∈ {1, 2, . . . , N E } onto j link,z (t) ∈ {1, 2, . . . , N D }. The reconnection dynamics involved in j link,z (t) (for all z alternatives) are specified in the scheme of Algorithm 1. The main feature of this algorithm is that communication links are distributed according to the combinations of activities and mutual distances between objects at levels E and D. The procedure used in this algorithm involves repeated random selection of pairs of users. Each time, after the initial selection, the more active user in the pair is preferred. Immediately afterwards, the ability to connect to the drone is tested. If the randomly selected drone does not have available capacity, it is assumed that the connection will take place. However, if there are other drones with available capacity or prior links, then the drone with the smaller separation distance is always given preference over the more active user. Finally, the capacity is recalculated and updated after the changes.
It is now necessary to discuss the problem of interference in greater detail. Interference is reflected by the SINR ED,j -characteristic, which is given by A technological problem included in the following relation can be seen in the way of updating the uplink communication sessions. The relation involves, on the one hand, the factor 0 as a suppressed intracell interference and, on the other hand, the factor 1 as intercell interference, accounting for the corresponding In what follows, we consider the computation of the signal-to-interferenceplus-noise ratio for E→ D links SINR ED,j link,z . Since the separate processing of each pair z, j does not yield any information about the system, but only local state details, a suitable aggregation of data for all users is used As the ratings are crucial for the choice of service, the providers must be fair at the end-user level. The quality differences can be modeled by SINR ED,j link,z . When uniform bandwidth approximation is considered, the global throughput fairness can be defined using Jain's fairness index as follows [33] The value calculated by (10) reflects the fairness of the throughput distribution among all users and acquires values from 1/N E , representing the least fair throughput distribution, to 1, representing the most fair throughput distribution. Throughput of individual users is calculated using the Shannon-Hartley theorem. Calculation of an expected value of SINR that replaces information about SINR of individual users is naturally an informationally lossy process that is also sensitive to outliers. The fairness index fulfills an important role of preserving the SINR inequality information.
In the idealized case, when there is no interference, the SNR can be used for D-C connections in the following form where 2 is the respective distance and  if ( d ED,j orig ,z rnd > d ED,j alter ,z rnd ) then j link,z rnd ← j alter n link,D,j alter ← n link,D,j alter + 1 n link,D,j orig ← n link,D,j orig − 1 (actual numbers of links) end if end while is the effective transmission power scaled by the bandwidth B D ;λ DC is the effective transmission wavelength. Moreover, P D is subjected to mutation and possible improvements. The mean is selected to evaluate the characteristics at the central drone C. Throughout this paper, the metrics SINR ED,j , SINR D and SNR C serve as the main KPIs to evaluate the performance improvement of the proposed method.

B. ORNSTEIN-UHLENBECK FORM OF THE MOBILITY OF END-USERS, INFORMATION FLOWS AND SELF-ORGANIZATION
To simulate the dynamics of the EU, we used Ornstein-Uhlenbeck (OU) processes that are able to represent several VOLUME 9, 2021 of its relevant aspects at some stylized level [34]. Importantly, OU can be characterized by three main features that can be used as a basis for mobility modeling. The first is its intrinsic simplicity and similarity to random walks over a short period of time. The second is the spatial quasicontinuity of the trajectories, similar to random walks and time series processes. The coincidental returns to some favored areas with moves occurring at longer time intervals comprise the third feature. Other mobility modeling approaches, such as Lévy flights, lack the third feature, which enhances the realism of mobility. As a result, in both spatial and activity contexts, the primary goal of using OU is to represent end-user returns to the vicinity of previous states.
In our particular case, we are dealing with a situation where end-users have moving positions that are tied to local static equilibria (x E,eq,z , y E,eq,z ), for z = 1, 2, . . . , N E . In the case of using OU processes, we consider a stochastic system of the independent equations The impact of equilibria is described by the γ E > 0 parameter. Equivalently, the typical time scale is determined by 1/γ E , which characterizes drift to the respective position (x E,eq,z , y E,eq,z ). The uncertainty reaches the level of the users through the pair of white-noise independent Gaussian processes ζ x E,Gauss,z (t) and ζ y E,Gauss,z (t). We suppose that the OU is general enough to allow us to consider OU simulations for the activity variables as well. The mobility model is thus extended by an activity process in the form where the proportion of active end-users is given by The logistic function σ (x) = 1/(1 + exp(−x)) transforms the auxiliary real-valued activity measure α E,z into [0, 1]. A shared constant equilibrium parameter α σ,eq ∈ [0, 1] is also included in the model to ensure activity recovery. The reason for having two activity variables, α E,z and α σ,z , is that while α E,z (t) corresponds to OU-type dynamics, α σ,z (t) ensures that activity is within the [0, 1] range. The randomness again comes from a specific independent Gaussian process ζ α E,Gauss,z .

IV. REPRESENTATION OF VIRTUAL FORCES
In this section, we discuss the rules of motion within the concept of virtual forces. Swarm systems described by virtual forces are basically large systems based on many interconnected rules. The behavior of UAVs, more precisely that of their swarms, is conditioned by dozens of parameters. We can assume that appropriate parameter settings will be important for the improvement of KPIs. This approach should certainly be implemented taking into account uncertainty in environmental fluctuations. Virtual forces used in our model of UAV mobility provide various features, such as smooth trajectories, gradual acceleration and deceleration, mutual collision avoidance by repulsive forces, and stable swarm cohesion by mutual attractive UAV forces, that also help us to avoid problems with simulation space boundary conditions present in some of the simpler mobility models. UAV mobility also directly and purposely contributes to improvement of qualitative metrics of the network. Simpler but widely used UAV mobility models often lack some of these realistic features and properties. Our UAV mobility model most closely matches Topology-controlbased mobility models, the most advanced class of mobility models presented in [35], but it also includes some aspects of time-dependent and group mobility models described therein.
We should also note that our model abstracts away the problem of limited UAV range. One way to separately solve the problem of limited range of individual UAVs due to limited capacity of their batteries could be the introduction of redundant UAVs. The number of UAVs that are part of the swarm could thus be fixed and the UAVs that would be close to battery depletion could be replaced by other redundant drones, before leaving the swarm for automated recharging or battery replacement. The UAV replacing the one that would leave the swarm could almost perfectly continue the trajectory of replaced UAV, with swarm behaviour closely approximating the idealized swarm without battery capacity restrictions.
As different scenarios and system models have different requirements for the complexity of UAV mobility models [35], we should also describe such mobility modeling requirements in our simulated scenario. In our work, UAVs do not have any other mission objectives apart from coverage quality improvement, and compared to other works that implement the flying ad hoc network (FANET) with relay nodes and use, for example, Lévy flight mobility [36] or a task-dependent mobility model [37], our system model has much lower requirements on mobility management than relay-node-based FANET and scenarios in which UAVs may have additional mission tasks apart from coverage optimization. In our model, the relay-based network is replaced by direct backhaul connection of lowaltitude UAVs to high-altitude drones. Without additional tasks, only relatively slow changes in swarm topology are necessary. To conclude, our UAV mobility model should be sufficient for this application, as it significantly exceeds the simulated scenario's mobility modeling requirements.
In the representation of virtual forces, the motion of the central UAV is affected by that of the N D small UAVs. Their (x,y)-coordinates (x D,i , y D,i ) may be perceived as attractive for C under the constraint z = h C . The in-plane force components (F x C , F y C ) of the respective virtual force can be 145072 VOLUME 9, 2021 written as In this expression, k CD > 0 is an elasticity parameter that causes a basic form of attraction.
The determination of small UAV trajectories is more complicated. The UAVs are confined to the x-y plane with a fixed out-of-plane component z = h D . We assume that the in-plane virtual forces F D,j = (F x D,j , F y D,j ) acting on the j−th UAV can be decomposed into partial contributions, as follows There are four basic interactions determining the types of forces present in (18) that act on the j-th UAV and result in total force F D,j . User locations, in the lowest (ground) height level, are independent from any force interactions. There is, however, a force interaction F tar,DE,j between UEs and UAVs that influences only UAV locations. UAV locations are also influenced by mutual force interactions F DD,j with other UAVs, F DC,j with the centralized drone and F DM,j force interaction with stabilizing coordinates. Objects in the highest and the lowest height level interact with UAVs located in middle level due to their proximity. This mobility model resembles interaction of particles with movement constrained to a single plane and with multiple such parallel planes for three sets of particles. The analogy is further strengthened by the calculation of forces based on the adapted Morse potential as described by (21), which was originally developed as a model for interatomic interaction. The effect of pairwise interactions is described by where r D,j = (x D,j , y D,j ), k DD > 0 is the respective parameter and a D , b D are positive real-valued parameters. The interaction depends on the UAV-UAV distance To incorporate the inter-UAV functionality of mutual collision avoidance, it is helpful to define a function of the general parameters a, b, and d. The function is adapted from the pairwise Morse interaction formula (with a predisposition to prefer certain distances d ∼ b) [38]. Thus, (21) influences the direction and magnitude of the inter-UAV interaction forces in 19, resulting in their growing attraction (up to a certain point) when their mutual distance increases beyond the preferred distance and their growing repulsion, as their distance decreases below the preferred distance.
To coordinate the actions of the small UAVs with those of the central UAV, an additional assumption must be included in the model. Small UAVs should follow the central UAV, more precisely, its effective projections into plane D. The effect is described by with a new elasticity parameter k DC > 0. The interaction must inevitably reflect the influence of the end-users characterized by (x E,z (t), y E,z (t)) instantaneous positions. The coupling requires the introduction of an additional parameter k DE > 0 common to virtual forces [39] The planar coordinates of the local targets are suggested to exploit the local estimates of user density. The auxiliary weights W j,z depend on the squared Euclidean distances d 2 ED,2d,j,z = (x D,j − x E,z ) 2 + (y D,j − y E,z ) 2 of the projections of the links between the UAVs and end-users. The form of density estimator-based local targeting taken here is from the Nadaraya-Watson structure of nonparametric estimators [40]. There is also a radial basis idea that can be utilized as a template. In (23), the k DE parameter determines how fast the drone will move toward the local target, which changes continuously. The specificity of (24) is that the weight value is proportional to the level of activity, which captures the fact that more passive users will join drones infrequently.
The sharpness of responses to end-user positions depends on the parameter k T ≥ 0. The local targets of UAVs are suggested to capture the local attraction enhanced by the activity α E,z . The parametrization implies that k T −1/2 represents the characteristic length scale estimate of enduser density variations. In the limit k T → 0, the local character wanes, and only the single systemic target (center of mass) remains. A single target located at the center of mass illustrates the trivial case of the weighted average with constant weights. In this case, k T → 0, which means the influence of d ED,2d,j,z no longer takes place but is directed to VOLUME 9, 2021 the single target point, although the accumulation of UAVs at a single destination is not allowed because of the imposed virtual repulsive forces.
Virtual forces should take into account various operational unwanted events. For some unexpected reason, a swarm of UAVs may move outside its area of operation, which can impair the communication performance and the organization of the system. In such situations, global stabilization constraints can be useful. Constraints can be implemented through the control coordinates (x DM , y DM ), which serve as a reference point for the action of the corresponding virtual force contribution where x DM , y DM and k DM are model parameters. This type of structure functions similar to an anchored spring, directing all of the drones to a single location. However, because the other forces do not stop acting, there are in fact almost no collisions. Forces in and of themselves are insufficient to cause movement. Some rules of motion are required, analogous to the motion rules in Newtonian mechanics. Forces alone are insufficient to move an object. Some motion rules are required, as in Newtonian mechanics.
In our case, we prefer a simple version of the classic mechanics rules without inertial effects. The absence of virtual mass in the rules corresponds to an overdamping regime that is free of so-called overshooting effects. Furthermore, we assume that the dynamics are constrained to planes D and C. To avoid creating any misunderstanding, we emphasize that virtual forces are only a calculation tool for active movement and are not identical to mechanical forces; this also applies to drone parameters compared to parameters in motion rules. In such cases, the movement of UAVs can be represented by the system of 2 N D + 2 ordinary differential equations In regard to virtual forces, the causes of movement are placed on the right-hand sides in these rules. Since the virtual forces themselves contain many parameters that can be adjusted or optimized to improve the characteristics of UAVs, we set all of the force-to-velocity transformation factors on the right-hand side of the previous equations to be equal to one. It should be pointed out that the problem of the physical (technical) dimension is easily solved in this case by formally assigning a typical velocity dimension to the virtual force.
Given the force parameters k DD , k DC , k DE , there remains sufficient flexibility free from parametric redundancy. Note that the movement of UAVs occurs using a system of equations [41] that are indirectly related to the stochastic environment.

V. DECISION MAKING-PARAMETRIC CHANGES
In fact, the paradigm of the virtual forces is equivalent to rule-based approaches. Depending on the environment (here, the end-users), the parameters in the forces or rules may become more or less appropriate. Their suitability and changes must be inferred from higher-level rules for slowly evolving parameters. The entire virtual force (rule-based) architecture generates UAV trajectories in the context of enduser motion. In fact, a swarm of UAVs forms trajectories with the statistical properties of telecommunication characteristics that can be estimated from them. Therefore, we need to consider the statistics of the memorized trajectories to decide the actual validity of the movements in parameter space. In this section, we give an example of how the entire estimation and movements can be performed and interconnected.
To avoid hasty decisions under the influence of excessive volatility present in the evaluation, we propose the use of time-domain signal filtering. We believe that the identification of temporal trends in the estimated behavior of both the first and second statistical moments in the decisionmaking process under fluctuating conditions with respect to historical simulated data will provide better reliability in terms of successive decisions.
The instantaneous KPIs were used to construct statistical moments of the first and second order, for example, by means of SINR D or SINR 2 D , the averages in the sliding window, according to where N me05 ≡ floor(N mem /2) is half of the total memory/filter depth N mem . We consider two alternatives: The heuristic strategy called win-stay lose-shift is used for parameter evolution throughout this work. This strategy, as its name implies, either does not change the agent's behavior if it previously resulted in a win, or it changes this behavior if it previously resulted in a loss. In the dynamic evolutionary context, the selection of behavioral change is achieved by means of parameter mutation. According to the possibility of applying small gradual changes, from the evolutionary perspective of parametric mutations in accordance with the mentioned basic possibility of ''loseswitch'' in the following specification, we distinguish two types of parameters. For mutable parameters that allow changes, the moment of parametric change is marked as ''switch''. This aspect is consistent with perspectives from biology and economics that posit parametric changes follow an undesirable degradation (i.e., ''loss'') of quality in terms of mean characteristics based on time-filtered data. The constant (persistent) parameters encode the type of information we have about our simulated system that will not change with certainty; therefore, the parameters are common to all the environments we consider. On the other hand, the environment can change slowly, and the changes can have parametric consequences. To avoid destabilization of the overall dynamics mediated by virtual forces, we assume that mutational (i.e., ''lose-switch'') changes are much slower than variations in UAV positions. Furthermore, in the case of losses, there is a massive number of candidate responses. Therefore, it is useful to further define the responses by introducing parametric boundaries to ensure that the simulation operates within an acceptable range of behavior.
In our model framework, we have the following particular parameter classification: • Persistent parameters: As in similar studies (e.g. [42]), it is very difficult to maintain a sufficiently general interpretation of simulations, as we perform them only for some selected parameters. The list in Table 2 was preceded by some tests to provide sufficiently strong effects in the KPI characteristics. This is probably the most common way to ensure that we are in the relevant parametric domain. • Mutable parameters: For further considerations, it is useful to introduce the label MPar common to all alternatives In this list,P D , h D , h C are directly related to the KPI indicators. Slowly varying geometry is considered in h D , h C . The virtual forces depend on x DM , y DM , a D , b D , together with the analogues of the ''elasticity'' parameters k DD , k DC , k DE , k CD , k T (as in the case of deformable objects). These parameters change according to the principles of online parametric evolution; however, they must vary more slowly than the fast variables, such as the UAV coordinates. The formalism we used does not emphasize their functional time dependence. Now, consider the simulated decision for the time horizon N mem . We want to test a swarm of drones based on user requirements, mimicking the environment as a competitor in a simulated tournament. We settled on the term ''tournament'' as an appropriate way to convey the ''win-stay, loseshift'' scenario, borrowed from evolutionary game theory. The connection we found offered us an interpretation of the KPI as a swarm score relative to its environment. By adopting this interpretation, we arrived at considerations about tournaments and strategies. As a result, this insight facilitates the intersection of different research directions at the conceptual level.
The quantification then determines how necessary it is to modify the originally simple concept of '' win-stay, loseswitch'' for a relatively complex swarm of UAVs. We are interested in whether evolutionary parameters can also create TABLE 3. Initial values of MPar init and MPar parameters, which are allowed to mutate slowly. As the natural dimension of k DD , k DC , k DE , k CD , k DM is s −1 , the corresponding virtual forces will have the dimension km s −1 . The integration will then convert the virtual forces to the spatial displacements. Such a formulation simply avoids the use of classic weight units as the inertia terms are not considered.
improved combinations. Undoubtedly, the main premise of the evolutionary heuristic view of learning is that strategic mutational movements are able to shift swarm parameters toward the improvements identified by the three key KPI characteristics of the wireless communication domain. More precisely, if we include fluctuations, the improvements should occur mainly within the six statistical moments that are used for multiobjective KPI monitoring.
At this juncture, we enter the decision-making phase, which is based on the observation of the system's macrostates. The swarm macrostate describes the system configuration observed, which can compute the macroscopic perspective, i.e., systemic characteristics. The approach processes the KPI aspects of the environment mentioned. Future decisions then have quantitative consequences for the parameters capable of mutation, which may change after testing the inequalities assessing the improvements according to the defined indicators Ind i type ,mut,i order = 1(AM i type , I , i order < AM i type , II , i order ), where 1(. . .) is the indicator function with binary outputs that treat the logical inputs 1(True) = 1 , 1(False) = 0. Formally, all the indicators can be covered by a symbol Ind i type , mut, i order , where i type ∈ {D, C, F}, and i order ∈ { 0, 1}. Thus, the information can also be organized into binary 6-tuples Ind D,mut,0 , Ind D,mut,1 , Ind C,mut,0 , Ind C,mut,1 , Ind F,mut,0 , Ind F,mut,1 .
Here, the enumeration and meaning of the components are consistent with (31). Two constraints (k = 1, 2) are imposed on the weights w D,mut,i w ,k + w C,mut,i w ,k + w F,mut,i w ,k = 1 , which preserve the sums, except for the i w = 0 strategy corresponding to the nonmutating reference. The peculiarity of strategy i w = 10 stems from the fact that it describes the weight system with components selected from rational numbers. Each i w ∈ {1, 10} is suggested to quantify the indicators of KPI trends. The mutations are determined by the pair of predecessors R mut,0 and R mut,1 . These results are transformed below into the respective mutation amplitudes r MPar ∈{ rP D , r h C , . . .} [see (36) (34) are obtained. These parameters include the pseudorandom part Rand _Unif [−1,1] MPar generated uniformly within [−1, 1]. The subscript of Rand_Unif MPar emphasizes the parametric correspondence. Due to concerns that evolution may destabilize the integration of differential equations, the parametric boundaries Bnd MPar,down and Bnd MPar,top are introduced. Thus, the parametric mutation is given by for the specification of r MPar by means of R mut,0 and R mut,1 calculated using (30). Here, the differences in R variants might offer more contrasting results for MPar new,interm .

A. PRELIMINARIES AND SIMULATION SETUP
The complete protocol for simulation and the algorithmic blocks represent different perspectives of the description (see Tab.4). The full simulation is summarized in Algorithm 2: its principles and components are discussed in this section. As already mentioned, OU processes with a fixed equilibrium system of coordinates are used to describe the activity of end-users. The equilibrium positions are shown in Fig.3 equations. The stochastic OU process was simulated using the standard Euler-Maruyama method, while for the path integration, as a starting approach, we used the elementary Euler approximation scheme. 1 Numerically, however, a hybrid extension exists in which the virtual forces are applied only in combination with the constraints (x j,D (t + t) − x j,D (t)) 2 + (y j,D (t + t) − y j,D (t)) 2 ≤ d 2 crit,sh defined by the critical spatial shift parameter d crit,sh . The constraints allow for improvement in the stabilization of the UAV motion when exposed to parametric changes. The sequence of the repeated triplets of blocks (IG_E, IG_DC, IG_Lα) continues with the  calculation of KPIs. The mechanism of slow parametric changes adjusts the model behavior to the emerging new conditions. It depends on the decision denoted as DM_. . . . The decisions are given by a sequence of algorithmic blocks that process the current information about the average values (AM . . .) in a sliding window of the extent N me05 . The abbreviation DM_AI and function are derived from the combined use of AM ... and Ind ... . The alternative DM_WR uses weights to determine R mut,0 and R mut,1 (see (30)). The block DM_RX uses r MPar , taken from (36), to stochastically update the parameters represented by MPar new,interm → MPar new → MPar according to the mutation scheme given by (35). To achieve the desirable precision of the mean characteristics (KPIs), the simulation was restarted N rep times for the parameters listed in Tab. 3. The results were analyzed following common methods for Monte Carlotype techniques. The corresponding nonequilibrium averages were calculated for the selected number of mutation steps st mut associated with mutations epochs MEp(.).
Repeating the simulation of UAV swarm initialization (see Algorithm 2 and Table 2) affords an opportunity to obtain the dependence on MEp(.) and the strategies. The outputs can be organized in accordance with the selected KPI indicators, supplemented by the MEp(.) and i w indices. The evolutionary-neutral benchmark strategy i w = 0 is used to obtain the relative averages (relative improvement compared to the nonevolving system) rel SC (i w ) = SNR C (i w )   The selection of the top ten unicriterial outcomes stems from the sorting of rel SC . Without undermining the level of fairness, the situation is almost six times better. We have indicated, on the basis of a certain order, the importance of i w = 4 (the largest item in the column).
We represent a baseline scenario in the simulation scope as a UAV swarm where the parameters do not mutate but remain constant. Table 6 indicates a substantial balance between the effect of the KPI properties. The inequalities in their impacts are considered determinants of our methodological focus. Consider the case of MEp(.) = 14, i w = 1, which is contrasted to a nonevolutionary condition. The rel SD result shows an increase of 80%, with a 22% gain in rel SC and only a 2.8% loss in rel FE . In general, these differences can be explained by a multiobjective concept that assumes the existence of a Pareto front, that is, a compromise boundary identifying the geometric hypersurface in which we can TABLE 8. List of the ten best values in the sense of high rel FE . The exceptionality of strategy i w = 9 with a 2.7% increase in fairness compared the swarm behavior in the absence of evolution (i w = 0).

FIGURE 4.
The nonequilibrium represented by SINR D , SNR C , FAIR E . Calculated for i w = 1. The saturation becomes apparent at the critical MEp MEp crit 8. The characteristics are quantitatively and qualitatively very similar to those obtained for i w ∈ { 4, 9}. We do not need to know the specifics of the dependencies; we only need to see how they look in general. improve a selected KPI by degrading the remaining KPIs. Several other investigations, such as [43], have discovered that trade-offs play a similar role in cost-benefit evaluation. Regarding the simulated effects, the proposed solution has robust improvement capability compared to the evolutionaryneutral ( nonevolutionary) strategy i w = 0. The collection of the relative quantifiers provides important results in the corresponding additional Tables 7 and 8. Three outstanding strategies i w ∈ {1, 4, 9} are based on the ranking of relative characteristics. The dynamics of i w = 1 are depicted in Fig. 4. After an analogous examination of i w ∈ {4, 9}, we found that the relative differences between the effects of the strategies is visually difficult to discern in these plots, so we omitted them. In terms of applicability, strategies in the list have achieved SINR D = 7 dB and SNR C = 30 dB while However, there are also epochs of mutation that are extraordinary in terms of the characteristics of the swarm configurations. The narrowed range MEp ∈ {8, 9} can be regarded as anomalous or critical in the reaching of the Pareto front. Behind this is an inevitable deterioration of some KPIs being accompanied by improvements in others. The findings become clearer with a further view through Fig. 5. The figure shows the critical diversification of dependencies on MEp in the space of strategies. In particular, the figure highlights the fact that once a critical number of evolutionary changes, measured in MEp, is reached, we observe a rapid diversification in the system's response to different strategies. It is the separability of the upper and lower branches, which occurred only in the case of rel SD , that becomes the impetus for building clustering procedures for strategies.
The characteristics rel SC and rel FE do not exhibit late MEp ramification in the same way that rel SD does. Based on three complementary characteristics of the obtained subfigures, the critical point MEp crit = 8 corresponding to 794 mutations (see Table 5) can be estimated. By means of numerical simulations we have shown that the proposed method based on the application of virtual forces and parametric FIGURE 6. Comparison of KPI characteristics for different strategies along the MEp coordinate. The polylines are guidelines for the eye to distinguish the differences in the impacts of the strategies. The figure points to the fact that for values above MEp crit 8, the diversity of responses to mutation strategies increases. Thus, along the MEp coordinate, early (subcritical) and late (supercritical) epochs are distinguishable. In all three cases (a), (b), and (c) of the panels, we can recognize the ramification of characteristics. Panel (b) shows the transition from less to more diverse in the strategic space. According to polylines in panel (c), we can distinguish roughly two categories in terms of lower (i w ∈ { 1, 2, 3 }) or higher strategic branches (i w ∈ {4, . . . , 10 }). changes in mutation is capable of palpable improvement compared to the evolutionary-neutral system (with fixed parameters). Moreover, we have identified the parametric impact associated with the Pareto front.
Appropriate statistical characteristics and measures were identified to represent the simulation results with an emphasis on gradual parametric changes. Clearly, the evaluation had to be carefully planned in advance to assess the impact of the respective strategies. A statistical comparison of the strategies was conducted for the simulation of reference swarms, where the parameters do not mutate.
The evaluation of strategic choices indicated that depending on the conditions, the evolution of the swarm not only enhances the effectiveness of the selected KPIs but also causes their eventual worsening. In the long run, the simulated system tends toward the Pareto front by the actions of three simultaneously monitored KPIs and evaluated characteristics. The model offers a trade-off, which explains why many events with reduced KPI characteristics occur. The general bias induced by mutations, which tends to take on values greater than one, is a beneficial aspect, particularly in the case of rel SC and rel SD . Relative fairness values, on the other hand, are more equally dispersed around one.

VII. CONCLUSION
A computational learning paradigm inspired by evolutionary principles has been proposed to achieve the self-optimization of a UAV-based wireless communication system. The flexibility of instantaneous UAV movements is achieved using a combination of virtual forces and a set of persistent and mutable parameters. To reflect the exogenous dynamics of the system, a model of user mobility is implemented based on Ornstein-Uhlenbeck processes. Simulations are modeled for eleven different parametric mutation strategies. According to the results, a significant improvement in KPIs is achieved by the proposed approach compared to the evolutionaryneutral system. Furthermore, the results discuss aspects of the Pareto front, which is viewed as the set on which selected performance indicators can be improved without simultaneously worsening the other indicators.
In future research, we intend to consider theoretical potential bridges of the evolutionary learning heuristic proposals to existing machine learning architectures, particularly in the context of xAI framework development.