Enhanced Relative Localization Based on Persistent Excitation for Multi-UAVs in GPS-Denied Environments

Intelligent unmanned aerial vehicles (UAVs) have been applied for civil and military uses. Relative localization (RL) is crucial for multi-UAVs to accomplish complex tasks successfully and safely. In global positioning system (GPS) denied environments, where accurate or meaningful location information is hard to obtain, persistent excitation based RL is a promising approach for multi-UAVs to achieve RL without any needs of external infrastructures. However, for many cases, existing persistent excitation based RL method suffers precision loss, error accumulation and divergence. This article tackles these issues, and proposes an enhanced approach to ensure the practical usage of persistent excitation based RL. Synchronized sensor sample prediction is introduced to confine and reduce RL error, and RL estimation is redesigned to avoid RL error accumulation. We evaluated our solution by simulating various scenarios. The results show that the proposed approach can effectively decrease RL error and prevent RL error accumulation.


I. INTRODUCTION
Unmanned aerial vehicle (UAV) equipped with different sensors and actuators can perform various tasks. Due to its autonomous control, low cost, flexibility and reusability, it has been widely used in research, military and civil life [1]. Recently, swarm that consists of multiple UAVs has gained more attention, in order to fulfill complex missions in dynamic environments [2].
Localization is one of the decisive factors of autonomous UAV navigation, and relative localization (RL) among UAVs is crucial for multi-UAVs cooperation. Currently, the global positioning system (GPS) is still the main solution of localization, but it is generally unreliable for indoor environment, urban alley, mountain/forest area and battlefields. For such GPS-denied environment, if the GPS signal is completely lost, it is considered as GPS-Refused. Otherwise, it is con-The associate editor coordinating the review of this manuscript and approving it for publication was Francesco Mercaldo . sidered as GPS-Restricted [3], if large localization error or intermittent GPS signal occurs. For both scenarios, extra effort is required to ensure accurate localization and/or relative localization of UAVs.
To obtain location information in GPS-denied environments, some studies proposed solutions relying on external infrastructures, such as the global system for mobile (GSM) [4], radar [5], [6] and radio [7], [8], which provide references for UAVs to estimate their own locations. Although reasonable localization precision can be achieved with these infrastructures, sometimes it is unrealistic to properly build them in the wild or even battlefield in advance, and their cover ranges are normally limited. Some other work performed localization by utilizing pre-prepared knowledge, which includes digital elevation map (DEM) [9]- [11], magnetic anomaly map [12], and database built through simultaneous localization and mapping (SLAM) [13], [14]. The main limitation of these approaches is that significant amount of computation and memory resources are required for feature labeling, storing and matching. Moreover, the preserved knowledge is very sensitive to dynamic environment changes.
Relative localization in GPS-denied environment can be achieved by calculating the differences of the obtained UAVs' global locations which are broadcasted and shared in the swarm. But more directly, it can be done through radio ranging and visual positioning [15], [16]. Persistent excitation based RL estimation by ultra-wideband (UWB) ranging and velocity measurement [17]- [19] is one of the latest work done in this field, and its effectiveness has been validated through outdoor experiments. However, the proposed method did not thoroughly consider the conflicts among bounded sensor output frequencies, UWB bandwidth and UAV's navigation speed, which results in RL accuracy loss and error divergence in some scenarios. Furthermore, when UAVs fly with the same velocity, e.g. flying in formation, unacceptable RL error accumulation can be observed.
In this article, we focus on the RL problem in GPS-denied environments, and introduce enhanced persistent excitation based RL that solves the existing issues mentioned above. For each UAV, RL related onboard sensors include the UWB module and the inertial measurement unit (IMU), used for measuring relative distance and velocity respectively. To improve RL accuracy, we apply sensor sample synchronization and prediction based on recent sensor measurements, despite of the limited sensor output frequency and UWB bandwidth, and even the original method divergence, it can also work. Meanwhile, a novel equation for calculating RL estimates is proposed to eliminate RL error accumulation when UAVs maintain the same velocity. According to the simulation results, our approach leads to notable RL error reduction comparing with the state-of-the-art, and effectively avoids RL error accumulation.
The main contributions made by this article are summarized as follows: • We discovered two limitations of the existing persistent excitation based RL method and validated them by theoretical analysis and simulation evaluation; • An enhanced approach of RL that is precise and practical for multi-UAVs is proposed; • Sensor sample synchronization and prediction is integrated in RL to ensure its accuracy; • Calculation of RL estimation is redesigned to prevent RL error accumulation. The remainder of the paper is organized as follows. Section II summarizes recent research on RL for multi-UAVs. Section III illustrates issues of the existing RL methods that motivate this article. Section IV proposes the enhanced persistent excitation based RL approach, featuring with sensor sample synchronization and prediction and the new equation for calculating RL estimates. Section V shows the evaluation results. Finally, we conclude in Section VI. For simplicity concerns, only 2D space (i.e. multi-UAVs fly at the same height) is considered in this article.

II. RELATED WORK
We firstly summarize some recent work on RL in GPS-denied environments, and then discuss more details about one latest research in this field, i.e. persistent excitation based RL.

A. RL IN GPS-DENIED ENVIRONMENT
As shown in Figure 1, RL approaches for multi-UAVs in GPS-denied environments can be classified into two categories.
For the first category, RL is achieved by following three steps: (1) each UAV obtains its own global location, (2) UAVs broadcast and share their global location in the swarm, (3) each UAV calculates relative location accordingly. Among them, the first step is the most challenging. Many proposed solutions rely on external infrastructures, especially for GPSrefused environments. Hamer and Andrea obtained global position with the assist of a ground anchor network consisting of UWB modules [8], so as some other similar research [20]- [24], and Xu et al. expanded the anchor network by considering onboard UWB modules on UAVs [7]. For GPS-restricted environments, multi-UAVs coordination has been considered to improve GPS's robustness and accuracy [25]- [27]. Benini et al. fused IMU and GPS outputs, and improved positioning accuracy with Extended Kalman Filter (EKF) [28]. SLAM has also been utilized, which generates navigation maps in real-time for estimating UAV's location [13], [14].
For the second category, direct measurements with various sensors are taken to estimate relative locations among UAVs. Maamar et al. combined radio ranging with visual direction finding to achieve RL [15], while Liu et al. combined radio ranging with velocity measurement [29]. Saska et al. adopted onboard cameras to estimate relative positions [16]. Nguyen et al. installed multiple UWB modules on UAVs, to estimate RL with the unmanned ground vehicles (UGVs) [30], [31]. Persistent excitation based RL to be discussed in the next section is another typical example.

B. PERSISTENT EXCITATION BASED RL
Persistent excitation based RL [18], [19] has been proven as a promising way of estimating relative locations of multi-UAVs in GPS-denied environments. As illustrated in Fig. 2, we consider three UAVs. For UAV i , its moving inertial frame F M i VOLUME 8, 2020 is consistent with the global inertial frame F I . By utilizing the onboard IMU and UWB modules, each UAV respectively measures its velocity (e.g. UAV i 's velocity v i,t at time step t) and relative distances to others (e.g. relative distance d ij t between UAV i and any other UAV j at time step t), with a synchronous clock in the swarm system. According to the velocity information sent along with the UWB request message through UWB ranging, the relative velocity of any two UAVs can be calculated. Based on the relative distances and relative velocities obtained, the real RL X ij i,k is estimated accordingly, asX ij i,k , which illustrates the estimation RL between UAV i and UAV j at time step k. The communication of multi-UAV is a distributed network, such as in [32], [33], including air-air and air-ground communication, which deeply influence the performance of multi-UAV system. In this article, we force on the RL between different UAVs. The unique data needed to communicate is UAV velocity, which could be transmitted in UWB request message. In order to highlight the research focus, we will not further discuss the distributed multi-UAV networks.
Let us consider two different UAVs in the swarm, e.g. UAV i and UAV j . Taking the relative distance d ij k and relative velocity v ij i,k at the time step k as inputs, for UAV i , its RL estimate to UAV j at the time step k +1, i.e.X ij i,k , can be calculated with (1), according to [18].
Here,ḋ T is the sampling period, and γ ∈ R + is the tunable convergence factor.
The process of UAV i estimating its relative location to any other UAV j at the time step k + 1 in [18], i.e.X  • v i,k : velocity of UAV i measured by the onboard IMU at time step k; • v j,k : velocity of UAV j that UAV i receives through UWB ranging at time step k; It has been proven that if T satisfies the condition shown in the following equation (2), RL estimation error Assume that UAV i obtains its RL estimates to UAV r and UAV j , i.e.X ir i,k andX ij i,k respectively, according to (1). Based on them, UAV r 's RL estimate to UAV j can be inferred, which is considered as indirect RL estimation, i.e.X rj i,k . If both direct and indirect RL estimates are taken into account, RL fusion estimate can be calculated with improved RL accuracy [18]. Enhancing direct RL estimation will benefit RL fusion estimation as well.

III. MOTIVATION
After carefully studied and examined on the latest persistent excitation based RL method, we discovered several limitations that affect its effectiveness in practice. In this section, we address them with theoretical analysis and evaluation results. These discovered issues motivate us to propose the enhanced approach introduced in Section IV.

A. REDUCING RL ERROR
In order to satisfy (2) and thus ensure RL error less than the constant C, the sampling period T needs to be upper-bounded [18]. Its maximum value depends onv,δ and γ .
On the other hand, to estimate RL with (1), during each sampling period T , the relative distance and relative velocity between any two UAVs need to be obtained. Therefore, T 's lower-bound is mainly determined by three factors: (1) the amount of UAVs in the swarm, (2) IMU's velocity measurement frequency, and (3) the UWB dialogue time required for UWB ranging that measures relative distance and passes velocity information between any two different UAVs. Among them, the last two factors are sensor-dependent. Due TABLE 1. The averages of RL estimation error X 01 − X 01 , X 02 − X 02 , X 12 − X 12 and X − X shown in Fig. 3 for UAV 0 , UAV 1 and UAV 2 fly randomly and independently, with different settings onv and γ (T = 0.025s).
to the restricts on cost, size and weight, onboard sensors normally have limited sampling rates. For example, IMU with three single-axis accelerometers and three single-axis gyroscopes, normally has a maximum output frequency as 100Hz. One of the most popular UWB module, i.e. PulsON 440 (ranging error less than 3cm), using two-way time-offlight (TW-ToF) ranging method, has a maximum ranging frequency around 125Hz, which depends on its ranging dialogue time (no less than 8ms).
If T 's lower-bound required for measuring all relative distances and relative velocities of any two UAVs in the swarm exceeds T 's upper-bound shown in (2), unacceptable RL errors will occur. Let us consider a small swarm of 3 UAVs, whose ID are set as 0, 1 and 2 respectively. Assume that 8ms ranging dialogue time and minimal 3 times UWB ranging are required, T should be no less than 24ms. With γ = 0.1,v = 15m/s andδ = 0.5m/s, Equation (2) is no longer satisfied.
Considering T = 25ms, we evaluated the RL estimation error by simulation. Note that T 's value adopted here is an optimistic assumption to make it ideally small. In real applications, the sampling period supposes to last much longer, causing (2) even harder to satisfy whenv increases. Table 1 summarizes the averages of RL estimation error in one simulation as the evaluation results with different configurations onv and γ , and Fig. 3 shows how RL estimation error changes in three UAVs random flight simulation. According to Fig. 3 (a)-(c), with γ unchanged, RL error and its amplitude grow significantly asv increases. When v = 15m/s and equation (2) becomes invalid, the RL estimation error fails to converge, which is unacceptable. To make (2) satisfy even whenv = 15m/s and thus confine the RL error, we reduce γ to 0.03 accordingly, as shown Fig. 3 (d). Although it helps reducing RL error, smaller γ also leads to slower RL error convergence, comparing to Fig. 3 (a) and Fig. 3 (b).
Our enhanced approach overcomes the limitation discussed above, which reduces RL estimation error and meanwhile maintains fast RL error convergence. It is introduced in detail in Section IV-A.

B. AVOIDING RL ERROR ACCUMULATION
Relative movement among UAVs is the essential of persistent excitation based RL, which generates excitation and assists RL estimate correction. Based on the Lyapunov stability theory, it has been proven that the excitation exists only when relative velocities v ij i,l in m continuous time steps (i.e. l = k − m + 1, . . . , k) are not linearly dependent [18]. However, if multi-UAVs fly in fixed formation with same velocity, v ij i,l at m continuous time steps are considered as 0 and become linearly dependent. Consequently, the excitation is lost. Meanwhile, as IMU's measurement error of velocity (precision: 0.5m/s) is much larger than UWB ranging (precision: 0.05m), significant RL error accumulation may occur.
We also observed the RL error accumulation analyzed above by simulation. After three UAVs fly randomly in the first 60 seconds and then form a formation with the same velocity < √v , √v >,v = 5m/s, the already converged RL error starts to accumulate, as shown in Figure 4.
To avoid RL error accumulation, we redesigned the equation for calculating RL estimates, which will be further explained in Section IV-B.

IV. METHODOLOGY
In this section, we propose enhanced persistent excitationbased RL, targeting the motivations discussed above. Considering the limited load of UAV, our method just need UWB module to measure relative distance and embedded IMU to measure velocity. Specifically, we apply sensor sample synchronization and prediction to relieve the conflict raised due to high flight speed and limited sensor sample rate, and thus ensure RL estimation error is always upperbounded. RL estimate calculation is redesigned to prevent RL error accumulation happened when multi-UAVs fly with same velocity and thus the excitation for RL error correction is missing.
Algorithm 2 gives our enhanced RL approach, which estimates the relative location from UAV i to any other UAV j at the time step k + 1, i.e.X ij i,k+1 . Compared to [18], we introduce interpolation, interpolating, extrapolation to expand the sampling data set, and redesigned RL estimation to prevent the RL error accumulation. The input of the algorithm includes: • N: the amount of UAVs in the swarm; • P: the amount of sensor measurements required for performing sensor sample prediction; . The curves of RL estimation error X 01 − X 01 , X 02 − X 02 , X 12 − X 12 and their average X − X during the simulation (from left to right respectively) when UAV 0 , UAV 1 and UAV 2 fly randomly and independently, with T = 0.025s and different settings ofv and γ listed in Table 1. • M: the amount of sensor sample predicts generated at each time step; • d ij k : relative distance between UAV i and UAV j at time step k measured by onboard UWB module; • v i,k : velocity of UAV i measured by onboard IMU at time step k; • v j,k : velocity of UAV j that UAV i receives through UWB ranging at time step k; The rest of the algorithm will be further explained in the following sections.

A. SENSOR SAMPLE PREDICTION AND SYNCHRONIZATION
As discussed in Section III-A, sampling period T is lowerbounded by the sample rate of onboard IMU and UWB modules. Meanwhile, for multi-UAVs, increasing the flight speed reduces T 's upper-bound required to confine RL error (shown in (2)). Given predefined onboard sensor sample frequency, to ensure a valid T exists for high-speed navigation (i.e. T 's lower-bound is no greater than its upper-bound), we ''increase'' the sensor sample rate by generating new sensor samples through prediction based on the most recent sensor output history. Reducing γ , on the other hand, is not considered, in order to achieve fast RL error convergence.
UAVs' velocities are continuous and derivable, so as relative velocities and relative distances among UAVs. Based on recent samples of velocity and relative distance measured by sensors, we adopt interpolation, interpolating and extrapolation techniques to expand and synchronize the corresponding sensor sample sets. For example, as illustrated in Fig. 5 Classic interpolation can generate piecewise linear curve, hermite curve, cubic spline curve, and many others. Based on the generated curve, interpolating or extrapolation are applied to calculate sensor sample predicts. In our implementation, to reduce the overall complexity of the enhanced RL approach, we apply interpolation, interpolating and extrapolation on piecewise linear curve. Generally, the more the sensor outputs collected and used for interpolation, higher precision the sensor sample prediction is, as Figure 5(b) shows.
In Algorithm 2, RL estimate is calculated with the generated sensor sample predicts. After sufficient P samples of relative distance d    based on sensor measurements, interpolation is performed with the latest P samples (see Line 9). We apply interpolating to generate M − 1 sample predicts of relative distance d ij k−1,r and relative velocity v ij i,k−1,r for the time step k (see Line 10). The change rate of relative distanceḋ ij k−1,r is calculated based on the gradient of the curve of relative distance (see Line 11). Then, RL estimateX ij i,k is recalculated (see Lines 12-16). Next, we apply extrapolation to generate M − 1 sample predicts of relative distance d ij k,r and relative velocity v ij i,k,r for the next time step k + 1 (see Line 17), and calculateḋ ij k,r (see Line 18). Lastly,X ij i,k+1 is estimated (see . Here, P is determined according to the adopted interpolation algorithm, and M is properly chosen to ensure T /M is upperbounded by 1 γ (2v+δ) 2 .

B. RL ESTIMATION REDESIGN
When multi-UAVs fly in fixed formation with the same velocity, calculating RL estimate based on (1) will result in significant RL error accumulation, due to the lost of persistent excitation, as discussed in Section III-B. To solve this issue, the RL estimation process firstly needs to know when samevelocity navigation happens and RL error starts to accumulate, then corrects the accumulated RL error with real-time sensor measurements of relative distance d ij k . We design the operator S 1 , as shown in (3), to find out whether UAVs are flying with the same velocity, here, p = 3/T is used to synthetically considering for a period of three seconds. If so, k r=k−p+1 ḋ ij r 2 will be close to 0, and VOLUME 8, 2020 thus S 1 approaches to 1.
When RL error starts to accumulate, we have | X ij i,k − d ij k | > µ ≥ 0, in which µ is the threshold of the accumulated RL error. In many simulations, we have found that smaller µ may bring the continuous accumulation of RL errors and bigger µ brings sawtooth wave in the curve of RL errors. So we found that µ = 1m is most suitable. We design another operator S 2 , as shown in (4). When both S 1 and S 2 approach to 1, we believe RL error accumulation occurs.
To prevent RL error accumulation from affecting RL accuracy, we utilize the relative distance d ij k measured through UWB ranging for RL error correction. Comparing to the relative velocity v ij i,k , d ij k provides higher precision. The redesigned equation for calculating RL estimates is shown in (5), which integrates operators S 1 and S 2 with (1).
According to (5), for different scenarios, RL estimation is calculated accordingly: • if UAV i and UAV j fly with different velocities (S 1 = 0), then equation (5) falls back to (1); • if UAV i and UAV j fly with similar or same velocity, but RL error has not accumulated yet (S 1 > 0, S 2 = 0), then equation (5) falls back to (1); • if UAV i and UAV j fly with similar or same velocity and RL error has accumulated (S 1 > 0, S 2 > 0), the measured relative distance d ij i,k is used to correct RL estimation. In Algorithm 2, equation (5) is adopted to calculate RL estimate, as shown in Lines 6, 14 and 20.

C. COMPUTATIONAL COMPLEXITY ANALYSIS
Since the persistent excitation based RL algorithm is enhanced by sampling prediction, and this method is running on a UAV, it is necessary to analyze their computational complexity in one step and in one second. For Algorithm 2, the time cost is mainly spent on sample prediction and RL recalculation. In this article, we introduce linear interpolation method to fit the sample data curves, and its computational complexity is O(P), where P is a constant used to describe the number of interpolation data. And the complexity of interpolating are O(M ), where M is a multiple of data prediction, so as the complexity of extrapolation. For RL recalculation, the complexity of (3), (4) and (5) are O(1), correspondingly, the complexity of RL recalculation is O(2M ). If there are n UAVs in the neighborhood of UAV i , then the overall complexity of sample prediction is O(Pn + 4Mn) in one step and O((P + 4M )n/T ) in one second.
Compared with environmental perception and image processing in SLAM or other visual algorithms, the computational complexity of this enhanced RL estimation algorithm is negligible, that could be run on almost any CPU, even the mini computing unit loaded on UAV. And compared with the delay of data transmission, the increased calculation cost of this enhanced RL estimation algorithm can still be ignored, that just have little influence on the real-time nature of the RL estimation output. Thus, we think that this enhanced algorithm is suitable for RL estimation of multi-UAVs.

V. EXPERIMENT
We demonstrate the effectiveness of our enhanced persistent excitation based RL approach (tagged as Enhanced) by simulation, comparing to the original work [18] (tagged as Baseline). Three UAVs with their ID set as 0, 1 and 2 respectively are considered. Each UAV carries onboard IMU and UWB modules, and can perform UWB ranging measurements and communicate with each other.

A. EVALUATION ON RL ESTIMATION ERROR 1) SIMULATION SETUP
All three UAVs fly independently, with their initial positions and accelerations set randomly. For each simulation run, the control variable, i.e. the variation of UAV's acceleration at each time step, follows the Gaussian distribution N (δ a :0, 0.5). We set the minimum of v i,k asv − 2, and thus we have v i,k ∈ [v − 2,v]. Sensor noises d k , ḋ k and k are upperbounded by 0.05m, 0.05m/s and 0.5m/s respectively. The parameter P is set as 10, and when P < 3/T , p = P, otherwise p = 3/T .

2) RESULT ANALYSIS
We firstly consider the same set of configurations as discussed in Section III-A. Here, we still have T = 0.025s, and the value ofv and γ are listed in Table 2, which summarized the averages of RL estimation error and the improvement over Baseline with different configurations onv, γ , and M . To ensure (2) can be satisfied for different settings ofv and γ , we set M properly for our enhance RL approach as explained in Section IV-A. Especially, whenv = 15m/s, T = 0.025s and γ = 0.1, M > γ 2v +δ 2 T ≈ 2.32. Therefore, we consider M = 5 or M = 10 during the simulation. The simulation results are illustrated in Fig. 6-10 and summarized in Table 2. The frequent dynamic changes of the curves observed in these figures reflect the randomness of UAVs' trajectories, relative distances and relative velocities, due to the evaluation setup described in Section V-A1. Based v = 5m/s, γ = 0.1). From left to right, X 01 − X 01 , X 02 − X 02 , X 12 − X 12 and their average X − X are given.  on the experimental results, we see Enhanced outperforms Baseline mainly in the following three aspects: • Enhanced introduces less RL estimation error comparing to Baseline, when the same configuration ofv and γ is adopted. According to the curves shown in Fig. 6, Fig. 7 and Fig. 10 and Enhanced's improvements on RL error reduction listed in Table 2, for example, with M = 5, Enhanced decreases the RL estimation error that VOLUME 8, 2020 v = 15m/s, γ Baseline = 0.03, γ Enhanced = 0.1). From left to right, X 01 − X 01 , X 02 − X 02 , X 12 − X 12 and their average X − X are given. Baseline brings by more than 20% on average; and when M = 10, the Enhanced's improvements on RL estimation precision goes up to 28.3%. The most significant reduction on RL estimation error made by Enhanced over Baseline occurs when M = 10,v = 10m/s and γ = 0.1, which is 32.0%.
• Enhanced overcomes one of Baseline's fatal weakness, and continues to provide valid RL estimation, even when equation (2) can not be satisfied for some configurations, such as T = 0.025s,v = 15m/s and γ = 0.1. Recall that for this configuration, Baseline causes divergent and unacceptable RL error, as discussed in Section III-A and shown in Fig. 3. However, according to Fig. 8, Enhanced works effectively and provides RL estimates with the average error less than 7.9m.
• Enhanced ensures faster RL error convergence comparing to Baseline, with only little sacrifice of the accuracy. By changing γ from 0.1 to 0.03, Baseline can confine the RL estimation error whenv = 15m/s, but it also results in longer time for the RL estimation error to converge. According to Fig. 9, without reducing γ , Enhanced still succeeds in decreasing the RL estimation error efficiently. It makes X 01 − X 01 , X 02 − X 02 and X 12 − X 12 less than 10m in 20s, 60s  and 9s respectively, which is 1.65x, 1.32x and 3.89x faster than Baseline to achieve so. Meanwhile, around 4% RL estimation precision loss on average is observed comparing to Baseline. Therefore, careful choice of γ needs to be made for tradeoff between RL error convergence speed and RL estimation precision. If faster convergence is the highest priority, Enhanced ensures it with acceptable precision loss; otherwise, by reducing γ , Enhanced provides better RL estimation precision, as shown in Fig. 10.
To further verify Enhanced's effectiveness over Baseline, we consider more configurations of largerv, and set γ = 0.03, M = 10. Fig. 11 gives the average RL estimation error obtained by Baseline and Enhanced and improvements on RL error reduction made by Enhanced over Baseline. We see that for all thev evaluated, Enhanced brings less RL estimation error comparing to Baseline. With γ = 0.03, when Baseline fails to converge RL error forv = 30m/s, Enhanced confines RL error successfully.
To further discuss the influence of RL errors byv and γ , we consider more configurations of differentv, γ , and set M = 10. Fig. 12 gives the average RL estimation error obtained by Baseline and Enhanced. It illustrates that higher v and bigger γ result in more RL estimation error, and the Enhanced method will get better performance in higher speed flight scenes for multi-UAV system.

B. EVALUATION ON RL ERROR ACCUMULATION 1) SIMULATION SETUP
We reconsider the same simulation illustrated in Fig. 4(a) and discussed in Section III-B at first. Three UAVs navigate independently in the first 60 seconds and then fly with the same velocity < √v , √v >,v = 5m/s (tagged as TestCase1). Besides, we consider the following four scenarios that consist of several random navigation phases and samevelocity navigation phases, to demonstrate the robustness and stability of Enhanced in terms of eliminating RL error accumulation: • TestCase2: straight line trajectory with two phases of same-velocity navigation; • TestCase3: triangle trajectory with three phases of samevelocity navigation; • TestCase4: N-shape trajectory with three phases of same-velocity navigation; • TestCase5: T-shape trajectory with three phases of samevelocity navigation. For these four test cases, every random navigation phase lasts 60 seconds. The speed maintained during each same-velocity navigation phase is 10m/s. To isolate and highlight the effectiveness of the redesigned equation (5) on avoiding RL error accumulation, we set M = 1 for Enhanced and thus disable Enhanced's sensor sample prediction and synchronization which aims to optimize RL error reduction (already discussed in Section V-A). Moreover, we adopt the configuration T = 0.025s, γ = 0.03, and µ = 1m, to ensure equation (2) is always satisfied for both Baseline and Enhanced. Other configurations are same as described in Section V-A1.

2) RESULT ANALYSIS
For TestCase1, Fig. 14 gives the curves of RL estimation error and S 1 , S 2 and S 1 * S 2 for estimatingX 01 0,k+1 with Enhanced, respectively. Comparing with Baseline's performance shown in Fig. 4(b), we see that by adopting equation (5) instead of (1), Enhanced effectively prevents RL error accumulation when UAVs navigate with the same velocity staring from the 60th second (see Fig. 14(a)). For S 1 (see Fig. 14(b)), during the first 60 seconds of simulation, its value is mostly 0, and the few non-zero value suggests that UAV 0 happens to have little velocity difference with UAV 1 during the random navigation phase. It approaches to 1 when same-velocity navigation phase starts. S 2 (see Fig. 14(c)) grows when | X ij i,k − d ij k | exceeds the threshold µ. It changes more significantly during the random navigation phase, and becomes more stable during the same-velocity navigation phase, but it's not zero. Combining S 1 and S 2 , S 1 * S 2 > 0 indicates when does the RL estimation need correction, in order to avoid RL error accumulation. Based on Fig. 14(d), we see RL estimation correction happens anytime when UAVs have similar velocities and the differences between X ij i,k and the measured relative distance d ij k is larger than µ. Overall, by constantly correcting RL estimation on time, Enhanced eliminates RL error accumulation successfully. Fig. 13 shows the results of TestCase2, TestCase3, TestCase4 and TestCase5 respectively. For each case, we show the UAVs' trajectories, Baseline's performance, Enhanced's performance and the curve of S 1 * S 2 . It can be seen that for all these scenarios considered, with great robustness and stability, Enhanced is capable of preventing RL error accumulation while Baseline fails.

VI. CONCLUSION
In this article, we discovered two fatal limitations that existing persistent excitation based RL technique suffers, and proposed an enhanced RL approach that effectively overcomes these weaknesses. To confine RL estimation error, we consider sensor sample prediction and synchronization based on interpolation, interpolating and extrapolation. To avoid RL error accumulation, we redesign the calculation process of RL estimation, which has shown its advance through simulation. In the future, we plan to continue studying the RL problem for multi-UAVs in GPS-denied environments and consider other challenging problems. Conducting real-world outdoor experiments is also considered. He is currently a Professor with the Artificial Intelligence Research Center, National Innovation Institute of Defense Technology. He has written one monograph, more than 80 high-level articles, over 30 national invention patents, and one national industry standard. His research interests include distributed object middleware technology, adaptive software technology, artificial intelligence, and robot operation systems. He was a recipient of the Second Prize in National Science and Technology Progress Awards twice and the First Prize in Provincial-Level Scientific and Technological Progress Awards three times. He has presided over and participated in the National 863 Project, the National Key Research and Development Plan, the National Natural Science Foundation, Major Projects of Core Electronic Devices, and the High-End Generic Chips and Basic Software more than 20 times.