Model-free online motion adaptation for energy-efficient flight of multicopters

Limited flight distance and time is a common problem for multicopters.We propose a method for finding the optimal speed and sideslip angle of a multicopter flying a given path to achieve either the longest flight distance or time. Since flight speed and sideslip are often free variables in multicopter path planning, they can be changed without changing the mission. The proposed method is based on a novel multivariable extremum seeking controller with adaptive step size, which is inspired by recent work from the machine learning community on stochastic optimization. Our method (a) does not require a power consumption model of the vehicle, (b) is computationally efficient and runs on low-cost embedded computers in real-time, and (c) converges faster than the standard extremum seeking controller with constant step size.We prove the stability of this approach and validate it through outdoor experiments. The method is shown to converge with different payloads and in the presence of wind. Compared to flying at the maximum achievable speed in the experiments with a uniformly selected random sideslip angle, flying at the optimal range speed and sideslip on average increases the flight range by 14.3% without payload and 19.4% with a box payload. In addition, compared to hovering, flying at the optimal endurance speed and sideslip increases the flight time by 7.5% without payload and 14.4% with a box payload. A video can be found at https://youtu.be/aLds8LVfogk.


I. INTRODUCTION
Multicopters are used in a wide range of applications such as aerial photography [1], transportation [2], search and rescue [3], inspection [4], and agriculture [5], thanks to their low cost, ease of control, and high maneuverability.However, a primary limitation for current vehicles is their limited flight endurance and range [6].
One way to improve the limited flight range or endurance problem is through energy-efficient mechanical design.For example, in [7] a triangular quadcopter with one large rotor for lifting and three small rotors for control was proposed, which has the advantage of combining the energy efficiency of the large rotor and the fast control response of the small rotors.In [8], the authors designed a quadcopter with slightly tilted motors which has a better control authority over the yaw.This results in a lower variance in motor forces for yaw control.Because a motor's power is a convex function of its thrust, this design helps to reduce the total power consumption of the motors.Hybrid quadcopters which are able to do both aerial and ground locomotion, were introduced in [9] and [10]: 1 Authors are with the High Performance Robotics Laboratory (HiPeR-Lab) at the Department of Mechanical Engineering, UC Berkeley.{wuxiangyu,mwm}@berkeley.edu 2 Author is with the Hybrid Robotics Group at the Department of Mechanical Engineering, UC Berkeley.zengjunsjtu@berkeley.edu 3 Author is with the Aerospace Controls Laboratory at the Department of Aeronautics and Astronautics, MIT.atagliab@mit.eduwhen the vehicles operate in the ground locomotion mode on a flat ground, they only need to overcome the rolling resistance and use much less power compared to flying.A hybrid power system for multicopters consisting of a lithium battery, a fuel cell, and a hydrogen tank was introduced in [11], which enables longer flight time compared to traditional batteryonly power systems, thanks to the higher specific energy of hydrogen compared with the lithium battery.In [12], an inflight battery switching system was proposed, which enables a small quadcopter to dock an additional battery to a large quadcopter and increases its flight time.
Another category of methods focus on developing algorithms to reduce the power consumption of existing multicopters.By planning energy-efficient trajectories or by implementing energy-aware control algorithms, these approaches do not require design changes to existing hardware and are thus economical to deploy.For example, in [13] the authors proposed a method for finding the minimum-energy trajectory between a predefined initial and final state of a quadcopter, by solving an optimal control problem of the angular accelerations of the four propellers.This approach was extended in [14], where the fixed end-time trajectory optimization was extended to both free and fixed end-time solved with an indirect projected gradient algorithm to improve the numerical accuracy.Simulation results were shown to validate the effectiveness of the methods in both papers.In [15], the task of reaching a goal in a set of candidate goals while using the least amount of energy was investigated.The energy-efficient path planning algorithm was based on the model predictive control and disturbance from wind was considered.The authors showed that their method was able to reach the goal which required the least amount of energy in simulations and indoor experiments.In [16] and [17], the authors proposed energy-aware coverage path planning methods for photogrammetric sensing of large areas using multicopters.The methods find the optimal speed along the coverage path to minimize the energy usage during the mission.Outdoor experiments were conducted to validate their methods.
A necessary condition for model-based methods to perform well is accurate power consumption modeling.Power consumption models of multicopters can be derived by analyzing their electric and aerodynamic properties.For example, [18] [19] introduced power consumption models of the battery, electric speed controller and motor, and [20,Chapter 5] introduced the aerodynamic power consumption of the propeller based on the momentum theory.Besides, some researchers proposed data-driven models by selecting variables that affect the power consumption (e.g. the vehicle's speed and acceleration, wind speed, and payload weight) as inputs and finding their relationship to power consumption through experimental data [16] [21].
However, there are often hard-to-model effects on the vehicle's power consumption, such as changes in vehicle components' performance (e.g.batteries and motors) due to aging and temperature changes.In addition, the change in payload size, shape, or weight in applications such as package delivery and spraying (e.g., pesticides or fertilizer at farms) often requires reidentification of parameters in the power consumption model, which is time-consuming.The imperfections in the energy model could potentially be compensated using online data-driven methods.For example, in [22] the authors used an Extended Kalman Filter and in [23] the authors used Gaussian processes to estimate the correction terms in the vehicle's dynamics equations, which improved the control accuracy of the quadcopters.However, to the best of our knowledge, no such methods have been developed for the energy efficient flight of quadcopters yet, and their effectiveness and computational efficiency are thus still an open question.
The aforementioned difficulties in quadcopter energy consumption modeling motivates us to propose, to the best of our knowledge, the first model-free method for finding the flight speed and sideslip angle (i.e., angle between the forward direction of the vehicle and the relative wind) which achieve the longest flight time (endurance) or flight range given a predefined path.The method is based on a novel multivariable extremum seeking controller and does not require power consumption models of the multicopter.
Extremum seeking control is a model-free adaptive control technique for finding the local minimizer of a given, potentially time-varying, cost function by applying a persistently exciting periodic perturbation to a set of chosen inputs, and monitoring the corresponding output changes.A survey of the development of this control method can be found at [24].It has applications in areas such as maximizing the energy generation of wind turbines [25] and photovoltaic power plants [26], and maximizing the pressure rise in axial flow compressors [27].Its applications in robotics can be found in a literature survey [28].A common problem of extremum seeking controllers is their slow convergence speed, and we propose a novel multivariable extremum seeking controller with adaptive step size to improve it.In addition to the flight speed, it could also simultaneously find the optimal flight sideslip angle to achieve the longest flight range or endurance (time).
The major contributions of this paper are as follows: 1) We present a model-free adaptive method to find the flight speed and sideslip angle of multicopters that achieve the longest flight range or endurance.
2) The method is based on a novel multivariable extremum seeking controller with adaptive step size, which is computationally efficient and converges faster than the standard extremum seeking.3) We give a stability proof for the proposed controller via averaging and singular perturbation analysis.4) We validate the effectiveness of the proposed method in extensive outdoor experiments.The experiments demonstrate the proposed method's faster convergence compared with the standard method, robustness to payloads and wind disturbances.This is an evolved paper based on our prior work [29], [30].In contrast to the prior work, this paper presents: 1) A stability proof of the proposed multivariable extremum seeking controller with adaptive step size taking into account the vehicle's dynamics.2) Extensive outdoor experiments with practical real-world sensing instead of the previous work's indoor experiments with a motion capture system for state estimation.3) Applications of extremum seeking to time optimal flight in addition to range optimal flight, by searching for the optimal endurance flight speed and sideslip angle.4) Experiments and discussion about the energy cost from the extremum seeking controller because of perturbation.5) Experiments and analysis about the proposed method's performance under wind disturbances.

II. PROBLEM STATEMENT
In this work, we propose a method to find the most energy-efficient flight speed and sideslip angle to mitigate the common problem of the limited flight range and endurance of multicopters.
We choose to optimize these two variables because they affect the vehicle's power consumption and are typically additional (redundant) degrees of freedom in a multicopter's flight, where the flight missions require the vehicle to track specified geometric paths.Because the multicopter is usually not axisymmetric (especially when carrying payloads), flying with different sideslip angles affects the drag force faced by the vehicle and leads to different power consumption.The sideslip angle can be changed by changing the yaw angle.The flight speed also affects the power consumption of the vehicle: The goal of the controller is to find the optimal sideslip r β and rv to minimize the cost function y = (h • l)(r).The frequencies of the high pass and low pass filters are set, respectively, to ω hv and ω lv for speed, and ω hβ and ω lβ for sideslip.The scalar kv and k β are related to the step size of the extremum seeking controller and both of them should be positive numbers to minimize the cost function.The standard extremum seeking controller with sinusoidal perturbations does not have the step size adapter and the outputs of the low pass filters directly go to the integrator, while the remaining structure of the algorithm is exactly the same.The step size adapter is detailed in Section III-A2.
when the flight speed increases, the power consumption first decreases and then increases, which can be explained by momentum theory [20,Chapter 2.14].This predicts that the maximum flight endurance is achieved by flying at a suitable flight speed, rather than hovering.When our goal is to achieve the longest flight endurance (time), we want to minimize the consumed energy for a given time.As a result, the cost function for the optimal endurance flight is defined as the instantaneous electric power p e .When the goal is to achieve the longest flight range (distance), we want to minimize the energy consumed for a given distance.Thus, the cost function for the optimal range flight is the instantaneous electric power over speed p e /v (i.e.energy over distance), where v denotes the speed of the vehicle.
A model-free optimization method is preferable, which can handle hard-to-model effects (e.g., components aging and temperature change) and payload changes.This motivates us to use an extremum seeking controller to find the optimal flight speed and sideslip angle.The required inputs to the extremum seeking controller are the instantaneous energy cost and a userdefined geometric path.Its outputs are the vehicle's reference speed and sideslip angle commands, which are then used to convert the geometric path into a reference trajectory to be tracked by the low-level controllers.

III. MODEL-FREE SPEED AND SIDESLIP ADAPTATION
In this section, we introduce the novel multivariable extremum seeking controller with adaptive step size.It is able to achieve faster convergence than the standard extremum seeking controller with a fixed step size, by taking a smaller step size when the estimated gradient has a large magnitude or variance and vice versa.Vector variables and functions that map to vectors are written in boldface.Notations in this section are summarized in Table I.

A. Extremum seeking controller with adaptive step size
A block diagram of the proposed adaptive-step-size, multivariable, extremum seeking controller is shown in Figure 2. We define the state variables of the multicopter (relevant to our problem) as x = [v, β] T , where v and β are the speed and sideslip of the vehicle, respectively.The outputs of the extremum seeking controller are defined as r = [r v , r β ] T , where r v is the reference flight speed and r β is the reference flight sideslip.We assume a smooth control law α(x, r), so that the closed-loop dynamics of the speed and sideslip are represented by The cost function is represented by Like in [31], we make the following assumptions about the closed-loop vehicle dynamics and the cost function: Assumption 1.There exists a smooth function l : R 2 → R 2 such that f (x, α(x, r)) = 0 if and only if x = l(r).
Assumption 2. For each reference input r, the controller ensures that the equilibrium x = l(r) is locally exponentially stable uniformly in r.
Thus, we assume that we have a control law α(x, r), that can locally stabilize any of the equilibria that r may produce.
Assumption 3. The cost function (described in Section II) has a local minimum at 1) Gradient estimation: The extremum seeking controller approximates the gradient of the cost function and integrates the negative of the estimated gradient to minimize the cost [32].To approximate the gradient of the cost function, sinusoidal perturbations are added to the speed setpoint rv and sideslip setpoint rβ , where a v and a β are the speed and sideslip perturbation magnitudes and the ω v and ω β are the speed and sideslip perturbation frequencies.
The cost function's value y consists of low-frequency components (η v and η β ) and high-frequency components (y − η v and y − η β ).The cost is first high pass filtered to remove the low-frequency components and retain only the cost changes estimates of the second moments of qv, q β gv, g β output of the step size adapters a small positive constant preventing dividing by zero.γv, γ β cut-off frequencies for low-pass filters of q 2 v , q 2 β kv, k β positive constants related to the step-size of gradient descent δ, ω small positive constants used in the stability proof, see ( 10) constants used in the proof, related to constants without prime superscript, see (10) τ = ωt a time scale used in proof p(τ ) = p(t/ω) representation of p function under time scale τ Π the least common period of functions with frequencies of ω v and ω β because of the perturbations.These values are then multiplied elementwise with the demodulation signals where the demodulation signals' frequencies w v and w β are the same as their corresponding perturbation frequencies.We denote the results of the multiplications as ξ v and ξ β .If the cost function's value change is in phase with the perturbations, which means that the cost value increases as the inputs' values increase, ξ v and ξ β will be positive.If they are out of phase, the outputs will be negative.After this, ξ v and ξ β are sent to low pass filters, whose outputs are approximations of the cost function's gradient, denoted by q v and q β .

2)
Step size adapter: The difference between the proposed extremum seeking controller and the standard multivariable extremum seeking controller [31] is the step size adapters, which are defined as follows: where m v , m β are estimates of the second moments of the output of the low pass filters q v and q β , and is a small positive constant preventing dividing by zero.Equations in (6) are essentially first-order low-pass filters for q 2 v and q 2 β , and γ v and γ β denote their cut-off frequencies respectively.The idea is motivated by the adaptive moment estimation algorithm (Adam) [33], which is commonly used in the stochastic optimization of objective functions in machine learning, such as training neural networks [34], [35].
The adapters take in the output of the low pass filters q v and q β (the gradient estimates), and outputs g v and g β .They are then passed to the integrators to perform gradient descent.The effective step size for gradient descent is k v g v /q v for the speed optimization and k β g β /q β for the sideslip optimization, and the step size adapters change them by changing g v and g β .The second moments of the initial outputs from the low pass filters are used to initialize m v and m β in (6).
In (7), by dividing q v and q β with the square root of their corresponding second moments, the outputs g v and g β of the adapters will be approximately bounded by ±1, since l ] ≤ 1 (E denotes expected value, and q l being either q v or q β ).As a result, the descent rates for speed and slideslip are bounded by k v and k β .This can be understood as establishing a trust region around the current parameter value, beyond which the current gradient estimation can be inaccurate.In addition, the adapters output small values when the gradient estimates have large uncertainty (m v and m β are large) and vice versa, which makes the controller more robust to noise.

B. Stability Analysis
In this section, we present the stability proof of the novel multivariable extremum seeking controller with adaptive step size through averaging and singular perturbation analysis.A similar methodology was used in [31] to prove the stability of a single variable standard extremum seeking controller and was used in [36] to prove the stability of a multivariable Newtonbased extremum seeking controller.
1) System dynamics: By substituting the setpoint r with r + p(t), the closed-loop dynamics of the vehicle in (1) can be rewritten as The proposed extremum seeking controller's dynamics in Figure 2 can be summarized as The parameters for the extremum seeking controller are selected as where δ and ω are small positive constants, and ω v , ω β , ω hv , ω hβ , ω lv , ω lβ , k v , k β , γ v and γ β are positive constants.In addition, for this multivariable extremum seeking controller to work for both the speed and the sideslip angle simultaneously, their perturbation frequencies ω v and ω β should be distinct.
For the following averaging and singular perturbation analysis, we use the time scale τ = ωt.In addition, we define Then, the system dynamics in ( 8) and ( 9) with small perturbations can be rewritten as: where r = [r v , rβ ] T , p(τ ) = p(t/ω).
2) Averaging analysis: We first freeze the dynamics of the vehicle (8) at its equilibrium point x = l(r * + r + p(τ )), substitute it into (13) and get the reduced system where v(r From Assumption 3 we have that: To provide compact notations, we denote 2 v(0) = H for later discussion.The least common period of sinusoidal functions with frequencies of ω v and ω β is defined as Π.We first prove the stability of the reduced system using averaging analysis: Proposition 1.For the reduced system (14), under Assumption 3, there exists ā and δ such that for all a ∈ (0, ā), δ ∈ (0, δ), the reduced system dynamics (14) have a unique exponentially stable periodic solution of period Π, which for all τ > 0 Proof.The proof of Proposition 1 is shown in the appendix at the end of this paper.
This implies that the error terms rΠ v (τ ) and rΠ β (τ ) converge to an O(δ + a 2 ) neighbourhood of zero.The flight speed and sideslip found by the extremum seeking controller are periodic and converge to an O(δ + a 2 ) neighbourhood of their optimal values r * v and r * β (i.e.values that minimize the cost functions defined in Section II).
3) Singular perturbation analysis: We then analyze the full system (12) and (13).To provide compact notations, we define the state vector of the extremum seeking controller as z = [r v , rβ , q v , q β , ηv , ηβ , m v , m β ] T , and write (13) as By Proposition 1, there exists an exponentially stable periodic solution z Π (τ ) such that where L(τ, z Π (τ )) = l(r * + r + p(τ )).To convert the system ( 12) and ( 17) into the standard singular perturbation form, we shift the state z to get z = z − z Π (τ ) such that where The quasi-steady state is By substituting the quasi-steady state into (20) and we get the reduced model which has an equilibrium at the origin z = 0.The equilibrium has been shown to be exponentially stable in the proof of Proposition 1.In addition, we study the stability of the boundary layer model (in the time scale Since f (l(r), α(l(r), r)) = 0 according to Assumption 1, x b = 0 is the equilibrium of the boundary layer model (24).By Assumption 2, this equilibrium is locally exponentially stable uniformly in r.
Combining the exponential stability of the reduced model with the exponential stability of the boundary layer model, and using Tikhonov's theorem on the infinite interval [37,Chapter 11.3], we can conclude that the solution of ( 17) is O(ω)-close to the solution of the reduced model (22).Using the results of Proposition 1, we can then conclude that the error terms rΠ v (τ ) and rΠ β (τ ) converge to an O(ω + δ + a 2 ) neighbourhood of zero.In summary, the proposed extremum seeking controller is locally stable -starting from an initial condition near the cost function's local minimum, it will converge to a neighbourhood around that local minimum if the perturbation is sufficiently small and slow relative to the closed-loop dynamics of the vehicle, and if the Assumptions 1-3 hold.

IV. EXPERIMENTAL RESULTS
Outdoor experiments were conducted to demonstrate the effectiveness of the extremum seeking controller with adaptive step size to find the optimal flight speed and sideslip.The proposed method was shown to have better convergence speed than the standard extremum seeking control.It was also able to converge in the present of strong wind disturbances.An experiment video can be found at https://youtu.be/aLds8LVfogk.
We want to note the significance of the outdoor experiments in this paper compared to indoor experiments in our previous work of [29], [30]: 1) In our previous work, a motion caption system was used to measure the vehicle's position and attitude at very high accuracy (about 1 mm error for position and 1 degree error for attitude) and at 200 Hz frequency.In contrast, in the outdoor experiments, a GPS was used for position estimation, whose accuracy was at meter level with a much lower frequency (10 Hz).As motion capture systems are not available in most of the real-world applications, this new sensor setup with GPS shows that our proposed method is able to perform well under much larger state estimation variances compared with indoor experiments.2) Because of limited space, the multicopter was only able to fly a circular path of 2 m radius in previous indoor experiments.The centripetal force increased dramatically as the flight speed increased for such a small radius, contributing largely to the power consumption.Such an experimental setup is rare in real-world applications such as package delivery or surveillance, and made the vehicle's power consumption increase almost monotonically as the speed increased.In outdoor experiments, the centripetal force became much smaller due to much larger flight radius -a more realistic experiment setup.We were thus also able to find the speed and sideslip for optimal endurance flight, as the power as a function of speed and sideslip has a much deeper minimum.3) Experiments were conducted both on light wind and windy days, to see the effect of wind disturbances on the proposed method.Such real-world effects were not possible indoors.4) We used the standard, off-the-shelf PX4 firmware for the low-level control and state estimation of the vehicle, instead of using a custom firmware and control stack     in our previous work.This demonstrates the ability of the proposed method to be easily deployed on existing multicopters.

A. Experiment setup
The experiments were performed with a custom-built quadcopter with and without a box payload (as shown in Figure 1).The weight of the vehicle without the box payload was 0.9 kg, and the box weighs 0.1 kg and has a size of 180×115×80 mm.The distance between the hubs of the two diagonal motors is 330 mm and the propeller is 203 mm in diameter.The extremum seeking controller was run on an onboard computer (Jetson Nano), and an mRo Pixracer R15 flight controller ran the standard PX4 firmware [38] including the state estimator and low-level controllers.The low-level cascaded PID controller (corresponds to α(x, r) in Assumption 2) stabilizes the vehicle and thus satisfies Assumption 2 in Section III-A.Other low-level controllers satisfying Assumption 2 could also be used with our proposed method.The Jetson Nano and the Pixracer communicate through a UART link using mavros.The main reasons for running the extremum seeking controller on the onboard computer are for easier data logging and implementation.The computational power of micro controllers such as the Pixracer should also be able to run this algorithm, as it only requires several simple operations as shown in Figure 2. Removing the onboard computer could further save the energy, at the cost of not being to log data as easily.The experiments were conducted at a flat grass field at the Richmond Field Station, Richmond, CA (37.916588N, -122.336667E).
The control architecture for the vehicle is shown in Figure 3.The extremum seeking controller (with or without adaptive step size) takes in the desired geometric path and instantaneous range cost or endurance cost.The power measurement p e comes from a power module (Holybro PM06 v2) connected to the battery, and the speed measurement v comes from a state estimator based on a GPS (Zubax GNSS 2), a range finder for measuring the flight height (Beneware TFmini-S) and an IMU (Invensense MPU-9250).The extremum seeking controller outputs the reference tangential speed r v and sideslip r β along the desired path, which are used to parameterize the geometric path into a reference trajectory.The reference trajectory is then tracked by the low-level position and attitude controller, which is a cascaded PID controller.
The range of flight speed was 0-12 m/s when carrying the box payload and was 0-15 m/s without payload.The sideslip angle is a periodic variable, whose period is 180 • , due to the vehicle and payload's rotational symmetry.

B. Extremum seeking parameter selection
The values of parameters of the standard extremum seeking controller and our proposed adaptive step size extremum seeking controller used throughout the experiments are shown in Table II.The perturbation frequencies (w v and w β ), perturbation magnitudes (a v and a β ), gains for the integrator (k v and k β ), cutoff frequencies of high-pass (w hv and w hβ ) and low-pass filters (w lv and w lβ ) need to be selected properly to achieve good performance of the extremum seeking controllers.The guidelines for choosing them are detailed below: 1) Perturbation frequencies: The perturbation frequencies must be slow compared with the closed-loop dynamics of the quadcopter (ω should be small as mentioned in the stability analysis), such that they can be well tracked by the vehicle.Mathematically, the perturbation frequency could be selected smaller than the dominant frequency of the vehicle's closedloop dynamics.The perturbation frequencies can be increased to achieve a faster convergence rate [39], given they can be tracked well by the vehicle.In addition, the multivariable extremum seeking control requires distinct perturbation frequencies for the speed and sideslip angle.
2) Perturbation magnitudes and integrator gains: Large values for the perturbation magnitudes will be helpful for faster convergence, but will increase the oscillation magnitudes.Large values for the integrator gains will also be helpful for faster convergence, but will make the controller more sensitive to disturbances.As a result, we can increase the perturbation magnitudes and integrator gains to obtain the fastest convergence speed for a permissible amount of oscillation and sensitivity.
3) Cutoff frequencies of the high-pass and low-pass filters: The cutoff frequencies of the high-pass and low-pass filters should be designed based on their corresponding perturbation frequencies: the cutoff frequency of the high-pass filter should be set higher than the perturbation frequency (w hv ≥w v and w hβ ≥w v ), and the cutoff frequency of the low-pass filter should be set lower than the perturbation frequency (w lv ≤w β and w lβ ≤w β ), to prevent attenuation of measurements at the perturbation frequency.We set the cutoff frequencies of the high-pass and low-pass filters to be the same as their corresponding perturbation frequencies, which simplified the parameter tuning process and was found to work well in the experiments.

4)
Step-size adapter cutoff frequency: The two parameters in the step size adapters γ v and γ β are cutoff frequencies for the low-pass filters of the square for estimated gradient q 2 v and q 2 β .One could increase their values as long as the noises are sufficiently attenuated.
In general, the selection of the extremum seeking parameters is a tuning process, but the guidelines above are valuable for making parameter tuning effectively.
To make a fair comparison between the standard and the proposed extremum seeking controller, we kept all parameters for the two different methods to be the same except k v and k β , since they have different meanings for the two methods: the k v and k β values are the step sizes for the standard method but are only part of the step sizes for the adaptive method, as shown in Section III-A2.They were empirically tuned in experiments for the two different methods to each achieve the fastest convergence rate in optimal range speed and sideslip searching when carrying a box payload 5(a).

C. Performance comparison under light wind
In the comparison experiments, the quadcopter was commanded to fly along a circular path with 30 meters in radius and a constant height of 5 meters.The circular path was chosen for a simple and intuitive comparison, as well as easy experimental implementation, while our proposed method is also applicable to sufficiently smooth geometric paths with more complicated shapes.The experiments were conducted during good weather to minimize the effect of wind disturbances.
1) Cost value ground truth: To verify that the proposed extremum seeking controller is able to converge close to the optimal speed and sideslip, we experimentally evaluated the optimal range and optimal endurance cost functions.When the vehicle is carrying the box payload, the values of the cost functions at various speed and sideslip are shown at Figure 4(a The data shows the importance of flying at the energy efficient speed and sideslip: compared to flying at the maximum achievable speed in the experiments with a uniformly selected random sideslip, flying at the optimal range speed and sideslip on average increases the flight range by 14.3% without payload and 19.4% with a box payload.Besides, compared to hovering, flying at the optimal endurance speed and sideslip increases the flight time by 7.5% without payload and 14.4% with a box payload. 2) Convergence speed comparison and discussion: The convergence speed of the standard and the proposed methods are compared in experiments with/without a box payload and with different initial conditions.We consider the extremum seeking controller converges when both the speed and sideslip settle close to their optimal value.When the goal is to find the speed and sideslip which achieve the optimal flight range, the results are compared in Figure 5(a) and Figure 5(b).When the goal is to find the speed and sideslip which achieve the optimal flight endurance, the results are compared in Figure 6(a) and Figure 6(b).The convergence times are summarized in Table III for optimal range and in Table IV for optimal endurance (N/A represents that the method failed to converge by the end of the experiment).We can see that the proposed method converged about twice as fast as the standard method in these tests.In summary, we can see that the proposed extremum seeking controller with step-size adapter converged about twice as fast as the standard extremum seeking controller.In addition, the parameters of the extremum seeking controller were tuned for optimal range speed and sideslip searching when carrying a box payload, as mentioned in Section IV-B.The same set of parameters still worked well for the other experiment setups (optimal endurance goal, with and without box payload) for the proposed method, showing that the method has good robustness to parameters.However, the standard extremum seeking method failed to converge in some cases, suggesting it is less robust.Figure 6.Optimal endurance speed and sideslip searching performance comparison between the proposed method (red lines) and the standard method (blue lines).The ground truth values for optimal speed and sideslip are marked as grey dashed lines and grey shaded regions (values from Figure 4).The results when carrying an additional box payload are shown in (a) and the results with no additional box payload are shown in (b).Each column in the subfigures represent a test with a different initial speed and sideslip.In the second test of (a), the optimal sideslip is marked at both 100 degrees and -80 degrees.This is because the vehicle and payload are rotational symmetric, such that a sideslip offset of 180 degrees has the same effect on the vehicle's power consumption.In (b), the optimal speed is marked as a range between 5 -7 m/s, because the cost function values are very close in this range with less than 1% difference.
Like other perturbation-based extremum seeking methods, the convergence speed of the proposed method is still limited by the time-scale separation, which requires the changing of the speed and sideslip setpoints to be slow compared to the perturbation frequencies.In our experimental tests, the proposed extremum seeking controller converged within 2 minutes in the majority of cases.We think this would be a practically useful convergence time considering the flight time of most multicopters are between 10 to 20 minutes [6].

D. Cost of extremum seeking
Since the perturbations are applied by the extremum seeking controller, the power consumption of the vehicle will be higher than the flight at a constant reference without perturbations.In this subsection, we compare the optimal values of the cost function without perturbation (i.e., optimal cost values in Figure 4) with the average cost values when flying at the same mean speed and sideslip but with perturbations applied.The increases in cost are summarized in Table V.
In summary, the increase in cost was 3.1 -4.2 % because of the perturbations applied by the extremum seeking controller.This is less than the power consumption reduction when flying at the optimal endurance speed compared to hovering, which is 12.6% with the box payload and 7% without it, so the advantage of the proposed method outweighs its cost.
To reduce the impact of this increase, the extremum seeking controller can be enabled only when there is a model change (e.g., picking up a new payload), and disabled after convergence.In addition, decreasing the perturbation magnitude will be helpful for reducing the additional cost of perturbation, but this will also reduce the convergence speed.One should take these two factors into account when selecting the proper perturbation magnitude.

E. Performance under strong wind disturbances
We further evaluated the performance of the proposed extremum seeking controller with adaptive step size under strong wind disturbances.Like the aforementioned experiments, the vehicle was commanded to follow a circular path with a radius of 30 meters at 5 meters in height.The wind was measured by a Young 81000 anemometer at 20 Hz with 0.01 m/s resolution, at a height of 2 meters.The extremum seeking controller's parameters are the same as the experiments under light wind in Section IV-C.The experiments demonstrated that the proposed method was still able to find the optimal range and endurance speed and sideslip, as shown in Figure 7 and Figure 8.The maximum wind magnitude was 7.43 m/s in the optimal range experiment,  The optimal values of the sideslip and cost function are marked as grey dashed lines.The optimal value of the speed is marked as a range between 5 -7 m/s, because the cost function values are very close in this range, with less than 1% difference.The maximum magnitude of wind disturbances is 4.83 m/s and was 4.83 m/s in the optimal endurance experiment.The proposed method is not very sensitive to wind disturbances: because of the time-scale separation in the extremum seeking controller, the change in the speed and sideslip setpoints by the extremum seeking controller is very slow compared with the closed-loop dynamics of the vehicle.
Compared with the tests with the same initial conditions but under light wind in Section IV-C2, the wind disturbances caused larger oscillations in the reference sideslip (Figure 7 compared with the first column of Figure 5(a)) and longer convergence time (Figure 8 compared with the second column of Figure 6(b)).

V. CONCLUSION
An online, adaptive, model-free method for finding the speed and sideslip that maximize the flight range or endurance of multicopters is proposed in this work.Not dependent on any power consumption model of the vehicle, it is able to adapt to different payloads and is easy to deploy.The proposed method can mitigate the common problem of limited flight range and endurance of multicopters.Based on a novel multivariable extremum seeking controller with adaptive step size, it is able to achieve faster convergence compared to the standard extremum seeking controller with fixed step size.
Through realistic outdoor experiments, we show that this method is able to find the optimal speed and sideslip correctly under different payloads and under strong wind disturbances.In addition to multicopters, this method can also be applied to fixed wing aerial robots to find the optimal flight speed (to achieve the longest flight time or distance) whose sideslip is usually not a free degree of freedom in path planning.
APPENDIX: PROOF OF PROPOSITION 1 The reduced system ( 14) is in the form where the averaging method is applicable [37,Chapter 10.4] (δ is a small positive parameter).Its corresponding averaged system dynamics can be described as follows, where the superscript a denotes the variables of the averaged system, and Π is the least common period of sinusoidal functions with frequencies of ω v and ω β .The equilibrium point of the averaged system ( 25) is denoted as [r a,e v , ra,e β , q a,e v , q a,e β , ηa,e v , ηa,e β , m a,e v , m a,e β ] T which satisfies: q a,e v = q a,e β = 0, (26) m a,e v = m a,e β = 0, (v(r a,e + p(σ)) sin ω v σdσ = 0, where the superscript e denotes the variables for the equilibrium point.We consider ra,e where b i,v and b i,β (i = 1, .., 5) are constant numbers.By substituting (31), ( 32) into ( 28), (29), integrating and equating the like powers of a v and a β , we can find that the first-order coefficients and second-order coefficients for the mixing terms are zero, and ra,e In addition, by substituting (33), (34) At the equilibrium point of the averaged system in (25), the Hessian J a,e r is a block-diagonal matrix as follows, where A, B ∈ R 4×4 , Hence, the block-lower-triangular matrix J a,e r in ( 36) is Hurwitz if and only if that all diagonal submatrices are Hurwitz.Since δ, γ v , γ β , ω hv and ω hβ are positive constants, it remains to prove A as Hurwitz for stability.
With a first-order Taylor expansion we can get that The characteristic polynomial of A with roots λ can be written by computing the determinant of λI − A, det(λI − A) which can be expanded to a 4th order polynomial of λ.Under the assumptions that a is small and that the Hessian H in ( 15) is positive, the roots of this 4th order polynomial can be shown have negative real parts using the Routh-Hurwitz criterion [40, Chap.6.2], implying that A is Hurwitz.Therefore, J a,e r is proven as Hurwitz.The Hurwitz Jacobian

Figure 1 .
Figure 1.A quadcopter with and without a box payload with unknown aerodynamics effects, as used in our experiments.

Figure 2 .
Figure 2. Block diagram of the adaptive step size multivariable extremum seeking controller (in the dashed rectangle).The goal of the controller is to find the optimal sideslip r β and rv to minimize the cost function y = (h • l)(r).The frequencies of the high pass and low pass filters are set, respectively, to ω hv and ω lv for speed, and ω hβ and ω lβ for sideslip.The scalar kv and k β are related to the step size of the extremum seeking controller and both of them should be positive numbers to minimize the cost function.The standard extremum seeking controller with sinusoidal perturbations does not have the step size adapter and the outputs of the low pass filters directly go to the integrator, while the remaining structure of the algorithm is exactly the same.The step size adapter is detailed in Section III-A2.

Figure 3 .
Figure 3.Control architecture for the model-free adaptive flight range or endurance optimization of a multicopter.The details of the extremum seeking controller block is shown in the dashed rectangle in Figure 2. The extremum seeking controller runs on the onboard computer (Jetson Nano) while the low-level controllers and the state estimator run on the Pixracer flight controller.
Range cost with box payload.
Endurance cost with box payload.
Range cost with no box payload.
Endurance cost with no box payload.

Figure 4 .
Figure 4. Ground truth data of the cost functions' values, with and without an additional box payload.Each square in the heat maps corresponds to 20 seconds' data collected at 50Hz.The optimal value in each case is encircled with a grey rectangle.(a) The range cost with the box payload reaches its minimum at about 10 m/s in speed and 100 degrees in sideslip.(b) The endurance cost with the box payload reaches its minimum at about 6 m/s and 100 degrees in sideslip.(c) The range cost with no box payload reaches its minimum at about 11 m/s and 120 degrees in sideslip.(d) The endurance cost with no box payload reaches its minimum at about 5 m/s and 100 degrees in sideslip.
) and Figure 4(b), while Figure 4(c) and Figure 4(d) show the values without box.
Carrying an additional box payload.
Without carrying an additional box payload.

Figure 5 .
Figure 5. Optimal range speed and sideslip searching performance comparison between the proposed method (red lines) and the standard method (blue lines).The ground truth values for optimal speed and sideslip are marked as grey dashed lines (values from Figure4).The results when carrying an additional box payload are shown in (a) and the results with no additional box payload are shown in (b).Each column in the subfigures represent a test with a different initial speed and sideslip.
Carrying an additional box payload.
Without carrying an additional box payload.

Figure 7 .
Figure 7. Optimal range speed and sideslip seeking under strong wind disturbances, with the box payload.The optimal values of the speed, sideslip and cost function are marked as grey dashed lines.The maximum magnitude of wind disturbances is 7.43 m/s.

Figure 8 .
Figure 8. Optimal endurance speed and sideslip seeking under strong wind disturbances, without the box payload.The optimal values of the sideslip and cost function are marked as grey dashed lines.The optimal value of the speed is marked as a range between 5 -7 m/s, because the cost function values are very close in this range, with less than 1% difference.The maximum magnitude of wind disturbances is 4.83 m/s (r a + p(σ)) sin ω v σdσ − ω lv q a v (r a + p(σ)) sin ω β σdσ − ω lβ q a β r a + p(σ))dσ γ v (−m a v + q a v 2 ) r a,e + p(σ)) sin ω β σdσ = 0, a,e + p(σ))dσ,

Table I NOTATIONS
USED IN SECTION III.
T state variables related to energy-efficient flight, v is flight speed, β is sideslip angle r = [rv, r β ] T extremum seeking controller's outputs, rv is ref.speed, r β is ref.sideslip hv , ω hβ high-pass filters' cutoff frequencies ω lv , ω lβ low-pass filters' cutoff frequencies ηv, η β low-frequency components of the cost function, filtered out by the high pass filters ηv, ηβ difference between ηv and η β to optimal cost (h • l)(r * ), see (11) ξv, ξ β product of high-frequency components of the cost function with the demodulation signals qv, q β approximations of the cost function's gradient to speed and sideslip mv, m β