Fast UAV Trajectory Optimization using Bilevel Optimization with Analytical Gradients

This paper presents an efficient optimization framework that solves trajectory optimization problems efficiently by decoupling state variables from timing variables, thereby decomposing a challenging nonlinear programming (NLP) problem into two easier subproblems. With timing fixed, the state variables can be optimized efficiently using convex optimization, and so the time variables can be optimized using a separate outer optimization. This is a bilevel optimization in which the outer objective function itself requires an optimization to compute. The challenge is that gradient optimization methods require the gradient of the objective function with respect to the time variables, which is not available. Whereas the finite difference method must solve many optimization problems to compute a gradient, this paper proposes a more efficient method: the dual solution (Lagrange multipliers) of the convex optimization problem is exploited to calculate the analytical gradient. Since the dual solution is a by-product of the convex optimization problem, the gradient can be obtained `for free' with high accuracy. The framework is demonstrated on solving minimum-jerk trajectory optimization problems in safety corridors for unmanned aerial vehicles (UAVs). Experiments demonstrate that bilevel optimization improves performance over a standard NLP solver, and analytical gradients outperforms finite differences. With a 40\,ms cutoff time, our approach achieves over 8 times better suboptimality than the current state-of-the-art.


Fast UAV Trajectory Optimization using Bilevel
Optimization with Analytical Gradients Weidong Sun * , Student Member, IEEE, Gao Tang * , and Kris Hauser, Member, IEEE Abstract-This paper presents an efficient optimization framework that solves trajectory optimization problems efficiently by decoupling state variables from timing variables, thereby decomposing a challenging nonlinear programming (NLP) problem into two easier subproblems. With timing fixed, the state variables can be optimized efficiently using convex optimization, and so the time variables can be optimized using a separate outer optimization. This is a bilevel optimization in which the outer objective function itself requires an optimization to compute. The challenge is that gradient optimization methods require the gradient of the objective function with respect to the time variables, which is not available. Whereas the finite difference method must solve many optimization problems to compute a gradient, this paper proposes a more efficient method: the dual solution (Lagrange multipliers) of the convex optimization problem is exploited to calculate the analytical gradient. Since the dual solution is a by-product of the convex optimization problem, the gradient can be obtained "for free" with high accuracy. The framework is demonstrated on solving minimum-jerk trajectory optimization problems in safety corridors for unmanned aerial vehicles (UAVs). Experiments demonstrate that bilevel optimization improves performance over a standard NLP solver, and analytical gradients outperforms finite differences. With a 40 ms cutoff time, our approach achieves over 8 times better suboptimality than the current state-of-theart.

I. INTRODUCTION
R EAL-TIME optimal trajectory generation has long been a challenging but essential component in robotics. However, due to nonlinear dynamics, non-convex constraints and high dimensionality, it is difficult to solve these problems in real-time. A popular strategy is to represent trajectories with polynomial splines and to decompose state constraints into convex sets. This allows quadratic programming (QP) methods to be applied, which can solve to global optima efficiently and accurately. This has been applied to path planning for ground robots, autonomous cars [1], [2], humanoid robots [3] and UAVs [4]- [6]. The disadvantage of this approach is that the time allocated for each segment of the spline has to be fixed in order for the optimization to remain convex, because time enters the optimization objective and constraints nonlinearly. In prior approaches, the allocated time is often chosen heuristically, because optimizing the time allocation scheme along with the path makes the problem non-convex and too slow for real-time use. W One effective strategy proposed in Mellinger et al. [4] is a space-time decomposition approach that optimizes the path with the time allocation scheme fixed, and then iteratively refines the time allocation scheme through gradient descent. Specifically, in each iteration, two levels of optimization problems are solved: the lower level optimizes the path with the time fixed, and the upper level finds a better time allocation scheme with gradient descent. One unresolved challenge in this framework is finding a good gradient direction for the upper-level optimization, due to the fact that the objective is the solution of a lower-level constrained optimization problem. Finite differences was used in this prior work, but this can be computationally expensive because if n segments are used in the spline, then finite differences requires n + 1 lower-level optimizations to be solved for each gradient evaluation.
The primary contribution in this paper is a bilevel optimization framework for optimal time allocation that estimates the gradient from the dual solution (Lagrange multipliers) of the QP problem. A graphical illustration of this technique is shown in Fig. 1. The dual solution to the lower-level optimization problem can be used to calculate a gradient to the upperlevel one under the assumption of a fixed active set. Since the dual solution is a by-product of the QP problem, we are able to estimate the gradient "for free". Compared with finite differences which suffer from the choice of step size and low accuracy, our approach obtains the exact gradient. Experiments are conducted on UAV trajectory optimization problems in the presence of obstacles, and show that our gradient estimation approach is orders of magnitude faster than the current stateof-the-art [4] while also being more accurate. This leads to an 8 times improvement in suboptimality when the optimizer is given a 40 ms cutoff.

II. RELATED WORK
Despite intensive research, it is still challenging to find an optimal time allocation scheme of a spline trajectory in realtime. One strategy is to generate a time allocation scheme with heuristics [5], [6] and stick with the scheme during the optimization stage. For example, Gao et al. [5] use the heuristic of selecting velocity according to the distance to the nearest obstacle, closer the trajectory is to the obstacle, lower the velocity. Though the heuristic is reasonably chosen, the velocity of the trajectory after the optimization stage does not necessarily follow the heuristic. Heuristics, though being cheap to compute, are often not optimal and will sometimes lead to spikes in jerk or snap near connection points, see Fig. 5c for an example.  (1) , t (2) and t (3) are three feasible time allocation schemes, they are optimized in the upper-level optimization problem. Each of these t corresponds to a quadratic programming problem, which is being solved in the lower-level optimization problem. The red paraboloids are quadratic objective functions, cyan planes are equality constraints, no inequality constraint drawn for illustration purpose. Feasible sets are purple curves, and purple dots are the optimal solutions to each lower-level optimization problem.
One way to improve a non-optimal time allocation scheme is to refine it iteratively using gradient descent [4], [7]. The refinement process computes the gradient of the objective w.r.t. the time allocation scheme, takes a step after choosing a suitable descent direction and step length. To the best of our knowledge, finite difference is the only way being used for gradient calculation in this scenario. However, estimating gradient with finite difference is time-consuming since the objective function is actually the objective of an optimization problem. For each gradient evaluation, this approach has to solve the same number of optimization problems as the number of segments in the spline, thus making it intractable for real-time performance. Moreover, choosing step sizes for finite difference is hard and the result may not necessarily be a good approximation to the actual gradient.
Another strategy to determine a time allocation scheme is to use sampling [3]. This approach randomly samples the duration of each segment until the corresponding QP can be solved, and is applicable to problems with a rather small feasible set, for example the problem of humanoid locomotion where a small change in time can make the optimization problem infeasible. However it may not scale well to generating trajectories with a large number of segments. Also, the work described in [3] only seeks for a feasible time allocation and does not improve it.
Richter et al. [7] proposes a framework that is able to generate collision-free spline trajectories and optimize time allocation. They managed to formulate a QP with only equality constraints. However, such a formulation has the disadvantage: (a) Being collision-free is achieved by iteratively adding new points and solve another QP if there is a collision. This strategy has no bound on the number of iterations needed though their experiments show a few iterations are usually enough. (b) Dynamic feasibility is achieved by checking the max acceleration in every iteration and stop the iteration if the acceleration reaches the limit. Since computing the extrema of a high order polynomial is hard according to the Abel-Ruffini theorem, this strategy can be slow when we are using high order polynomials. Moreover, their framework solves the optimization problem by re-formulating the equality constrained QP into an unconstrained one, they will not be able to handle general inequality constraints which often exist in trajectory optimization. In our work, we use bilevel optimization as a way to solve a GBD, and exploit theoretical developments from these fields to derive analytical gradients for trajectory optimization.
More broadly, the idea of decomposition and divide-andconquer has been explored in the Benders Decomposition (BD) [8] and Generalized Benders Decomposition (GBD) [9], which are well-known techniques in operations research. In the GBD formulation, complicating variables are variables which would make the optimization problem considerably more tractable when temporarily fixed. The idea of GBD is to split the original optimization problem into two: a master problem and a subproblem. The master problem is generated by temporarily fixing the complicating variables of the original problem, and in BD these are usually integer variables. In the case of trajectory optimization, the complicating variables are times allocated to each segment. This approach is also closely related to bilevel optimization [10], which refers to a mathematical program where one optimization problem has another optimization problem as one of its constraints, that is, one optimization task is embedded within another. The outer optimization task is often referred to as the upper-level optimization problem and the inner optimization task is known to be the lower-level optimization problem. Bilevel optimization is well-known in production and marketing decision making, and we refer readers to [10], [11] for more comprehensive treatments of the topic.
There are analogues to our approach in the field of machine learning. OptNet [12] incorporates a QP solver as a layer into the neural network and is able to provide analytical gradients of the solution to the QP w.r.t. input parameters for back propagation. Gould et al. [13] presents results on differentiating argmin optimization problems w.r.t. optimization variables in the context of bilevel optimization. Their results give exact gradients for the lower-level optimization problem. However, works described in [12], [13] do not consider finding the gradient of the objective function. Moreover, methods proposed by Gould et al. [13] involves inverting a Hessian matrix which could be computationally expensive and often not even invertible, and they only describe the case of equality constraints and use a log-barrier function to tackle inequality constraints. The gradient of the objective function is even cheaper to compute, making our approach very suitable for speeding up trajectory optimization.

III. METHOD
In this section, we describe how our framework can be applied to trajectory optimization. We start by introducing how trajectory optimization works, then give a mathematical formulation of the problem we are trying to solve and end with a description of our algorithm.

A. Spline-Based Trajectory Optimization
Trajectory optimization asks to find a trajectory that minimizes some measure of performance while satisfying all the necessary constraints, e.g. being collision-free and dynamically feasible, and is formulated as where x encodes a trajectory, with the initial state x 0 prescribed. Often the final state x(y) = x f is also given, and sometimes the final time T is fixed. J(x, T ) is the measure of performance. In order to generate safe and dynamically feasible trajectories, the following constraints must be satisfied: 1) Safety: Position constraints that make sure the position lies in a safe region. 2) Dynamic Feasibility: Bounds on minimum and maximum velocity and acceleration. In abuse of notation we write state constraints as g(x) ≤ 0 and h(x) = 0. More precisely, we have ≡ (x, x , x , . . . , t), g ≡ g(x, x , x , . . .), and h ≡ h(x, x , x , . . .) depending on higher order derivatives.
If we represent the trajectory as a polynomial spline, we can reframe the problem in terms of spatial (spline coefficients) and temporal variables (spline knot points). Specifically, define a spline with n segments and segment durations ∆t 1 , . . . , ∆t n . The timing of each knot point is given by for each i = 1, . . . , n. We formulate our splines using 6th order polynomials (d = 6).
In order for the trajectory to be smooth, equality constraints should be enforced so: 1) States at the start and end of the trajectory should match the initial state and (optional) final state. 2) Continuities at connection points that ensure a smooth transition between each segment of the trajectory. If the trajectory needs to be C d continuous, equality constraints up to the d'th order should be applied at all knot points. We found that in the UAV case, applying continuity constraints up to acceleration yields good results.
Often, the objective function is chosen to be the integral of the squared norm of some high-order derivative of the trajectory, such as minimum jerk. As shown in [4], [5], these types of objectives can be written as a quadratic function of the coefficients of the polynomials, as long as timing is fixed. Each constraint g(x(y)) ≤ 0 on an individual state at a fixed time t ∈ [t i−1 , t i ] in segment i can also be converted to a constraint on the coefficients c i0 , . . . , c id . Overall, safety and dynamic feasibility constraints can be enforced at a set of collocation points, leading to a nonlinear program. However, there is no guarantee of the whole trajectory satisfying all the constraints.
Another approach is to encode the trajectory as a Bézier spline. Bézier curves have the nice properties that: 1) The curve is totally contained in the convex hull of its control points.
2) The derivative of a Bézier curve is again a Bézier curve. Thus if the trajectory is represented with Bézier curves and the bounds described above are convex or convex in each segment, then enforcing constraints on all the control points will guarantee that the whole trajectory will satisfy those bounds. (Note that this is only a sufficient condition and tends to be conservative, as the whole trajectory may satisfy all the constraints with some control points lying out of bounds.)

B. Formulation of the Bilevel Optimization Problem
In order to efficiently solve the trajectory optimization problem, we explicitly split the spatial part and the temporal part of the trajectory. We gather all the polynomial coefficients in the flattened vector x ∈ R n(d+1) (spatial part), and define the time allocation scheme y = (∆t 1 , . . . , ∆t n ) (temporal part).
We require the objective function to be quadratically separable in the form f (x, y) = 1 2 x T P (y)x + q(y) T x + c(y), in which P (y) is a quadratic objective matrix which is positive semidefinite. P (y), q(y) and c(y) are allowed to be nonlinear in y. Separability holds when the objective is expressed as the integral of the squared norm of some high-order derivative of the trajectory, such as minimum jerk. As shown in [4], [5], these types of objectives can be written as quadratic functions of x as long as timing is fixed. c(y) is the part of the objective that is only dependent on y, such as penalizing total time with c(y) = c i ∆t i = 1 T y.
If the objective is quadratically separable and the spatial constraints are convex, then a bilevel trajectory optimization problem can be formulated as follows: Ay ≤ b, The timing constraints Ay ≤ b and Cy = d enforce constraints such as positive durations y ≥ 0, and possibly fixed total time 1 T y = T . We refer interested readers to [2], [3], [5], [6] for different realizations of this general formulation. Note that although the objective functions remain the same in both the lower-level and upper-level optimization problem, y is fixed in the lower-level optimization problem but becomes the variable we want to optimize in the upper-level optimization problem. The lower-level problem is also specified to be a quadratic program (QP), which can be solved efficiently and globally optimally in x.
Our strategy is to use a constrained gradient descent on the function f (y) = f (x (y), y) with x minimizing the QP for a fixed timing y. The descent method indeed has been used to solve bilevel optimization problems, and our framework is a variant of this method. Given an feasible y, we find a direction −∇f (y) ∈ R n and a step length α that can make a sufficient decrease in f (y) while maintaining the feasibility of the new point y new = y + α∇f (y). The general issue is the availability of gradients.

C. Analytical gradient estimation
We now start to examine how the dual solution or Lagrange multipliers from the solution of the lower-level optimization problem can be utilized to estimate the gradient of f . We assume that the lower-level and upper-level objective functions are the same. Let us consider a general nonlinear optimization problem formulated as: The Lagrangian is defined as: Where λ and µ are Lagrange multipliers associated with inequality and equality constraints, respectively.
Suppose that x minimizes f (x, y) for a fixed value of y. Let us consider the first-order optimality conditions [14], also known as the Karush-Kuhn-Tucker (KKT) conditions. The KKT conditions must be satisfied at the optimal point. Namely, the following set of equations have to be satisfied by the optimal solution tuple (x , y,λ,μ): Where ∇ x denotes the gradient with respect to x, denotes an element-wise multiplication, inequalities and equalities should be interpreted element-wise.
The complementary slackness condition [15] tells us that if the i'th element ofλ is nonzero, then the i'th inequality constraint is active. That is, g i (x, y) = 0, where g i (x, y) denotes the i'th row in the inequality g(x, y) ≤ 0. If g j (x, y) < 0, we say that the j'th constraint is inactive.
The active set is the set of all the indices corresponding to active constraints, which we denote as A. The inactive set, which contains all the indices corresponding to inactive constraints, is denoted asĀ. It is clear that A andĀ are complements of each other. Note that we do not require the active set to be unique. Now let us split the inequality constraints into two parts: one that contains all the active constraints and the one that contains all the inactive constraints. For all the active inequality constraints, we can think of them as equality constraints since g i (x, y) = 0, ∀i ∈ A. We now construct a new equality constraint by stacking the active inequality constraints and equality constraints: denotes the extraction of rows whose indices are in the active set A. The vector of Lagrangian multipliers corresponding to this new set of equality constraints denoted as z is: At the optimal point, the tuple (x , y,z) must satisfy the following due to KKT conditions: We would like to examine how the optimal objective changes if a small perturbation is applied to the time allocation scheme while keeping the active set constant. Let us consider an infinitesimal shift dy applied to the time allocation scheme t. We denote the shifted time allocation scheme as y = y+dy, the shift in optimal solution due to the perturbation as dx = x − x , the shift in optimal objective due to the perturbation If r(x , y) is differentiable and we apply first-order Taylor expansion on equation Eq. (5b), we get: r(x , y) + ∇ y r(x , y) dy + ∇ x r(x , y) dx = 0 Since r(x , y) = 0 due to Eq. (5b), Eq. (6) can be rewritten as: ∇ y r(x , y) dy = −∇ x r(x , y) dx (7) Let us assume f (x , y) is differentiable, approximating df with a first-order Taylor expansion around f (x , y) results in: Note that we can rewrite Eq. (5a) as: Now we plug Eq. (9) into Eq. (8): Finally, we plug Eq. (7) into Eq. (10): df =z T ∇ y r(x , y) dy + ∇ y f (x , y) dy = z T ∇ y r(x , y) + ∇ y f (x , y) dy Since df is the shift of the optimal objective, we can conclude that ! ", $ % = ! ", 0 = 1 2 " * ! ", $ * = ! ", −1 = 1 2 " * − " ℎ " : " ≤ −1 as desired. Even more succinctly, sinceλ i = 0 at non-active constraints, we can write this as

D. Two illustrative examples
We illustrate our analytical gradient estimation technique with two toy examples. The first example inspects the case with the objective function being: The constraint h(x, y) ≤ 0 being: x ≤ −1 In this example, we assume that the constraint does not depend on y. We pick y 1 = 0 and y 2 = −1 and inspect the shift of the optimal objective ∆f = f (x , y 2 ) − f (x , y 1 ). Fig. 2 gives a graphical illustration. Since the constraint does not depend on y, we have: ∇ y r(x , y) = 0 Then according to Eq. (12): Our second example has an objective that does not depend on y: With a constraint h(x, y) ≤ 0 parameterized by y: x + y ≤ 0 Let us consider y 1 = −1.0, y 2 = −0.5 and inspect the shift of the optimal objective ∆f = f (x , y 2 ) − f (x , y 1 ). Fig.  3 gives a graphical illustration of this example. Since the objective f (x) does not depend on y, we have: . Black and red line segments with shades are the constraints corresponding to y 1 and y 2 , respectively. Optimal solutions are black and red dots. The blue arrow lies tangent to the parabola represents the Lagrange multiplier λ to the black constraint. The predicted decrease of the optimal objective ∆f predicted is denoted by the orange arrow. The actual decrease of the optimal objective ∆f actual is denoted by the green arrow. Note that the Lagrange multiplier only gives firstorder information so the actual decrease is less than predicted.

E. Outer optimization
Our gradient descent algorithm is given in Algorithm 1, which takes an initial guess y 0 of the time allocation scheme and a maximum number of iterations as input. It then iteratively descends f until some optimality conditions are satisfied or the maximum number of iterations is reached. if optimality-conditions-satisfied then J, λ, y ← J α , λ α , y α 12: return t In Algorithm 1, Line 2 solves a QP problem with a time allocation scheme y = y 0 and then returns the objective value J and the dual solution (Lagrange multipliers) λ. J is now the baseline objective and the rest of the algorithm will improve upon it. Line 4 estimates the gradient of the objective w.r.t. time using the Lagrange multipliers λ. Line 5 finds a normalized descent direction from the gradient by projecting the gradient onto the null space of constraints on t [14]. Line 6, which is further illustrated in Algorithm 2, finds a suitable step length α that gives sufficient decrease in the objective function. If such an α is found, the line search algorithm also returns the objective, Lagrange multipliers and time allocation associated with the optimal α denoted as J α , λ α and y α , respectively. Then the objective, Lagrange multipliers and time allocation scheme are updated in Line 11. If the step length α cannot be found during the iteration, the iteration stops and returns the last t as in Line 8. The optimality conditions used in our implementation are: 1) Norm of the projected gradient is less than a threshold, or 2) The change of the objective function is less than a threshold. α ← τ s α 14: return α not found Algorithm 2 shows our adaptive backtracking line search routine starting from y 0 in the direction of p, step length α is returned if found as in Line 12 along with the associated objective J α , Lagrange multipliers λ α and time allocation scheme y α , otherwise "α not found" will be returned as in Line 14. In our implementation, the Armijo condition [14] is used for checking sufficient decrease in the objective. The initial step length α 0 for each line search is defined as a static variable in Line 1, it will be updated adaptively. The update strategy is similar to the update of trust region radius in a trust region algorithm: If the line search achieved sufficient decrease after the first iteration, the initial step length α 0 for the next line search will grow as in Line 9; If the line search found an α that leads to sufficient decrease after more than one iterations, then that α will set be as the α 0 for the next line search as in Line 11. Parameters that control the growing and shrinking of α are τ g and τ s , respectively, they are defined as constant variables in Line 2. Note that τ g > 1 and 0 < τ s < 1. Line 5 takes a new step and line 6 solves the QP with the updated time.

IV. EXPERIMENTS
We demonstrate our method on trajectory planning for UAVs in the presence of obstacles. Our implementation of the algorithm is written in Python and C++: Python is used in constructing the QP problem, performing line search and   [16] is used as the interface between Python and C++. All the experiments are carried out on a laptop with a 2.9 GHz Intel Core i5 processor. We make use of the environment generator and planning pipeline from Gao et al.
to generate convex collision-free corridors for a UAV [5]. This open source software is able to generate random obstacles in a 3D environment, visualize the obstacles and trajectories in Rviz, and generate safe corridors. We generated 100 tests by randomly sample feasible start points and goal points from the environment generator and record the flight corridor. Statistics of the number of segments from these tests are shown in Fig. 4. Because these involve 6th order splines in a 3D state space, the number of variables in each optimization problem is (7 × 3 + 1)n where n is the number of segments. We establish a minimum-jerk objective function, and fix the total amount of time on the trajectory. The initial timing is allocated from the heuristic described in [5]. The fixedduration constraint is used for fair comparison with the unrefined trajectory, but we note that our framework is able to handle general objective functions and constraints on timing. Fig. 5 shows an instance of one of these corridors, and how the trajectory can be improved by refining the time allocation scheme.
We compare the results from different combinations of QP solvers and gradient estimation methods listed in Table I. Here we refer to the relative suboptimality, which is calculated as J − J J , where J denotes the objective value achieved by an algorithm, and J denotes the true optimum. In our experiment, we take J as the minimum of all the objectives returned by different algorithms.
We test the following solvers: Sqopt [17], an active-set QP solver intended for large and sparse systems; Mosek [18], an interior-point QP solver, OSQP [19], an alternating direction method of multipliers (ADMM) approach, and qpOASES [20], an active-set QP solver designed for small and dense systems. Experiments show that Sqopt and Mosek yield adequate results, but neither of OSQP or qpOASES works well in our experiment, because qpOASES does not scale well to midscale or large problems with sparse structures, and OSQP cannot solve QP to a high accuracy and gives inaccurate Lagrange multipliers.
We compare our Lagrange multiplier method (LM) against finite differences (FD) using Sqopt and Mosek. LM improves overall running time by 2-3 times beyond FD, and moreover terminates with a much lower suboptimality. We suspect this is because of the improved accuracy of the LM gradient estimation. The performance bottleneck of the LM method is moved to the line search step, which takes about 90% of the total computation time while the time spent on gradient estimation is trivial. Sqopt is generally faster than Mosek, and this could be a consequence of active-set methods' ability to perform warm start. We have tried Quasi-Newton methods but they performed poorly against steepest descent. We also observed that the computation time of our method scales roughly linearly with the number of segments.
We also compare against SNOPT [21], which is a general nonlinear solver for sparse, large-scale problems. Here we do not decompose the problem, and simply formulate joint spatial and temporal optimization problem as an NLP. We provide analytic gradients to SNOPT for solver robustness. We initialize SNOPT with the unrefined time allocation scheme and the spline coefficients resulting with the first QP solve. However, even starting from a feasible solution, SNOPT does poorly compared to our bilevel optimization framework. SNOPT tends to terminate prematurely without converging, and often moving to an infeasible point. We believe this is because the joint spatial and temporal NLP is ill-conditioned: 1) The QP objective function exhibits high-order dependence on timing. 2) Some spatial constraints are very sensitive to the highorder spline coefficients.
The results shown in Fig. 6 explore suboptimality as a function of time. For each cutoff time, we abort the optimization process at that time and record the suboptimality achieved at this time. If no iteration has been finished at the cutoff time, we use the unrefined objective instead. Note that due to the properties of the steepest descent method, the first few iterations will give significant decrease in the objective but the improvement decreases as more time is spent. In applications, we suggest that our optimizer can run in a separate thread and return the best result at a given timeout. If this algorithm were to run in 25Hz (40 ms cutoff time), which is a reasonable frequency for real-time applications, our method achieves a mean and median suboptimality of 14.73 and 0.87 respectively, compared to 117.33 and 6.91 achieved by [4].

V. CONCLUSION
We presented a novel way to estimate the gradient of the objective function w.r.t. time allocation scheme in a trajectory optimization problem and proposed an efficient bilevel optimization framework based on Generalized Benders Decomposition. Our results indicate that we can achieve a significant decrease in the objective of the trajectory while having realtime performance. This framework may have applications in path planning for ground vehicles, humanoid robots and UAVs. Underpowered robots will particularly benefit from this framework since a refined trajectory is smoother, less aggressive and easier to track than its unrefined counterpart.
Future work may include accelerating the gradient descent by exploiting the structure of the problem. For example, acceleration may be achieved through using Newton or Quasi-Newton methods. Also, we are considering the possibility of deploying our framework on a real UAV to further investigate its performance in real-time trajectory optimization.