Improving Local Trajectory Optimization by Enhanced Initialization and Global Guidance

Trajectory optimization is a promising method for planning trajectories of robotic manipulators. With the increasing success of collaborative robots in dynamic environments, the demand for online planning methods grows and offers new opportunities as well as challenges for trajectory optimization. Special requirements in terms of real-time capabilities are one of the greatest difficulties. Optimizing a short planning horizon instead of an entire trajectory is one approach to reduce computation time, which nonetheless separates the optimality of local and global solutions. This contribution introduces, on the one hand, Extended Initialization as a new approach that reduces the risk of local minima and aims at improving the quality of the global trajectory. On the other hand, the particularly critical cases in which local solutions lead to standstills are mitigated by globally guiding local solutions. The evaluation performs four experiments with comparisons to Stochastic Trajectory Optimization for Motion Planning (STOMP) or Probabilistic Roadmap Method (PRM*) and demonstrates the effectiveness of both approaches.


I. INTRODUCTION
Trajectory optimization enables an intuitive way to apply motion specifications to trajectory planning for robotic manipulators. It allows both, the consideration of constraints such as joint limits or collision avoidance, and the optimization of criteria that range from high-level tasks [1] over secondary objectives [2] down to the motion itself [3]. Consequently, it has long been established as a general planning method. On the other side, online planning offers a flexible and instantaneous way to deal with changes in the environment or task. The success of robots for collaborative purposes, leads to an increasing presence of people in the workspace, thus further rendering the importance of the topic. Online planning is still a challenging problem where criteria such as real-time capability and quality of the trajectory conflict. Because of its success, trajectory optimization is also a reasonable choice for online planning. However, the available planning time is limited and, depending on the dynamics of the environment, might be insufficient to simply perform trajectory optimization in a loop. For this reason, there are The associate editor coordinating the review of this manuscript and approving it for publication was Zheng Chen . measures to simplify the planning problem at the price of reduced quality of the solutions.
This contribution analyzes the impact of planning short local trajectories that simplify the planning problem and proposes methods to mitigate suboptimality. The next subsection gives an overview over related methods that cope with the computational burden and negative side-effects of simplified problems in online trajectory optimization.

A. RELATED WORKS
The representation of the trajectory in an optimization problem has a relevant effect on the computation time. For example, representations such as fully discretized collocation or multiple shooting outperform single shooting by providing sparsity and improve numerical stability [4]. The sparse optimization problem is then encoded in a suitable structure like, e.g. a hypergraph [5] and solved by a sparse solver. A method to further reduce the effort exploits the separation of shooting intervals even further by distributing them over several local optimization problems and solving them via distributed optimization methods [6]. It is faster than solving with ordinary multiple shooting, however, it is most effective for full trajectories only which require knowledge about the total duration and full term predictions of the environment. In dynamic environments, where spontaneous detours or evasions are necessary, the duration of a trajectory is not trivial to determine. Automatic Differentiation [7] becomes increasingly popular and reduces the amount of expensive function evaluations compared to finite differences which have a significant impact on the total computation time because of costly distance calculations. However, it cannot easily be adapted to structural changes of the optimization problem caused by variable numbers of obstacles which is why finite differences are still commonly used.
A higher level approach to reduce the problem complexity is to devide the trajectory into several segments and perform planning of the next segment while tracking the current one [8]. It allows computation times up to the length of a segment, but also requires predictions of the environment to extend over the length of two segments, which is difficult to achieve, especially for moving people. In addition, the motion of the currently tracked segment is fixed and cannot be adjusted for changes, which is why smaller segments are desirable. However, choosing smaller segments mitigates the effectiveness of this method.
Recently, methods that utilize a fixed planning horizon which always starts from the current state become more and more popular [3], [9]- [11]. Because of their principle, they can also be called Moving Horizon Planning (MHP). A portion of the solution is tracked and the next optimization is performed similarly to Model Predictive Control. The duration of the final trajectory results implicitly even though the planning horizon is fixed. The shorter the horizon, the simpler and faster the optimization problem. This makes it the fastest method compared to the other approaches. However, the introduction of a limited planning horizon leads to a local planning problem that departs from the original problem and promotes vulnerability to local minima [3], [12], [13]. Local minima appear in the form of suboptimal global trajectories or even standstills, since a local solution (locally optimal or not) is not necessarily optimal in the sense of the global problem. A detailed problem definition follows in Section II.
The most common methods for optimization are gradientbased. Interior point [14] or sequential quadratic programming [15] approaches have become standard and efficiently exploit the sparsity of collocation or multiple shooting [5]. Their main weakness is their dependency on a proper initialization, which is a serious factor especially for nonlinear problems such as trajectory planning. Gradient-free approaches such as STOMP overcome this by improving a trajectory iteratively until convergence via random sampling and selection similar to evolutionary algorithms [16]. However, this method is often not fast enough since a complete trajectory is planned, for which the duration must be known in advance.
Suboptimal global trajectories and standstills are closely related to the structure of the environment. In [17] the authors, therefore, utilize homotopy classes for initialization of parallel planning instances so that all distinct paths are considered during planning. The best instance provides the momentary solution. This method scales poorly with the complexity of the environment and plans complete trajectories in each instance, which is only fast for a two-dimensional workspace of a mobile robot. Furthermore, the idea of homotopy planning is not effectively applicable to the one-way open kinematic chain of stationary manipulators. Another approach, therefore, reactively samples via-points to solve standstills [13]. It can resolve standstills once detected but does not proactively avoid them. Closely related is a method that traverses a list of predefined via points one by one by a local potential field approach [18]. The via points originate from a visibility graph and thus capture the whole scene in a global way [19]. However, by strictly approaching the via points one by one, the method leaves little space for the local planner to perform motion optimizations. Also closely related is a method that combines global exploration with local optimization by passing an initial global solution back and forth between a global planner and local optimization [20]. Both planners are modified from literature approaches to work in conjunction. However, due to its sequential structure, this method is not real-time capable and is limited to paths.

B. CONTRIBUTION AND OUTLINE
Based on the advantages and promising results of fully discretized and gradient-based MHP, this article presents novel methods to tackle suboptimality and local minima that arise from the simplified local planning problem. The contribution is arranged as follows: • A detailed problem definition in Section II to form a theoretical basis and to emphasize its core aspects on which the presented methods built upon • Two methods in Section III that take over the idea of parallel planning instances and provide an enhanced initialization to mitigate suboptimality • One method in Section IV that extends to the idea of global via points as well as interleaving global and local planning with more flexibility of the local subproblem and faster computation of trajectories • Comparisons in Section V with plain MHP as well as state of the art planners such as STOMP and PRM*. 1 Section VI eventually concludes this article and gives an outlook for future work. Note, PRM* merely plans a geometric path that is still comparable from a baseline point of view, since the core problem rather lives on a geometric level. However, pure path planning methods usually do not fulfill the requirements of the online trajectory planning problem without further ado.

II. PROBLEM DEFINITION
Consider the fully discretized trajectory τ ∈ T that consists of states x k ∈ X and controls u k ∈ U for k = 0, 1, . . . , K : τ := x 0:K := x 0 , x 1 , . . . , x k , . . . , x K u 0:K −1 := u 0 , u 1 , . . . , u k , . . . , with T as the set of all trajectories. The sets X and U define the admissible state space of a D-DoF manipulator with respect to the constraints: The distinction between states and controls arises later from the repetitive nature of online planning, which commands the required controls instead of states. For the sake of simplicity, states x k refer to the robot's joint angles and controls u k refer to desired joint velocities rendering τ to be a common joint space trajectory. Without loss of generality, we assume the robot's built-in joint velocity controllers to be sufficiently precise to attain the commanded u k . Note, U can easily be extended to also limit accelerations approximated by finite control differences. The time law of τ uses the uniform grid: which assigns timestamps t k to all state and control pairs. To distinguish from trajectory timestamps t k , t n serves as a general timestamp for n = 0, 1, . . . , N . Consequently q n := q(t n ) is the robots actual joint configuration, with q ∈ X C (t) and x f,n := x f (t n ) is the planning target with x f ∈ X C (t) at t n . Set X C (t) ⊆ X contains all collision free states regarding robot self-collisions (X R (q) ⊆ X ), static obstacles (X S (t) ⊆ X ), and dynamic obstacles (X D (t) ⊆ X ): Note, X S (t) and X D (t) depend on time as the number of obstacles might change.

A. GLOBAL PLANNING PROBLEM (Offline)
A formal definition of a global optimization problem to find an optimal solution trajectory τ that starts at q n and reaches x d ∈ X C (t n ) while minimizing costs regarding x k and u k at t n is given as: subject to Besides optimizing the number of states K it is also possible to choose a sufficiently high but fixed K or optimize t [22]. Solving (5) to achieve a global solution by optimization will be called Global Trajectory Optimization (GTO).
Expression (5a) utilizes the cost function: with components Components (6b) and (6c) state the common quadratic cost function and penalize deviations from x d as well as control effort with positive definite matrices Q ∈ R D×D and R ∈ R D×D for weighting. Although (5d) already enables collision avoidance, component (6d) keeps the robot away from obstacles in general and induces prophylactic avoidance behavior. As a robot link approaches an obstacle, the proximity function ρ : O → R + 0 rises: with weight w ∈ R + 0 , activation threshold d min ∈ R + , and pairs of link and obstacle collisions o ∈ O. The function d : O → R reflects the signed distance between a collision pair o.
Expression (5b) relates x k and u k to x k+1 as a simple forward Euler integration to match the integral relation between joint angles and velocities. Equality constraint (5e) forces τ to start at the robot's state q n . Solving (5) usually takes more time than available for online planning in peopled environments. Therefore, planning is performed once at t n with n = 0 and x d = x f,0 . The optimality of a solution obtained by gradient-based algorithms depends on its initialization τ I . A suboptimal solution is denoted τ • and the optimal solution is denoted τ .

B. LOCAL PLANNING PROBLEM (Online)
To achieve an online-capable computation time, K is limited to a sufficiently small value. In this case, K is fixed and thus the constraint (5f) is omitted. The local problem now differs from the global one in such a way that a trajectory is sought that merely points towards x f,n , but does not necessarily reach it directly. Since online planning is repetitive and entangled with motion execution, x f,n is approached step by step which gives this method the name of Moving Horizon Planning similar to Model Predictive Control.
Local planning is performed repeatedly at t n for n = 0, 1, . . . , N using x d = x f,n and the same t as τ . To distinguish between trajectories of different timestamps, we use subscript notation τ n whenever necessary. By setting t k for k = 0 equal to t n , and considering (3), τ effectively starts at t n and consistently extends into the future. In each instance t n , VOLUME 10, 2022 FIGURE 1. Illustration of three cases where local solutions affect the optimality of CLTs. The differences appear at t 2 . Local solutions as well as CLTs can be optimal (stars) or suboptimal (bullets). Initializations are marked as circles.
the first element u 0 of the optimized controls is sent to the robot resulting in the continuous-time signal u(t): The resulting trajectoryτ caused by the sequence of controls from local solutions is called closed-loop trajectory (CLT). It ideally matches the global solution of (5). However, this is not guaranteed as the local problem has a limited planning horizon. Suboptimality manifests in deviations from the global solution and results in, e.g. longer paths and duration or even in a standstill. Fig. 1 demonstrates three cases of how suboptimal CLTs occur. It shows the set of all trajectories T with points indicating locally suboptimal solutions τ • n and stars indicating locally optimal solutions τ n . Red color marks are local solutions that produce a standstill. The area is vertically divided into timestamps t n for n = 0, 1, 2, 3. The gray-framed areas define subsets of T of all feasible trajectories of local optimization that start at q n . Circles represent initializations τ I n of the local problem with corresponding arrows pointing to the converged solution. Some local solutions (green) are part of the globally optimal solution and follow Bellman's principle of optimality [23] and some (blue) arise from local solutions that produce a suboptimal CLT. The sequence of green solutions will be called Bellman Trace. The last solution at t 3 is a standstill on x f,3 .

a: CASE I
Assuming a proper initialization τ I 0 , then the first two solutions follow the Bellman Trace in Fig. 1a. At t 2 a change in the environment (moved obstacle) results in a new Bellman Trace, with the old one in dashed. The dotted gray shape between both feasible sets indicates the previously joined area of feasible solutions, which is now cut in half by the obstacle. The first case of suboptimal CLTs now arises by the dependency of the local problem on its initialization τ I 2 . It may results in the bottom (blue) solution even if the top (green) solution is locally optimal.

b: CASE II
Local suboptimality reaches its worst case when it comes to a standstill because of an obstacle blocking the path and the planner's inability to find the locally optimal solution to pass around. In Fig. 1b the sequence is equal to Fig. 1a until timestamp t 2 , where now the bottom local solution produces a standstill. The second case is even worse than the first because the suboptimal CLT does not reach the target x f,3 . Again, even if the top (green) solution is locally optimal, the problem may converge to the bottom (blue) solution when initialized poorly.

c: CASE III
In Fig. 1c, the Bellman Trace at t 2 requires a local solution that lies outside the feasible set. Consequently, the local solution converges to the boundary of the set and, in this case, becomes a standstill. This case demonstrates that even locally optimal solutions τ may not lead to a globally optimal CLT. A standstill is locally optimal e.g. if the planning horizon is too short to cover the reeving part of an evasive motion and thus leading to higher costs than stopping. One option is to increase K to extend the feasible set (dotted gray) in such a way that it includes the upper (green) solution. Then Case II applies if there are additional local minima. Since a longer horizon also means higher computation time, this is only partly applicable. Another way is to change the cost function so that a new locally optimal solution (blue) arises, which does not cause a standstill. Although the resulting sequence of local solutions no longer resembles the Bellman Trace, the goal is reached at the costs of a suboptimal CLT.
Besides improving the quality of local solutions by proper initializations τ I n , it is also important to offer local solutions, which produce a (near) optimal CLT. For the former, this article utilizes an Enhanced Initialization τ I n,(·) produced by a parallel exploring instance, which itself uses two different kinds of initializations. For the latter, we tackle the problem by offering new locally optimal solutions by adjusting the cost function of the local problem.

III. PARALLEL PLANNING AND ENHANCED INITIALIZATION
Avoiding poor initializations is not trivial as it requires knowledge of the area around a locally optimal solution. The highdimensional search space T and the nonlinear constraints of (5) make explorations complicated. In addition, online planning usually exploits warm starts which eliminates explorative behavior and tends to find solutions similar to the previous one but on the other hand is important to decrease planning time as it essentially implies a replanning behavior that distributes the optimization across multiple timestamps.
To keep the benefits of warm starts and at the same time overcome its drawbacks, the presented approach explores the current feasible set by a parallel planning instance. Fig. 2 shows a sequence diagram that demonstrates how both instances intertwine: t 1 : Both instances solve the same planning problem with the main instance using its previous solution τ 0 and the parallel instance using an external initialization τ I n,(·) . t 2 : The main instance warm starts with its previous solution τ 1 since the parallel instance is still planning. It finds a suboptimal solution τ • 2 . t 3 : The parallel instance now offers a locally optimal solution τ 1,(·) that is preferred over the previous suboptimal solution τ • 2 for initializing the main instance. t 4 : The parallel solution τ • 3,(·) is suboptimal and thus the previous main solution τ 3 is preferred. Note that the choice for initializing the main instance is only determined by the solution with least costs. It does not necessarily have to be the locally optimal solution. The parallel instance is allowed to take more iterations than the main instance and uses dedicated techniques to generate τ I n, (·) which are presented in the following two sections. Since the parallel solution is based on a potentially older environmental state, it does not command the robot directly but is re-optimized in the main instance. This has the additional advantage that the control can always be sent regularly.

A. INITIALIZATION BY LINEAR INTERPOLATION (L-IN)
The most general initialization for the parallel exploration instance is linear interpolation τ I (L) between q n and x d for which the states and controls are: It is assumed to have a solver that does not require the initialization to be feasible, as linear interpolations across the entire space usually violate constraints (5b), (5c), and (5d).

B. MIRRORED INITIALIZATION (M-IN)
A recent local solution of the main instance describes the impacts of a pushing obstacle on the trajectory. A reversed motion opens a new explorative perspective. Therefore, this more enhanced variant mirrors statesx k of the last solution τ t n−1 of the main instance to initialize the parallel instance. Considerx i,k and x i,d as the i-th components of the states x k and x d , respectively: for each joint i = 0, 1, . . . , D − 1. Letz i,k andz i,d be homogeneous points in the spatio-temporal domain onx 0:K and x f,n , respectively: Let v i be a unit vector pointing fromz i,0 toz i,d : Matrix T i ∈ R 3×3 is a homogeneous transformation that transforms pointsz i,k onto the t-axis: with v ⊥ i being perpendicular to v i . Finally consider matrix M ∈ R 3×3 that mirrors along its second component: The complete mirroring operation is now performed as: It essentially is a transformation ofz i,k onto the t-axis followed by a sign flip and a backward transformation. The mirrored points z i,k are constrained to joint limits and to a total time of K t. If the sequence is not of full temporal length, it is continued by replicating z i,K , i.e. Zero Order Hold (ZOH). Fig. 3 shows an example of the mirroring procedure.
To match the uniform grid (3) in terms of t and a total number of K points, the sequence is resampled into x 0:K . After performing the above steps for all joints, the controls u k are calculated based on finite differences and both, x 0:K and u 0:K −1 are stored in τ I (M) .

IV. SUB-GOAL TRACKING
To adjust the cost function and promote locally optimal solutions that avoid standstills this method swaps x d = x f,n with a local sub-goal x d = x s,n ∈ P that is drawn from a set P ⊂ X R ∩ X S (t n ) of candidates at t n and reachable by the current feasible set of trajectories. This way, non-constant locally optimal solutions are preferred over standstills. Since this method acts on the reference x d only, it does not interfere with Enhanced Initialization introduced in the previous section.

A. SET OF CANDIDATES
By changing the target x d of the local planning problem, it further deviates from the global problem (5) and the CLT more likely becomes suboptimal. To limit the extent of suboptimality, P should be close to the globally optimal solution. Since P does not directly depend on time, it is purely geometric and can be expressed as a path, e.g. as a list of linearly interpolated waypoints. The approach, therefore, utilizes an optimal path planner to find the shortest path P in parallel to MHP. The points, p s ∈ P and p g ∈ P, indicate the start and goal of P, respectively. If x f,n is inside of a hypersphere with radius γ ∈ R + around q n , sub-goal tracking is skipped (x d = x f,n ) to avoid planning trivial paths. In all other cases, the motion towards x f,n is assumed to be potentially sensitive to locally optimal standstills and sub-goal tracking becomes necessary (x d = x s,n ).
If there is no previous path, or if x f,n is outside of a hypersphere with radius δ ∈ R + around the path's current endpoint p g , an optimal path planner generates/updates P based on p s = q n and p g = x f,n at t n . The hypersphere models a hysteresis, which prevents permanent replanning, especially in the case of moving references x f,n . To further reduce the planning effort for moving references, multi-query planners such as PRM* should be preferred to create P.
Besides new reference values x f,n , replanning P is also triggered by updates of X S (t n ). Due to the required planning time, only static (and temporary static) obstacles are considered. They often represent the largest and most bulky part of the workspace. Temporary static obstacles are dynamic obstacles that temporarily do not (or only slightly) move. The distinction can be made using background detection approaches.

B. SUB-GOAL SAMPLING
At t n , the sub-goal x s,n is drawn from P based on τ n−1 . This is done by virtually moving x s,n along P starting at p s . To ensure that the resulting sub-goal is within range of the local planning problem, it is moved by path length s ∈ R + 0 : in whichK ∈ N denotes the number of successive states x k , which are sufficiently close |x d − x k | ≤ ∈ R + to the current target x d for k = K , K − 1, . . . , 0. Parameter λ ∈ R + can be interpreted as a pseudo-velocity at which the robot is expected to move, e.g. the robot's joint velocity limit. Together with t, s represents the available distance of the planning horizon before it is maxed out. Parameter K 0 ∈ N ensures that there is always some clearance. Fig. 4 demonstrates howK is chosen. At the first planning cycle (n = 0), there is no previous solution and thus x s,0 = p s = q 0 . The resulting solution leads toK = K + 1 since all states lie on q 0 and no motion is necessary. In the subsequent cycle (n = 1), x s,1 virtually moves along P according to (19) and the robot starts moving according to (8). This process repeats for all subsequent cycles. The combination of P starting at q n and choosing x s,n so that it is robustly embedded in the previous solution τ n−1 ensures that x s,n lies within the planning horizon of the next planning instance.
To ensure that x s,n does not collide with a dynamic obstacle, s is reduced gradually until x s,n ∈ X D (t n ). In the worst case, a big-enough dynamic obstacle constantly pushes the robot back. However, usually the dynamic obstacle is either classified as background eventually and P passes it, it moves out of the way by itself, or the horizon reaches around it. In particular, the latter is encouraged by sub-goal tracking as it preserves the flexibility of the local planner and imposes fewer requirements on the path quality compared to classical path tracking approaches.

V. EVALUATION
The evaluation comprises four experiments based on three scenarios shown in Fig. 5. The first two show a dynamic obstacle disturbing the robot during a horizontal and vertical motion, respectively and are utilized in Experiments I -III. The third scenario is static and includes two shelves from which the robot picks and places a fictional object. It is utilized in Experiment IV only. The obstacles and robot links are modeled by swept-spheres, which allow for efficient distance calculation in (4) and (7) [24]. More details on the exact structure of the scenarios and the robot's start and target configurations can be found in Appendix A.
To exploit sparsity and efficiently maintain changing constraint dimensions that arise from dynamic environments, the optimization problem of MHP is transformed into a hypergraph representation [5]. MHP is embedded in a custom C++ ROS framework that handles the interface to the environment and the computation of distances, constraints, and costs. Its performance under multiple dynamic obstacles in the form of a person has been demonstrated by Krämer et al. [3]. The framework utilizes the interior point method IPOpt [14] to solve the planning problem as well as any parallel instances using the linear solver MA27 from [25]. The parameters for this are summarized separately for all experiments in Appendix B.
The robot is a Universal Robot 10 (D = 6), which is simulated in Gazebo to simplify setting up, running, and analyzing the experiments. However, the motion commands generated by MHP can also be sent directly to the real robot, which is demonstrated for Experiment IV in the accompanying video. Both, the simulation and planning framework run on a computer equipped with an Intel i7-8700 CPU at 3.2 GHz using 32GB RAM RAM at 2666MHz under Ubuntu 18.04.

A. EXPERIMENT I (Scenarios A + B)
The first experiment investigates the influence of linear and mirrored initializations on the resulting quality of the local solution of MHP. For this, the costs of these local solutions are compared with the costs of a baseline solution. The Baseline is MHP initialized with τ I (L) at the beginning and warm starting afterwards. The variants L-IN and M-IN do not use warm start and always re-initialize with τ I (L) or τ I (M) , respectively. All variants solve the same local problem and use the parameter given in Appendix B. For better comparability, the robot ignores incoming motion commands and remains in its initial configuration. Fig. 6 shows cost values of all three variants along with the pitch angle of the obstacle in Scenario A. In the beginning, the path in front of the robot is free and the costs correspond to a local trajectory pointing directly to the target. All variants share almost identical costs up to t = 1 s and thus also share the same solution. The costs start to increase as the intruding obstacle starts pushing away the local solutions. The costs of the Baseline keep increasing until the obstacle stops at t = 2 s since the local solution warm starts and thus lacks explorative capabilities. The other two variants feature exploratory abilities due to their type of initializations and are able to find a better local solution, which eventually lies on the level of a direct trajectory. Instead of being pushed downwards entirely, the local solutions of these variants pass the obstacle on the upside. Besides a clear improvement to the Baseline, there are also differences within the type of initializations. For example, M-IN finds the locally better solution earlier than L-IN. Fig. 7 shows the costs for Scenario B in the same way. Again, in the beginning, the path is clear for a trajectory that points directly to the target. Due to the intruding obstacle, the costs start to increase for all three variants. For the Baseline, they rise to their maximum value at t = 1.75 s and then fall when the obstacle has passed, and finally stay constant after the obstacle stops. However, the costs still do not reach the locally optimal costs as for variant L-IN and M-IN. Both perform equally in this scenario. Their initially stronger increase VOLUME 10, 2022 compared to the main instance until t = 1.2 s indicates that these initializations can also lead to a locally worse solution than Baseline. However, this is not critical in normal operation when Enhanced Initialization as shown in Fig. 2 is enabled, since the parallel solution is only used to initialize the main instance if it constitutes lower costs.

B. EXPERIMENT II (SCENARIOS A + B)
Since the first experiment demonstrated the effectiveness of solving a local problem initialized by L-IN or M-IN instead of warm starting, this experiment investigates the effects on the CLT when the main instance is initialized by parallel solutions according to Fig. 2. Consequently, this experiment deals with the following variants: the Baseline variant without Enhanced Initialization, which initializes itself at the beginning via τ I (L) and subsequently via warm start, and two variants with a parallel instance initialized by τ I (L) or τ I (M) used for enhanced initialization of the main instance. Their parameters are given in Appendix B. To generate a CLT the robot no longer ignores the motion commands. This experiment utilizes the length of the CLTs as a measure to compare their quality. Fig. 8 compares the spatial lengths of the CLTs for Scenario A and B. In Scenario A, the Baseline clearly produces the longest distance (1.73 rad) compared to L-IN (1.23 rad) and M-IN (1.2 rad), since it avoids the obstacle in below and moves in front of it. There is hardly any difference between both variants with Enhanced Initialization in this  scenario when it comes to path length. Both almost match the shortest distance. However, inspecting the second joint's CLT in Fig. 9 shows differences in terms of when a locally better solution is found. Up to t = 1.3 s, all three variants tend to avoid the obstacle downwards before variant M-IN selects a locally better solution upwards. Variant L-IN tends to avoid the obstacle downwards for a little longer before it also selects a new solution. It can be seen that the distance already traveled in joint 2 is revised which reduces the fluidity of the robot's motion. Variant M-IN, successfully recognizes a locally better solution earlier and passes the obstacle below with less directional changes.
The results of Scenario B are similar to Scenario A except that now the second and third joints perform the actual motion and the first joint is used for evasion. Fig. 10 shows again a tendency in the wrong direction before variant M-IN and L-IN find a new locally better solution. The Baseline stays with the initial tendency and rides out the obstacle.

C. EXPERIMENT III (SCENARIOS A + B)
The previous results demonstrated that Enhanced Initialization leads to better local solutions. However, this does not necessarily imply that they are also part of the global solution. The third experiment therefore extends the previous one by comparing the similarity between local solutions and global solutions generated by three global methods. Besides GTO, these methods are STOMP and the path planner PRM*, which both are sampling-based. GTO, STOMP, and PRM* have been integrated into the same C++ framework as MHP for maximum comparability. They share the same objective  functions, constraints, and distance calculation framework if needed. GTO is formally closest to solving the global problem, but itself depends on initialization, which in this case is τ I (L) . STOMP is taken from [26], with custom joint limit clipping and the sampling matrix from the original publication [16]. It is not fully equivalent to GTO due to a fixed trajectory duration. PRM* is taken from the Open Motion Planning Library [27]. With sufficient planning time, it is asymptotically optimal in finding the shortest path, but it is purely geometric. To cover a wide range of global solutions, all three global methods are part of the evaluation. The parameters for the MHP variants are the same as in the previous experiment and those for STOMP and PRM* can be found in Appendix B.
A local solution is optimal in the sense of the global problem if it is a substrategy of the globally optimal solution [23]. Since in all three scenarios the obstacle is dynamic and the robot is moving, the globally optimal solution also varies. For this reason, global solutions are determined at different timestamps (t = 0 s, 0.5 s, . . . , 3 s) based on the current state of the robot and obstacle. The similarity between the local solution and the global solutions of the same timestamp is considered as a measure of quality. The closer a local solution is to the globally optimal solution of its timestamp, the more it conforms to the local optimal solution. The similarity measure is inspired by the Frechet and Hausdorff distance and is defined in (20) in Appendix C. Lower values mean more similar. Table 1 shows the distances between local solutions of Baseline, L-IN, and M-IN from Experiment II and global   further, the behavior is similar for variant L-IN. The Baseline almost always shows the worst similarity. This reinforces the results from Experiment I that mirroring tends to find a better local solution earlier on Scenario A than linear interpolation, with evidence that it additionally contributes to the global optimal solution due to its similarity to the optimal 2 strategy.
For the first three timestamps in Scenario B, the local solutions of all three variants are close to the global solution. At t = 1.5 s, they all deviate to a similar but significant extent. The reason is that at this point the global solutions have just changed direction and both, L-IN and M-IN are still following the old direction. This can also be seen in Fig. 12 as a comparison between timestamps t = 1.5 s and t = 2 s. It shows the superposition of the local solution and CLT of M-IN together with the global solution of STOMP for the first joint. At t = 1.5 s the local solution passes the obstacle in front when STOMP already suggests to pass behind. At t = 2 s, the local solution follows that of STOMP. Although M-IN usually tends to find a better solution earlier than L-IN, the delay time between mirrored initialization of the parallel instance followed by optimization and re-initialization of the main instance (see Fig. 2) can take up to two planning cycles, which is crucial in this case. For the remaining timestamps in Table 1, all variants get closer to the global solution as the obstacle passed by, whereby M-IN is slightly better than L-IN. The Baseline also improves but still has the least similarity.   sub-goal tracking are compared. M-IN+ST uses PRM* without smoothing and any post-processing to generate P. Both variants are compared with three global methods GTO, STOMP, and PRM* from the previous experiment. The parameters are given in Appendix B. The experiment uses Scenario C with two shelves. Table 2 summarizes the results of this experiment. Variant M-IN is not able to drive out of the left shelf to reach the target and runs into a standstill. The same applies to STOMP, which was not able to determine a valid trajectory even though it was initialized using PRM*. Variant M-IN+ST, on the other hand, is able to move out of the left cabinet and successfully reaches the bottom shelf of the right cabinet due to sub-goal tracking. However, the resulting length of 10.85 rad is higher than that of the global methods PRM* (7.25 rad) and GTO (6.07 rad). Fig. 13 overlays the CLT of M-IN+ST and P with local solutions at five different timestamps for the first three joints. The joint angle of P corresponds to the determined sub references x s,n . The local solutions usually reach these values because of a sufficient margin of K 0 = 8. The figure further indicates how the resulting global solution corresponds to a smoothed and optimized version of P, which shows how the flexibility of optimization is successfully exploited in this method compared to simply tracking P. However, it is not able to provide a shorter path than initially given by P.
GTO finds the shortest solution, after it was initialized with PRM*. A linear interpolation is too far away from the solution to be found by the optimizer in the highly nonlinear search space. PRM* provides a slightly longer solution than GTO since it terminates after 25 s and cannot optimize a trajectory in a gradient-based and thus objective-based manner. Note that GTO and PRM* are mainly used for references and therefore not tuned for speed. Both methods may generate the solution in a shorter time than given in Table 2. In this experiment, however, the focus lies on the rough order of magnitude that demonstrates the exclusion of GTO and PRM* as online planners.

VI. CONCLUSION AND OUTLOOK
This article presents approaches to improve local solutions in MHP via both, advanced initialization and sub-goal tracking. Optimization with linear or mirrored initialization without warm start already provides local solutions with less final costs. Enhanced Initialization is a method to combine the advantages of warm starts with the exploratory property of a parallel instance. It successfully finds locally better solutions and improves CLTs in terms of final path length (Case I). The results show that M-IN tends to find locally better solutions earlier than L-IN. Additionally, Enhanced Initialization using M-IN is most often closer to the shortest global solution than L-IN or normal MHP. However, there is still room for improving M-IN as it does not always switch to a better and locally close-to-optimal solution in time and also tends to slightly move in the wrong direction first. This is partially limited by design as for mirroring in the right direction it requires a prior tendency in the wrong direction. Another reason is the delay measured from the initialization and optimization of the parallel instance via re-optimization in the main instance.
The changes in the cost function by following a dynamic sub-goal successfully establish local solutions that prevent standstills (Case II/III). A primitive path planned in parallel with MHP is sufficient as the set of sub-goal candidates. The resulting motion is smooth and except for the goal state completely subject to optimization of MHP.
Future work analyzes the influence and opportunities of environmental predictions as well as initializations transformed from operational space. Especially for the former, some promising approaches have emerged in the past to predict the motion of humans that can be incorporated into methods and references in a prescient way. Initializations that originate from operational space might have the ability tackle standstills directly on a more intuitive level and at the same time focus the analysis more on Case II.  Table 3.

APPENDIX B PARAMETERS
The sets in (4) are based on the minimum distanced min = 0.01 m. The repelling potentials (7) become active when the distance falls below d min = 0.1 m and use a weight of w = 50. Furthermore, Table 4 shows the parameters of all variants of trajectory optimization. Table 5 contains the VOLUME 10, 2022 parameters for sub-goal tracking and Tables 6 and 7 contain parameters of the global methods STOMP and PRM*, respectively.
with d(x k ,x 0:M ) denoting a distance function between points x k and linearly interpolated segments ofx 0:M . It is adapted from the Frechet and Hausdorff distance to account for usually shorter local solutions. Fig. 14 shows an example of the metric for a two dimensional state space with K = 3 and M = 5.