Online Hybrid Motion Planning for Dyadic Collaborative Manipulation via Bilevel Optimization

Effective collaboration is based on online adaptation of one's own actions to the actions of their partner. This article provides a principled formalism to address online adaptation in joint planning problems such as Dyadic collaborative Manipulation (DcM) scenarios. We propose an efficient bilevel formulation that combines graph search methods with trajectory optimization, enabling robotic agents to adapt their policy on-the-fly in accordance to changes of the dyadic task. This method is the first to empower agents with the ability to plan online in hybrid spaces; optimizing over discrete contact locations, contact sequence patterns, continuous trajectories, and force profiles for co-manipulation tasks. This is particularly important in large object co-manipulation that requires changes of grasp-holds and plan adaptation. We demonstrate in simulation and with robot experiments the efficacy of the bilevel optimization by investigating the effect of robot policy changes in response to real-time alterations of the dyadic goals, eminent grasp switches, as well as optimal dyadic interactions to realize the joint task.


I. INTRODUCTION
D YADIC collaborative Manipulation (DcM) is a term we use to refer to a set of two individuals jointly manipulating an object, as shown in Fig. 1. The two individuals partner together to form a distributed system, augmenting their manipulation abilities. Such individuals can be either humans or robots. In scenarios, where both individuals are humans, the collaboration is natural as we humans are adept at co-manipulation.
One key element is our ability to understand our partner's intentions and adapt our actions accordingly. A second central skill is our ability to generate sequential manipulation plans. Nevertheless, our understanding of the mechanisms of joint action [1] and sequential decision making [2] are still subject of research.
Early work by Sheridan [3] identified eight core challenges of human-robot communication, with two of them being: (i) the need to acquaint both humans and robots with models of their partners, and (ii) the need to regulate the interaction of distributed decision-making systems, typically referred as mixed initiative systems. Ajoudani et al. [4] summarized the strategies used to equip robots with interaction capabilities and pinpointed that research on human-robot interaction models is still at its infancy. In this work, we focus on how a robot policy can be partner-aware and flexible toward complying with the requirements of DcM scenarios.
DcM scenarios demand a broad range of manipulation skills from both participants. The secret behind humans' remarkable manipulation skills, is our competence in control and prediction of contact events [5]. In this article, we address co-manipulation scenarios that involve multiple changes of contact, which is the crux of sequential manipulation [6].
As a typical DcM example, consider a robot transporting a large object with a human as shown in Fig. 1 . To avoid such deadlocks, both the human and the robot should anticipate the object's future state and change their contact locations accordingly, as shown in Fig. 2(B) and 2(D). As illustrated in Fig. 2(E) and 2(G), contact adjustments are crucial for the failure or success of the task. Further, actions like contact changes must comply with the partner's actions to jointly balance the object.
Joint planning in DcM scenarios is extremely challenging and requires solution of the following four complex problems: 1) Partner's Intention Estimation: An agent can only contribute to the performance of the dyad, if an estimation of the partner's intention can be obtained. 2) Joint Action Space Planning: As the two individuals act upon the same object their actions need to be coordinated with respect to the critical aspects of the task, e.g. balancing the object in collaboration with the partner. 3) Online Motion Plan Generation: Since the human partner's behavior is changing-i.e. non-stationary-a collaborative agent needs to update its own action plans on-the-fly according to the current goal and state of the interaction. 4) Agent's Hybrid Policy: The agent's repertoire of actions needs to be sufficiently rich to participate in DcM tasks. Such actions belong to a hybrid space of both continuous and discrete quantities, like forces and contact changes.
In this article, we address the last three points with a novel bilevel optimization formulation-where a continuous optimization problem is embedded into a discrete one. We couple informed graph-search (GS) methods and trajectory optimization (TO) to efficiently compute online hybrid motions with high fidelity. The resulting motion plans can be updated on-the-fly and incorporate both geometrical and physical couplings between the individuals of the dyad. Partner's intentions are represented as task space goals and here, we assume that the intentions can be predicted. The partner's policy is abstracted as task space wrenches, which enables us to model joint manipulation. The proposed method enables robots to generate on-the-fly joint action policies that are partner-aware and can benefit from the breadth of the hybrid action space.
The contributions of this article can be summarized as follows: 1) Partner-Aware Dyadic Planning Formalism: We extend the joint planning formalism-introduced in our previous work [7]-to non-stationary partner behaviors. Using this, the problem of finding the appropriate actions to co-manipulate the object can be addressed given an estimate of the partner's variable intentions. This formalism serves as a principled basis for the development of the partner-aware joint-action method.
2) Bilevel Computational Formulation: Our bilevel optimization formulation enables the combination of graph-search methods with trajectory optimization methods in a single framework. The former provides a coarse solution which is refined by the latter. This combination allows us to efficiently explore the discrete modes of the problem, e.g. the contact state of an end-effector, and holistically reason about geometric and dynamic properties, e.g. contact locations, forces and timings.

3) Hybrid Optimal Control for Multi-Contact Planning:
We present a holistic model-based optimization method that allows robotic agents to treat (i) forces, (ii) contact locations, (iii) actions timings, (iv) object's trajectory, and (v) contact sequence pattern, concurrently while ensuring optimality. To do so, we introduce a set of hybrid motion primitives that enable our method to generate hybrid plans without a prespecified contact pattern as in [7]. 4) Online Dyadic Planning: By combining the above three with the partner model introduced in our work [7] and an informed search planner, we realize a computationally efficient optimization method for DcM setups. The method generates hybrid motion plans online that are capable of adapting on-the-fly to changes during the task, like dyadic goal changes.
This article is organized as follows. Section II reviews related work on DcM setups and hybrid motion generation. Section III presents the problem formulation. Brief background on methods is provided in Section IV, while the methods' details are given in Sections V, VI, and VII. Section VIII presents the evaluation of the method and the experimental results. Finally, Sections IX and X conclude this article by discussing promising future research directions.

II. RELATED WORK
A. Human-Robot Collaboration (HRC) Components 1) Partner's Policy Prediction: In [8] and [9], a confidence measure of the human's goal prediction was used to alternate between reactive and proactive robot behaviors. Multiclass classifiers were utilized in [10] and [11] to recognize human partner's commands through force interaction in co-manipulation tasks. Further, conditional random fields were adopted in [12] to infer the human's intended goal during box co-pushing tasks and in [13] to anticipate object related human activities. In [14], learning-based human's occupancy workspace was predicted to generate robot collision-free motion. A comprehensive literature on modeling other agents in multi-agent environments can be found in [15]. These methods predict the partner's intent, however, the adaptation of the agent's policy is only realized as a selection from a set of discrete behaviors or as a modification of few continuous variables.
2) Dyadic Interaction Models: Interaction models are separated in three prevailing schools of thought, described below.
Control focused: Agravante et al. [16] used impedance control to accommodate partner's actions and collaboratively carry a table with a human. A load sharing framework with predefined sharing modes was presented in [17]. In both cases, the partner is treated as an external disturbance to the system.
Coupled-policies focused: Maeda et al. [18] proposed a method to transfer adaptive hand-overs to robots from kinesthetic demonstrations. Similarly, in [19], force and vision information was employed to commence the appropriate learned impedance behaviors depending on the task's phase. Data-driven extraction of interaction constraints during hand-over tasks was proposed in [20]. The extracted constraints were then used to form online robot responses. These methods couple together the policies of the agent, the partner, and the task evolution, to learn a direct mapping toward generating online adaptive robot responses. Thus, their generalization capabilities are limited to the demonstration set.
Partner-model focused: Maeda et al. [21] used a polynomial model to predict human motion and update the robot's goal. In [22] and [23], a task model was learned offline, to guide the interaction at the reproduction phase. Evidence of humans utilizing a model-based prediction of the partner's goal to update their own task's goal and consequently their own policy has been presented in [24]. Such methods are elegant, as each entity (partner, agent, task) of the interaction is modeled separately and actions of the two individuals can be appropriately reasoned upon. Our work follows the same principle to obtain generalizable robot behaviors.
3) Agent's Policy Generation: Another central aspect to HRC is motion attributes that the robot can regulate to fulfill the task in collaboration with the human. In [25], task space attributes, e.g. the object's trajectory, were optimized to facilitate human ergonomics. In [26], adaptation during co-manipulation was realized through turn-taking collaboration. Further, a number of methods focus on the dynamic properties of the interaction. Inverse dynamics approaches concentrated on the torque and force regularization [27], [28], while others adapted the impedance characteristics of the robot online to accommodate for partner's actions [22], [29]. However, a central aspect of manipulation is the selection of the appropriate contact locations on the object [30]. Accordingly, the exploitation of the contact space of the object by the two individuals is vital in DcM scenarios. To the best of our knowledge, contact adaptation within collaborative manipulation scenarios has not been addressed yet, although it is crucial toward enabling a robot perform complex DcM tasks jointly with a partner.

B. Hybrid Motion With Contact Changes
Next, we focus on the state-of-the-art multi-contact methods used for generating hybrid motions for manipulation and locomotion.
1) Multi-Contact Planar Manipulation: Mason introduced the problem of planar nonprehensile manipulation, the motion cone concept, and the voting theorem [31]. The limit surface concept was introduced in [32] and used in [33] to model the dynamics of planar pushing. These concepts map contact point's motion to object's motion, and have been used [34]- [36] to address planning and control for planar pushing. Also, recently they were generalized to a broader set of planar tasks [37]. Yet the different contact modes are typically explored with offline sampling, and the quasi-static environment assumption limits their applicability to 2D tabletop pushing.
2) Hybrid Planning and Control: According to an important duality between manipulation and locomotion, the latter is an instantiation of nonprehensile manipulation [38]. Our work is inspired by model-based optimal control methods [39]- [42], that are not restricted by a quasi-static stability assumption. Next, we describe three hybrid motion generation approaches.
Hierarchical approaches: These approaches address hybrid problems by decomposing them into action planning [43], contact planning [44], and motion control [45]. Such hierarchies allow to exploit domain knowledge at the task planning level and have been used for online motion generation. Yet, as these elements are designed separately it is usual that the final solution is not optimal or sometimes not feasible. In contrast, our method treats all the variables of the problem holistically.
Mixed-integer programming: This formulation explicitly models the hybrid nature of the problem and has been used by [46]- [48] for both locomotion and planar manipulation. Yet mixed-integer methods need to explore both the continuous and discrete parts of problems, while reasoning for the discrete part is done using general combinatorial optimization methods like Branch and Bound. This typically leads to large computation times that can be prohibitive for DcM needs.
Continuous programming: Using continuous optimization, in [41], a mathematical problem with complementarity constraints was formulated in the presence of complex contact phenomena. In [40] and [39], smooth nonlinear optimization problems were formulated based on a key observation: motions through contacts have phases, while the contact set remains invariant within each phase. One of their drawbacks is that the motion can only be conditioned on physical properties of the problem neglecting higher-level task objectives that are common in DcM scenarios [49], [50]. Our formulation treats all the variables as continuous and utilizes a GS method to consider higher-level task objectives too.
3) Sequential Manipulation Planning: On a different line of work, Simeon et al. [51] employed probabilistic roadmaps to produce motion plans with multiple grasp-hold changes. King et al. [52] used Monte Carlo tree search to plan sequences of discrete pushes and reason about object interactions. The A* algorithm was used by Gienger et al. [53], to demonstrate DcM scenarios with a human and a robot. These methods discretize the state space to employ search algorithms. Further, combinations of these methods with motion planners have been realized [54]. Nevertheless, to the extent of our knowledge, these methods are limited to kino-dynamic planning and have not been applied in hybrid problems, where the generation of dynamically feasible plans is of core importance.
Logic geometric programming (LGP): LGP with physics synthesizes logical planning with optimal control to demonstrate a broad range of robot sequential manipulation planning capabilities.
LGP has also been used for multi-agent cooperative manipulation tasks [55], such as handovers. The cooperative aspects are limited to the kinematic domain, where both agents act synchronously, but their actions are not physically coupled. In parallel to LGP with physics, the proposed method combines the benefits of informed GS algorithms and optimal control formulations. Informed search methods such as A* are a more efficient special case of the Branch and Bound algorithm [56], a fact that makes them well suited for online motion adaptation during DcM tasks. Toussaint's work [57] exhibits creative solutions for manipulation puzzles in simulation, while our method considers dynamic aspects of the dyadic interaction and enables online re-planning. Fig. 3 provides a graphical representation of DcM as a system. It is separated into three components; (i) the partner's policy, (ii) the dyadic interaction, and (iii) the agent's policy, similarly to Section II-A. Key in this formulation is the dyadic interaction, which is used to capture the binding between the two individuals, both in physical and in mental terms. The physical pairing arises due to the object that acts as the physical medium for exchanging information, while the intentions of the individuals are naturally correlated due to the common task of the dyad. The exact pairing configuration-e.g. role distribution within the dyad-can be regulated through the physical constraints and the dyadic objectives.

A. Core Nomenclature
Here, we introduce the core notation used in this article, while section specific symbols are defined in each section. We use superscripts, k for the indexing of the agent's k th end-effectors' quantities, as well as a and p, to refer to the same quantity for the agent and the partner, respectively. We use subscripts to denote both time and indices along a sequence. n ∈ N Dimensionality of c-space (partner, agent) ν ∈ N Dimensionality of manipulation task N ∈ N Total number of knots 1 i ∈ N Knot indices of the transcribed problem j ∈ N Indices of the graph-search problem K ∈ N Total number of agent's end-effectors T ∈ R >0 Total motion duration (final time) Forces applied by agent's k th end-effector λ ∈ R ν Partner's applied wrench c k ∈ R ν Agent's k th end-effector position K, D ∈ R ν Stiffness and damping (partner, agent) q ∈ R n Configuration (partner, agent) y t:T ∈ R ν×N Pose trajectory of the object ΔT ∈ R N Agent's action timings

B. Formulation
A trajectory ξ is a time-indexed sequence of actions, that guide the object to the goal state x T given its current state x t , with x t = [y tẏt ] T . In the top right of Fig. 3, we illustrate few trajectories (grey) from all feasible ones, as well as the optimal one (black) with ξ * The full control policy π a of an agent participating in DcM tasks is defined as function π a (·) − → ξ, where ξ in the most generic form, can be used to represent a trajectory with several components as However, as this work aims to generate hybrid motion plans that belong in the force-contact space, the output of the agent's policy can be simplified to The actual task, i.e. the object's motion, is a function of the two individuals' policies described by Thus, the policies of the two individuals are coupled, forming the dyadic interaction. We represent this relationship by explicitly parameterizing the policy π a as which indicates the dependency of the agent's policy to the estimated policy of the partnerπ p , the parameters of both the dyadic setup θ D and the manipulation task θ M . As shown in Fig. 3, the estimation of the partner's policy can be obtained based on a set of sensory measurements, an intention estimation process and a parametric model. In the proposed DcM formulation, the parametric model of the partner's policy depends on the state of the object, it outputs task-space wrenches and can be described by a set of parameters θ p , formally written as To comply with the sequential nature of DcM tasks, the policy of the partner should be non-stationary. This can be represented with a multi-modal probability distribution P r(θ p |x t , q p t , H p ) over parameters θ p , given the sensed data x t , q p t and a history of partner's actions H p . In every instance of the dyadic interaction, the partner's policy is described by one of the modes of the distribution as shown in top left of Fig. 3. This can be obtained by an intention estimation process.
The aim in DcM scenarios is to obtain the agent's optimal policy that depends on the parameters of both the dyadic setup and the manipulation task, and the current estimate of partner's policy. We express partner-aware dyadic planning with by introducing the idea of considering the partner's actions into the motion plans of the agent through the constraint functions g(·). As the partner's behavior is non-stationary, the parameters θ p of the partner's policy π p need to be estimated repeatedly during the interaction to provideπ p , which will trigger an update of π a . Further, the objectives of the dyad, such as θ D , are met through the cost function c (·), where θ D can represent e.g. the role assignment within the dyad. Additionally, the task specifications can be satisfied either through the cost function c (·) or the constraints g(·), where θ M may define e.g. the final pose of the object or a constant linear/angular velocity for the object.

A. Hybrid Motion Preliminaries
As described in Section III, the policy of the agent π a generates hybrid action trajectories that guide the object from the current state to the goal. We illustrate one such trajectory in Fig. 4, where the object's pose y t , the end-effectors' positions c k and the contact force f l of the left end-effector are visualized. Such trajectories have hybrid nature due to the contact change. The elements we would like to highlight in Fig. 4 are: (i) critical transition instances exist within the trajectory, where discontinuities occur, e.g. the force at T 1 and T 3 , (ii) according to these Fig. 4. Hybrid motion plan with one grasp-hold change, separated into phases. The grey dotted area on top illustrates the physical space (x,z,φ). Orientation φ is illustrated with the green arrow in the object. The force f l applied by the left (blue) end-effector is shown with the middle plot. The knots of the trajectory with resolution 3 are shown in the bottom graph along with the contact distance d of the left end-effector to the object's surface. The contact knots are grey, the swing knots are pink, and the pre-contact knot is cyan. It is worth mentioning that all the quantities shown here are optimized.
time-instances, the motion can be separated in phases-called contact-invariant phases, e.g. contact and swing phase, and (iii) the sequential arrangement of these phases defines the outline of the trajectory-which we refer to as structure of the motion and we denote with H ∈ {0, 1} K×N . In manipulation setups, the structure of the motion specifies the arms' contact sequence pattern, i.e. the order with which the arms change contacts.

B. Graph Search Algorithms Preliminaries
Search algorithms are used to find paths within graphs [58]. A graph G = (V, E) is described with a set of nodes V and set of directed edges E. Each edge e pq ∈ E denotes a directed link between node v p ∈ V and v q ∈ V , where v q is a successor of v p . In the context of search algorithms; (i) each edge e pq has an associated cost c pq > 0, (ii) the graph can be obtained given a set of initial nodes {v ι } ⊂ V and a successor operator Γ. Γ is defined on the set V and when applied on a node v ι provides all its directed edges to successor nodes with the respective costs. Finally, given a node v s and the successor operator Γ, a subgragh G s = (V s , E s ) can be constructed such that all the nodes of G s define the accessible set of nodes from node v s . GS algorithms address the problem of finding the optimal where C (v) ∈ R is the total cost along the path, (7c) indicates the set of all directed edges starting from a given node, f γ (·) is the transition function responsible for computing the next node in the sequence, and (7d), (7e) specify the initial node and the set of final nodes, respectively [58]. The specifics for our problem regarding C , Γ, f γ and the representation of v are discussed in Section V-A. These methods compute efficiently a sequence of transitions, i.e. the structure of the motion. Yet they neglect details of the actual continuous motion through the transitions.

C. Trajectory Optimization Preliminaries
TO addresses the problem of finding locally optimal trajectories for dynamical systems [59], [60]. We consider the following optimal control problem min where η is the dimensionality of a general system, x ∈ R η is the model's state vector, u ∈ R η is the model's control vector, c (·), c f (·) ∈ R in (8a) are the running and final cost functions, f (·) ∈ R η in (8b) describes the system's dynamics, and (8c) to (8f) describe bounds on the initial state, final state, path constraints and motion duration, respectively. Description (8) belongs to a rather general class of optimization problems-termed Infinite Programming problems-since we seek to find a set of continuous functions that fulfill a set of continuous constraints. To make such problems computationally tractable, the usual approach is to parameterize the problem using a finite number of decision variables, i.e. express the problem as a constrained parameter optimization problem. In this work, we express the hybrid problem utilizing direct transcription (using a trapezoidal integration rule); we further specify our problem's structure in Sections V-B and VI, while more details on TO methods and transcription methods are provided in Appendix A. These methods are used to compute efficiently continuous motion plans through discontinuities, and typically require a proper initial seed.

V. BILEVEL OPTIMIZATION
In this section, we provide the core computational formalism, that enables on-the-fly generation of hybrid motion plans, both for single agent manipulation planning and for joint manipulation planning in dyads. First, we describe how GS algorithms can be formally combined with TO methods. The former is the outer level and the latter is the inner level of the bilevel optimization. Next, we present the details of the outer and inner levels. The schematic shown in Fig. 5 illustrates the interplay between the outer and inner level, and reveals the nested structure of the inner level.
Bilevel Formulation: The aim of this formalism is to provide the means to generate hybrid trajectories, like the one shown in Fig. 4. Motivated by the key observations mentioned in Section IV and inspired by the bilevel method presented in [61] as well as the "Mixed-Logic Program" [57], we combine the two formulations presented in (7) and (8) into a single bilevel optimization formulation, as follows The outer level of the optimization is described with equations (9a)-(9d) and it is responsible to construct the structure of the motion. This is achieved by performing a discrete search using the GS method, shown in Fig. 5. The inner level of the optimization is described with the TO problem (9e) and its role is to compute the continuous trajectories, such that the discrete transitions can be realized. (9a) and (9b) are identical to (7a) and (7d) in formulation (7), however, the discrete transition function f γ described in (7b) is now replaced by (9e), which denotes a nonlinear continuous optimization problem of the form (8). Additionally, to reflect in the computational formalism the dependency of the solution to the current mode of the partner's policy-as denoted by (4) and (5)-we modified (9c)-(9e) such that they depend on the parameters θ p . The dyadic planning details are given in Section VII.
Outer-Inner Level Interplay: First, the outer level computes a discrete sequence of states that define the initial structure of the motion toward the goal. Second, each segment of the motion is passed on and is optimized by the inner level. One or more of these segment may be altered by the inner level, resulting in a modification of the initial structure of the motion. Consequently, the discrete sequence of states-subsequent to the modified segment-might become obsolete and might need to be recomputed by the outer level. This third step is closing the bilevel loop. Running these three steps iteratively, the bilevel optimization converges to the goal in a sliding window fashion. In Fig. 6, we illustrate the bilevel nature of the method with four examples.

A. Outer Optimization Level
For the outer optimization level, we aim for a fast GS method, to compute the structure of the motion, e.g. a sequence of contact changes and object's motions. Given a discrete state representation, the state-space can be encoded into a graph, with each discrete state being a node v of the graph as described in Section IV-B. We use a coarse state representation that includes a discrete description of the object's state y and contact locations of agent's end-effectors c k . A node v in the graph corresponds to the tuple (y, c l , c r ), where y, c l , c r ∈ N and v is defined as an index to the tuple with v ∈ N. Fig. 7 depicts a viable 2D state discretization. Fig. 6. Representative illustration of four different solution paths (i)-(iv) obtained by the proposed bilevel optimization method. The dashed lines depict the discrete transition found from the outer (discrete) level of the optimization, while the full lines are the continuous segments obtained from the inner (hybrid) level of the optimization. All four paths start from the same initial node with index 1. Solution path (i) ends at node 7. Solution paths (ii) and (iii) end at node 8 although they are different paths. In particular path (ii) will be generated when having a change of goal from final node 7 to node 8. Similarly, paths (iii) and (iv) end at different nodes which are identical with respect to the task, if we observe the state of the object only. An interesting point is the alternation of the transition from e 3,5 to e 3,6 by the inner (hybrid) level optimization, which results in a new path from node 6 to the goal.
A key element of the GS algorithms is the successor operator Γ, defined in Section IV-B. Γ allows us to attain a low branching factor and perform graph expansion more efficiently than brute-force node insertion [62]. We realize Γ for multi-contact manipulation and DcM scenarios specifically. We construct a simplified and intuitive physics model of the object-hand interactions based on the following rules [53]. Feasible States: 1) Left end-effector must always be on the left of the right.
2) A minimum distance between end-effectors is defined.
3) Applied forces have to be permissible given the contact location (see Section VI-3). 4) When both end-effectors are in contact, they must quasistatically counteract gravity effects on the object's CoM. 5) The pivoting torque spawned in scenarios with single contact must not violate a given threshold (DcM-specific). Feasible Transitions (task-depend): a) Both end-effectors must be in contact to rotate the object. b) A single or both end-effectors can change contact within one transition. Rules 1), 2), and a) are realized based on a mapping from the discrete state to the continuous Cartesian space of the endeffectors. Rules 3) and 4) are computed based on quasi-static principles which are configuration dependent, typically used in grasping literature [30]. Further, rule 5) reserves as an implicit threshold on the required torque the partner has to apply, counteracting the pivoting torque applied by the agent as states with high torques are not allowed.
Regarding the particular choice of GS method, we use a heuristic A* algorithm for the following two reasons. First, the A* algorithm is considered a special case of Dynamic Programming [63], [64]. Thus, the solution of our overall problem is obtained by a bilevel optimization process. Second, A* is known for its computational efficiency, as it exploits heuristics to achieve a very low effective branching factor.
A* constructs an optimal sequence of states in terms of the evaluation function C (·) = g(·) + h(·). The cost term g(·) ∈ R is obtained from edges' costs (see Section IV-B), while the heuristic term h(·) ∈ R needs to be admissible (always underestimate the actual cost) and monotonic to ensure that the solution path found is optimal [58]. To facilitate optimal composition of the solution paths shown in Fig. 6, the heuristic term h(·) needs to be designed in accordance with the cost function c (·) of the inner optimization level in (9e). The details on the heuristic and the cost terms are given in Section VIII.
The outer level provides the optimal structure of the motion efficiently-plus-an initial guess for the inner optimization level, by converting the optimal sequence of states to continuous trajectories using fifth order polynomials [53].

B. Inner Optimization Level
The inner level (9e) is responsible for optimizing the hybrid path (see Fig. 4), given the structure of the motion. Here, all motion-relevant quantities (see Section IV-A) are optimized within their continuous manifold, while the discretized description of the quantities (see Section V-A) serves as a bases for the initial seed of the continuous problem. For example, contacts c k i are optimally selected from the entire object surface, not only from the discrete contact locations shown in Fig. 7.
As it has been reported in our previous work [7] and by others [39]- [42], [57], the computational times of optimizing the full path at once are extensively large for any type of online motion adaptation. Further, the nonconvex nature of the problem gives no global optimality guarantees. Thus, to address the computational efficiency challenge, we propose to optimize each segment of the motion separately. To realize this, we introduce a decomposition of general hybrid motion into a set of hybrid motion primitives, referred as the Hybrid Optimization Lexicon for Manipulation (HOLM). Fig. 8(a) shows the primitives for a single end-effector and Fig. 8(b) illustrates a few combinations of HOLM primitives for bimanual agents.
In contrast to [65], where the hybrid motion is chopped into spacetime windows with fixed contact configuration and time duration, we choose to build each primitive as a sequence of two contact-invariant phases (see in Section IV-A) of variable time duration. The primitive Cnt2Cnt has to two consecutive contact phases, the Cnt2Sw has a contact phase followed by a swing phase-where the grasp-hold change starts-and the Sw2Cnt has a swing phase followed by a contact phase, where the grasp-hold change is completed. A single swing primitive does not contribute to the task, thus every swing phase is accompanied by a contact phase. Hence, each segment of the motion is optimized including the critical transitions of making and braking contact (discontinuities) to anticipate the next phase's contact configuration. The transition from one HOLM primitive to the next does not require special treatment as the contact configuration is not altered.
Regarding the collection of primitives used, Sw2Cnt and Cnt2Sw form the minimal set of making-braking contact, while Cnt2Cnt is used to maintain contact, e.g. this is particularly useful when the robot rotates the object toward the goal. The use of the Cnt2Cnt primitive is encouraged with rule a) described in Section V-A. In general, this set of primitives allows to fine-tune the hybrid robot motions to be legible [66].
The inner level accomplishes very fast optimal hybrid motion plan generation, given the structure of the motion. To the extend of our knowledge, this in turn empowers the bilevel optimization to be the first on-the-fly re-planning capable hybrid optimization method. This allow us to demonstrate online hybrid policy adaptation with respect to non-stationary dyadic interactions. Next, we provide the inner level details.

VI. HYBRID PLANNING VIA TRAJECTORY OPTIMIZATION
To solve the continuous optimization problem in (9e), we perform direct transcription as explained in Appendix A. This involves discretizing the trajectories of the following decision variables. For each i th knot, the quantities of interest [see (2) and (3)] are (i) the pose of the object y i , (ii) the velocity of the objecṫ y i ∈ R ν , (iii) action timings ΔT i , (iv) the contact locations c k i , and (v) the contact forces f k i . We group these quantities in two vectors ∀i ∈ N the trajectory of x i and u i describes a hybrid motion (described in Section IV-A). TO problems with intermittent contacts can be expressed using complementarity constraints, yet in practice convergence of these problems is difficult (see Appendix B). In our work, the structure of the motion is optimized by the outer level (see Section V-A), which allows us to customize our transcription, and separate the motion in phases with different constraints-the contact-invariant phases mentioned in Section IV-A. Next, we present the phase-independent and the phase-specific constraints, respectively.

1) Phase-Independent Constraints:
We introduce here constraints that are applied to all the knots of the trajectory regardless of the particular phase. We note that the object's dynamics f o (·) ∈ R 2ν and the end-effectors' motion f e (·) ∈ R ν are integrated using trapezoidal quadrature. Also, ψ c ∈ R 2ν defines the reachable area of the agent's end-effectors, referred as arms' workspace.
r Dynamics of the object (discussed further in Section VII) r Initial state of the object r Desired final state of the object r Kinematic limits of the agent's end-effectors We use simple box bounds to approximate them.
r Upper bound on the total time of the motion 2) Phase-Specific Constraints: The transcription of our hybrid optimization problem follows the phase-based parameterization (see Fig. 4) introduced in [40], also used in [39]. We extend this by considering the three possible collision states between two rigid bodies as described in [67], and we split the knots in three sets according to the contact-invariant phases; the contact, swing, and pre-contact sets, shown in Fig. 4. At each discretization point (knot), a constant subset of constraints needs to be satisfied. Most of the phase-specific constraints are time independent, which allows us to optimize each phase's duration and satisfy the constraints of each phase simultaneously. Each phase is characterized by a distinct set of decision variables that allows us to enforce a number of constraints implicitly and reduce the number of decision variables. A list of the constraints categorized according to the phase of the motion follows.
(i) Contact phase: r Permissible contact forces (discussed in detail next) ) r No contact point slipping (implicit constraint) r End-effectors in contact with the object (implicit constraint) where d(·) ∈ R is the signed distance between endeffector and object. S obj : (y, c k ) − → R ν computes the closest point on the object's surface to the endeffector's location and stresses out the importance of object's shape representation described below. (ii) Swing phase: r End-effector's motion r End-effector's swing motion away from object ) r No force (implicit constraint) (iii) Pre-contact phase: r End-effectors touching the object ) r No force (implicit constraint)

3) Permissible Contact Forces:
With (16), we denote the allowable contact forces exerted by the end-effectors to the object. These forces should satisfy the constraints 3 where f is the force vector, n c ∈ R ν is the normal and t c ∈ R ν is the tangent vector at the contact point on the object's surface. μ ∈ R is the friction coefficient. Here, (24a) is the unilateral contact constraint and (24b) is the friction cone constraint; we use the linearized friction cone form. In (24) the constraints are denoted using the halfspace representation. Alternatively, the force constraints can be enforced using the vertex representation, also used in [68] where ν c = n c + μt c are the extreme rays of the friction cone, α ≥ 0 are weighting coefficients and κ ∈ R is the number of rays used. Normals, tangents, and extreme rays are functions of the contact location and are obtained from S obj (·) according to the object shape representation. A 2D graphical illustration and intuitive comparison between the halfspace and the vertex forms is given in Fig. 9(a). We choose to enforce constraint (16) with (25) as we have empirically noticed faster convergence.  (14), (18), (20), (22), and (25) are realized given a specific representation of the end-effectors position. We choose to represent the end-effectors position relative to the CoM of the object in polar coordinates, graphically shown for the 2D case in Fig. 9(b).

5) Object's Shape Representation:
The object's surface is represented with a closed cubic spline curve. The spline representation is a smooth description of the object's surface from which all relevant properties along with their gradients can be extracted, like normal and tangent vectors. The use of continuous representations of the object's surface is more generic than approaches like [69] that rely on the convexification of the object's shape, as scaling with respect to the number of edges/phases becomes cumbersome.

6) Input Variables and Hyper-Parameters:
The only required input variable is the description of the manipulation task, θ M . Here, we use the start and goal state of the object, denoted as [y * 0ẏ * 0 ] and [y * Nẏ * N ], respectively. Nonetheless, the specification of θ M can be as flexible as needed, and can be specified either via the cost function c(·), e.g. minimize object's acceleration, or via the constraints g(·), e.g. set upper object's velocity limits or set forbidden regions of the workspace. To solve the TO problem two hyper-parameters need to be specified. First, the resolution of the grid N (number of knots) shown in Fig. 4. Second, motion's upper time bound T .

VII. DYADIC CONSTRAINT AND PARTNER MODEL
In this section, we present the specifics on how to incorporate the partner's policy in the framework and generate dyadic hybrid plans. In DcM scenarios, the object is jointly manipulated by both individuals-as specified by (3)-by applying forces on it. We propose to incorporate the partner's policy in the TO framework through the transcription constraints defined in (11). Only now, the object's dynamics are subject to the partner's wrenches too, described by (26) where m ∈ R and J ∈ R ν×ν ≥0 are the mass and inertia of the object, I is the identity matrix, g is the gravitational acceleration,ẏ ω i is the object's angular velocity, and with(·) we refer to the cross product matrix formed by the input vector. By realizing (11) according to the augmented dynamics-where λ represents the partner's contribution-the TO generates plans in accordance to the partner's policy, referred as partner-aware. This is illustrated in Fig. 3 with the physical constraints block. In contrast to [55], where the method assumes full control authority over the partner's actions, the only requirement of our method is an estimate of the partner's policy.
Partner's policy parametric model (see Fig. 3): This work aims to provide a principled way toward incorporating partner's actions into the policy of the agent. An essential step toward this goal is to identify the appropriate function space in which the partner's policy lies. We use here a simple but ample model for the partner's policy The parameters K p and D p denote a spring-damper behavior of the partner toward the goal [y * Nẏ * N ] of the co-manipulation task. K p can be interpreted as the parameter that can shape whether the partner acts as a leader K p 0 or as a follower K p = 0, along with all the intermediate behaviors. The goal [y * Nẏ * N ] captures the partner's intentions relative to the task. This model has been used in human motor control research [70], as it captures the essence of the partner's policy.
Partner's policy oracle function: As this work is not focused on estimating the partner's policyπ p , we assume an oracle function exists. The oracle function can predict the parameters θ p = (y * N ,ẏ * N , K p , D p ) that describe the current mode of the partner's policy (see Section III). This in turn enables the use of (5) without the need to compute P r(θ p |x t , q p t , H p ). The implementation of the oracle function could be realized with the methods reviewed in Section II-A1.

VIII. EXPERIMENTS
In this section, we first provide computational evaluations of the proposed method. We proceed with simulations on both a single agent and a dyadic setup. Last, we evaluate the proposed method with real-world DcM experiments. See the attached material for video footage of the simulations and the human-robot experiments during DcM tasks.
The purpose of the computational study is to highlight the computational gains of HOLM, in comparison to our previous work [7], and emphasize the importance of specific algorithmic choices. The objective of the single agent simulations is to demonstrate the capabilities of the method to plan highly dynamic motions. Likewise, the dyadic simulations present a multitude of situations, where the resulting hybrid policy of the agent is conformed to the partner's policy. The experiments with a human-robot dyad demonstrate our method's viability to plan on-the-fly under real-world conditions.
Parameters of Experimental Setup: The state of the object is y = [x z φ] with task dimension ν = 2, which is sufficient for the demonstrations; however, both levels (Section V) can  FIG. 8 be realized in 3D space with ν = 3, e.g. the inner level can be modified based on our previous work [68]. To obtain the discrete state (y, c l , c r ) mentioned in the outer level (see Section V-A), we only need to consider φ and the contact locations, as the translational components of y, do not affect the structure of the motion. φ is discretized with 30 • resolution and for each of c l and c r , we specify 16 contact locations. With these choices and with the rules defined in Section V-A, the branching factor for a brute-force search method is b ≈ 23. Yet, the A* algorithm uses heuristics to guide the search, thus the average effective branching factor for our setup is b * ≈ 4, which is the key to very efficient outer level computation times. Regarding the evaluation function C of A*, the heuristic term h models the angular difference between the current and goal rotation angles of the object, while the transition cost function g corresponds to the required movement length; shorter transitions in the continuous Cartesian space are cheaper. The cost function of the HOLM primitives similarly minimizes distance to goal and overall path length. With this setup-as discussed in Section V-A-the resulting A* discrete solution sequence is optimal and in accordance with the inner optimization level. The knot resolution used for the HOLM primitives is six knots per phase, and the friction cone is μ = 0.5. For each HOLM primitive, we use an upper time bound of T = 3.5 s for contact phases and T = 6.5 s for swing phases. Once the hybrid motions are optimized in the task space they are being mapped to the configuration space of the robot using inverse kinematics (IK) [71].
Regarding the implementation details, we use CasADi [72] to realize the HOLM primitives, 4 where each primitive is a separate parameterizable hybrid problem. Each hybrid problem is a large and sparse nonlinear optimization problem which is solved using IPOPT [73], while the automatic differentiation capabilities of CasADi allow us to provide exact gradient and hessian information. The A* planner and the lower-level control aspects of the robot, e.g. inverse kinematics (IK), are implemented in the Robot Control Software (Rcs) framework. 5 All experiments are conducted on a 64-bit Intel Quad-Core i7 3.40GHz workstation with 16GB RAM.

A. Computational Evaluations
We now show improved computational results over our previous single optimization based hybrid planning approach [7].

1) HOLM Computation Times:
In Table I, we present the average computation times for 15 runs of each HOLM primitive. Each primitive is evaluated on a variety of tasks, using three objects with different shape, a sphere, a rectangular box, and a parallelogram box. The tasks involve translation from 0 m − 1m and rotation from 0 • -180 • , similar to the ones shown in Figs. 11 and 12. The computational times reported are obtained with zero initial seed and they scale linearly with respect to the number of knots and the time horizon. These results reveal the computational benefits of HOLM.
2) Bilevel Optimization Computation Times: In Table II, we present the average computation times for the bilevel optimization. We group tasks in terms of angular distance from the initial state of the object to the goal, as this grouping nicely relates to the number of contact changes required to complete the task. As the number of contact changes depends on the initial contact configuration, a range of contact changes is given rather than an exact number (second column of Table II). We also provide the approximate horizon of the resulting motion. These tasks are as follows: We show the computation time required for the first segment of the motion, the average computation time for each one of the consecutive segments (fifth and sixth column of Table II). The former indicates the planning time until the receding horizon plan can be updated, while the latter specifies how fast the successive segments are computed. These computation times comprise revising structure of the motion too, and are proportional to the graph size displayed with the number of explored nodes (fourth column of Table II). These evaluations exhibit the online planning capabilities of the bilevel method.
3) Discussion: The main steps that allow us to improve the computation times from tens of seconds in our previous work [7], to milliseconds for HOLM and few seconds for the bilevel optimization are as follows: (i) decomposing the problem into HOLM primitives, which allows to keep the size of the hybrid  problems small (second column of Table I), (ii) exploring the hybrid structure of the problem with an efficient GS algorithm, (iii) formulating a sparse problem that can be efficiently solved, 6 (iv) providing the exact Hessian using automatic differentiation, (third and fourth column of Table I), and (v) selecting the end-effectors' and permissible force representation discussed in Section VI-3 and VI-4.
The seventh column of Table II shows the average computation times required to optimize the full continuous path using the HOLM primitives only for the inner optimization level. First, as the HOLM primitives utilize the initial seed provided by the outer level (see Section V-A), the computation times are much smaller than the ones in Table I. Second, in the eighth column of Table II, we provide the computation times (only inner level) needed to compute the full path using a hierarchical approach, as in [74]. The comparison between the seventh and eights column of Table II reveals the computational gain of using HOLM primitives with respect to the baseline approach.
Finally, the success rate of the bilevel optimization depends on the selected discretization of the outer level. If a fine discretization is selected, an optimal solution is always found. However, this is achieved at the expense of computational efficiency. Therefore, we used a discretization of 30 • that provides fast solutions and satisfying success rate. The inner optimization level has been empirically observed to provide robust solutions in terms of convergence, due to the appropriate initial seed given. This allows us to mitigate sensitivity issues with respect to the initial seed, which is a common drawback of continuous optimization methods. Further, even in case the inner level fails to converge, we can always use the interpolated trajectory obtained by the outer level.
The computation times presented demonstrate the online planning capabilities of our method. We gained approximately a ×10 to ×50 speedup in comparison to our previous work [7], while simultaneously the arm's contact sequence pattern (structure of the motion) is automatically computed.

B. Simulations Experiments
We present here a number of different motion plans generated by the proposed method that demonstrate the capability to find dynamic and partner-aware solutions. A dynamic motion is illustrated in Fig. 10(a)-(f), where a robot performs in simulation the challenging task of throwing and catching a ball. Next, the variability of the solutions generated with respect to the dyadic setup is analyzed.

1) Partner-Aware Solutions:
First, we alter the partner's policy parameters K p and D p in (27). Each partner's policy is expressed as a force field along one axis: in π p 1 along X, in π p 2 along Z, in π p 3 along the main diagonal, and in π p 4 along φ axis. In Fig. 11(a) and (b), the agent has one end-effector and jointly completes with the partner, a 2.12 m translation task in a zero gravity (table-top) scenario. Fig. 12(a) and (b) illustrate solutions for a 0.98 m translation and a −90 • rotation task, generated as responses to two different partner policies in a scenario with gravity along the z-axis. The former task highlights the effect of the partner's policy on the selected contact location. The variation of the computed solutions is evident in the latter task in Fig. 12 2) Outer Versus Inner Level Solutions: With Fig. 16, we show the benefits of our method over solely search-based planning approaches [53]. During this 90 • object rotation DcM task, the human partner does not properly support the object, as shown in Fig. 15, where the avatar's left hand is not in contact with the object. In our partner model, this is represented through parameters K p,φ = 0 and D p,φ = 0 in (27). The search-based outer level provides a coarse solution (blue line in Fig. 16) that does not take into account the policy of the partner, while the inner level significantly alters the plan (green line in Fig. 16) to conform to the dynamic constraints of the task, i.e. jointly balance the object. This shows that the inner level significantly alters trajectories, durations, and action timings of the outer level solution, to respect dynamic aspects of the interaction.
3) Online Adaptation to Alternations of the Joint Goal: During this DcM scenario, the initial object's target orientation of 150 • changes to −55 • , while the agent is not aware of this change in advance. The object's target serves as a proxy to the partner's intention. This is realized by altering the goal [y * Nẏ * N ] during the interaction shown in Fig. 18. In Fig. 17(a)-(c), we show the angular state evolution of the object and the two end-effectors.   Once the change of joint goal occurs, re-planning is completed in 0.95 s for the first segment of the receding horizon plan. The consecutive segments are adapted in 1.13 s. This illustrates that our method can adapt on-the-fly trajectories, action timings, durations, the structure of the motion and contact locations to respond to real-time changes of the joint task.

C. DcM Experiments
We validate our approach in a real setting, where a human partner jointly manipulates two different objects with a bi-manual, i.e. k ∈ {1, 2}, and n = 32 DoF robot. The robot moves on the horizontal plane in an omni-directional fashion-due to its mobile base-and utilizes its two Kuka LBR iiwa 820 arms along with two Schunk dexterous 3-finger hands for manipulation and DcM tasks. A linear joint allows the arm base to be translated along the vertical axis. We use a box and a cylindrical object. Both are bulky, so that a human cannot perform the task alone. The hybrid motion plans are optimized in the task space and are realized on the robot in an open-loop fashion, after being mapped to the configuration space using IK. A detailed description of the physical system can be found in [53]. The robot utilizes surface contacts at the planned contact locations as a form of mechanical feedback. Further, it is worth noting that the joint-range of the robot only permits rotations of the object of about 90 • before it reaches kinematic limits, thus grasp-hold changes are required.
1) Human-Aware Solutions: In correspondence to the simulation experiments shown in Figs. 12 and 13, we demonstrate a 90 • box rotation task with one contact change per arm. During this real-world evaluation, the human partner is actively rotating the box and no change of goal occurs. The key-frames of the DcM scenario are depicted in Fig. 19.
2) Online Adaptation to Human's Goals: We perform two experiments to demonstrate the on-the-fly adaptation to the In the first experiment, shown in Fig. 20, the initial goal of the human partner is to rotate the cylindrical object to 90 • . The full plan is computed by the bilevel optimization in 1.72 s and the duration of the resulting hybrid motion is 27.31 s, with one contact change. During the experiment, the human partner decides that the preferred orientation of the object should be 180 • . Given the change of the human's goal, the robot agent computes the first segment of the adapted plan in 0.54 s and the remaining segments of the hybrid plan are computed within 1.28 s, while the total duration of the updated plan is 51.63 s with two contact changes.
In the second experiment, shown in Fig. 21, the initial goal of the human partner is to rotate the object to −45 • . The hybrid motion plan includes a grasp-hold change of the right arm and has a total duration of 28.93 s, which is computed within 1.69 s. During execution, the human alters the intended dyadic goal and aims for a 90 • desired object orientation. The first segment of the adapted motion (receding horizon) is computed in 2.60 s, while the remaining hybrid plan is computed in parallel with the execution of the first segment in 4.98 s. The total updated plan has a duration of 57.35 s and includes two contact changes. Note that this experiment requires a complete reversal of the object's orientation. The computed motion stops near −45 • (see attached video) and then an opposite rotation is initiated. The stop is due to the rotation reversal and not due to stretched computation time.

IX. DISCUSSION
In this section, we discuss our dyadic modeling choice, practical considerations and possible extensions of the proposed approach.
In our partner-aware dyadic formulation presented in Section III, we treat the two individuals and their dyadic interaction separately (see Fig. 3); such treatment was used to analyze human-human interactions [24]. We believe that this design choice is of core importance; as it has been shown in a variety of scenarios in Section VIII. This allows our method to generalize over different tasks and partner behaviors, assuming that the partner behavior can be approximated with the simple but rich spring-damper model. For the outer level described in Section V-A, we used a specific discrete state representation and rules that do not model the partner explicitly. Nevertheless, our framework can be easily extended to enable multi-layered dyadic interaction modeling. The inner level (see Section V-B) takes into account geometric [55] and dynamic aspects of the interaction (see Section VIII-B2), while the outer level could incorporate logical interaction rules, e.g. if one of the partner's arm is swinging, the agent's end-effectors should remain in contact with the object. Also, due to the A* choice, the outer level finds only the optimal discrete solution. One could enhance the planning robustness at the expense of optimality or computation time, by realizing the outer level with Anytime Repairing A* [75], that computes multiple incrementally optimal solutions.
Last, during the robot experiments, we identified the potential usefulness of microscale adaptation to task's current state. Essentially, coping with arbitrary dyadic situations requires both our online planning adaptation method (long horizon) and closed-loop control (short horizon). To this front, a Hybrid Model Predictive Control (MPC) implementation based on the HOLM primitives would allow the robot to correct for small errors during the evolution of the task, e.g. close the loop with respect to the object's state. The HOLM computation times presented in Section VIII-A1 serve as a first promising step.

X. CONCLUSION
This article presented a novel concept toward online adaptive robot motion generation for physical HRC tasks, such as Dyadic collaborative Manipulation (DcM) scenarios. We proposed a formalization toward addressing the joint action problem based on the assumption that an estimate of the partner's non-stationary intentions can be attained.
Further, we proposed a novel computational formalism to exploit the efficiency of informed graph-search (GS) methods in combination with the dynamic and geometric reasoning of optimal control methods. Our approach computes the optimal hybrid policy for the robot to complete manipulation tasks as a member of a dyad or alone. The method only assumes a roughly estimated model of the partner's policy and a model of the object. With these information, our bilevel optimization computes dynamically consistent and optimal hybrid paths for the (i) trajectory of the object, (ii) agent's forces, (iii) agent's contact locations, (iv) respective timings of these actions, and (v) arms' contact sequence pattern. Due to the computational efficiency of the method, the optimal paths can be computed online, such that on-the-fly adaptation to real-time changes of the dyadic interaction can be realized. This capability of the proposed method is particularly important for HRC scenarios, where typically the human partner alters intentions and behaviors multiple times throughout the interaction.
In summary, the proposed method is able to optimize over a variety of different modes, which span both: 1) The hybrid action space that arises, due to the multicontact nature of the task. 2) The multi-modal nature of joint-action planning, due to the non-stationary policy of the partner. The pivotal aspects that enable the method to holistically optimize over such a complex and multi-modal space efficiently is the use of an informed GS algorithm in combination with the decomposition of the hybrid motion into the HOLM primitives. The outer level's rules explore only the useful part of the solution space and with the HOLM primitives hybrid motion plans are optimized very efficiently.
We evaluated the method both in simulation and with an actual human-robot dyad. Both results demonstrate that the proposed method enables the robot agent to adapt its motion plans online, in response to real-time changes of the dyadic setup. These indicate the large potential of the method to be employed in general co-manipulation scenarios. Our future work will focus on methods to estimate the human intentions online and fully realize our vision presented in Fig. 3.

APPENDIX A TRAJECTORY OPTIMIZATION METHODS
TO methods can be categorized as either indirect or direct. Indirect methods are based on the calculus of variations or the Maximum Principle [76]. They first use necessary conditions, usually as a boundary value problem in ordinary differential equations, and then discretize the resulting equation to obtain the optimal solution. For our problem, multiple path constraints (8e) have to be included, which is not straightforward using this transcription method. Direct methods for TO first discretize (8) and then use standard nonlinear optimization techniques to solve the resulting parametric problem [60]. Since standard optimization techniques are used, general constraints are easily incorporated. This comes with costs regarding the accuracy of the obtained solution, for which the required level is always application dependent, while the resulting problems are easier to pose and solve. Our method falls into the direct TO category.
Next, there are three main direct TO methods: shooting, transcription, and collocation [77]. In direct shooting methods, an integration scheme (e.g. an ODE solver) is used to eliminate state trajectory variables from the problem. As a result, problems in this class require only discretization of the control and possibly of the path constraints. To compute the state trajectory calls to an embedded integrator are needed, which first requires the integrator to provide sensitivities and second it can be especially problematic for unstable systems. To mitigate this, direct multiple shooting methods perform both a state and control discretization, while calls to an integrator are still used, albeit for a shorter horizon [78]. This leads to larger but structured nonlinear problems.
Direct transcription methods do not require calls to an embedded integrator; the discrete system's dynamics are enforced as constraints. This is achieved by discretizing both the controls and the states in a grid as well as the objective integral, where the grid points are called knots. Using direct transcription, problem (8) can be expressed as where the notation is the same as in Section IV-C. The optimization problem is typically large and sparse, and nonlinear solvers which exploit sparsity (SNOPT [79] or IPOPT [73]) can be used. Direct transcription methods have similar convergence characteristics with direct multiple shooting, are well parallelizable, and are preferred for problems with challenging path constraints. Their drawback is that the time discretization and the integration scheme should be carefully selected, since there is a trade-off between accuracy and computation effort. Finally, in direct collocation methods both the input and the state are parameterized by piecewise polynomial functions (splines). Using these polynomial functions, the values of the state and control are computed outside of the knot points (typically at the midpoint of each segment), referred as collocation points. At these collocation points, the derivative of the spline is enforced to match the dynamics. Most commonly, first order polynomials are used for the input and third order for the state. Defining the collocation points at the midpoints of the spline allows their practitioners to compute the values of the state and control at the collocation points without computing the spline coefficients [80].

APPENDIX B TRAJECTORY OPTIMIZATION THROUGH CONTACT
In TO through contact, the hybrid nature of the intermittent contacts is usually expressed via a complementarity formulation defined as 0 ≤ d⊥f ≥ 0, where d is a signed distance between the contacting objects and f is the constraint normal force between them. This states that only unilateral force can be exerted between the bodies, penetration is not allowable, and that situations involving no contact but contact force are excluded. Mathematical programs with complementarity constraints are in practice difficult to solve as they do not satisfy constraint qualifications and relaxations are usually needed [41].