Robust and Safe Autonomous Navigation for Systems with Learned SE(3) Hamiltonian Dynamics

Stability and safety are critical properties for successful deployment of automatic control systems. As a motivating example, consider autonomous mobile robot navigation in a complex environment. A control design that generalizes to different operational conditions requires a model of the system dynamics, robustness to modeling errors, and satisfaction of safety \NEWZL{constraints}, such as collision avoidance. This paper develops a neural ordinary differential equation network to learn the dynamics of a Hamiltonian system from trajectory data. The learned Hamiltonian model is used to synthesize an energy-shaping passivity-based controller and analyze its \emph{robustness} to uncertainty in the learned model and its \emph{safety} with respect to constraints imposed by the environment. Given a desired reference path for the system, we extend our design using a virtual reference governor to achieve tracking control. The governor state serves as a regulation point that moves along the reference path adaptively, balancing the system energy level, model uncertainty bounds, and distance to safety violation to guarantee robustness and safety. Our Hamiltonian dynamics learning and tracking control techniques are demonstrated on \Revised{simulated hexarotor and quadrotor robots} navigating in cluttered 3D environments.

INDEX TERMS constrained control, physics-constrained learning, safe learning for control

I. Introduction
Designing controllers that guarantee system stability and handle safety constraints is an important problem in safetycritical applications of automatic control systems, including autonomous transportation [1], [2], robot locomotion [3], and medical robotics [4]. Safety depends on the system states, governed by the system dynamics, and the environment constraints. This leads to two requirements for designing provably safe controllers: an accurate model of the system dynamics and the satisfaction of safety constraints.
The first requirement has motivated data-driven dynamics learning approaches, utilizing machine learning techniques, such as Gaussian process (GP) regression [5], [6], [7] and neural networks [8], [9]. For physical systems, recent works [10], [11], [12] design the model architecture to impose a Lagrangian or Hamiltonian formulation of the dynamics [13], [14], which a black-box model might struggle to infer. For Lagrangian dynamics, Lutter et al. [10] use neural networks to represent the mass and potential energy in the Euler-Lagrange equations of motion. Meanwhile, Zhong et al. [11] use a differentiable neural ODE solver [15] to predict state trajectories of a Hamiltonian dynamics model, encoding Hamilton's equations of motion. A trajectory loss function is back-propagated through the ODE solver to update the Hamiltonian model parameters. Our prior work [12] extends the neural ODE Hamiltonian formulation by imposing SE(3) constraints to capture the kinematic evolution of rigid-body systems, such as ground or aerial robots. A Hamiltonian-based model architecture also allows the design of stable regulation or tracking controllers by energy shaping [11], [16], [12]. Interconnection and damping assignment passivity-based control (IDA-PBC) [17], one of the main approaches for energy shaping, injects additional energy into the system via the control input to achieve a desired total energy, which is minimized at a desired regulation point. Instead of learning robot dynamics in continuous time, Saemundsson et al. [18] design a variational integrator network to learn discrete-time Lagrangian dynamics. Havens and Chowdhary [19] extend it by including control input in the model and use model predictive control for stabilization.
The second requirement, ensuring satisfaction of safety constraints, has gained significant attention in planning and control. Model predictive control (MPC) methods [20], [21], [22], [23] include safety constraints in an optimization problem, which is typically solved by discretizing time and linearizing the system dynamics. Reachability-based techniques [24], [25], [26], [27] are directly applicable to nonlinear systems and offer strong safety guarantees but many require solving a Hamilton-Jacobi partial differential equation (PDE) or sum-of-squares optimization program. This is computationally challenging, especially for highdimensional systems, and may require system decomposition techniques [28]. Control barrier functions (CBFs) [1], [29], [30] offer an elegant approach to encode safety constraints. For a control-affine nonlinear system, a CBF constraint is affine in the control input, allowing safe control synthesis with quadratic programming (QP) [30]. However, constructing valid CBFs that guarantee the feasibility of the QP problem is challenging [31], [32]. Given a stabilizing controller, reference governor techniques [33], [34], [35] use a virtual governor system to introduce safety constraints based on the Lyapunov function of the closed-loop system. Recent work [36], [37] achieves safe trajectory tracking in unknown environments but is limited to linear or feedbacklinearizable systems.
Safe control synthesis with a learned model of the system dynamics needs to account for the model estimation error between the learned and the ground-truth dynamics [38]. Model uncertainty may be viewed as a disturbance, applied to the learned system, and handled using robust or adaptive control techniques [39], [40], [41], [42], [43], [44]. Safety constraint satisfaction in the presence of model uncertainty can be achieved using robust MPC [45], [46], [7], [47], L-1 adaptive control [40], or model reference adaptive control [48] that tracks the trajectory of a reference model and compensates for model uncertainty. For example, Hewing et al. [7] propose an MPC technique that trains a Gaussian Process (GP) model of the system dynamics and use the GP uncertainty to introduce probabilistic safety constraints in the MPC optimization. Robust controllers for Hamiltonian systems may also be developed using the IDA-PBC approach [43], [49], [50]. Most techniques have considered systems with states defined in Euclidean space and cannot handle manifold constraints, e.g., due to the orientation kinematics of a mobile robot. For quadrotors, Lee et al. [41] estimate disturbances from the tracking error and design a robust geometric controller but do not consider safety constraints.
In this paper, we consider both dynamics model learning and safe control synthesis for rigid-body systems, whose states include position, orientation, and generalized velocity. We assume that system has an unknown dynamics model but, as a physical system, it satisfies Hamilton's equations of motion over the SE(3) manifold of positions and orienta-tions. Given state-control trajectories, from past experiments or collected by a human operator, we seek to learn the system dynamics and design a tracking control law that handles safety constraints, e.g., obtained from distance measurements to obstacles in the environment. In our preliminary work [51], we learn a translation-equivariant Hamiltonian model of the system dynamics using a physics-guided neural ODE network [12]. We use the Hamiltonian model to synthesize an energy-shaping geometric tracking controller. The total energy of the system serves as a Lyapunov function and enables us to enforce safety constraints during trajectory tracking using a reference governor to regulate the difference between the system energy and the distance to safety violation. However, our preliminary work [51] uses the learned Hamiltonian model as the ground-truth dynamics and ignores the model estimation error in the control design. In this paper, we capture the estimation error as a bounded disturbance applied to the learned system and develop a robust safe tracking controller that takes the disturbance into account in the design of the reference governor. Our Hamiltonian dynamics learning and tracking control techniques are compared to a GP MPC technique [7] and are demonstrated in a 3D environment using a simulated hexarotor robot to achieve collision-free autonomous navigation.
In summary, the contribution of this work is a tracking control design for Hamiltonian systems with learned dynamics, which achieves robustness to model estimation errors and safety with respect to state constraints.

II. Problem Statement
Consider a rigid body with position p ∈ R 3 , orientation R ∈ SO(3), body-frame linear velocity v ∈ R 3 , and bodyframe angular velocity ω ∈ R 3 . Let q = [p r 1 r 2 r 3 ] ∈ SE(3) denote the body's generalized coordinates, where r 1 , r 2 , r 3 ∈ R 3 are the rows of the rotation matrix R. Let ζ = [v ω ] ∈ R 6 denote the body's generalized velocity. The generalized momentum p of the body is defined as: where M(q) 0 is the positive-definite generalized mass matrix. Let x = (q, p) ∈ T * SE(3) denote the state of the rigid body system on the cotangent bundle T * SE(3) of the SE(3) manifold. The Hamiltonian, H(q, p), captures the total energy of the system as the sum of the kinetic energy T (q, p) = 1 2 p M −1 (q)p and the potential energy U(q): The evolution of the state x is governed by Hamilton's equations of motion [52]: where u ∈ R 6 is the control input, e.g. force and torque or motor speeds for a UAV system, B(q) ∈ R 6×6 is an input gain matrix, and the operators q × , p × are defined as: where the hat mapŵ for w ∈ R 3 is: The Hamiltonian dynamics model in (3) can be extended to include energy dissipation in a port-Hamiltonian formulation [17] such as friction or drag forces [53]. However, for clarity of the control design, we leave this for future work.
We consider the case that the parameters of the Hamiltonian dynamics model in (3), including the mass M(q), the potential energy U(q), and the input matrix B(q), are unknown. Instead, we are given a trajectory dataset D = {t consisting of D sequences of generalized coordinates and velocities (q N , collected by applying a constant control input u (i) n to the system with initial condition (q n ) for t ∈ [t n , t n+1 ) and n = 0, . . . , N − 1. Our objective is to learn the system dynamics from the data set D and design a control policy u = π(x) such that the system follows a desired reference path without violating safety constraints. Let O ⊂ R 3 and F := R 3 \ O denote the unsafe (obstacle) set and the safe (obstacle-free) set, respectively. Denote the interior of F as int(F). We assume that O is not known a priori but the distanced(p, O) from the system's position p to O can be sensed with a limited sensing range d max > 0: where d(p, O) := inf a∈O p − a denotes the Euclidean distance from p to the set O. The safe tracking control problem considered in this paper is summarized below.
Problem 1: be a training dataset of state-control trajectories obtained from a rigidbody system with unknown Hamiltonian dynamics in (3). Let r : [0, 1] → Int (F) be a continuous function specifying a desired position reference path for the system. Assume that the reference path starts at the initial position at time t 0 , i.e., r(0) = p(t 0 ) ∈ Int (F). Using local distance observations d(p(t), O) of the unsafe set O, design a control policy π : T * SE(3) → R 6 so that the position p(t) of the closed-loop system with control law u = π(x) converges asymptotically to r(1), while remaining safe, i.e., p(t) ∈ F, ∀t ≥ t 0 .

III. Learning SE(3) Hamiltonian Dynamics from Data
In this section, we design a dynamics model that can be learned from a previously collected trajectory dataset, e.g., obtained from manual control, and is sufficiently general to represent different mobile robots, such as cars and drones. We describe how to learn Hamiltonian dynamics from the dataset D = {t in Sec. II, using translation-equivariant Hamiltonian-based neural ODE networks [12]. The mass M(q), the potential energy U(q) and the input gain B(q) are approximated by neural networks. We show that the model estimation errors caused by the trained neural networks can be considered as a disturbance applied on the learned system.

A. Translation-equivariant SE(3) Hamiltonian dynamics learning
Since the system dynamics does not change if we shift the position p to any points in the world frame, we offset the trajectories in the dataset D so that they start from the position 0 and learn the system dynamics well around the origin. This is sufficient for stabilization task, e.g. using the controller design in Sec. IV, because driving the system from state x with position p to a desired state x * with position p * is the same as driving the system from the state x with position 0 to a desired state x * with offset position p * − p.
Since the momentum p is not directly available from the dataset D, we use the time derivative of the generalized velocity, derived from (1): Eq. (3) and (5) describe the Hamiltonian dynamics of the generalized coordinates and velocities with unknown inverse generalized mass matrix M −1 (q), input matrix B(q), and potential energy U(q), for which we aim to approximate by three neural networks M −1 θ (q), B θ (q) and U θ (q), respectively, with parameters θ.
To optimize for the parameters θ, we use the Hamiltonianbased neural ODE framework that encodes the Hamiltonian dynamics (3) and (5) with M θ (q), B θ (q) and U θ (q) in the network structure (Fig. 1). The forward pass rolls out the dynamicsf θ described by (3) and (5) with the neural networks M θ (q), B θ (q) and U θ (q) using a neural ODE solver ( [15]) with initial state (q (i) n , ζ (i) n ). We obtain a predicted state (q n+1 for each n = 0, . . . , N − 1 and i = 1, . . . , D as: where the distance metric c is defined as the sum of position, orientation, and velocity errors on the tangent bundle T SE(3): c q, ζ,q,ζ = c p (p,p) + c R (R,R) + c ζ (ζ,ζ), (6) with the position error c p (p,p) = p −p 2 2 , the velocity error c ζ (ζ,ζ) = ζ−ζ 2 2 , and the rotation error c R (R,R) = log(RR ) The network parameters θ are optimized using gradient descent by back-propagating the gradient ∇ θ L of the loss through the neural ODE solver efficiently using adjoint method [15]. Specifically, let a = ∇ q,ζ L be the adjoint state and s = ((q, ζ), a, ∇ θ L) be the augmented state. The augmented state dynamics are [15]: We obtain the gradient ∇ θ L by a single call to a reverse-time ODE solver starting from s n+1 = s(t n+1 ): for n = 0, . . . , N − 1, and update the parameters θ using gradient descent. Please refer to [15] for more details.

B. Model estimation error as a disturbance
Via the training process described in Sec. A, we approximate the ground truth massM(q), potential energyŨ(q) and input gain matrixB(q) with the learned mass M θ (q) =M(q) + ∆M θ (q), potential energy U(q) =Ũ(q)+∆U θ (q), and input gain B(q) =B(q) + ∆B θ (q) where ∆M θ (q), ∆U θ (q), and ∆B θ (q) are the estimation errors. We drop the subscript θ to simplify the notations. The generalized coordinates q and the ground-truth momentump :=M(q)ζ, satisfy the Hamiltonian dynamics (3): with the ground-truth Hamiltoniañ Meanwhile, for the generalized coordinates q and the momentum p := M(q)ζ, the Hamiltonian dynamics is learned from data and of the form: with the learned Hamiltonian and its estimation error ∆H(q, p) = 1 2 ζ ∆M(q)ζ +∆U(q). However, the learned dynamics (11) is only an approximation of the actual dynamics for (q, p). While the dynamics of q does not change, the actual dynamics of the learned momentum, p = M(q)ζ =p + ∆p, where ∆p = ∆M(q)ζ, is derived from (9) as follows: where the force (13) represents the effect of the model errors ∆M(q), ∆U(q), and ∆B(q) and is considered as a disturbance applied to the learned system (11). To improve the error d 1 with respect to the position p, we enforce translation-equivariance in the neural ODE model, as described in Sec. III.A, and learn the model well around the origin. This allows us to offset any position p to the well-learned region around the origin. To reduce the model error with respect to orientation, we collect a training dataset that covers different regions of roll, pitch, and yaw angles, e.g. by manually driving a UAV to different desired positions and yaw angles. A promising approach to estimate the disturbance magnitude is to employ a Bayesian formulation of the neural ODE network used to learn the dynamics model. A Bayesian model will provide a posterior distribution, rather than point estimates, for the model parameters (i.e. M −1 (q), B(q), and U(q)), whose variance can be used to obtain parameter error bounds and, in turn, a disturbance bound. Bayesian neural network models that can be used for dynamics learning include Bayesian neural ODE networks [54], [55], neural stochastic differential equation (SDE) networks [56], or Gaussian-process ODEs [57]. This motivates analyzing the robustness of our control design in Sec. IV to the disturbance d 1 caused by the model errors.

IV. Stabilization of Hamiltonian Dynamics with Matched Disturbances
As discussed in Sec. III.B, due to estimation errors in the dynamics learning process, the learned system model satisfies Hamilton's equations of motion in (3) subject to a matched disturbance signal d 1 : R → R 6 : We consider a passivity-based stabilizing controller for (14), and analyze its robustness with respect to the disturbance signal d 1 and its safety with respect to the obstacle set O.

A. Passivity-based control
Consider a desired regulation point x * = (q * , p * ) for the system in (14) with generalized coordinates q * = (p * , R * ) and momentum p * = 0. Since the Hamiltonian H(x) may not have a minimum at x * , the control signal u in (14) should be designed to inject additional energy H a (x, x * ) into system and achieve a desired Hamiltonian H d (x, x * ) = H(x)+H a (x, x * ), which is minimized at x * . This is the approach followed by interconnection and damping assignment passivity-based control (IDA-PBC) [58]. Let x e = (q e , p e ) denote the error in generalized coordinates and momentum: where k p and k R are positive scalars.
The IDA-PBC method [12], [58] designs a controller u = π(x, x * ) such that the closed-loop dynamics of the system in (14) are governed by the desired Hamiltonian in (16) as: where the terms J(x, x * ), K d , and d in the transformed dynamics depend on the control design. To obtain the controller, one uses the relationship between x and x e in (15) to equate the dynamics in (14) and (17), leading to: with J(x, x * ) := R 0 0 0 0r e1r e2r e3 . If the IDA-PBC matching equations [59], are satisfied, where B ⊥ (q) is a maximal-rank left annihilator of B(q), i.e., B ⊥ (q)B(q) = 0, then the controller in (18) achieves the desired closed-loop dynamics in (17) with d = d 1 , i.e., without introducing any extra disturbance.
If the matching equations (20) cannot be satisfied globally, i.e., the IDA-PBC controller does not solve the system is a least-squares solution. In this case, the residual, is introduced as an additional matched disturbance: in the closed-loop dynamics in (17). Since the magnitude of d 2 is proportional to that of b(x, x * ), it depends on the desired regulation point x * . An underactuated quadrotor system example is provided in Sec. VII.D.
In general, the matching equations (20) are nonlinear PDEs and can be solved explicitly only for certain cases [59]. If B(q) is invertible, i.e., the system in (14) is fully-actuated, then the solution in (18) exists and is unique. For systems with underactuation degree 1, the matching equations may be reduced to ODEs with closed-form solution [60] or solved with certain desired kinetic energy [61]. Yuksel et al. [62] solve the matching equations specifically for stabilizing a quadrotor system, using Euler angles instead of a rotation matrix. A survey on this topic is available in [59].

B. Robustness analysis
In this section, we analyze the stability and robustness with respect to the disturbance signal d in (22) of the IDA-PBC controller in (18). Although the techniques we developed for dynamics learning in Sec. III and control synthesis in Sec. IV.A did not make any assumptions about the Hamiltonian system in (14), our robustness and safety analysis that follows is developed under two assumptions.
Without Assumption 1, it is not possible to provide any performance guarantees for the control design because the disturbance d can have an arbitrary effect on the evolution of the closed-loop system dynamics. The disturbance magnitude bound δ d exists if we assume bounded estimation errors, bounded velocity and acceleration, bounded ∇ q (∆H(q, p)), and bounded control input u from the controller (18).
Our robustness analysis in Thm. 1 below constructs an ISS-Lyapunov function [63] to handle the disturbance d. Assumption 2 simplifies the proof that we have a valid ISS-Lyapunov function. Extending the analysis to handle a statedependent mass M(q) is left for future work.
We simplify the error dynamics (17) by noting that: which leads to: Theorem 1: Consider the Hamiltonian system in (14) with desired regulation point x * = (q * , 0) and control law specified in (18) with parameters k p , k R , K d . Assume that the initial state x(t 0 ) lies in the domain A = x | tr(I − R * R) ≤ α < 4, p ≤ β for some positive constants α and β. Then, the function: is an ISS-Lyapunov function [63] with respect to d in (22) and satisfies: VOLUME 00 2021 , and the associated matrices Q 1 , Q 2 , Q 3 are defined as: where the elements of Q 3 are: Denote the sub-level set of V(x, x * ) with respect to positive scalar c as: Given constants c 1 , c 2 defined as: is an estimate of the region of attraction of the control law in (18). Any state x starting within S c2 will converge exponentially to S c1 and remain within it. The position error trajectory p e (t) is uniformly ultimately bounded as: To ensure that c 1 < c 2 , the disturbance bound δ d should satisfy δ d < c2k3 k2kγ . Proof: The estimates of the region of attraction and the uniform ultimate bound on the position error provided by Thm. 1 for the IDA-PBC controller are conservative because our analysis considers the mass and inertia jointly as a generalized mass M and does not differentiate the force and torque disturbances. Besides considering separate disturbance bounds for the force and torque inputs, less conservative bounds can be achieved by introducing disturbance compensation [41].

C. Safety analysis
Sec. IV.B analyzed the stability and robustness properties of the IDA-PBC controller for a given regulation point x * . Next, we use the Lyapunov function V(x, x * ) in (24) to derive conditions under which the trajectory of the closedloop system remains outside the unsafe set O. We introduce a barrier function, which takes the region of attraction S c2 of the controller and the invariant set S c1 associated with the ultimate bound in Thm. 1 as well as the distanced(p * , O) to O into account to quantify the margin to safety violation: where k 1 , k p , c 1 , c 2 are the constants specified in Thm. 1. If, for a given regulation point x * , the safety margin ∆E(x, x * ) is positive initially, then any trajectory of the closed-loop system remains safe as it converges to the invariant set S c1 .
Proposition 1: Consider the system in (14) with regulation point x * = (q * , 0) and control law in (18). Suppose that the desired position p * has sufficient clearance from the unsafe set O and the disturbance d is bounded as follows: If the initial state x(t 0 ) = x 0 satisfies: then the position error trajectory is uniformly ultimately bounded as in (29) and the system remains safe, i.e., d(p(t), O) ≥ 0 for all t ≥ t 0 .

V. Safe and Stable Tracking using a Reference Governor
In this section, we develop a safe tracking controller by introducing a reference governor [33] to guide the reference point x * for the stabilizing control law π(x, x * ) in (18) along the desired reference path r introduced in Problem 1.
A reference governor is a virtual dynamical system whose state g(t) moves along r(σ) for σ ∈ [0, 1]. In this paper, the governor state g(t) ∈ R 3 specifies a desired position p * (t) for the Hamiltonian system. We introduce a lifting function x * (t) = (g(t)) to provide a desired orientation R * (t) and specify a reference state x * (t) for the Hamiltonian system. Given x * (t), we compute the safety margin ∆E(x(t), x * (t)) in (30) and use the leeway amount by which the margin exceeds 0 to move the governor state g(t) along r(σ). Intuitively, the reference point x * (t) = (g(t)) speeds up when ∆E(x(t), x * (t)) increases, e.g., the distance to obstacles increases or the system energy function decreases, and vice versa. Given a point g = r(σ) on the reference path for some σ ∈ [0, 1], we generate a reference state x * = (q * , p * ) where q * = (p * , R * ) = (g, I) and p * = 0. The governor state g represents the desired position p * on the path. For simplicity, we set the desired rotation matrix R * = I. If, in addition to r, a desired yaw angle reference is provided, one can generate R * using the method described in [64] to achieve better orientation tracking. We define a lifting function : R 3 → T * SE(3) that generates a reference state x * = (g) from the governor state g: where e 1 , e 2 , e 3 are the rows of the identity matrix. Given the reference state x * = (g), we compute the safety margin ∆E(x, x * ) in (30) and describe how to update the governor state to ensure that safety is preserved. We update the governor state g(t) = r(σ(t)) along the path by regulating the parameter σ: where k g > 0 is a control gain and σ * (t) ∈ [0, 1] is a desired time-varying parameter, which we construct using the safety margin ∆E(x, x * ). We require σ * (t) to satisfy two conditions: 1) always stay ahead of the current σ(t): σ * (t) ≥ σ(t), ∀t ≥ t 0 , and 2) have distance σ * (t) − σ(t) proportional to ∆E(x(t), x * (t)). The first condition guarantees that the governor state g(t) moves forward along the path towards the goal r(1). The second condition allows the safety margin ∆E to adaptively regulate the governor state g(t) in order to ensure safety for the Hamiltonian system. To construct the desired path parameter σ * , we define a local safe zone as a ball around the governor state g with radius ∆E(x, x * ) based on the state x and the reference state x * = (g).

Definition 1:
A local safe zone is a subset of R 3 that depends on the system state x and the governor state g: where is the lifting function in (35) and ∆E is the safety margin in (30).
We determine σ * as the farthest intersection between the local safe zone LS(x, g) and the path r by solving the scalar optimization problem in Def. 2.

Definition 2:
A local projected goal for system-governor state (x, g) is a pointḡ ∈ LS(x, g) that is farthest along the path r: The construction of the local projected goalḡ is shown in Fig. 2 (right), showing a reference path r, the local safe zone LS(x, g) and the local projected goalḡ. This constructing of σ * andḡ completes the governor update law (36).
Our safe tracking controller consists of the reference governor system in (36), adaptively updating the reference point x * = (g) via the lifting function in (35), and the passivity-based controller in (18) that drives the Hamiltonian system towards x * . The stability, safety, and robustness of the proposed tracking controller are analyzed in Thm. 2.

Theorem 2:
Suppose that the desired path r(σ) has sufficient clearance from the unsafe set O and the disturbance d is bounded as: (14), the governor system in (36) with σ * constructed in Def. 2 and the control law u = π(x, (g)) in (18). Suppose that the initial state (x 0 , g 0 ) satisfies:
The local projected goalḡ and the associated σ * are well defined in Def. 2. By the governor update law (36), the path parameter σ increases, the governor state g(σ) moves along r towards the goal r(1). The desired state x * = (g) is updated via the lifting function (35). As g tracksḡ on the path r via the path parameter update in (36), the system state x tracks x * = (g). During this process, the safety margin ∆E(t) fluctuates and regulates the rate of change of σ.
Since σ * (t) is bounded in (38), σ(t) is updated continuously [65] in (36), leading to a continuous governor state g(t). By construction, the lifting function (g) is continuous in g. Therefore, the reference point x * (t) = (g(t)) is continuous in time, leading to a continuous Lyapunov function V(x, x * ) and a continuous safety margin ∆E(t). As a result, the safety margin ∆E(t) cannot become negative without crossing 0 from above at some time T 0 . As ∆E(t) ↓ 0, the local safe zone shrinks to a point, i.e., LS(x, g) ↓ {g}. This immediately stops the the governor becauseḡ = g(T 0 ) = r(σ(T 0 )) andσ(T 0 ) = 0.
As a result, Proposition 1 states that x(t) stays within the invariant set S c2 (x * (T 0 )) for t ≥ T 0 and converges to S c1 (x * (T 0 )) without leaving F. Eq. (30) shows that ∆E(t) = 0 implies c 1 ≤ V(t) ≤ c 2 . By Thm. 1, as x(t) approaches x * (T 0 ), we haveV(T 0 ) < 0, i.e., the Lyapunov function V is decreasing. There exists h > 0 such that ∆E(T 0 +h) becomes strictly positive. Hence, the governor is able to move again towards a newḡ generated by the positive ∆E(T 0 + h). This process continues until the governor state g(t) converges to r(1), the closed-loop system converges to the region S c1 ( (r(1))) and the position p(t) satisfies the uniform ultimate bound in (29) around r(1).
Note that while our control design does not account for state estimation errors, e.g. from an odometry algorithm with a sensor setup (e.g. stereo camera, LiDAR, or visual-inertial), we can conservatively handle the errors by reducing the obstacle distanced in the safety margin specification in (31).

VI. Application to Hamiltonian Dynamics in R n
In this section, we show that our control design can be easily modified and applied to a Hamiltonian system with configuration q in R n and dynamics: where the Hamiltonian H(q, p) is defined as: Given a desired regulation point x * = (q * , p * ) with momentum p * = 0, define the error state x e = (q e , p e ) as: A desired Hamiltonian, minimized at x = x * , is: The IDA-PBC controller: achieves the closed-loop dynamics: where d = d 1 + d 2 as in (22) and d 2 is as in (21).
Theorem 3: Consider the Hamiltonian system in (40) with desired regulation point x * = (q * , 0) and control law in (44) with parameters k p , K d . Under Assumptions 1 & 2, the function: with U d (q e ) = kp 2 q e 2 is an ISS-Lyapunov function [63] with respect to d and satisfies: , and the associated matrices Q 1 , Q 2 , Q 3 are defined as: Any initial state x converges exponentially to S c1 = {x|V(x, x * ) ≤ c 1 } with c 1 := k2kγ k3 δ 2 d and remains within. The error trajectory q e (t) is uniformly ultimately bounded: The proof of Thm. 3 follows the same steps as that of Thm. 1, and is omitted due to space limitations. In contrast to Thm. 1, the result in Thm. 3 for R n holds globally, i.e., the region of attraction is A = R n × R n . Thus, the disturbance magnitude bound δ d can be arbitrarily large.
The safety analysis in Sec. IV.C can be modified with a new safety margin: as Thm. 3 holds globally. The reference governor lifting function can be chosen as (g) = [g 0 ] . The governor state update remains the same as in (36). The robustness analysis extends the safe tracking results in [66] and [36].

VII. Evaluation
We evaluate our robust and safe tracking controller using simulated hexarotor and quadrotor robots in 2D and 3D environments with ground-truth mass m = 6.77 kg, and inertia matrix J = diag([1.05, 1.05, 2.05]) kg · m 2 , inspired by the solar-powered UAV in [67]. The robot's Generalized mass values    ground-truth dynamics satisfy Hamilton's equations (3) with generalized mass M(q) = diag(mI, J), potential energy U(q) = mg 0 0 1 p, where p is the position and g ≈ 9.8ms −2 is the gravitational acceleration. The input matrices for the hexarotor and the quadrotor are B(q) = I and B(q) = 0 4×2 I 4×4 , respectively. The control input u of the hexarotor includes a 3D force and a 3D torque while that of the quadrotor includes a scalar force and a 3D torque.
While, our evaluation focuses on rotorcraft aerial robots, the methodology for system identification and control synthesis proposed in this paper is general. The exact same approach is applied to hexarotor, quadrotor and other ground and marine vehicles. This is in contrast with other system identification and control synthesis methods, which require knowledge of the dynamics structure, careful experiment design, and domain expertise for the particular system.

A. Evaluation of SE(3) Hamiltonian dynamics learning
We consider a simulated hexarotor unmanned aerial vehicle (UAV) (Fig. 3) with fixed-tilt rotors pointing in different directions [68] and a simulated quadrotor UAV. Since the mass m of the UAVs can be easily measured, we assume the mass m is known, leading to a known potential energy U(q) = mg 0 0 1 p. We approximate the inverse generalized mass matrix by M −1 θ (q) = diag(m −1 I, J −1 θ (q)) and learn J θ (q) −1 and B θ (q) from data.
We mimic manual flights in an area free of obstacles using a PID controller and drive the UAVs from a random initial pose to 18 desired poses, generating 18 1-second trajectories. We shift the trajectories to start from the origin and create a dataset D = {t  Fig. 3(c) shows the loss function during training. Note that if we scale M θ (q) and the input matrix B(q) by a constant γ, the dynamics of (q, ζ) in (3) and (5) does not change. Fig.  3(d) and 3(e) plot the scaled version of the learned inverse mass J θ (q) −1 and the input matrix B θ (q), converging to the constant ground truth values. We achieve similar results for the quadrotor using the same training process.

B. Evaluation of robust safe tracking control of a learned 2D hexarotor Hamiltonian model
Next, we compare our approach with a GP-MPC technique [7] using a simulated 2D fully-actuated hexarotor UAV, moving on the xz-plane with position p = x, 0, z and orientation R = R ψ determined by the pitch angle ψ. The control input is a 3D wrench, including a 2D force and a 1D torque.
As we only consider the pitch angle ψ, we are interested in the inertia value J yy and ignore J xx and J zz . We assume that the generalized mass m and J yy are unknown for the 2D hexarotor and approximated by m θ and J yy θ , respectively. The input gain B(q) is assumed known.
Let m 0 = 1.5m and J yy 0 = 1.5J yy be initial guesses of the mass m and the inertia J yy . We model the approximated mass inverse m −1 θ and inertia inverse J yy θ −1 as: where L 1 (q; θ) and L 2 (q; θ) are two neural networks, representing the residual mass inverse and inertia inverse to be learned. In GP-MPC [7], the dynamics (3) are split into a prior nominal model with the prior mass m 0 and inertia J yy 0 , and residual dynamics, modeled by a GP regression model.
To collect training data, we place the simulated hexarotor at an initial location (x, z) = (−1, 0) and apply random control inputs to obtain D = {t Our Hamiltonian neural ODE network is trained with the dataset D, as described in Sec. III. For GP-MPC, the same dataset D is used to train a GP regression model of the residual dynamics as described in [7] and implemented in [38].
We assume there are two walls in the environments, generating two safety constraints on the robot position: −x + z < 1.1, 0.8x + z < 0.4. The task is to track a predefined piecewise linear path r, shown in Fig. 5, while safely avoiding collision with the walls. We adapt the GP-MPC implementation by [38] for the 2D hexarotor and enforce the safety constraints probabilistically with 95% confidence interval using the GP model uncertainty. To propagate the model uncertainty through a horizon of 10 time steps, we linearize the dynamics model around the hovering state and propagate the state mean and covariance using the mean equivalence technique [7], [38] with a time step of 1/120 s. Meanwhile, our learned Hamiltonian neural ODE model is used with the safe tracking controller described in Sec. V to perform the task and enforce safety constraints. Fig. 4 compares the prediction errors of our learned neural ODE network and the GP model. We collect the robot states and control inputs, generated by our controller while tracking the path, and predict the next state. Fig. 4 (left) plots the prediction error over time, showing that we achieve better prediction than the trained GP model. This reflects the difference between our model, which encodes the Hamiltonian structure and translation equivariance in the network architecture, and the GP model, which incurs higher model uncertainty in locations far from the data points. Fig. 4 and 5 show tracking performance of our approach and GP-MPC. We compare the tracking error of both methods, calculated as the distance from the robot position to the reference point, specified by the governor in our approach and by time parameterization of the path in GP-MPC: p * (t) = r(min(t, 10)/10), i.e., the GP-MPC method finishes the task in about 10 seconds, similar to the tracking time of our approach. Our controller is able to track the path more accurately than GP-MPC, illustrated qualitatively in Fig. 5 and quantitatively in Fig. 4 (middle). This can be explained by the higher predictions errors shown in Fig. 4 (left), which grow quickly after multiple time steps due to uncertainty propagation. Both our safe tracking controller with learned Hamiltonian dynamics and the GP-MPC safe controller keep the hexarotor in the safe region, i.e., the distance to the obstacles is always positive in Fig. 4 (right).

C. Evaluation of robust safe tracking control of a learned 3D fully-actuated hexarotor Hamiltonian model
This section evaluates our Hamiltonian dynamics learning and safe tracking control techniques using a simulated hexarotor UAV in a 3D environment. The task is to navigate from a start position to a goal in a cluttered warehouse environment without colliding with the obstacles O. The same control gains are used for this 3D navigation task as in the previous section. A simulated LiDAR scanner provides point cloud measurements P(t) of the surface of the unsafe set O, depending on the system pose at time t, with a maximum sensing range of d max = 30 m. The distance from the governor g(t) to the unsafe set O is approximated viā d(g(t); O) ≈ min y∈P(t) g(t) − y . The reference path r is pre-computed using an A* planner and tracked in ∼ 80 s. Fig. 6 shows the behavior of the closed-loop hexarotor system in the warehouse environment. The safety margin ∆E(x, x * ) fluctuates during the tracking process but, as can be seen in Fig. 6, it never becomes negative. The augmented system (x, g) is controlled adaptively, slowing down when the dynamic safety margin decreases (e.g., when the hexarotor is close to an obstacle or has large Lyapunov value V) and speeding up otherwise (e.g., when the robot is far away from the obstacles or has small total energy V). The simulations show that our control policy successfully drives the system from the start to the end of the reference path while avoiding sensed obstacles online, i.e., d(p, O) remains positive throughout the tracking task. Fig. 7 plots the tracking errors between the robot state x and the reference state x * generated by the governor, showing that our controller tracks the path well. The tracking errors for the Euler angles and angular velocity, are close to 0. The position and linear velocity errors in the x and z directions are close to zero as well while the errors in y direction fluctuates around −0.5 m and 0.8 m/s, respectively, and converges to 0 at the end. This is expected as the robot stays behind the reference point, mostly in y direction, and converges to the end of the path.
To evaluate the robustness of our controller, we repeat the warehouse experiment using the ground-truth dynamics, subject to a artificially generated disturbances d ∈ R 6 with different upper bounds δ d . Each component of the disturbance d ∈ R 6 is uniformly generated in [−0.5δ d , 0.5δ d ]. If d > δ d , we normalize the disturbance as δ d d/ d . Our robust tracking controller successfully finishes the tracking task across a wide range of δ d : requirement on ∆E. Fig. 8 shows the average position errors and the minimum distance to obstacle during the tracking task versus the disturbance upper bound δ d . The average position tracking errors remain similar against δ d .
The minimum distance to obstacle d(p, O) is always positive, illustrating the safety guarantees of our controller. This number starts decreasing when δ d > 1 as larger disturbances can suddenly move the robot towards the obstacles.

D. Evaluation of robust safe tracking control of a learned 3D underactuated quadrotor Hamiltonian model
In this section, we repeat the task of safely navigating from a start position to a goal in the same cluttered warehouse environment in Sec. VII.C with a quadrotor, whose model is learned from data as described in Sec. VII.A. As mentioned in Sec. IV, the control input in (18) (19) equal to 0, i.e. the force component b v coincides with the z-axis of the body frame. As guaranteeing this condition is hard, we instead use the force component in the world frame Rb v and a desired yaw angle ψ * to determine the desired rotation matrix, similar to [64]. The vector Rb v is set as the z-axis of the desired frame, i.e., the third column b * 3 of the rotation matrix R * , to minimize the disturbance d 2 in (21) from the matching condition. We calculate the second column b * 2 by projecting the second column of the yaw's rotation matrix b ψ 2 = [− cos ψ, sin ψ, 0] onto the plane perpendicular to b * 3 . We use the controller (18) andω * = R * Ṙ * for our tracking task. We successfully finish the task with the quadrotor while remaining safe for the entire experiment, as shown in Fig. 9, with similar behavior of the closed-loop quadrotor system in terms of the safety margin, Lyapunov function and distance to obstacle compared to Sec. VII.C. However, the orientation tracking error of quadrotor ( Fig. 10) is larger than that of hexarotor, as expected since the quadrotor is underactuated.

E. Evaluation of our approach against unmodeled noise
In this section, we verify the robustness of our controller against unmodeled noise on a simulated hexarotor by injecting high frequency noise (e.g., propeller vibration) into control inputs and simulating state estimation errors. In particular, a 4.8 kHz 6D sinusoidal signal with amplitude 5 is generated for high frequency noise. Meanwhile, state estimation errors in positions, Euler angles, linear and angular velocity are randomly generated with zero mean and standard deviation, chosen from [69] (position: 0.01 m, Euler angle: 0.01 degree, linear velocity: 0.02 m/s and angular velocity: 0.14 degree/s). We consider the task of stabilizing to a static governor, i.e. the governor is not moving, with the learned dynamics model: without any unmodeled noise (base), with high-frequency noise, and with state estimation error. Fig.  11 plots the Lyapunov function V and the safety margin ∆E over time. Our controller is not affected significantly from the high-frequency noise, potentially because the noise's effect is canceled out due to its zero mean. Our controller is safe against the state estimation errors from [69], i.e. ∆E > 0 over time, but fails to remains safe, i.e. ∆E < 0 at some times, if we triple the noise deviation.

VIII. Conclusion
This paper developed a tracking controller for Hamiltonian systems with learned dynamics. We employed a neural ODE network to learn translation-invariant Hamiltonian dynamics on the SE(3) manifold from trajectory data. The Hamiltonian of the learned system was used to synthesize an energyshaping controller and quantify its robustness to modeling errors. A reference governor was employed to guide the system along a desired reference path using the trade-off between system energy, disturbance bounds, and distance to obstacles to guarantee safe tracking. Our results demonstrate that encoding SE(3) kinematics and Hamiltonian dynamics in the model learning process achieves more accurate prediction than Gaussian Process regression. Utilizing the system energy in the control design offers a general approach for guaranteeing robustness and safety for physical systems and generalizes well to desired trajectories which are significantly different from the training data. Future work will focus on disturbance compensation and real experiments.
Consider the Lyapunov function candidate in (24): where U d = kp 2 p e 2 + k R 2 tr(I − R e ). In the domain A, we have [70,Prop. 1]: By the chain rule and (23), we have: Using (54) and (55), together with the Cauchy-Schwartz inequality and the sub-multiplicative property of the Euclidean norm, the Lyapunov function candidate is bounded as: The bounds can be stated compactly in quadratic form using z = [ e p e ] and Q 1 , Q 2 in (26): The time derivative of the Lyapunov candidate satisfies: The termṗ e is from (23). The termė is obtained from (52): can be found using Young's inequality [43]: with = λ min (K d ), η = 1, a = M −1 p e , b = d: Similarly, we have: where the elements of Q 3 are provided in (27) and k γ = 1 2λmin(K d ) + ρλ 2 2 2λ1 . Since the parameters ρ, k p , k R , K d can be chosen arbitrarily, there exists some choice that ensures the matrices Q 1 , Q 2 , Q 3 are positive definite as shown below. The inequalities in (25) are obtained from (56) and (63) using the Rayleigh-Ritz inequality.

Region of Attraction
We use the invariant sets S c = {x | V(x, x * ) ≤ c} induced by the Lyapunov function to restrict the error dynamics inside the domain A and estimate the region of attraction.