Safe Autonomous Navigation for Systems with Learned SE(3) Hamiltonian Dynamics

Safe autonomous navigation in unknown environments is an important problem for mobile robots. This paper proposes techniques to learn the dynamics model of a mobile robot from trajectory data and synthesize a tracking controller with safety and stability guarantees. The state of a rigid-body robot usually contains its position, orientation, and generalized velocity and satisfies Hamilton's equations of motion. Instead of a hand-derived dynamics model, we use a dataset of state-control trajectories to train a translation-equivariant nonlinear Hamiltonian model represented as a neural ordinary differential equation (ODE) network. The learned Hamiltonian model is used to synthesize an energy-shaping passivity-based controller and derive conditions which guarantee safe regulation to a desired reference pose. We enable adaptive tracking of a desired path, subject to safety constraints obtained from obstacle distance measurements. The trade-off between the robot's energy and the distance to safety constraint violation is used to adaptively govern a reference pose along the desired path. Our safe adaptive controller is demonstrated on a simulated hexarotor robot navigating in an unknown environments.


Introduction
Designing controllers that handle safety constraints and guarantee stability is an important problem in safety-critical applications of robotics, such as autonomous driving (Ames et al., 2014b;Shalev-Shwartz et al., 2016), locomotion (Ames et al., 2014a) or medical robotics (Yip and Camarillo, 2014). Safety depends on the system states, governed by the system dynamics, and the environment constraints. This leads to two requirements for designing provably safe controllers: 1) the availability of an accurate dynamics model and 2) the satisfaction of time-varying safety constraints that are only known at runtime.
The first requirement has motivated machine learning techniques for system dynamics learning, e.g. based on Gaussian processes (Deisenroth et al., 2015;Kabzan et al., 2019) or neural networks (Raissi et al., 2018;Chua et al., 2018). For physical systems, recent works (Lutter et al., 2019;Zhong et al., 2019;Duong and Atanasov, 2021b) design the model architecture to encode a Lagrangian or Hamiltonian formulation of robot dynamics (Lurie, 2013;Holm, 2008), which a black-box model might struggle to infer. Zhong et al. (2019) use a differentiable neural ODE solver (Chen et al., 2018) to generate predicted state trajectories in a Hamiltonian formulation. A loss function is back-propagated through the ODE solver to update the model parameters. Duong and Atanasov (2021b) extend this approach by imposing both Hamiltonian dynamics and SE(3) pose constraints on the ODE structure. A Hamiltonian-based model architecture also simplifies the design of stable regulation or tracking control by energy shaping (Zhong et al., 2019;Duong and Atanasov, 2021a,b). The key idea of energy-based shaping, known as interconnection and damping assignment passivitybased control (IDA-PBC) (Van Der Schaft and Jeltsema, 2014), is to inject additional energy via the control input into the system to achieve a desired total energy, minimized at a desired set point.
The second requirement related to safety guarantees has gained significant attention in planning and control. Model predictive control (MPC) methods (Borrelli et al., 2017;Grüne and Pannek, 2017;Bravo et al., 2006;Mayne et al., 2000) include safety constraints in an optimization problem, which is typically solved by discretizing time with linearized dynamics. Reachability-based techniques (Herbert et al., 2017;Majumdar and Tedrake, 2017;Kousik et al., 2020) work directly for nonlinear systems and offer strong safety guarantees but have high computation cost and scalability issues for high-dimensional systems. Control barrier functions (CBFs) with quadratic programming (QP) (Ames et al., 2014b(Ames et al., , 2017(Ames et al., , 2019 offer an elegant and efficient framework for real-time safe control synthesis. However, construct a valid CBF (Ames et al., 2019) that guarantees the feasibility of the QP problem (Xu, 2018) at all times is challenging. Given a stabilizing regulation controller, reference governor techniques (Bemporad, 1998;Kolmanovsky et al., 2014;Garone and Nicotra, 2016) maintain a virtual governor system to adaptively generate a regulation so that the system follows reference commands safely. Recent work (Arslan and Koditschek, 2017;Li et al., 2020) achieves safe navigation in unknown environments but is limited to feedback-linearizable systems.
In this paper, we consider both requirements for rigid-body robot systems, whose states are described by their SE(3) pose and generalized velocity. We assume that the robot dynamics are unknown but, as a physical system, satisfy Hamilton's equations of motion over the SE(3) manifold. We consider a training set of state-control trajectories, from past experiments or collected by a human operator, and seek to safely track a desired position path with safety constraints obtained online from distance-to-obstacles measurements. We learn a SE(3) Hamiltonian model of the system dynamics using a neural ODE network (Duong and Atanasov, 2021b). As the robot dynamics are equivariant to translation, we offset the trajectories to start from the origin and train a translationequivariant Hamiltonian neural ODE model. The Hamiltonian structure of the learned model offers an energy-based regulation controller with the total energy of the system viewed as a Lyapunov function. This, in turn, enables us to enforce safety constraints using reference governor techniques without the need to linearize the system dynamics. Inspired by constraint embedding techniques (Garone and Nicotra, 2016), we impose safety constraints, based on the sensor measurements, on the Lyapunov function. We use the trade-off between the distance from constraint violation and the system energy level to regulate a reference governor and achieve safe and stable position tracking in an unknown environment.
Contributions. In summary, the contributions of this paper are 1) a translation-equivariant SE(3) Hamiltonian dynamics learning approach and 2) a tracking control design for SE(3) Hamiltonian systems with stability and safety guarantees. Our dynamics learning and tracking control techniques are demonstrated on a simulated hexarotor robot using a depth sensor to navigate in unknown environments.

Problem Statement
Consider a robot modeled as a rigid body with position p ∈ R 3 , orientation R ∈ SO(3), bodyframe linear velocity v ∈ R 3 , and body-frame angular velocity ω ∈ R 3 . Let q = [p r 1 r 2 r 3 ] ∈ R 12 denote the robot's generalized coordinates, where r 1 , r 2 , r 3 ∈ R 3 are the rows of the rotation matrix R. Let ζ = [v ω ] ∈ R 6 denote the robot's generalized velocity. The generalized momentum p of the system is defined as: where M(q) 0 denotes a positive-definite 6 × 6 generalized mass matrix. Let x = (q, p) ∈ R 18 be the robot state. The Hamiltonian, H(q, p), captures the total energy of the system as the sum of the kinetic energy T (q, p) = 1 2 p M(q) −1 p and the potential energy U(q): As a mechanical system, the time evolution of the state x is governed by Hamilton's equations of motion (Lee et al., 2017;Duong and Atanasov, 2021b): , and B(q) ∈ R 6×p is an input gain matrix. The operators q × , p × , and the hat mapŵ for w ∈ R 3 are defined as: We consider the case that the parameters of the Hamiltonian dynamics model in (3), including the mass M(q), potential energy U(q), and input matrix B(q), are unknown. Instead, we are given a trajectory dataset D = {t consisting of D sequences of generalized coordinates and velocities (q . We aim to learn the dynamics from the data set D and design a control policy u = π(x) such that the robot follows a desired reference path without violating safety constraints in an unknown environment. Let O ⊂ R 3 and F := R 3 \ O denote the unsafe (e.g., obstacle) set and the safe (obstacle-free) set, respectively. Denote the interior of F as int(F). We assume that O is not known a priori but the robot can sense the distanced(p, O) from its position p to O locally with a limited sensing range β > 0: where d(p, O) := inf a∈O p − a denotes the Euclidean distance from p to the set O. The safe autonomous navigation problem considered in this paper is summarized below.
be a training dataset of state-control trajectories obtained from a robot with unknown Hamiltonian dynamics in (3). Let r : [0, 1] → Int (F) be a continuous function specifying a desired position reference path for the robot. Assume that the reference path starts at the initial robot position at time t 0 , i.e., r(0) = p(t 0 ) ∈ Int (F). Using local distance observationsd(p(t), O) of the unsafe set O in an unknown environment, design a control policy π(x) so that the position p(t) of the closed-loop system converges asymptotically to r(1), while remaining safe, i.e., p(t) ∈ F for all t ≥ t 0 .

Training a translation-equivariant SE(3) Hamiltonian dynamics model
can be obatined by measuring the generalized coordinates and velocities of the system at times t N using an odometry algorithm (Delmerico and Scaramuzza, 2018;Mohamed et al., 2019) or a motion capture system. The control input u (i) can be generated by manually driving the robot or using an existing controller.
Since the system dynamics do not change if we shift the position p to any points in the world frame, we offset the trajectories in the dataset D so that they start from the position 0 and learn the system dynamics around the origin. This is sufficient for control purposes, e.g. using the control design in Sec. 4.1, the input driving the system from state x with position p to desired state x * with position p * is the same as the one driving the system from state x with position 0 to the desired state x * with the offset position p * − p.
Since the momentum p is not directly available from the dataset D, we use the time derivative of the generalized velocity, derived from (1): Eq. (3) and (5) describe the Hamiltonian dynamics with unknown inverse generalized mass matrix M(q) −1 , input matrix B(q), and potential energy U(q), which we aim to approximate by three neural networks M θ (q) −1 , B θ (q) and U θ (q), respectively, with parameters θ.
To optimize the parameters θ, we use a neural ODE framework that encodes the Hamiltonian dynamics (3) and (5) with M θ (q), B θ (q) and U θ (q) in the network structure ( Fig. 1(a)). The forward pass rolls out the Hamiltonian dynamics (3) and (5) with the neural networks M θ (q), B θ (q) and U θ (q) using a neural ODE solver (Chen et al., 2018) to obtain a predicted sequence (q where the distance metric c is defined as the sum of position, orientation, and velocity errors: where (3) is the inverse of the exponential map, returning a skew-symmetric matrix in so(3) from a rotation matrix in SE(3), and the ∨-map (·) ∨ : so(3) → R 3 is the inverse of the hat map(·) in Sec. 2. The parameters θ are optimized via gradient descent by back-propagating the loss through the neural ODE solver. This is done using the adjoint method, where the gradient ∂L/∂θ is calculated by a single call to a reverse-time ODE starting from t = t N at (q

Evaluation of the Hamiltonian dynamics model of a simulated hexarotor
We consider a simulated hexarotor unmanned aerial vehicle (UAV) (Fig. 1(b)) with fixed-tilt rotors pointing in different directions (Rajappa et al. (2015)) modeled as a fully-actuated rigid body with mass m = 0.027 and inertia matrix J = 10 −5 diag([2.4, 2.4, 3.2]). The robot's ground-truth dynamics satisfy Hamilton's equations in (3) with generalized mass M(q) = diag(mI, J), potential energy U(q) = mgz, and the input matrix B(q) = I. The control input u is a 6-dimensional wrench, including a 3-dimensional force and a 3-dimensional torque. Since the mass m of the hexarotor can be easily measured, we assume the mass m is known, leading to a known potential energy U(q) = mg 0 0 1 p, where p is the UAV position and g ≈ 9.8ms −2 is the gravitational acceleration. We approximate the inverse generalized mass matrix by M θ (q) −1 = diag(m −1 I, J −1 θ (q)) and learn the inverse inertia matrix J θ (q) −1 and the input matrix B θ (q) from data. As the dynamics model is translation-invariant, we mimic manual flights in an area free of obstacles using a PID controller and drive the hexarotor from a random initial pose near the origin to a desired pose, generating 18 one-second trajectories. We shift the trajectories to start from the origin and create a dataset D = {t with N = 5 and D = 432. The Hamiltonian-based neural ODE network 1 is trained with the dataset D, as described in Sec. 3, for 5000 iterations and learning rate 10 −3 . Fig. 1(c) shows the loss function during training. Note that if we scale M θ (q) and the input matrix B(q) by a constant γ, the dynamics of (q, ζ) in (3) and (5) do not change. Fig. 1(d) and 1(e) plot the scaled version of the learned inverse mass J θ (q) −1 and the input matrix B θ (q), converging to the constant ground truth values.

Safe Tracking using a Reference Governor
In this section, we first describe a passivity-based regulation controller for arbitrary pose stabilization in Sec. 4.1. We derive sufficient conditions for safety based on an invariant level set of the closed-loop system's Hamiltonian. In Sec. 4.2, we propose a reference governor control policy to adaptively generate a regulation pose along the desired path and achieve safe navigation.

Passivity-based control for learned Hamiltonian dynamics
Given the learned model of the system dynamics: we want to find a control policy that stabilizes the system to a desired equilibrium x * := (q * , 0) with desired generalized coordinates q * = (p * , R * ) and zero momentum p * = 0. We design a control policy u = π(x, x * ) to shape the total energy (Hamiltonian) of the closed-loop system so that it achieves a minimum at the desired state x * = (q * , 0). By injecting energy into the system through the controller u = π(x, x * ), we aim to achieve the following desired Hamiltonian: where k p and k R are positive gains. We use interconnection and damping assignment passivitybased control (IDA-PBC) (Van Der Schaft and Jeltsema, 2014): a damping gain with positive terms k v , k ω , and e(q, q * ) is the error between q and q * : Lemma 1 If the input gain matrix B θ (q) of the system in (7) is invertible, the control policy u = π(x, x * ) in (9) always exists and asymptotically stabilizes the system to an arbitrary reference x * = (q * , 0) with Lyapunov function given by the desired Hamiltonian H d (x, x * ) in (8).
Proof See Appendix A.
Next, given the closed-loop system: we derive conditions on the initial state x 0 under which the position p converges to p * safely, remaining in the safe set F. We first define a dynamic safety margin (DSM) ∆E(x, x * ) for the Hamiltonian dynamics (11): whered 2 (p * , O) is the truncated distance to the unsafe set O in (4). Given a fixed desired point x * , the DSM function measures the the trade-off between safety, measured byd 2 (p * , O), and system energy, measured by H d (x, x * ) and allows us to find a positively forward invariant set S(x, x * ) such that for any x 0 ∈ S(x, x * ), the position p converges to p * while remaining in the safe set F.
Proof See Appendix B.
Using the results for a fixed x * in this section, we develop a reference governor system to adaptive change x * over time in Sec. 4.2 so that the robot can safely track a desired path.

Reference governor design
We introduce a virtual system, called a reference governor (Bemporad, 1998) to adaptively track the path r defined in Sec. 2 and provide a time-varying reference x * (t) for the actual system in (11). The motion of the governor system needs to be regulated to balance the energy of the Hamiltonian system with the distance to the unsafe set O, keeping the safety margin in (12) positive. We define the governor as a first-order linear time-invariant system with state g(t) ∈ R 3 and dynamics: The governor input u g will be chosen to move the governor system along the reference path without violating the safety condition ∆E(x, x * ) ≥ 0 obtained in Proposition 2. Define a local safe set LS(x, g) as a region around the governor state g that does not violate safety: where > 0 is arbitrarily, ensuring that LS(x, g) ⊆ int(F). The size of the local safe set determines how fast the governor can move along the reference path without endangering safety.
Definition 3 A local projected goal at system-governor state (x, g) is a pointḡ ∈ LS(x, g) that is furthest along the reference path r: Choosing the governor input as u g =ḡ forces the governor to track the reference path adaptively, taking the safety condition ∆E(x, x * ) ≥ 0 into account. Given the local projected goal g(t) ∈ R 3 and the governor state g(t) ∈ R 3 , we also generate a desired reference state x * (t) for the system in (11) by lifting g(t) to R 18 . We may choose g(t) as the desired position with zero desired velocity but, to provide guidance on the SE(3) manifold, we need to also generate a desired orientation R * (t). We construct a lifting function : F × int(F) → R 18 to obtain x * = (g,ḡ) as: where p * = g and r * 1 , r * 2 , r * 3 are the rows of the matrix:  Figure 2: (a) Structure of the reference-governor tracking controller: a reference governor with state g adaptively tracks a pointḡ along the desired path r and generates a time-varying equilibrium x * = (g,ḡ) for the closed-loop Hamiltonian system; (b) A local projected goalḡ (purple dot) is generated as the furthest intersection between the local safe set LS(x, g) (yellow sphere) and the path r (blue curve). Given g andḡ, a desired equilibrium x * = (g,ḡ) is generated for the Hamiltonian system with orientation indicated by the red, green, and blue arrows, respectively.
with e 3 = [0, 0, 1] , c 1 = (ḡ−g)/ ḡ−g , c 2 = (e 3 ×c 1 )/ e 3 ×c 1 and c 3 = (c 1 ×c 2 )/ c 1 ×c 2 . If g =ḡ, the most recent backup of R * or I may be used. Our safe tracking control design is visualized in Fig. 2. It consists of two parts: 1) a first-order reference governor system with state g adaptively following the local projected goalḡ along the path r and 2) a closed-loop Hamiltonian system tracking the reference signal x * = (g,ḡ). Our main result is summarized in the following theorem.
Proof See Appendix C.

Evaluation
This section evaluates our safe tracking controller on a simulated hexarotor UAV using the learned Hamiltonian dynamics in Sec. 3.2. The task is to navigate from a start position to a goal in an environment without colliding with the obstacles. To guarantee stability and safety, all control gains must be positive. The following control gains were used with the regulation controller in Sec. 4.1: k p = 0.25, k R = 125J, k v = 0.125, k ω = 10J in (8)  unsafe set O, depending on the system pose, with a maximum sensing range of β = 30. The distance from the governor g(t) to the unsafe set O is approximated viad(g(t); O) ≈ min y∈P(t) g(t) − y . The point clouds P(t) are used to construct an occupancy grid map online and a reference path r is replanned periodically using the A* algorithm to ensure that r(σ) ∈ int(F). In this paper, we assume the learned system dynamics are accurate and simulated a noise-free environment. We leave model uncertainty, measurement noise, and external disturbances for future work. Fig. 3 and Fig. 4 show the behavior of the closed-loop hexarotor system in two different environments. The reference governor follows the projected goalḡ and generates a time-varying equilibrium x * = (g,ḡ) for the hexarotor. The dynamic safety margin ∆E(x, x * ) fluctuates during this process but it never becomes negative as seen in the figures. The augmented system (x, g) is controlled adaptively, slowing down when the dynamic safety margin decreases (e.g., when the robot is close to an obstacle and has large total energy H d ) and speeding up otherwise (e.g., when the robot is far away from the obstacles or has small total energy H d ). The simulations show that our control policy successfully drives the system from the start to the end of the reference path while avoiding sensed obstacle online, i.e., d(p, O) remains positive throughout the motion.

Conclusion
This paper developed a tracking controller for fully-actuated Hamiltonian systems which enables safe autonomous navigation in unknown environments. Given only a training set of system statecontrol trajectories, our approach estimates the system dynamics accurately using a neural ODE network and synthesizes a controller that avoids obstacles based on run-time distance measurements. Our method was demonstrated on a simulated hexarotor aerial robot navigating in complex 3D environments. Future work will focus on capturing model uncertainty and external disturbances in the design and deploying it on a hardware platform.
Using group property, rotation error matrix R e = R * R ∈ SO(3), therefore, R e is a orthogonal matrix. All columns of R e are orthonormal and all elements in R e are less than 1, hence, tr I − R * R ≥ 0. Since M is a positive definite matrix, it is easy to see that H d is positive definite, and 0 minimum value is achieved only at x * e = (q e , 0) with q e = [0 , e 1 , e 2 , e 3 ] . The time derivative can be computed as: Hence,Ḣ d (x, x * ) ≤ 0 for all x e . It is not hard to show that the only point can stay within set Ḣ d = 0 is at origin. By the LaSalle's invariance principle (Khalil, 2002), the system (18) asymptotically stabilizes to desired equilibrium x * e , i.e. x * = (q * , 0).
we know that x(t) will stay within S(x, x * (T 0 )) for t ≥ T 0 and position constraints will not be violated. As x → x * (T 0 ) becauseḢ d (x, x * (T 0 )) < 0 when g is static and x = x * (T 0 ), there exists h > 0 such that ∆E(T 0 + h) becomes strictly positive. Hence, the governor is able to move again towards newḡ getting further along the path as discussed previously. This process continues until the augmented system stabilized at ( (r(1), r(1)) , r(1)) whereḡ stops changing.