Safe Control of a Reaction Wheel Pendulum Using Control Barrier Function

This paper presents a safe control applied to a reaction wheel pendulum, assuring that the system satisfies stability objectives and safety constraints. Safety constraints are specified in terms of a set invariance and verified through control barrier functions (CBFs). The existence of a CBF satisfying specific conditions implies set invariance. The control framework considered unifies stability objectives, expressed as a nominal control law, and safety constraints, expressed as a CBF, through quadratic programming (QP). The work focuses on safety; thus, the nominal control law applied was a simple linear quadratic regulator (LQR). The safety constraint is considered to guarantee that the pendulum angular position never exceeds a predetermined value. The control framework was applied and analyzed considering continuous-time and discrete-time situations. The results from numerical simulations and experimental tests indicate that the pendulum is well stabilized while satisfying a safety constraint when forced to leave the safe set.


I. INTRODUCTION
The reaction wheel pendulum is an inverted pendulum balanced by an actuated rotating reaction wheel (flywheel). This system can reflect different typical problems in control, such as nonlinearities, robustness, stabilization, and under actuation, that make it an attractive and useful system for research and advanced education. Several engineering problems can be approximately modeled as an inverted pendulum, such as rocket launch, two-wheeled human transporter (Segway) and bipedal robot [1].
Reaction wheels are actuators commonly used in aerospace applications, such as in spacecrafts [2] and satellites [3], to control the attitude without the use of thrusters. In robotics, reaction wheel has been also applied in biped walking robots [4] and to some variations in the reaction wheel pendulum [5]- [8].
The reaction wheel pendulum has an unstable equilibrium point on its upright position. Several control strategies presented in the literature have been applied to stabilize this system, such as PD control [1], pole placement method [9], The associate editor coordinating the review of this manuscript and approving it for publication was Ton Duc Do . feedback linearization [10] and sliding mode control [11], [12]. These works are proposed to satisfy a stability objective, i.e., to stabilize the system at the equilibrium point, but safety constraints are not considered. Motivated by several recent works related to the safety of dynamical systems with control barrier function (CBF) [13], [14], in this work we apply a control framework on the reaction wheel pendulum that simultaneously satisfies stability objectives and safety constraints.
The safety of dynamical systems can be specified in terms of a set invariance. The first study to provide necessary and sufficient conditions for set invariance was conducted by Nagumo [15] in the 1940s. In the 2000s, barrier certificates were introduced to prove the safety of nonlinear and hybrid systems [16]- [18]. The term ''barrier'' is related to barrier functions, which, in optimization problems, are added to cost functions to avoid undesirable regions [19].
Nagumo's Theorem gives the necessary and sufficient conditions for set invariance considering set boundaries. To ensure safety over the entire set, a ''Lyapunov-like'' approach was proposed in Tee et al. [20], whereby a positive definite barrier Lyapunov function yields invariant level sets. If these level sets are contained in the safe set, safety VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ can be guaranteed. This methodology has the limitation of imposing strong and conservative conditions, because it enforces the invariance of every level set [13]. In Wieland and Allgower [21], the notion of a barrier certificate was extended to a ''control'' version, yielding the first definition of CBF. The concept of control Lyapunov barrier function presented in Romdlony and Jayawardhana [22] combines CBF and control Lyapunov function (CLF) and simultaneously guarantees safety and stability. CLFs utilize Lyapunov functions together with inequality constraints on their derivatives to establish entire classes of controllers that stabilize a given system [23]. Several works present applications of CLFs as feedback controllers, such as [24]. As in Tee et al. [20] and Wieland and Allgower [21], the work by Romdlony and Jayawardhana [22] imposes conditions stronger than necessary [13].
The most recent formulation related to the safety of dynamical systems is presented in Ames et al. [13] and Ames et al. [14]. This methodology, also called CBF, ensures safety over the entire set and imposes new conditions on CBF, making the problem minimally restrictive, unlike [20], [21] and [22]. It combines performance/stability objectives, expressed as a CLF or a nominal control law and safety constraints, expressed as a CBF. These objectives can be integrated through quadratic programming (QP) and safety constraints must be prioritized. Several applications using this methodology are proposed in the literature, such as adaptive cruise control [25], [26], bipedal walking robot [27], robotic manipulator [28], Segway [29], quadrotors [30] and multi-robot systems [31].
This formulation, initially developed for continuous-time systems, is extended to discrete-time systems in Takano et al. [32] and Agrawal and Sreenath [33]. In Agrawal and Sreenath [33], the performance/stability objectives are expressed as a CLF, and in Takano et al. [32], as a nominal control law. In discrete-time, the performance/stability objectives and safety constraints are integrated through nonlinear programming (NLP), and under certain conditions, the NLP can be formulated as a QP [32], as shown later on.
In this work, we apply the formulation presented in Ames et al. [13] and Ames et al. [14] to a reaction wheel pendulum. Since the focus is safety, for stabilizing the pendulum, a simple linear quadratic regulator (LQR) was considered. The safety constraint, expressed as a CBF, is considered to guarantee that the pendulum angular position never exceeds a predetermined value. The control framework was applied and analyzed considering continuous-time and discrete-time situations. Particularly for the discrete-time CBF, we stress that there are few studies dealing with that in the literature [32], [33].
The rest of this paper is organized as follows: In section II, the modeling of the reaction wheel pendulum is described. The nominal LQR, the concept of CBF and the control framework are presented in section III for the continuous-time system and, in section IV, for the discrete-time system. Results and conclusions are presented in sections V and VI, respectively.

II. SYSTEM MODELING
The schematic diagram of the reaction wheel pendulum is presented in Fig. 1. The system is constituted by an inverted pendulum that is balanced by an actuated reaction wheel. α is the pendulum angle, θ is the wheel angle and τ is the torque acting on the reaction wheel. The angles are measured with two optical encoders and the reaction wheel is actuated by a permanent-magnet DC motor.
The equations of motion can be derived using the Lagrangian method. The Lagrange's equations are described as d dt where L = T − V is the Lagrangian of the system, T is the total kinetic energy, V is the total potential energy, d is the number of generalized coordinates or degrees-of-freedom, q l represent the generalized coordinates and τ l the generalized forces (torques). For the reaction wheel pendulum, the generalized coordinates are α and θ (d = 2), and the generalized torques are +τ , imposed by the DC motor and acting on the reaction wheel, and −τ , which is the reaction torque acting on the pendulum.
The system kinetic energy T is the sum of the pendulum kinetic energy and the reaction wheel kinetic energy: where m p and m r are the pendulum and the reaction wheel masses, J p and J r are the pendulum and the reaction wheel moments of inertia, l p is the pendulum length and l cp is the distance to the pendulum center of mass. We assume that the system potential energy V is due to gravity only. Thus, V = m p gl cp cos α + m r gl p cos α, where g is the gravitational acceleration constant.
Applying (2) and (3) in (1), we obtain the following equations of motion: (m p l 2 cp + m r l 2 The torque τ generated by the DC motor can be described by: where K m is the motor torque constant and i m is the motor current. Neglecting the motor inductance and using Ohm's law, we obtain: where V m is the voltage applied to the motor armature using PWM (Pulse Width Modulation), K e is the back EMF (Electromotive-Force) constant and R m is the motor internal resistance.
The system model is represented by: with states x ∈ D ⊂ R n , inputs u ∈ U ⊂ R m and f (x) and g(x) locally Lipschitz.
In order to design the linear control to stabilize the system, the nonlinear model is linearized around the equilibrium point, resulting in:ẋ where

III. CONTROL FRAMEWORK -CONTINUOUS-TIME
This section presents the nominal LQR, the concept of CBF and the control framework that unifies the nominal LQR and CBF through QP, considering the continuous-time system described in (8).

A. NOMINAL CONTROL -CONTINUOUS-TIME LQR
LQR is an optimal regulator that, given the system equation (9), determines the matrix K of the optimal control vector so as to minimize the performance index where Q is a positive-semidefinite matrix and R is a positive-definite matrix. These matrices are selected to weight the relative importance of the state vector x and the input u on the performance index minimization [34].
If there exists a positive-definite matrix P satisfying the Riccati equation then the closed-loop system is stable. Thus, the optimal matrix K can be obtained by

B. CONTROL BARRIER FUNCTION -CONTINUOUS-TIME
Two dual concepts related to control systems are liveness and safety. As mentioned in Ames et al. [13], liveness requires that ''good'' things eventually happen, such as asymptotic stability or tracking, while safety requires that ''bad'' things do not happen, such as a set invariance. Liveness can be mathematically related to a CLF or an arbitrary nominal control law. On the other hand, safety can be related to CBF, meaning that any trajectory starting inside an invariant set will never reach the complement of the set [13]. A barrier function h(x) vanishes on a set C boundary, i.e., h(x) → 0 as x → ∂C. If h(x) satisfies Lyapunov-like conditions, then the forward invariance of C is guaranteed [13]. The natural extension of a barrier function to a system with control inputs is a CBF [21]. In CBFs, we impose inequality constraints on a derivative to obtain entire classes of controllers that render a given set forward invariant.
We consider a set C defined as the superlevel safe set of a continuously differentiable function h(x) : D ⊂ R n → R yielding [13]: The definition of safety is given by [13]: Definition 1: Let u be a feedback controller such that (8) is locally Lipschitz. For any initial condition x 0 ∈ D there exists a maximum interval of existence I (x 0 ) such that x(t) is the unique solution to (8) on I (x 0 ). The set C is forward invariant if for every x 0 ∈ C, x(t) ∈ C for x(0) = x 0 and ∀ t ∈ I (x 0 ). The system (8) is safe with respect to the set C if the set C is forward invariant. Considering , the formal definition of CBF is given by [14]: Definition 2: Consider the control system (8) and the set C ⊂ R n defined by (14) for a continuously differentiable With this definition, we have the following corollary [14]: Corollary 1: Given a set C ⊂ R n defined by (14) and let h(x) be an associated CBF for the system (8), then any locally Lipschitz continuous controller u : D → U such that u(x) ∈ K cbf (x) will render the set C forward invariant.
As previously mentioned, in this work, the safety constraint is considered to guarantee that pendulum angle α never exceeds a predetermined value. VOLUME 8, 2020

C. UNIFYING NOMINAL CONTROL LAW AND CBF THROUGH QP -CONTINUOUS-TIME
The final control framework unifies the nominal LQR and CBF through QP in continuous-time.
The nominal controller u no , for the control system (8), is shown in (10). The idea here is to consider that the safety constraints modify the nominal controller in a minimal way, just when the states are approaching the border of the safe set, so that the final control u satisfies corollary 1. Therefore, the final controller is formulated as an optimization problem, minimizing the error [28] e u = u no − u.
The squared norm of the error is considered as the objective function. The last term of (18) is neglected, since it is constant in a minimization process with respect to u. Thus, we can consider the following QP-based controller [13], [28]: where A cbf = −L g h(x) and b cbf = L f h(x) + α cbf (h(x)). It is important to highlight that the constraint in QP enforces the condition (15) for CBF. Fig. 2 shows the schematic diagram of the control framework for the continuous-time case.

IV. CONTROL FRAMEWORK -DISCRETE-TIME
This section presents the nominal LQR, the concept of CBF and the control framework that unifies the nominal LQR and CBF through NLP, considering the discrete-time system. The system described in (8) is represented in discrete-time as with states x(k) = x k ∈ D d ⊂ R n and inputs u(k) = u k ∈ U d ⊂ R m .
The linearized system is represented as where

A. NOMINAL CONTROL -DISCRETE-TIME LQR
The discrete LQR controller has the form where the matrix K d is such that minimizes the performance index where Q d is a positive-semidefinite matrix and R d is a positive-definite matrix. These matrices are selected to weight the relative importance of x k and u k over (23), respectively [35].
If there exists a symmetric matrix P d satisfying the discrete Riccati equation thus, the optimal matrix K d can be obtained by

B. CONTROL BARRIER FUNCTION -DISCRETE-TIME
Such as for continuous-time, we define a set C d which is a forward invariant set if it satisfies [33]: where B dk = B d (x k ) : D d → R is called discrete-time exponential barrier function. Proposition 1: The set C d is invariant along the trajectories of the discrete-time system (20) if there exists a map B dk : C d → R such that [33]: Definition 3: (Discrete-time exponential control barrier function) A map B dk : D d → R is a discrete-time exponential control barrier function if [33]: 1) B 0 ≥ 0 and, 2) there exists a control input u k ∈ R m such that B dk + γ d B dk ≥ 0, ∀k ∈ Z, 0 < γ d ≤ 1.

C. UNIFYING NOMINAL CONTROL LAW AND CBF THROUGH NLP -DISCRETE-TIME
Proceeding similar to the continuous-time case, suppose we want to guarantee safety for the control system (20), considering that we have a discrete nominal controller u nod , shown in (22). Based on Takano et al. [32], the problem for the constrained state control becomes the following NLP:   In this work, the discrete CBF B dk and the discrete-time system (20) are both linear; so the NLP (27) can be described as a QP, such as in the continuous-time system.

V. NUMERICAL/EXPERIMENTAL RESULTS
The behavior of the reaction wheel pendulum with the proposed control framework was verified through numerical simulations with MATLAB/Simulink and experimentally using a prototype developed at Escola Politécnica da Universidade de São Paulo (EPUSP), shown in Fig. 4. It is powered by a development board Teensy 3.2, based on a 32-bits ARM processor. Pendulum angle α and wheel angle θ were measured with optical encoders, 1 while pendulum velocityα and wheel velocityθ were obtained by Euler backward approximations. The reaction wheel is actuated by a permanent-magnet DC motor, with a motor driver VNH5019.

A. CONTINUOUS-TIME RESULTS
For the continuous-time system, the numerical values of matrices A and B in (9) We proposed an experiment whereby pendulum angle α should tracks a reference input α ref composed of short-time pulses. This was considered in order to verify the effect of the barrier function, i.e., with the final control framework, the pendulum is expected not to exit the safe set. Initially, just LQR was applied. When a reference input is considered, the control input (10) becomes where k 1 = −3.906.
Posteriorly, the control framework that unifies LQR and continuous-time CBF through QP shown in (19) was applied to guarantee that |α| never exceeds a predetermined value α max . For doing so, the CBF must be chosen in order to satisfy the safe set C (14). This can be solved by applying the following CBF: where c 1 and c 2 are constants determined empirically. A similar CBF can be found in Taylor et al. [36] applied to a Segway. The term c 2α 2 scales the importance of velocityα. If a small value is set to c 2 , so that the velocity exerts little influence, it can be observed that h(x) ≥ 0 happens just when |α| < α max ; so that the safe set C (14) is satisfied.
It is important to highlight that c 2α 2 must necessarily be added. The constraint in QP (19) shows that the input u only influences the system for L g h(x) = 0; hence, h(x) has to be designed such thatḣ(x) depends directly on u. If the term c 2α 2 is neglected, the CBF will have a relative-degree greater than one, i.e., L g h(x) = 0, and the problem cannot be solved. Nguyen and Screenath [27] and Wu and Sreenath [30] describe this kind of solution to deal with high relative-degree CBFs. Other solutions are presented in Hsu et al. [37], which proposes a backstepping-based method, and in Nguyen and Sreenath [38], whereby the concept of exponential CBF is introduced as a way to enforce high relative-degree safety constraints.
The QP of (19) was implemented using Hildreth's QP procedure [39], which is solved in polynomial time. The algorithm was embedded in the Teensy 3.2 platform. α cbf (h(x)) = γ h(x) was chosen, where γ is a constant, as suggested in [14]. Initially, the numerical values considered for CBF (32) and QP (19) were γ = 55, c 1 = 0.5, c 2 = 0.001 and α max = 0.087rad (5 • ). It was observed that using higher values for γ , the safety constraint determined by CBF is not respected and, using lower values, the CBF is more conservative, acting far from the barrier limit α max . Constant c 1 exerts little influence and it was set based on Taylor et al. [36]. It was also observed that using higher values for c 2 makes the CBF more conservative, and lower values have little influence.
Numerical simulations were performed in MATLAB/ Simulink. In order to make the simulation more realist, the actuator dead-zone and measurement noise was added to the encoder output. Dead-zone was experimentally identified equal to 0.13 (in duty-cycle of PWM). The measurement noise was modeled as a random variable uniformly distributed in (− /2, /2), where is the resolution of each encoder. With this approach, the measurement noise represents the quantization noise of the encoders.
Simulation results are presented in Figs. 5 for LQR, and 6 for LQR with CBF. The pendulum is assumed to start at an initial angular position α ini = 0.069rad (4 • ). A reference α ref with short pulses (0.2 s) with amplitude ±0.140rad (8 • ) is applied. The results show that LQR is able to stabilize the system. When LQR is combined with CBF, in the final control framework, it is possible to see that the safety constraint was respected, i.e., |α| never exceeds α max and the CBF h(x) respects the conditions shown in (14).
The experimental results with the prototype are presented in Figs. 7 and 8. In Fig. 7, just LQR was considered. It is possible to see that the controller performs well, keeping the system balanced and trying to track the short pulses. In Fig. 8, the final control framework was applied. Using the parameter γ = 55, as in the simulations, the CBF acts very close to barrier limit α max , and it was observed in previous experiments that some values of |α| somewhat exceeded α max , mainly due to sensor imprecision, unmodeled dynamics, and angular  speed estimators. Hence, for the experiments, we considered a more conservative barrier with γ = 1, so that the safety constraint was respected. In Fig. 8, initially |α| exceeds α max , since the CBF was programmed to act just after the transitory due to the initial condition. It is possible to see that the pendulum angle never exceeds the barrier limits, i.e., the states do not leave the safe set.

B. DISCRETE-TIME RESULTS
The system model was discretized considering a sampling time T s = 0.02s. The numerical values of matrices G and In this case, the weighting matrices of the LQR were set as (34) such that The same setup of the continuous time experiments are considered here. Initially, just LQR was applied. When a reference input is considered, the control input (22) becomes where k d1 = −3.3908. Thereafter, the control framework that unifies LQR and discrete-time CBF by means of an NLP, as shown in (27), was applied to guarantee that |α| never exceeds a predetermined value α max . The CBF must be chosen in order to satisfy the safe set C d (26). This can be solved by applying the following discrete CBF: For a discrete-time system, it is not necessary to add the term related to velocityα, because the Lie derivative L g h(x) does not play a role in the NLP (27). It can be observed that B dk ≥ 0 is satisfied just when |α| < α max ; so the safe set C d (26) is satisfied.
It is important to highlight that the discrete CBF (37) and the discrete-time system (20) are both linear; the NLP (27) can therefore be described as a QP, as in the continuous-time system. Considering the NLP (27) can be described as a QP, such that (19): where The QP (39) was solved using Hildreth's QP procedure again, which was also embedded in the Teensy 3.2 platform. Initially, the numerical values considered for CBF (37) and QP (39) were γ d = 0.25, α max = 0.087rad (5 • ) again and the reference input α ref was the same used in the continuous-time system. It was observed that when higher values for γ d are set, the safety constraint determined by CBF is not respected and, when lower values are considered, the CBF becomes more conservative.
Numerical simulations with MATLAB/Simulink are presented in Figs. 9 and 10. In Fig. 9, just LQR was considered, while in Fig. 10, the final control framework with CBF was applied. It is also possible to see that the LQR performs well and when it is used with CBF, the safety constraint  was also satisfied, and the CBF B dk respected the conditions shown in (26).
Experimental results are presented in Figs. 11 for LQR, and in 12 for LQR with CBF. Again, the system shown to be well stabilized and the CBF was able to guarantee the invariance of the safe set. For the same reasons presented in the continuous-time case, a more conservative CBF was used in the discrete-time experiments, with γ d = 0.1. In the discrete-time experiment, the results are somewhat better than in the continuous-time case. One reason why this happened is that, in the discrete-time case, the effect of the zero-order hold in the practical implementation was included in the model of Equation (21).

VI. CONCLUSION
This paper presented the control of a reaction wheel pendulum considering stability objectives, expressed as a nominal control law, and safety constraints, expressed as a CBF, by means of QP, in continuous-time and discrete-time. LQR was considered for the nominal control law, and the safety constraints were considered to guarantee that the angular position of the pendulum never exceeds a predetermined value. Results from numerical simulations and experimental tests indicate that the control framework satisfies the stability objectives and safety constraints. Due to some practical issues, such as measurement imprecision, unmodeled dynamics and angular speed estimator, in order to guarantee the safety constraints during the experiments, more conservative CBFs have to be applied in both continuous and discrete-time practical tests. As suggestions of future work, robust CBFs can be considered, whereby model uncertainties and disturbances are taken into account, as well as the association of CBF with other control methods, such as sliding mode and fuzzy Takagi-Sugeno. GABRIEL