Control Refinement Using Particle Methods

The paper presents a control scheme for the real-time tracking problem of nonlinear systems subjected to hard nonlinearities. The proposed tracking controller introduces a refining component in the control input designed for the nominal plant model. The refining component compensates for tracking performance degradation caused by modelling uncertainties and external disturbances. The refining component is modelled as a random signal, the probability density function is expressed as a combination of finite weights typical of particle methods. The weights are updated based on sequential tracking error data. The proposed algorithm is simulated for an inverted pendulum affected by Coulomb friction. Comparison with existing techniques exhibits remarkably superior tracking performance.


I. INTRODUCTION
Tracking of reference trajectory by a dynamic plant is one of the fundamental challenges in control design. The complexity of the problem is embedded in modelling inaccuracies and external disturbances. The so-called regulation theory provides the classical solution for tracking. The reference signal and the disturbances are assumed to be generated by an exosystem with known dynamics. The control law is then designed based on plant and exosystem dynamics employing feedback. A vast literature on the regulator for linear dynamical systems can be found in [1]- [3].
Nonlinear tracking has been a central problem in classical control systems research. Extension of regulator theory for a particular class of nonlinear systems can be found in [4]. Another significant contribution in tracking control of Lipschitz nonlinear systems is based on extended order high gain observer [5]. In this case, the control law includes estimating an additional extended state of the system, representing uncertainties of modelling and disturbances. Similarly, higher-order sliding mode control with disturbance observer has been proposed in [6]. The tracking control approaches for nonlinear systems discussed thus far require a specific structure of the nonlinear plant model. In addition, practical nonlinear phenomena like friction, dead-zone and backlash remain a challenge.
The associate editor coordinating the review of this manuscript and approving it for publication was Shaoyong Zheng .
A parallel, optimization-based approach is approximate/ neurodynamic programming [7] for handling systems affected by hard nonlinearities and input constraints. In this case, the control input is selected from the set of admissible values that minimizes the cost function over a prediction horizon. The design of the cost function is based on the desired performance. If the prediction is based on a system model, the methodology becomes the well-known Model Predictive Control (MPC) [8]. Approximate/ neurodynamic programming is computationally pervasive in model-based or model-free scenarios. Application of such methodology for fast, real-time dynamic systems faces implementation limitations.
The primary motivation behind particles based optimization methods is due to their ability to handle nonlinear and especially non-Gaussian problems [9]- [11]. Particle methods have found application in control systems in the past. The technique has been directly adapted for state estimation [12], [13] and for control effort [14] by converting the deterministic system into its stochastic counterpart by adding random measurement and observation noise. It is a necessary step to adopt the methodology; thus, the problem itself does not remain deterministic anymore [15], [16]. Particles based stochastic feedback control is proposed in [17], but the use of particles is restricted to state estimation only. The control effort for nonlinear feedback systems has also been designed by using the particle philosophy [18]- [21] in recent times, but the scope of these schemes is restricted to the ideal plants. The run-time trajectory update method using particles is proposed in [22]. The method suppresses the system noise significantly by a modified weight update mechanism but is not recommended on a large scale due to extensive memory requirements. The graph-based method [23] suggests a reduced memory requirement with a lesser number of particles in a specific environment, but the approach has a limited application and lacks generality for a large class of nonlinear problems. The control scheme proposed in [24] implements the conventional weighted particles approach for a deterministic problem, but the underlying probability density function (pdf) is required for its implementation. The algorithm's performance is degraded significantly in the non-availability of accurate information. The particles based approach to find the optimal control input in the case of a null controller has been described in [25]. Improved performance of particles is suggested in [26] by combining with conventional gradient descent optimizer, but the scheme is difficult to deploy due to the complex learning process.
The particles based methods show superior performance over any other sub-optimal scheme if the number of particles is sufficiently high [27]. That is why particles are employed in fewer optimal control schemes, despite their high-performance capability in a nonlinear environment. The particles based methods have been primarily employed in the tuning of classical controllers. The optimal tuning of the linear quadratic regulator through particle swarm optimization (PSO) is proposed in [28]. The approach is utilized for position control only. Also, the performance under disturbances is not considered. The optimal tuning of a PID controller using a hybridized PSO is proposed in [29] and a control scheme via quadratic Lyapunov function with unknown parameters has been implemented using PSO in [30]. The drawback of the PSO method is premature convergence and the dependence on user experience to set the parameter values. Thus they are usually recommended for offline optimization to solve initial parameters for online tuning. A Fuzzy-PID control approach is presented in [31] employing this concept.
The next challenge is to compensate for the effect of disturbances. Another option is to model the physical constraints to incorporate them in controller design using MPC to handle model uncertainties [8]. The approach again suffers the problem of computational complexity. Particle MPC introduced in [32] proposes model-based reinforcement learning to handle the uncertainty. The method is computationally efficient and based on sequential learning, but the impact of hard nonlinearities remains unaddressed. Active disturbance rejection control (ADRC) techniques are being used to improve the tracking performance of the system under disturbances in recent times. The extended state observer (ESO) is an integral part of ADRC where the conventional and adaptive techniques have been suggested to improve the overall performance of the controller, and disturbance estimation [33]. A recursive sliding mode controller with an adaptive disturbance observer is proposed in [34] where the high-frequency chattering is a problem in the resultant output. An adaptive ESO in [35] and finite time disturbance observer in [36] is proposed along with a high order sliding mode control to compensate the time-varying disturbances, but the plant dynamics are transformed with certain assumptions before applying the methodology. A standard ESO with PI-based ADRC is proposed in [37] with limited application to linear plant models. All the mentioned approaches employ ESO to estimate the disturbances using the estimation error.
To summarize, the classical methods suggest observerbased disturbance compensation where the estimation error is minimized to estimate the disturbance. However, our control objective is to minimize the tracking error, and any improvement in the tracking performance cannot be achieved by the observer on its own. Based on the separation principle, controller and observer are separately designed in conventional methods. Whereas our proposed scheme jointly presents controller and observer behaviour that are not distinct entities anymore. This idea of disturbance compensation based on tracking error is unique among the existing techniques and has not been a focus of literature. Furthermore, the conventional controllers focus on asymptotic behaviour resulting in an undesirable transient response like jerks, spikes, peaking. The transient behaviour is mainly ignored in the analysis as well. Our approach also has the advantage of addressing both transient and asymptotic behaviours.

A. MOTIVATION AND PROPOSED METHODOLOGY
The sufficient information about the plant enables the design of a nominal control law using classical methods. The desired trajectory corresponding to ideal asymptotic/ exponential convergence is obtained when the nominal plant model is subjected to this nominal controller. However, the desired performance is not achieved when such control law is applied to an actual plant due to the external disturbances, model uncertainty, parameter mismatch. The expected result is a deviation from the desired trajectory. The presence of hard nonlinearities like Coulomb friction or constrained input further worsens the situation.
The goal of this paper is performance recovery through a computationally efficient and optimization-based solution. Based on deviation from the desired response, necessary adjustment in the control input is estimated to have resulted in perfect tracking. The philosophy of particles is exploited to achieve this task. The subsequent control input includes the estimated adjustment resulting in improved tracking performance. This phenomenon is termed control refinement. The estimation of control adjustment can be termed as retrospective learning.
The intuitive notion of achieving ideal tracking is to recover the performance of the nominal system (perfectly modelled/unperturbed) under nominal tracking control input. We term such a response as ''nominal closed-loop system response''. In this case, the control refinement is done based on the difference between the measured system output and VOLUME 10, 2022 the nominal closed-loop system response. However, it is observed that ideal tracking cannot be achieved by following the nominal controller only. The major reason is that the process inherits the weakness of the controller. Furthermore, the controller is perturbed at each sampling instant by giving a new transient as the refining component. Thus the controller never settles to a steady state. To overcome these flaws, instead of replicating the behaviour of the nominal controller, we propose control refinement based on the difference between measured system output and an ideal (viable) trajectory. The ideal trajectory is the desired convergent curve which describes the ideal system behaviour specifying the ideal transient and steady-state response. It is the ideal trajectory that provides a basis for nominal controller design. The philosophy has led to remarkable tracking performance.
The main contributions are summarized as: 1) A combination of classical and particle methods and a joint mechanism of controller and observer has led to a remarkable performance that may exhibit an order of magnitude reduction in tracking error.
2) The proposed method addresses both transient and steady-state responses, thereby avoiding issues related to both. The organization of the remaining paper is as follows. The problem has been described in Section II. The details of particles based optimization with all its associated processes are discussed in Section III. The optimization algorithm is given in Section IV, and the stability of the proposed scheme is discussed in Section V. To authenticate the proposed technique, the scheme is implemented on a nonlinear problem, and the simulation results of illustrative example have been shown in Section VI. Finally, the conclusions are drawn in Section VII.

II. PROBLEM STATEMENT
Consider the following nonlinear system equations where x k ∈ R n is the state vector and f (·) is a deterministic transition function, u k is a known and deterministic input vector that drives the system dynamics, b n×1 is a vector connecting the input to the system and d k is the process noise that models external unknown disturbance. y k is the observation vector and h(·) is a deterministic observation function that is a known analytical link between the state vector and observation. v k is the observation noise which is assumed to be zero mean and additive as well. The SISO system is considered for simplicity. The methodology can be extended to MIMO systems without much difficulty. It is assumed that the exact knowledge about the plant {f , h} is not available. Instead we have a nominal model where f (x k−1 ) and f o (x k−1 ) are reasonably close. Notice the absence of process and observation noise in the above system. It is worth-mentioning that this framework is general enough to handle a variety of control problems including stabilization and tracking. For the later case a reference r k and possibly its derivatives {ṙ k ,r k , . . .} etc would be required. Our control objective is to track a time varying reference r k , where the tracking error is given as: For a wide variety of practical problems, inaccuracies like model uncertainty (including the presence of hard nonlinearities like Coulomb friction) and discretization errors (in the case of sampled data systems) can be absorbed as a component of external disturbance. From this point onwards, we assume such cases only. On the other hand, sufficiently accurate information about h(·) is a requirement; otherwise, it will not be easy to measure the deviation of the closedloop system from the desired response. In this framework, the assumption of controllability (and observability in the case of output feedback) is implicit.
It is assumed that a nominal controller u k−1 = ψ(x k−1 ), or u k−1 = ψ(x k−1 ) in case of output feedback wherex k−1 is the state estimate obtained from some observer is available. The designed nominal controller is based on the ideal control behavior ( i.e. the desired behavior is exponentially or asymptotically convergent) that enables y k to follow an ideal trajectory y i,k for k = {1, 2, 3, . . .}.
The term ''ideal trajectory'' is indicative of the ideal control behaviour and the term ''nominal controller'' is the controller designed for the nominal model and applied to the actual model. Moreover, both the ideal trajectory and nominal controller include the transient response and the asymptotic behaviour.
The controller ψ(·) designed for the nominal system has to be applied to the actual plant. Deteriorated performance is expected due to various factors mentioned above. Worstcase scenarios may even lead to instability or oscillations. Particle methods have been suggested for this purpose to handle difficult situations for the classical approach.
The dynamic nature of the problem requires corrective measures for each sample. The nominal controller generates a control input at each sample that is refined to minimize the deviation of the output from the desired response. Hence the method is called ''control refinement''.

A. NOMINAL CONTROLLER
This section aims to recover the desired controller performance when applied to the actual system. The system's block diagram with the proposed refinement of the control input is shown in Fig. 1, wherex k is the estimated state through the observer. It is clear from the block diagram that the effect of disturbances is being compensated using the tracking error, whereas only the states are estimated at the observer. Also, the refinement process is performed at each sampling instant, where u k is given by the nominal control law. The nominal plant model given in (2) is available for control refinement to evaluate the deviation from nominal closed-loop system response, defined as y o,k , whereũ k is the refined input calculated at instant k to be applied to plant for subsequent measurement.
With the application of u k−1 , it was expected that system output will reach y o,k in the next sample. However, the presence of unwanted effects deviates the system, whose output comes out to be y k . As discussed above this deviation can be collectively attributed to the disturbance d k−1 . The deviation is given as We need to first estimate the disturbance d k−1 that caused ε o,k . Once the estimated k−1 is obtained, a non-causal solution is to cancel the actual disturbance withd k−1 . Hence a refined controlũ is applied instead of u k−1 . In the special case when we are able to findd k−1 that forces ε o,k = 0, perfect results are achieved as y k = y o,k . Despite the performance of this controller, its non-causal form makes it little useful except in evaluation and assessment. A transition fromd k−1|k tô d k|k is required to make the controller causal, where the notation is self explanatory. One may be tempted to introduce dynamics in the disturbance model. However, this approach considerably, complicates the solution and is left for future consideration. With these limitations, only the following choice is left for the transition function Due to the nature of control, it is termed ''retrospective learning''. The term indicates that a lesson from the past is applied in the present. Thus the refined causal control is It is expected that this control is able to decrease (if not minimize) the next deviation i.e.
The refinement iteratively continues with the control. Above, we have freely used the notation k|k , k+1|k etc. Though with the scheme of following the controller behaviour, we will be able to track the performance of the nominal controller. It is worth mentioning that replicating the nominal controller's performance at each sampling instant may not have remarkable tracking performance. The main reason is the inherited weakness of the controller itself when working in a closed-loop system, even when accurate information about the plant is available. Moreover, The controller is perturbed through the refining componentd k at each sampling instant, causing it to never come out of transient. Thus the defective nominal response y o,k may not lead us to achieve the desired control objective. The alternate approach is to track the theme for which the controller is designed to improve the tracking performance.

B. IDEAL TRAJECTORY
The philosophy of tracking the nominal control behaviour does not appear suitable. That leads us to track the idea of tracking the ideal trajectory y i,k directly somehow.
The system's block diagram with this approach of refinement is shown in Fig. 2. It can be seen in the block diagram that the ideal trajectory is available for the control refinement in addition to the nominal plant to improve the tracking performance. u k is given by the same controller. The nominal plant model given in (2) is available for control refinement to evaluate the deviation from ideal trajectory y i,k , whereũ k is the refined input calculated at instant k to be applied to the plant for the subsequent measurement.
With the application of u k−1 to the system, the measurement y k is obtained. Due to the presence of unwanted effects, this output deviates from the desired behavior where the deviation is given as The deviation can be collectively attributed to the disturbance. Contrary to the approach of ''retrospective learning'' in the previous section, the methodology of tracking the ideal control behaviour is predictive, i.e. preparing the control in the present to minimize the error in future. The input u k is calculated to be applied to get the measurement at k + 1.
The output for the next sample can be predicted through the system model. Since the desired behavior at k + 1 is available as y i,k+1 , the refinement of u k is suggested in terms of estimating the refining componentd k to get the refined inputũ k asũ that minimizes the deviation

VOLUME 10, 2022
It is to be noted thatd k is the estimation of all the effects responsible for the deviation from the ideal trajectory, including the inherited weakness of the nominal controller. The refinement iteratively continues with the control. The intuition of following the ideal control behavior has remained intact by ignoring the nominal controller and employing the idea of tracking the ideal trajectory with the expectation that this control is able to decrease (if not minimize) the actual next deviation i.e.
C. OPTIMIZATION It is clear from the above sections that the tracking problem can be cast as an optimization problem. It is essential to mention that ε is not the estimation error of the classical observer. In that framework, the estimation error is the difference between actual and estimated output. This discussion is essential in understanding the difference between our suggested approach and classical methods of disturbance cancellation (or attenuation). In order to proceed to improve the tracking performance, we need to minimize the deviation ε. This in turn is done by minimizing the risk function J (ε) w.r.t d k . Particle methods are appropriate for such propositions. Details will follow in the next section. The common types of risk functions are mse : J (ε) = ε 2 absolute error : J (ε) = |ε| and hit − or − miss : where δ is a small positive number [38].
In the Bayesian approach, the cost function corresponds to obtaining the estimate from mean, median and mode respectively of the pdf under consideration. The latter of the three estimators is more suited to our problem. Hence, we will rely on the pdf's mode when point estimators are required.

III. BAYESIAN APPROACH OF OPTIMIZATION
At time k, measurement y k is received. After comparing it with y i,k we get the deviations ε i,k . The disturbance d k is stochastically modeled to have pdf p(d k |y k , y i,k ).Secondly, y i,k appears as deterministic parameter making mean of deviation ε i,k equal to y i,k . Note that zero mean assumption for the observation noise is important here. Also y i,k+1 is the desired behavior available at k and not a measurement. Its use in the subsequent equations shows the predictive nature of control. Making use of Baye's law p(d k |y 1:k , y i,1:k+1 ) = p(y k |d k , y i,k+1 )p(d k |y 1:k−1 , y i,1:k ) p(y k |y 1:k−1 ) As observation at time k does not depend on observation at 1 : k − 1, p(y k |y 1:k−1 , d k ) → p(y k |d k ). Notice the absence of y i,0 from above. The reason behind is that u 0 depends on y i,0 , but neither y 0 nor d 0 . p(y k |d k ; y i,k+1 ) is the likelihood function. The states x k do not appear in the expressions as they only serve as intermediate variables. State estimation using particles is not the scope of this paper, although there is a possibility for such an option for cases where the conventional state estimators are difficult to use. From Chapman Kolmogonov equation (14) where we have used the fact p(d k |d k−1 , y 1:k−1 ) → p(d k |d k−1 ) as from the transition equation (13), the predictive density p(d k |d k−1 ) does not depend on y 1:k−1 . The pair (13) and (14) As p(d k |y 1:k , y i,1:k+1 ) in not known, thus we have another density known as importance density q(d k |y 1:k , y i,1:k+1 ) which can be easily sampled that has the same support as p(d k |y 1:k , y i,1:k+1 ), thus the general weight function can be defined as w(d k ) = p(d k |y 1:k , y i,1:k+1 ) q(d k |y 1:k , y i,1:k+1 ) Consequent to that the weights in (15) are defined as where w m k = w(d m k ). It is emphasized that w m k is the importance of the sample d m k and not its probability. Using Baye's rule, the un-normalized weight function is Expanding using Chapman-Kolmogonov theorem w(d k ) ∝ p(y k |d k ; y i,k+1 ) × p(d k |d k−1 )p(d k−1 |y 1:k−1 , y i,1:k ) dd k−1 q(d k |d k−1 , y k )q(d k−1 |y 1:k−1 , y i,1:k ) dd k−1 (19) where the fact for a Markov process has been adopted for importance density as well i.e. q(d k |d k−1 , have been drawn from q(d k−1 |y 1:k−1 , y i,1:k ), the (19) becomes From the definition of w(d k ), it follows that This leads to the posterior pdf expression Generating d m . Referring to (15), the recursive weight update equation after combining (17) and (21) results in The knowledge of likelihood function p(y k |d k ; y i,k+1 ) and updated importance function q(d k |d k−1 , y k , y i,k+1 ) is crucial for updating the weight and are major problems in its practical implementation as well. The Bootstrap Particle Filter (BPF) gives the choice of transition density as importance density i.e.
Thus the weight update equation becomes w m k can be normalized as The second problem is the degeneracy of particles. The degeneracy can be avoided through resampling and sample impoverishment i.e. regularization. The point estimated k is selected which is the mode of the posterior pdf. The mode of the posterior pdfd k is expected to be sub-optimal in the vicinity of global minimum due to sparsity of particles. The estimate may be further improved through polynomial fitting.d k is combined with the input u k calculated by the controller at instant k to be applied to the system at the input channel asũ k = u k +d k (27) whereũ k is the refined input for next measurement. The flow chart of the proposed control refinement process is shown in Fig. 3

IV. PROPOSED ALGORITHM
Start: u k−1 is applied to actual system, y k is measured, y i,k+1 is available through the ideal trajectory.
• Calculate u k though ψ For m = 1 : M

V. STABILITY OF PROPOSED METHODOLOGY
The stability of the closed-loop system under refined control input (10) is established by proving the boundedness of system states. We consider control refinement based on the nominal controller. Analysis for ideal trajectory-based control refinement can be extended trivially and is thus omitted.
To this effect, we require that following conditions are satisfied.

A. ASSUMPTIONS
1) The optimization problem requiring minimization of cost J (ε) has been solved by the predefined algorithm.
2) The output function h(x k ) is globally bounded in its arguments. Furthermore, y o,k is also uniformly bounded.
3) The observation noise v k is bounded such that ||v k || ≤ γ . We now state the following Lemma, which guarantees boundedness of states of the closed-loop system.
Proof: Solving of optimization problem, requiring minimization of J (ε) implies (considering J (ε) = |ε| ) which can be extended to with boundedness of v k , we get Consequently, boundedness of system states x k follows from the implicit assumption of observability.

VI. SAMPLED DATA CONTROL EXAMPLE
The technique has been implemented on the sampled data nonlinear problem of the following generalized forṁ where the sampled output is available for measurement only i.e.
The sampled data form of system results in where the integral needs to be solved numerically at each sampling instant for the discretized state value. This makes the solution computationally intensive. Any alternative approach is left for future consideration. The second aspect is the decision of the control input for the sampling interval at discrete points in time, i.e.for [kT , (k + 1)T ] The system considered is of an inverted pendulum (with sampled output) [39] with due consideration to the compensation of the impact of high bandwidth disturbances through control refinement. The system model is given aṡ The proposed technique has been compared with the earlier proposed work in this particular area based on disturbance estimation and rejection using extended order observer for sampled-data nonlinear systems [39]. The nominal model is subjected to nonlinearities to form the actual plant. The parameter uncertainty is introduced using b 1 , c 1 and the nominal parameters are modified as b 1 = 0.9b, c 1 = 0.9c in the plant. The Coulomb friction has been introduced by replacing it with the friction parameter b 1 in the plant.
The tracking error for disturbed system, conventional disturbance estimation (CDE) method and proposed scheme of control refinement (CR) for the nominal controller (NC) and the ideal control behaviour (ICB) is shown in Fig.4. The expanded view of the tracking error is shown in Fig.5. The tracking performance has been improved by using control refinement for a nominal controller, but the controller's limitations have restricted any remarkable improvement. As clearly evident from the simulation results, an order of magnitude improvement in tracking performance has been achieved through control refinement based on ideal control behaviour. The deviation from ideal behaviour is shown in Fig.6. The transients in the trajectory tracking are due to the observer. The highest peak can be seen in the case of high gain/ extended order based conventional disturbance estimation (CDE) due to high gains of the observer. The transients in the control refinement approach are also due to the impact of the observer. Higher is the gain of the observer; higher will be the peak of the transient. The observer's performance is beyond the scope of this paper. However, the transient behaviour can be slightly improved by using existing approaches like SSRLS or adjustment of initial weights. The superior performance of the proposed control refinement in terms of any performance trajectory tracking is evident from the simulation results.

VII. CONCLUSION
The goal to improve the tracking performance of a dynamic plant by tracking the ideal control behaviour has been successfully achieved. The concept of control refinement to improve the tracking performance of nonlinear systems under multiple disturbances has been successfully implemented. A nominal control input's refinement based on particles has been introduced and simulated by addressing a sampled-data nonlinear problem. A deterministic problem has been solved by applying statistical tools. The proposed technique tracks the transient and asymptotic behaviour of the ideal trajectory designed for a time-varying reference in the presence of disturbances, and a remarkable performance has been achieved. The effect of model uncertainty and discretization with hard nonlinearities like Coulomb friction on the system's performance has been compensated through refinement of the control input by jointly handling the controller and observer. The compensation of harder nonlinearities like backlash will be considered in a future extension of the same concept. The higher-order refinement control is challenging, and readers are encouraged to study this particular case as a further area of research in this field. SABA ZIA received the B.Sc. and M.S. degrees in electrical engineering from the University of Engineering and Technology, Lahore, Pakistan. She is currently pursuing the Ph.D. degree with the College of Electrical and Mechanical Engineering, National University of Sciences and Technology, Islamabad, Pakistan. Her research interests include particle filters and control design for constrained and nonlinear systems.