Solving Chance Constrained Optimization under Non-Parametric Uncertainty Through Hilbert Space Embedding

In this paper, we present an efficient algorithm for solving a class of chance constrained optimization under non-parametric uncertainty. Our algorithm is built on the possibility of representing arbitrary distributions as functions in Reproducing Kernel Hilbert Space (RKHS). We use this foundation to formulate chance constrained optimization as one of minimizing the distance between a desired distribution and the distribution of the constraint functions in the RKHS. We provide a systematic way of constructing the desired distribution based on a notion of scenario approximation. Furthermore, we use the kernel trick to show that the computational complexity of our reformulated optimization problem is comparable to solving a deterministic variant of the chance-constrained optimization. We validate our formulation on two important robotic/control applications: (i) reactive collision avoidance of mobile robots in uncertain dynamic environments and (ii) inverse dynamics based path tracking of manipulators under perception uncertainty. In both these applications, the underlying chance constraints are defined over highly non-linear and non-convex functions of the uncertain parameters and possibly also decision variables. We also benchmark our formulation with the existing approaches in terms of sample complexity and the achieved optimal cost highlighting significant improvements in both these metrics.


I. INTRODUCTION
Consider the following optimization problem in terms of a scalar variable u. min J(u) (1a) where, J(u) is a user defined cost function, P (.) represents probability and f (.) is the constraint function which depends on the decision variable u and uncertain parameters, w 1 , w 2 .The dependence of f (.) on both w 1 , w 2 and u could possibly be highly non-linear and non-convex.The inequality (1b) can be generalized to include any number of uncertain parameters and multiple chance constraints.Further, multiple optimization variables can also be accommodated.However, for easier exposition, we first restrict our analysis to the simple case described above.Extensions to a more general case are straightforward and we discuss those later.The set C represents the feasible space of u and is assumed to be convex for simplicity.Optimizations such as (1a)-(1c) are called chanceconstrained optimizations and are used extensively for decision making under uncertainty.In robotics and control applications, Fig. 1.An illustration of the observations made in Remark 1.The shape of the distribution can be manipulated by u.An appropriate shape is one where most of the mass lies to the left of f (.) = 0 they form the backbone of the robust Model Predictive Control (MPC) frameworks.For example, see [1], [2], [3], [4].
Remark 1.At an intuitive level, chance-constrained optimizations can be interpreted as a problem of ensuring that a specific portion of the mass of the distribution f (w 1 , w 2 , u) lie to the left of f (.) = 0 (refer to Fig. 1).For given uncertain parameters w 1 , w 2 , the distribution is parametrized by the decision variable u and can therefore be used to manipulate the location of a specified portion of its mass.However, each choice of u incurs a cost J(u).
Remark 2. The chance constraint probability η has a direct correlation with the amount of mass of the distribution f (w 1 , w 2 , u) lying to the left of f (.) = 0.A Larger mass amounts to a higher η.
Chance-constrained optimizations are known to be very difficult to solve.The complexity increases further even when the uncertainty is non-parametric, that is, the analytical functional form of the probability distribution of w 1 , w 2 are not known.Chance constraints are easy to solve when w 1 , w 2 are assumed to have a Gaussian distribution and the constraint function f (.) is affine with respect to u for given w 1 , w 2 [5], [6].However, in general, optimization problems where chance constraints are defined over non-linear and non-convex functions and the underlying uncertainty cannot be represented in any parametric form are known to be computationally intractable.Thus, various approximations and reformulations are proposed in existing literatures to tackle chance-constrained optimization problems.Expectation of a function f (.) with respect to its random arguments V ar[f (.)] Variance of a function f (.) with respect to its random arguments A popular approximation called the scenario approach [7], [8] starts with drawing n samples (or scenarios) of w 1 , w 2 from their distribution and then replaces (1b) with n 2 constraints of the form f i (w i 1 , w j 2 , u) ≤ 0, ∀i, j.The scenario approach has a very interesting set of pros and cons.On the one hand, it is conceptually simple and is applicable even when the parametric form of the distribution of uncertain parameters is not known and just their samples are given.On the other hand, the naive implementation of the scenario approach is known to be overly conservative.To be precise, the cost J(.)increases with n, although the solution becomes more robust at the same time.Works like [9] provide algorithms for rejection sampling to reduce the conservativeness of the scenario approach.
An alternate class of approach relies on replacing chance constraints with a deterministic surrogate [5], [10], [11].For example, (2) represents the robust variant of the so called sample average approximation (SAA) [11], where, w i 1 , w j 2 represents the i th , j th samples of w 1 , w 2 and I f represents an indicator function.The variable γ is similar but not necessarily the same as the chance constraint probability η.A strong advantage of SAA (2) is that it provides a very tight approximation resulting in a low cost solution.However, (2) represents an extremely difficult non-smooth and non-convex constraint.Thus, the reformulated chance-constrained optimization itself becomes very difficult.Our experimentation has shown that it is possible to solve SAA based reformulations of single variable chance constrained optimization with an exhaustive search.However, such an approach is unlikely to scale to problems with multiple decision variables.
where, E[.], V ar[.] represent the mean and variance of f (.), taken with respect to random variables w 1 , w 2 .Using, Cantelli's inequality, it can be shown that the satisfaction of (3) ensures that chance constraints are satisfied with η ≥ 1+ 2 .However, it should be noted that this bound can be rather loose.The attractive feature of ( 3) is that it is applicable for a wide class of chance constraints.However, its efficiency is predicated on how easy it is to compute analytical expressions for E[.] and V ar [.] .For example, if f (.) is highly nonlinear or/and the parametric form of w 1 , w 2 is not known, then computing an accurate analytical expression for E[.] and V ar[.] becomes a very challenging problem.A workaround has been proposed in works like [13], [15] [16] where the analytical expressions for E[.] and V ar[.] are approximated through Monte Carlo sampling.However we should note two key bottlenecks of such approaches .First, the sample complexity is poor and our experimentation shows that it usually requires around 10 6 samples to get to a reasonable approximation.Second, for a given sample size, it is difficult to estimate how well the surrogate constraints are approximated.This in turn, makes it difficult to accurately infer the feasibility of the original chance constraints.

A. Contributions
In this paper, we present a novel approach built on the fact that any arbitrary distribution (f (w 1 , w 2 , u) in our case) can be embedded as a function (or a point, refer to Fig. 2) in Reproducing Kernel Hilbert Space (RKHS).The embedded function is generally referred to as the Kernel Mean and thus RKHS embedding is also known as Kernel Mean Embedding (KME) in the existing literature [17].A few key advantages of RKHS embedding are worth pointing out.First, the embedding can be achieved even when the parametric form of the underlying uncertainty (w 1 , w 2 in our case) is not known.Second, the embedding only requires point-wise evaluations of f (.) and is thus not influenced by its algebraic complexity.Finally, it opens avenues for the use of established reduced set methods to achieve a good sample complexity.Intuitively, reduced set methods provides a systematic way of choosing a subset of samples while still retaining as much information as possible from the original sample size by re-weighting the importance of those samples.
In the current work, we build on the concept of RKHS embedding and put forward the following contributions: • We further reformulate moment matching as a problem of minimizing the distance between the RKHS embeddings of the constraint function and the desired distribution.We use the so called Kernel trick to show that the complexity of the RKHS embedding based reformulation is comparable to solving a deterministic variant of the chanceconstrained optimization (1a-1c), obtained by replacing (1b) with a single deterministic constraint of the form f (w 1 , w 2 , u) ≤ 0. To be precise, if f (.) is polynomial in u of order l, then the reformulated problem is also a polynomial optimization problem of order 2l.• We benchmark our formulation with the existing approaches based on two metrics : sample complexity and obtained optimal cost.In particular, we highlight the following results: First, we show that our formulation significantly outperforms scenario approximation in both the metrics.Second, our formulation and the SAA approach based on surrogate constraints (2) results in similar optimal cost.However, our formulation leads to a simpler optimization problem and enjoy better sample complexity.Finally, our formulation also outperforms approaches based on surrogate constraints (3).• We apply our formulation on two challenging motion planning/control problems.The first problem involves navigating a mobile robot in dynamic and uncertain environments.Herein, we consider noise arising from both perception and ego-motion during formulating the chance constraints for ensuring probabilistic collision avoidance.
Our second problem implements a stochastic variant of inverse dynamics based path tracking for manipulators.We assume that the concerned manipulator has noiseless motions but noisy state estimation.Consequently, the manipulator should compute the necessary torque commands while considering the state estimation uncertainty to ensure that the probability of exerting a torque that violates the specified bounds is under some threshold.This requirement can be naturally put in the form of chance constraints.

A. RKHS
RKHS is a Hilbert space with a positive definite function k(.) : n × N → called the Kernel.Let, x denote an observation in physical space (say Euclidean).It is possible to embed this observation in the RKHS by defining the following kernel based function whose first argument is always fixed at x.
An attractive feature of RKHS is that it is endowed with an inner product which, in turn, can be used to model the distance between two functions in RKHS.Furthermore, the distance can be formulated in terms of kernel function in the following manner Equation ( 5) is called the "kernel trick" and its strength lies in the fact that the inner product can be computed by only point wise evaluation of the kernel function.

B. Distribution Embedding
The projection to RKHS can also be generalized to distributions.Let, w 1 , w 2 , w 3 .....w n be samples drawn from an unknown probability distribution P .This distribution P can be represented in the RKHS through a function called the Kernel Mean, which is described in the following manner where, α j is the weight associated with w j .For example, if the samples are i.i.d then, α j = 1 n , ∀j.The estimator (6) is consistent, i,e, the estimation improves as the number of samples increases.
Following [17], equation (6) can also be used to embed functions of random variables like f (w 1 , w 2 , u).
An important thing to note from (7) is that for given samples of w 1 , w 2 , the Kernel Mean given by ( 7) is dependent on the variable u.

C. Maximum Mean Discrepancy (MMD)
Given two distributions P , Q, MMD refers to the distance between their RKHS embeddings µ P , µ Q .That is: An important thing to note from ( 8) is how the kernel trick allows us to express MMD only in terms of the point-wise evaluation of the kernel function.

D. Reduced Set Methods
Consider a vector described in terms of weighted combination of basis functions.Reduced set methods are a class of algorithms which allows us to compute an optimal approximation of the vector using a highly reduced number of basis functions [18].Interestingly, the same class of algorithms can be applied to improve the sample complexity of RKHS embedding as well.The process can be described as follows.Let ŵ1

III. MAIN RESULTS
In this section, we derive our main result which is a reformulation of chance-constrained optimization (1a)-(1c) into a much simpler minimization problem.The following are our key assumptions: • We assume that the uncertainty is non-parametric, which in our case means that the probability distribution functions associated with w 1 , w 2 are not known.Rather, we have access to their n discrete samples.These samples could come from a simulator which mimics a very generalized distribution with arbitrary order of moments.• We assume that the analytical form for the constraint functions are known.

A. Algebraic Form of the Constraint Function
In this paper, we consider the chance constraints defined over the following class of constraint functions.
where, h i (w 1 , w 2 ), n → R is a generic possibly non-linear function of w 1 , w 2 , while u i represents a monomial of order i.
The definition (11) is very general and has the famous affine class of chance constraints as a special case with l = 1 and h 0 (w 1 , w 2 ) = 0, h 1 (w 1 , w 2 ) = w 1 .It can be seen that even if the uncertain parameters, w 1 , w 2 are Gaussian, the chance constraints defined over f (w 1 , w 2 , u) may still be too complex to get an analytical characterization for the distribution of f (w 1 , w 2 , u).Let, P f (u) represent the distribution of f (w 1 , w 2 , u)parametrized in terms of u.Its RKHS embedding can be computed using (7) in the following manner:

B. Desired Distribution
The notion of desired distribution is derived from the observations made in Remark 1.To recap, we want to ensure that the distribution f (w 1 , w 2 , u) achieves an appropriate shape.
To this end, desired distribution acts as a benchmark for f (w 1 , w 2 , u); in other words, a distribution that f (w 1 , w 2 , u) should resemble as closely as possible for an appropriately chosen u.We formalize the notion of desired distribution with the help of the following definitions: Definition 1. u nom refers to any solution of the optimization (1a)-(1c) that is associated with a low optimal cost J(u nom ).
Definition 2. Let w 1 , w 2 be random variables which represent the same entity as w 1 , w 2 but belong to some known distributions P des w1 , P des w2 .Further, when w 1 ≈ P des w1 and w 2 , ≈ P des w2 , then, f ( w 1 , w 2 , u nom ) ≈ P des f .In such a case, P des f is called the desired distribution if the following holds: Equation ( 14) suggests that if the uncertain parameters belong to the distribution P des w1 , P des w2 , then the entire mass of the distribution, f ( w 1 , w 2 , u) can be manipulated to lie almost completely to the left of f (.) = 0 by choosing u = u nom .This setting represents an ideal case because we have constructed uncertainties appropriately, so that we can manipulate the distribution of the chance constraints while incurring a nominal cost.

Constructing the Desired Distribution:
We now describe how distributions P des w1 , P des w2 and P des f can be constructed.While exact computations may be intractable, in this section, we provide a simple way of constructing an approximate estimate of these distributions.The basic procedure is as follows.
Given n samples of w 1 , w 2 we construct two sets C w1 , C w2 respectively containing n w1 samples of w 1 and n w2 samples of w 2 .For clarity of exposition, we choose w 1 , w 2 to identify samples from set C w1 , C w2 .Now, assume that the following holds.
By comparing ( 14) and ( 15), it can be inferred that the sets C w1 , C w2 are in fact sample approximations of the distributions P des w1 and P des w2 respectively.Furthermore, a set C f containing n w1 * n w2 samples of f ( w i 1 , w j 2 , u nom ) can be taken as the sample approximation of the desired distribution P des f .One last piece of puzzle remains.We still do not know, however which n w1 samples of w 1 and n w2 samples of w 2 should be chosen to construct sets C w1 , C w2 .In particular, we need to ensure that the assumption (15) holds for the chosen samples.To this end, we follow the following process.We arbitrarily choose n w1 samples of w 1 and n w2 samples of w 2 and correspondingly obtain a suitable u nom as a solution to the following optimization problem: Note that satisfaction of (16b) ensures that the assumption (15) holds.Few points are worth noting about the above optimization.First, it is a deterministic problem whose complexity primarily depends on the algebraic nature of f (.).Second, the desired distribution can always be constructed if we have access to sets C w1 , C w2 .The construction of these two sets is guaranteed as long as we can obtain a feasible solution to (16a)-(16c).Third, the computational burden of solving the optimization problem can be significantly reduced by some clever sampling.For example, in our implementation, we compute the left hand side of (16b) for different combination of samples and then choose the set which leads to the least violation of the constraints (16b).Finally, (16a)-(16c) is precisely the so-called scenario approximation for chance constrained optimization (1a)-(1c).Conventionally, scenario approximation is solved with a large n w1 , n w2 (typically 10 4 ) in order to obtain a solution that satisfy chance constraints (1b) with a high η (≈ 0.90).In contrast, we use (16a)-(16c) to estimate the desired distribution and thus for our purpose, a small sample size in the range of n w1 = n w2 ≈ 20 proves to be sufficient in practice.
The RKHS embedding of these distributions can be obtained in the following manner: Where, λ i , ξ j are constants derived from the reduced set methods described in Section II-D.

C. Chance-Constrained Optimization as a Moment Matching Problem
In this section, we reformulate the chance-constrained optimization (1a)-(1c) as a moment matching problem.Our key idea builds upon the following theorem from [19].
where, d refers to the order upto which the moments of P f (u) and P des f are similar.The above theorem suggests that the difference between two distributions can be bounded by a nonnegative function B(d) which decreases with an increasing order of moment d.Authors in [19] also show that this bound is particularly tight near the tail end of the distribution.Now, recalling that almost the entire mass of P des f lies to the left of f (.) = 0, it is clear that as we make the tail of P des f and P f (u) similar by matching higher order moments, we ensure that more and more of the mass of P f (u) gets shifted to the left of f (.) = 0. This, leads to the satisfaction of chance constraints (1b) with a higher η.Theorem 1 lays the foundation for the following optimization problem which can act as a substitute for the original chance-constrained optimization (1a)-(1c).
where, L mom (.) is a cost function that measures the similarity between the first d moments of P f (u) and P des f .A low value of L mom would imply that the first d moments of P f (u) and P des f are very similar.Accommodating Chance Constraint Probability η: Optimization (20a)-(20b), accommodates the chance constraint probability η in an implicit manner.Thus, the process of obtaining solutions with different level of robustness based on η is more indirect and involved than the original optimization (1a)-(1c).In (20a)-(20b), the similarity between the tail of P f (u) and P des f not only depends on the residual of L mom (.) but also on the moment order d used to construct L mom (.).Fixing weights ρ 1 and ρ 2 and increasing d increases the similarity near the tail end and thus leads to the satisfaction of chance constraints with higher η.A similar goal can be achieved by fixing d and ρ 2 and increasing ρ 1 .

D. Reformulating Distribution/Moment Matching through RKHS Embedding
The optimization (20a)-(20b) is still challenging to solve as it is not clear how to derive a suitable analytical form for L mom (.).To the best of our knowledge, there is no mapping that directly quantify the similarity between the first d moments of two given distributions.Here, we present a workaround based on the concept of RKHS embedding and MMD distance.Our key idea is based on the following results from [17], [20] Let µ P f , µ p des f represent the RKHS embedding of the distributions P f , P des f respectively.If the embedding is constructed through polynomial kernels, then the following theorem holds [20], ( [21], pp-15).Theorem 2. If µ P f (u)−µ P des f → 0, then moments of P f (u) and P des f upto order d become similar.
That is, decreasing the residual of MMD distance becomes a way of matching the first d moments of the distribution P f (u) and P des f .Theorem 2 suggest that the MMD distance can be used either as a measure of similarity between the first d moments of the two distributions.In other words, MMD with polynomial kernel can act as a surrogate for L mom (.).Using this insight, we present the following optimization problem which can act as a surrogate for (20a)-(20b).

E. Simplification Based on Kernel Trick
We now use the so called "kernel trick" to obtain a simplified form for the optimization (21a)-(21b) and highlight that the computational complexity of solving (21a)-( 21b) is comparable to solving a deterministic variant of the original chanceconstrained optimization.For ease of exposition, we consider a specific instance from the definition of constraint function (11) with l = 2 i.e. f (.) = h 0 (.) + h 1 (.)u + h 2 u 2 .We have Expanding µ h0 + µ h1 u + µ h2 u 2 , µ h0 + µ h1 u + µ h2 u 2 , we get Using the kernel trick, (5) reduces to the following expression where, n×n Following a similar process, the second term, Where, Finally, the last term, µ P des f , µ P des f in (22) can be handled in a similar manner and thus, optimization (21a)-(21b) can be expressed as the following non-linear optimization problem Where, Computational Complexity The computational complexity of our proposed algorithm has two specific parts.The first part stems from the complexity of constructing the kernel matrix like (26) used to formulate the cost function (30a).This in turn depends on the number of samples of the uncertain parameters w 1 , w 2 .In the worst case, we require n 2 samples.However, as explained in the Section II-D, the value of n can be optimized using the reduced set methods.
The second part of the complexity stems from how difficult it is to solve the optimization (30a)-(30b).To understand this further, consider a deterministic variant of (1a-(1c)), where we replace the chance constraint (1b) with a deterministic constraint of the form f (w 1 , w 2 , u) ≤ 0. If f (.) is quadratic in u, then the result would be a non-linear optimization problem with a quadratic constraint.In comparison, for the same form of the constraint function, our RKHS based reformulation takes the form of a non-linear optimization with a quartic polynomial of u in the cost function (see (30a)).An optimization with quartic polynomials can now be converted to that with a quadratic polynomial with a simple change of variables.Thus, it can be seen that the computational complexity of (30a)-(30b) is comparable to solving a deterministic variant of the chance-constrained optimization problem.In general if f (.) is a polynomial of order l, then our reformulation would involve a polynomial of order 2l in the cost function.

F. Convergence
Our proposed algorithm inherits the strong convergence guarantees associated with RKHS embedding of functions of random variables.For the individual functions h i (w 1 , w 2 ) (see 11), used to construct the chance constraints, the convergence of the RKHS embedding solely depends on the constants α i , β i .To be precise, let μhi represent the true RKHS embedding, then the following holds [17].
Where, O p represents convergence in probability.The assumptions (32a) are trivially satisfied if we use i.i.d samples of w 1 , w 2 to construct µ hi .Furthermore, the assumptions are also satisfied when working with the reduced set samples computed by the approach described in Section II-D.
The convergence of µ P f in RKHS also follow similar arguments.However, the convergence rate in this case also depends on the value of the decision variable u besides α i and β i (see (7)).Intuitively, the reason for this can be understood in the following manner: The shape of the distribution P f (u) depends on u, and thus for some u it can attain a very peculiar shape, which would require a larger number of samples for accurate enough estimation.

IV. APPLICATIONS
In this section, we consider two robotic/control applications and model them in the form of the chance-constrained optimization (1a)-(1c) and also present their RKHS reformulations.

A. Dynamic Obstacle Avoidance along a Given Path
Here, we consider dynamic collision avoidance between a disk shaped robot and non-reactive moving obstacles with similar shapes (Fig. 3(a)).Both the robot and the obstacles are assumed to have a single integrator motion model, i.e they can instantaneously change their velocities.Further, we consider a variant of the problem where the path of the robot is fixed and the robot achieves collision avoidance simply by varying the magnitude of its forward velocity.As shown in our earlier works [22], [23], the more general collision avoidance like [24], [25] can be conveniently built from this special case.
Let, (x, y) and ( ẋ, ẏ) be the position and velocity vector of the robot at some specific time instant when the robot detects imminent collision with the obstacles.Similarly, let (x o , y o ) and ( ẋo , ẏo ) represent similar vectors for the moving obstacle.It is clear that if the velocity vector of the robot is modified as (u ẋ, u ẏ), then it continues to move along its current path although the magnitude of its forward velocity gets scaled by a factor u.For u > 1, the robot would increase its forward velocity while for u < 1, it would slow down, to avoid collisions.Therefore the dynamic collision avoidance constraint can be written in the following form (refer to [23] for details).
Where, R, R o represent the radius of the footprint of the robot and the obstacle.Inequality (33a) can be put in the following more compact form, which resembles (11) with l = 2.
34) where, w 1 = (x, y, ẋ, ẏ) and w 2 = (x o , y o , ẋo , ẏo ).Uncertainty: Assume that the robot has both perception and ego-motion uncertainty (Fig. 3(b)).That is, P denotes a general PDF that is defined until its fourth order of moment.λ (.) 1 and λ (.) 2 denote the third(skewness) and fourth(kurtosis) moments respectively.If λ x 1 , λ x 1 = 0, and λ x 2 , λ x 2 = 3, then PDFs take the Gaussian form.Both w 1 , w 2 then become uncertain parameters and thus collision avoidance needs to be rephrased as chance constraints.The final chance constrained optimization for dynamic collision avoidance under uncertainty takes the following form.
The cost (36a) minimizes the deviation from the current forward velocities.Optimization (36a)-(36c) fits in the form described by (1a)-(1c).After solving the above optimization problem or rather the RKHS embedding based reformulation of it, the robot draws a sample from its current velocity distribution ẋ, ẏ and executes it after scaling by a factor u to avoid collisions.Multiple moving obstacles: If there are multiple moving obstacles in the environment, then the parameter w 2 needs to be computed specifically for each moving obstacle.That is, we have: Consequently, we will also have multiple collision avoidance constraints: The chance-constrained optimization would now have multiple chance constraints and take the following form.
min J(u) = (u − 1) 2 (38a) Our RKHS embedding based reformulation would now have the following form: where, µ P f i (u) represents the KME of the i th chance constraints and µ P des f i represents the KME of the desired distribution corresponding to the i th chance constraints.Note that the first term in (39a) can be obtained using the derivations presented in Section III-E.

B. Inverse Dynamics based Path Tracking
In this application, we consider the task of tracking a reference trajectory x d (t) by a manipulator (Fig. 4(a)), which can be framed as the following quadratic programming (QP) problem.

arg min
J(q(t))q(t) + J(q(t), q(t)) q(t) − ẍ(t) 2 2 (40a) M(q(t))q(t) + C(q(t), q(t)) q(t) ≤ τ max (40b) M(q(t))q(t) + C(q(t), q(t)) q(t) ≥ −τ max (40c) Where, ẍ(t) = k p (x(t) − x d (t)) + 2 * k p (ẍ(t) − ẍd (t)) + ẍd (t) and k p is a constant feedback gain.q(t) and q(t) represents the joint angle and velocities at time t.Let the degree of freedom of the manipulator be m, i.e, q(t) = (q 1 (t), q 2 (t)...q m (t)).J is the manipulator Jacobian matrix.The inequalities (40b)-(40c) ensures that the resulting q(t) is achievable without violating the torque bounds.The QP (40a)-(40d) is solved in a one-step receding horizon setting for trajectory tracking.To be precise,the QP is solved for the joint accelerations at each instant considering the current joint position and velocities.The state is evolved with the current acceleration and the process is repeated for a specific time duration.
Constraints (40b)-(40c) represent 2m affine inequalities each of which can be represented in the following familiar form: where, w 1 = (q 1 (t), q 2 (t)...q m (t)), w 2 = ( q1 (t), q2 (t)... qm (t)) (u 1 , u 2 ..u m ) = (q 1 (t), q2 (t)..q m (t)) Trajectory Tracking under Perception Uncertainty Assume that the manipulator has perfect motion capability but imperfect sensing for the joint angles q(t) and velocity q(t) (Fig. 4(b)).In such a case, q(t), q(t) and functions h j i (.) and h i (.) can be modeled as random variables.With this insight, we now formulate a stochastic variant of the inverse dynamics based path tracking problem as the following chance-constrained optimization: J(q(t))q(t) + J(q(t), q(t)) q(t) − ẍ(t) 2 2 (42a) where, J(q(t)) and J(q(t), q(t)) represents the Jacobian matrix formed with the mean variables q(t) and q(t).The inequality (42b) ensures that the resulting q(t) can be achieved without violating the torque bounds with atleast probability η.It can be seen that, (42a)-( 42c) is an extended variant of the original chance constrained optimization (1a)-(1c).Specifically, we now have multiple decision variables along with multiple chance constraints.
Remark 3.There is a subtle difference between the multiple chance constraints in optimization (38a)-( 38c) and (42a)-(42c).In the former, multiple chance constraints arise because the parameters i w 2 were different for each obstacle while the function f (.) remained the same for each constraint.In contrast, in the latter, the functions f i (.) were different for each constraint but the parameters w 1 , w 2 remained same across different constraints.
where, µ P f i (.) represents the KME of the i th chance constraints and µ P des f i represents the KME of the desired distribution corresponding to the i th chance constraint.

V. RESULTS
In this section, we present simulations obtained by applying our formulation to the examples derived in the previous section.During each application, we also separately benchmark our formulation with the some of the existing approaches for chance constrained optimization.The simulation videos for the results can be found in http://robotics.iiit.ac.in/uploads/Main/Publications/Bharath journal/.The uncertainty in velocity of the robot and the obstacle.We used the Kernel Density Estimation technique to obtain a graphical representation of the uncertainty in the velocity.The Gaussian approximation for these non-Gaussian distributions are also shown in lighter shades.It is evident that the Gaussian approximation results in a poor inference of the actual probability.

A. Collision Avoidance Results
1) One Obstacle Benchmark with Non-Gaussian Uncertainty: In this benchmark, a robot tries to avoid a single moving obstacle while considering the uncertainty in its own motion and its perception of the obstacle, both of which are assumed to be Non-Gaussian with unknown distribution.The configuration of the robot and the obstacle is shown in Fig. 3(b).The brown and the green colored samples in this figure,   As shown in Section IV-A, the uncertainty in position and velocity can be mapped to uncertain parameters w 1 , w 2 and consequently to functions h 0 (w 1 , w 2 ), h 1 (w 1 , w 2 ) and h 2 (w 1 , w 2 ).We subsequently use this information to compute the collision avoidance velocity for the robot.
The solution process and results are summarized in Figs.6(a)-6(f).As described previously, the solution process starts with the construction of the desired distribution P des f 1 .Subsequently, we ensure that the distribution of P f (u) is similar to P des f (atleast near the tail end) by choosing an appropriate u and the degree of the polynomial kernel d.The following key points should be particularly noted from the plots.Figs.6(a), 6(b), 6(c) clearly show that as d increases, the distributions P f (u) and P des f become more alike (atleast near the tail end) and at the same time a higher portion of the mass of P f (u) gets pushed to the left of f (.) = 0.
The increase in similarity between the two distributions is correlated with actual collision avoidance in Figs.6(d)-6(f), wherein the position samples shown in black correspond to the samples of distribution P f (u) which are to the right of f (.) = 0.The position samples shown in red correspond to samples which are to the left of f (.) = 0. Physically, this means that if at the current instant, the robot and obstacle occupy any of the position shown in black then the robot is going to collide with the obstacle if it chooses any velocity from the distribution ( ẋ, ẏ) and executes it after scaling it by a factor u. Similarly, the reverse holds for the position samples shown in red.Another important thing to note from Figs.6(d)-6(f) is that as the mass of the distribution P f (u) gets pushed to the left of f (.) = 0 due to an increase in d, the number of colliding position samples also reduces.Also, note that an increase in d is observed with a simultaneous increase in u.This means that the robot needs to modify its forward velocity by a larger amount to maintain a high probability of collision avoidance.
2) Three Obstacle Benchmark with Non-Gaussian Uncertainty: Here we consider a benchmark where the robot needs to avoid collisions with three obstacles under Non-Gaussian perception and motion uncertainty.3) Comparative Results on Collision Avoidance: Table II shows a comparison of the number of samples required by different approaches to compute an optimal solution such that the chance constraints are satisfied with a specified η.The following points can be noted from the table • As expected, a naive implementation of the scenario approach shows the worst sample complexity.For η ≈ 0.7, we required 200 samples each of w 1 , w 2 leading to a grid of size 4 * 10 4 .For η ≈ 0.9, we required 500 samples each of w 1 , w 2 .
• The SAA approximation proposed in [11] required a sample size almost half of that required by the scenario approach.For η ≈ 0.7, we needed 100 samples each of w 1 , w 2 .This requirement increased to 200 for η ≈ 0.9.• The approach of [10], [13] which is based on surrogate constraints 3 shows an interesting trend.The sample complexity is worse than scenario approach for η ≈ 0.7.However, the sample size does not vary with η.This is because the samples of the uncertain parameters are used to obtain an estimate of E[f (w 1 , w 2 , u)] and V ar[f (w 1 , w 2 , u)] and importantly, this estimation is independent of η.
• As can be seen from Table II, our proposed formulation based on RKHS embedding has significantly better sample complexity than all the above discussed approaches.It required 20 samples each of w 1 , w 2 to construct a reasonable estimate of the desired distribution.An additional 20, 40 samples were required to construct the RKHS embedding based reformulations at η ≈ 0.7 and η ≈ 0.9 respectively.
Figs.10(a), 10(b) compare the optimal cost obtained through different formulations.The following important observations can be drawn from it • Our proposed formulation results in lower cost solutions than approaches based on scenario approximation and surrogate constraints (3) [10], [13].The difference is more pronounced for non-Gaussian uncertainty and at higher η.In fact, at a higher η, approach based on (3) often runs into infeasibility.• Interestingly, the SAA approach of [11] result in very similar costs to those of our proposed formulation for both Gaussian and non-Gaussian uncertainty.This is not surprising as SAA proposed in [11] is indeed a very tight approximation of the chance constraints.

B. Path Tracking Results for a 2 link Manipulator
Recall that in this application, we repeatedly solve the chance constrained optimization (42a)-(42c) or rather the reformulation of it (43a)-(43b) and evolve the joint angles and velocities according to the computed acceleration control input at each iteration.Moreover, we have multiple chance constraints Comparing P fi (.) and P des fi at both the iterations, it can be seen at iteration 60, the tails of the two distributions are more closely matched and as a result, a larger portion of P fi (.) lies to the left of f (.) = 0.A direct consequence of this can be observed in the torque plots.At iteration 60, we observe fewer

Approach
P (f (w 1 , w 2 , u) < 0) ≈ 0.7 P (f (w 1 , w 2 , u) < 0) ≈ 0.9 Scenario w 1 , w 2 = 200 w 1 , w 2 = 500 SAA [11] w 1 , w 2 = 100 samples of torque that violate the torque bounds compared to what we observe at iteration 69. 1) Comparative Results for Path Tracking: We now compare our proposed RKHS based formulation with the scenario approach and the approach based on surrogate constraint (3) in the context of the path tracking application. 2.Table III and IV 2 We do not compare with the SAA approach of [11] here because its computational complexity on this application becomes too prohibitive.The collision avoidance application involved only one decision variable and thus, we could do a brute force search to solve the SAA formulated problem.However, such an approach would not be suitable for the path tracking application.Authors in [11] suggest a mixed integer reformulation, wherein the number of integer variables would be equal to the number of samples of the uncertain parameters.But, we remark that such a reformulation would be prohibitive for high dimensional robotic systems like manipulators.
summarizes the sample complexity for η ≈ 0.7 and η ≈ 0.9 respectively for different values of τ max .As can be seen, our RKHS based formulation enjoys better sample complexity than both the compared approaches in this application too.The order of improvement increases with η.Moreover, at η ≈ 0.7 an additional trend can be observed: the order of improvement also improves as the chance-constrained optimization becomes tighter due to a decrease in τ max .At η ≈ 0.9, the order of improvement remains almost the same for various τ max .It can also be noted that the sample complexity in this application is significantly lower than that observed in the previous collision avoidance application.We attribute this to the fact that f i (.) for path tracking application is affine in terms of decision variable (see (41)) while in collision avoidance application, is a non- convex quadratic.Fig. 12 shows the comparison of average optimal costs observed across 20 different problem instances.
As can be seen, our RKHS based formulation produces significantly lower cost solutions and the order of improvement increases with a decrease in τ max .To provide more insight, we present one sample comparison in Fig. 13.Note that the torque profile obtained through our formulation remains closer to the saturation for more iterations as compared to the torque profiles obtained from the scenario approach.The difference is more pronounced for τ max = 3.The higher torque naturally provides more control authority to track the reference path and velocity profiles as verified in the path deviation and cost plots of Fig. 12.

VI. CONCLUSION AND FUTURE WORK
Mathematical operations in RKHS, has been the back bone for many of the modern machine learning algorithms.Examples of these span from kernel SVM to Gaussian Process.Recent trends in data science and programming languages widely advocate the use of probabilistic programming.Among the many existing approaches used in probabilistic programming, Hilbert space embedding of distributions has recently gained a lot of popularity.In fact literature along the lines of [17] even call it as Kernel Probabilistic Programming.We have adopted a series of recent papers [17], [20], [18] in this field that describes what a hilbert space embedding of a function of random variables would actually mean.One of the key aspect of our work is connecting the theory of RKHS embedding of distributions to a widely studied problem of chance-constrained optimization, which has applications in both robotics and control.
We formulated chance-constrained optimization as a problem of matching higher order moments of two distributions.The eventual structure that our formulation takes is that of a non-linear optimization problem, which can be easily solved with the help of off-the-shelf solvers.We validated our formulation on application like dynamic collision avoidance of mobile robots and path tracking of manipulators under torque bounds.Our benchmarking clearly establishes the improvement that our formulation provides over existing approaches in terms of sample complexity and optimal cost.At the moment, our formulation has some limitations, which we would be looking to rectify in our future works.Firstly, the cost function in our formulation is assumed to be deterministic, i.e they do not contain the uncertain parameters.One simple way of rectifying this would be to formulate stochastic cost as constraints using some slack variables.We are currently evaluating the scalability of this idea.Secondly, we are working on benchmarking our formulation with approaches, which first fits some distribution to the non-parametric uncertainty and then performs the subsequent analysis.Examples of such fitting techniques include include Gaussian Mixture Model, Kernel Density Estimator, Gaussian Process etc. Finally, we are also looking at more complex applications like multi agent navigation, reinforcement learning, etc.

Fig. 3 .
Fig. 3. (a): Deterministic collision avoidance between a robot and a dynamic obstacle.(b): Stochastic variant of the dynamic collision avoidance.The robot has both perception and ego-motion uncertainty.

Fig. 4 .Fig. 5 .
Fig. 4. (a): Problem set-up for inverse dynamics based path tracking for a two-link planar manipulator.(b):Inverse dynamics based path tracking under perception uncertainty leading to noisy estimates for joint position q(t) and joint velocities q(t)

Fig. 6 .
Fig. 6.The figures present the simulation results for collision avoidance with a single obstacle (Fig. 3(b)) under non-Gaussian uncertainty.Figures (a), (b), (c) show the constructed desired distribution P des f and the distribution of P f (u).The increase in the degree of the polynomial kernel d can be easily correlated with the decrease in the colliding samples shown in figures (d), (e), (f).

Fig. 7 .
Fig. 7. (a): Collision avoidance scenario where the robot needs to avoid collision with three moving obstacles.(b): The position samples of robot and obstacles at some specific time instant when the robot detects imminent collision with the obstacles.(c): The uncertainty in velocities of robot and obstacles.It is clear from the plots that they are non-Gaussian in nature, the Gaussian approximations of these distributions are also shown.The main intent of displaying the Gaussian distributions is to show how poorly they approximate the original non-Gaussian distributions.

Fig. 7 (
a) represents the configuration of the robot and the moving obstacles.At some specific time instant, Figs.7(b), 7(c) represent the uncertainty in the robot's and the obstacle's positions and velocities.Figs.8(a), 8(b), 8(c) show the desired distribution P des fi constructed corresponding to chance constraints formulated with respect to each obstacle.The figures also show the distribution of P fi (u) for various values of d.The improvement in collision avoidance probability with an increasing value of d is further validated in Figs.8(d), 8(e), 8(f) via a comparison of the position samples from where the robot can either collide with (black) or avoid (red) the obstacles.Snapshots from collision avoidance simulations are shown in Figures 9(a)-9(h).It is easy to relate these snapshots to the position samples from figures 8(d), 8(e), 8(f).As the value of d increases, the robot chooses a velocity that results in more and more clearance with the obstacles.This is what results in reduction of colliding samples in Figs.8(d), 8(e), 8(f).
and thus, a desired distribution P des fi needs to be constructed corresponding to each of them.Fig.11(a), 11(b) show the distributions P des fi and P fi (.) (for one of the chance constraints) at iteration 60 and 69 for d = 2. Fig.11(c) shows the torque values obtained at each iteration.The lines in black represent the mean torque values while the cyan shows the uncertainty around it in the form of samples.Fig.11(d) shows the tracking performance in terms of path deviation and optimal cost values at each iteration.

Fig. 8 .
Fig. 8. Figures show the simulation results for collision avoidance with three moving obstacles shown in Fig.7(b) under non-Gaussian uncertainty.In this example, we have multiple chance constraints P (f i (w 1 , i w 2 , u) ≤ 0) ≥ η because the uncertain parameter w 2 was different for each obstacle.Thus, as shown in Figures (a), (b), (c), we need to construct three different desired distributions P des f i corresponding to chance constraints formulated with respect to each obstacle.As seen in previous examples, an increase in the degree of the polynomial kernel d leads to the increase in the portion of the mass of P f i to the left of f (.) = 0. Figures (d), (e), (f) validate the reduction of the colliding samples with an increase in d.

Fig. 9 .
Fig. 9. Snapshots of collision avoidance simulation for d = 3, 5.Note how increase in d results in increase in clearance between the robot and the obstacles.The increased clearance translates to improvement in probability of collision avoidance.

Fig. 10 .
Fig.10.Average Optimal cost obtained with different methods for collision avoidance application observed across 20 different problem instances.Our RKHS formulation consistently results in lower cost solutions.Furthermore, the approach based on surrogate constraints (3) often runs into infeasibility at higher η.

TABLE II TABLE SUMMARIZING SAMPLE
COMPLEXITY FOR COLLISION AVOIDANCE APPLICATION.