A Data Compression Strategy for the Efficient Uncertainty Quantification of Time-Domain Circuit Responses

This paper presents an innovative modeling strategy for the construction of efﬁcient and compact surrogate models for the uncertainty quantiﬁcation of time-domain responses of digital links. The proposed approach relies on a two-step methodology. First, the initial dataset of available training responses is compressed via principal component analysis (PCA). Then, the compressed dataset is used to train compact surrogate models for the reduced PCA variables using advanced techniques for uncertainty quantiﬁcation and parametric macromodeling. Speciﬁcally, in this work sparse polynomial chaos expansion and least-square support-vector machine regression are used, although the proposed methodology is general and applicable to any surrogate modeling strategy. The preliminary compression allows limiting the number and complexity of the surrogate models, thus leading to a substantial improvement in the efﬁciency. The feasibility and performance of the proposed approach are investigated by means of two digital link designs with 54 and 115 uncertain parameters, respectively.


I. INTRODUCTION
The ever-growing demand for higher data rates in high-speed links, along with the increasing complexity and miniaturization, is making the effect of uncontrolled variations of design parameters (e.g., geometry, material parameters, and components tolerances) on system performance far from being negligible.
Among the several available approaches for uncertainty quantification, Monte Carlo (MC) simulation is undoubtedly the most straightforward method for assessing link performance with respect to parameter uncertainty. Indeed, MC allows predicting statistical quantities of the outputs of interests (e.g., voltage/current overshoots, eye digram opening, maximum dissipated power, etc.) using a set of deterministic simulations computed for some suitable random samples of the uncertain input parameters, drawn according to their probability distribution. While being straightforward to implement and virtually available in any design environment, an accurate statistical assessment via The associate editor coordinating the review of this manuscript and approving it for publication was Wiren Becker . MC analysis typically requires an exorbitant number of simulations, thus becoming impractical for real-life scenarios.
To overcome this computational limitation, stochastic macromodeling techniques based on the framework of polynomial chaos expansion (PCE) [1] or machine learning (ML) [2], [3] were extensively investigated in recent years for electrical and electronic engineering applications, see for example [4]- [16], respectively. The common underlying idea is to use a small set of simulation results, exploring the parameter space as much as possible, to ''train'' a closedform surrogate model of the expensive ''full-computational'' model. This surrogate model is then used to inexpensively predict the system performance for any possible configuration of the uncertain input parameters, thus allowing for the rapid calculation of statistical information.
In the most versatile implementations, the surrogate model parameters are calculated, and possibly tuned, via suitable regression techniques [4], [5], [11]. However, while in standard PCE implementations a common set of basis functions is used for any sweep point and any output of interest, nonparametric regression-based ML tools and advanced PCE techniques require to tune the hyper-parameters and/or the VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ basis functions, and therefore to solve a different optimization problem for each output of interest and sweep point (e.g., [8], [9], [14], [15]). This makes the abovementioned approaches prohibitive for problems with multiple and possibly time-or frequency-dependent outputs. Examples include sparse PCE approaches, such as those based on tensor recovery [17] or other adaptive schemes [18], [19], and non-parametric ML regression techniques like support-vector machine (SVM) [20], [21], least-squares SVM (LS-SVM) [22], and Gaussian process regression (GPR) [23]. All these techniques help mitigate the ''curse of dimensionality'', i.e., the efficiency reduction when the number of uncertain input parameters increases. Therefore, advanced stochastic surrogate models are the winning choice when applied to a limited set of outputs of interest. Otherwise, their direct application to the modeling of multiple transient responses turns out to be rather cumbersome, because a large number of time points should be considered to capture highly nonlinear dynamics.
Usually, there are two different approaches to tackle the above issue. One solution is to include the effect of both system parameters and transient dynamics into a single recursive model [24], for example via a neural network [3], [25], [26]. Such models also include the realizations at previous time points as additional input parameters. Despite its compactness, the resulting model is so highly complex that a very large number of training samples (in the order of thousands) is needed, since a huge number of model parameters must be simultaneously tuned. This approach might be a viable solution for low-order linear systems, but the model complexity grows exponentially for higher-order nonlinear systems.
A reasonable alternative is to build a different surrogate model for each output variable and time instant of interest (see, e.g., [9]). While this reduces the complexity of the single model, the overall number of models to be created is potentially huge, especially if a large number of time points is required, like for example when simulating eye diagrams. This paper proposes an alternative solution to overcome the shortcomings of this second approach. Specifically, principal component analysis (PCA) [27] is used to remove redundant information from input data samples, which are therefore reduced to a minimum subset [28]. Briefly speaking, the underlying idea is to exploit and remove the inherent correlation existing among several realizations of various responses of the system evaluated at different time points. A compression rate of several orders of magnitude is usually achieved, thus making the application of advanced stochastic surrogate models feasible for this reduced dataset. The implementation through singular value decomposition (SVD) allows controlling the accuracy of the compression.
The feasibility and strength of the proposed technique are assessed by means of two high-speed links: a 16-bit Flash memory bus operating at 66 MHz (≈1 Gbps) and affected by 54 uncertain design parameters, and a single electronic link affected by 155 stochastic variables and driven by a DDR buffer transmitting at 133 Mbps. For both test cases, PCA is applied to obtain a compressed representation of the training data. Then, two types of surrogate models are considered, namely a sparse PCE in combination with leastangle regression (LAR) [19] and a LS-SVM regression [22].
The paper is organized as follows. The problem and the goals addressed in this work are stated in Section II. Section III outlines the proposed compression scheme based on PCA. The performance of the proposed methodology is investigated in Section IV by means of two digital link designs. Finally, Section V concludes the paper. Throughout the paper, X , x, x, X, X denote a set, a scalar, a vector, a matrix, and a tensor, respectively.

II. PROBLEM STATEMENT
This section briefly introduces the problem under consideration. We consider a generic dynamic nonlinear system ∈ R d is a set of (uncertain) input design parameters, y(t) = [y 1 (t), . . . , y M (t)] T ∈ R M are the system outputs, and t is an independent sweep variable which the outputs depend on. 1 Without loss of generality, we do assume that all d uncertain parameters in x are independent and equally significant. Techniques are available to possibly reduce the set of input parameters when they are correlated [29].
Let us introduce the following set of training pairs: where y l (t k ) are vectors collecting the M system outputs, computed from (1) for a specific configuration x l of the input parameters at K distinct and increasing time points {t k } K k=1 . For the sake of notation compactness, the set of training pairs is rewritten as a union of subsets where each subset is defined as and y l,k,m denotes the mth output evaluated at the kth time instant for the lth configuration of the input parameters, i.e., y l,k,m = M m (t k ; x l ), with subscript m denoting the specific system output considered. Starting from a set of training pairs D k,m , we seek for the best configurations of parameters {w k,m } K ,M k,m=1 , each defining a surrogate modelM m (t k ; x, w k,m ) that minimizes the empirical risk functional 1 In the context of this paper, t denotes time; however, it could also be any other sweep variable like frequency, input power, temperature, etc.
where is the so-called loss-function. For example, the squared difference (M(t; x, w), y) = (M(t; x, w) − y) 2 (6) is the loss function used by the ordinary least-squares regression. Specifically, we are looking for the best set of model parameters w * that minimize the model error (5) on the available set of training samples D, i.e., In the framework of uncertainty quantification, the training set D is obtained by drawing samples x l of the input variables randomly or pseudo-randomly, and computing the corresponding system responses y l . In this paper, we use a Latin hypercube strategy to randomly sample the design space with good exploration properties.

III. PCA-COMPRESSED SURROGATE MODELING
Since the system outputs are time-dependent, it is virtually impossible that a single surrogate model can accurately predict the entire system dynamics at each time point, for each output, and for any configuration of the input parameters x. Therefore, the conventional, practical approach is to build a different surrogate model for each time point t k and output variable y m for which data are available, with k = 1, . . . , K and m = 1, . . . , M . In turn, the set of model parameters changes with time and output variable, and therefore we denote it as w k,m . These parameters are obtained by suitable regression techniques using L training responses {y l,k,m } L l=1 for each of these variables.
The complete dataset of training responses in D, let us denote it with Y = {y l,k,m } L,K ,M l,k,m=1 , can be interpreted as a L × K × M three-way tensor. The data in tensor Y exhibit some inherent correlation, since they come from the same system. Specifically, data for different responses, at different time points, are not completely unrelated to each other, but rather exhibit a large amount of interdependency. If tensor Y is reshaped by stacking data for all outputs and time points rowwise, a matrix Y ∈ R KM ×L is obtained. This new dataset Y can be interpreted as a columnwise collection of L realizations {ζ l } L l=1 of a KM -variate stochastic variable ζ . We calculate the experimental covariance matrix of ζ , i.e., where ζ ∈ R KM ×KM and is the dataset with the samples epurated from the mean which is subtracted columnwise.
Each of the ζ -realizations can be expressed as [28] for l = 1, . . . , L, where and {u n } KM n=1 are the left singular vectors, obtained by calculating the SVDỸ and taking the columns of U.
We can now truncate (11) to retain only then ''principal components'': which can be identified by setting a threshold on the relative magnitude of the singular values ofỸ, collected in descending order into the diagonal matrix S. The singular values of Y are proportional to the square root of the eigenvalues of its covariance matrix ζ . If we define ζ as the covariance matrix of the approximated data (14), using matrix properties we can conclude that where σ 1 is the first singular value ofỸ and σn +1 is the first singular value that is discarded by the PCA truncation. Hence, setting a threshold on the singular values allows a rigorous control on the approximation in terms of variance.
In this paper, we truncate when the magnitude falls below 1% of the first singular value (i.e., = 10 −2 ).
The key achievement is that the set {Z l,n } L,n l,n=1 of PCA coefficients (12) can be interpreted as a collection of L samples ofn new output system variables Z n (x), with Z l,n = Z n (x l ), describing the information pertaining to the entire set of original time-dependent outputs y(t). Since typicallyn ≪ KM , the PCA truncation leads to a substantial compression of the number of variables to be modeled. Each of these reduced variables can be approximated using any surrogate model. Once a model is available for the compressed variables Z n , new samples for the original time-dependent output variables can be recovered via (14). It is important to remark that no specific assumption on the nature of the correlation among the available data is required by the PCA algorithm.
In the appendices, we introduce the two surrogate models that are used in the application examples in conjunction PCA compression, namely the sparse PCE [18] and the LS-SVM regression [22]. However, the proposed compression strategy outlined in this section is general and applies to any generic surrogate model. VOLUME 8, 2020

IV. APPLICATION EXAMPLES
In this section, the feasibility and performance of the proposed methodology are assessed based on two application examples: a 16-bit channel with 54 uncertain parameters driven by Flash memory chips and a single-channel electronic link with 115 uncertain parameters [8] and driven by a DDR buffer. For the drivers, we use behavioral macromodels constructed using the method in [32], [33], but any other model, either behavioral or transistor-level, may be used.
We use the UQLab toolbox [30] for calculating sparse PCE surrogate models and the LS-SVMlab toolbox [31] to carry out LS-SVM regressions. Specifically, we use thirdorder polynomials and RBF kernels for the two methods, respectively. This provides a satisfactory trade-off in terms of accuracy and training cost for the considered application examples.
All circuit simulations are performed using HSPICE on a Dell Precision 5820 workstation with an Intel(R) Core(TM) i9-7900X, CPU running at 3.30 GHz, and 32 GB of RAM. The time step of the transient simulation is set to one half of the risetime of the digital signals to ensure satisfactory accuracy and resolution of the waveforms.

A. EXAMPLE 1: 16-BIT FLASH-MEMORY BUS
As a first application test case, the proposed methodology is applied to the link depicted in Fig. 1, which represents a 16-bit transmission channel of a memory chip. For the I/O buffers, we use behavioral macromodels compatible with the drivers of a Flash technology operating at 66 MHz. The structure includes a resistive rail for the power supply of the buffers, a RLC network with package parasitics, and a transmission channel consisting of 16 coupled microstrip traces. The farend terminations are floating to mimic connection to highimpedance receivers. The even-bit drivers transmit slightly asynchronous pulses with a duration of 15 ns and rise/fall times of 0.1 ns, whereas odd-bit drivers remain quiet in the ''low'' state. There are d = 54 uncertain design parameters, each with an independent Gaussian distribution and a 10% standard deviation from the nominal value. Namely: the resistance of the power rail (nominal value: r = 1.14 ), the package parasitics (nominal values: resistance R = 50 m , inductance L = 2 nH, capacitance C = 5.5 pF), the width and thickness of each microstrip trace (nominal values: w = 150 µm and t = 30 µm, respectively), the gap between the traces (nominal value: g = 150 µm), and the substrate parameters (nominal values: thickness h = 460 µm, relative dielectric permittivity ε r = 4.1, loss tangent tan δ = 0.02). The values of package parasitics are here assumed to be independent, even though some correlation usually exists between them. The power supply voltage is V DD = 1.8 V and the microstrip trace resistivity is ρ = 1.72 · m. The bus is 18-cm long.
The outputs of interest are the terminal voltages at the receiver side of each line and the supply voltages of each driver. These M = 32 outputs are evaluated at K = 901 equally-spaced time points between 0 and 45 ns. A naive application of advanced surrogate modeling techniques would require the construction of KM = 28832 models. On the other hand, the use of a standard, non-sparse PCE implementation with order p = 3 would require the determination of the coefficients for |K| = 29260 basis functions, and hence the use of an exorbitant number of training samples to solve the corresponding regression problem.
We consider instead L = 300 training configurations of the uncertain parameters, generated according to a Latin hypercube scheme. The corresponding responses are evaluated by means of HSPICE simulations. Figure 2 shows the normalized singular values of the training dataset. The singular values drop below the 1% threshold atn = 51. Therefore, we retain the first 51 terms in the PCA (11), with a compression rate amounting to less than 0.2% of the original variables. It should be noted that lowering the simulation time step only adds redundant information to the data, which is then cut off by the PCA compression. For example, if the time step is reduced to one tenth of the risetime, the data contains K = 4501 time points, but the PCA compression still retains the same number ofn = 51 terms. Moreover, if the PCA is applied separately to each of the 32 outputs (thus only exploiting the correlation of data over time), the overall number of models to be computed becomes 624. This demonstrates that modeling all outputs concurrently allows to effectively take advantage of their interdependency and leads to further compression.
We construct surrogate models for the PCA coefficients using both a sparse third-order PCE and a LS-SVM regression with RBF kernel. In the former case, the cardinality of the subsetsK for the sparse models of the various PCA coefficients varies between 29 and 132, meaning that the number of non-negligible PCE coefficients is between 0.1% and 0.5% of the total. On the other hand, the hyper-parameters of the LS-SVM regression, such as γ k,m in (24) and θ k,m in (26), are tuned based on the available set of training samples using the leave-one-out cross-validation score [31]. Figure 3 shows the variability and the standard deviation of a selection of four outputs, namely the terminal voltages on channels #0, #1, and #11, and the supply voltage of the buffer of channel #15. The green lines are a set of 500 responses from a reference MC analysis, which help visualize the voltage variation due to the parameter uncertainty. It should be noted that the second and the third plot refer to crosstalk voltages, as the drivers of channels #1 and #11 are not transmitting. Moreover, from the fourth plot, a large fluctuation of the supply voltage is observed, resulting from the commutation of the buffers. The solid blue, dashed red, and dotted yellow curves are the standard deviation estimated from 10000 MC samples, with the sparse PCE, and with the LS-SVM regression, respectively. The results provided by the two surrogate models compare well, and they are also in fairly good agreement with the reference MC results. An exception is the case of the crosstalk voltage on channel #1, for which a rather large discrepancy is observed. The reason could be ascribed to the large amount of ''outlying'' responses that can be observed in Fig. 3, which particularly affects the estimation of the standard deviation, especially around 20 ns.  To further assess the accuracy, we calculate the probability density function (PDF) of the quantities in Fig. 3 at time points exhibiting large variability. The results are shown in Fig. 4. A remarkable accuracy is established for both the PCE and the LS-SVM models, even for the crosstalk on channel #1. Similar or better results are found for the remaining outputs. Table 1   is minor compared to the simulation of the training samples, and the two techinques exhibit similar performance in terms of accuracy and computational efficiency for this test case.

B. EXAMPLE 2: EYE DIAGRAM OF A DDR LINK
As a second application example, we consider the electronic link investigated in [8]. However, differently from [8], we consider a time-domain simulation in which a behavioral macromodel of a DDR driver operating at 133 Mbps transmits a pseudo-random bit sequence and the uncertainty is provided by d = 115 parameters, again following a Gaussian distribution with a 10% relative standard deviation. These parameters include the rail resistance of the driver power supply (nominal value: r ≈ 0.866 ), the package parasitics (same nominal values as in the previous test case), as well as the values of all lumped elements and of all geometrical and material parameters of the microstrip lines along the link (we refer to [8] for the numerical values). Both training and validation samples from the full-computational model are again generated by simulating the link using HSPICE.
In this case, we analyze a single output variable, i.e., the voltage at the receiver side (hence, M = 1). However, because we consider a sequence of about 1000 bits, each data consist of K = 150001 time points, leading to an even larger dataset compared to the previous test case. Different datasets are used to train the surrogate models, with increasing samples sizes of L = {50, 100, 200, 300}. It should be noted that the number of basis functions for a full PCE would be in this case |K| = 266916. Figure 5 shows the normalized singular values of the various training datasets. For the smallest dataset with L = 50 responses, only the last singular value drops below the 1%-threshold. It is important to mention that the last singular value is always zero because one degree of freedom is lost in removing the mean from the dataset in (9). Therefore, we can conclude that this dataset is not large enough to sufficiently exploit the correlation between the various responses. For the datasets with L = 100, L = 200, and L = 300 training responses instead, the singular values drop below the threshold after index 90, 83, and 76, respectively, leading to a PCA compression between 0.05% and 0.06% of the original variables. It is interesting to note that larger datasets lead to a more effective PCA compression, as a result of the higher   amount of information contained. For the sparse PCE trained with the largest dataset (L = 300 samples), the size of the reduced sets of basis functions ranges from 47 to 147, with a sparsity well below 0.06%. Figure 6 shows the eye diagram resulting from the superposition of the received bit sequences for a small number of stochastic link realizations. The colored lines represents the eye masks obtained by considering the 95%-quantiles of the distributions of the high and low voltage levels based on 1000 link realizations. The solid blue line is the result obtained with the reference full-computational HSPICE simulations. The dashed red and dotted yellow lines are the eye masks based on the the sparse PCE and LS-SVM surrogate models, respectively, both trained with L = 300 responses. Excellent agreement between these techniques is again established.
Furthermore, the four panels of Fig. 7 compare the PDFs of the eye height (maximum opening) obtained with different training set sizes and the reference distribution estimated from the MC analysis (blue histogram). The standard deviation of the eye height estimated from the MC samples is 49.0 mV. The values obtained with the sparse PCE and LS-SVM models are reported in Table 2. It is important to remark at this point that the proposed technique allows for the uncertainty quantification of the entire received bit sequence, and not of just a single synthetic output quantity like the eye height, and hence the determination of more complex information such as the probabilistic eye mask of Fig. 6.
Finally, Table 2 also provides other relevant figures about the accuracy and efficiency of the two surrogate models for this second test case. It is observed that the calculation of the PCA compression, i.e., of the SVD (13) and the subsequent projection (12), has a negligible impact on the overall computational cost, even for this large-size example. Moreover, the sparse PCE achieves slightly lower average and maximum RMS error compared to LS-SVM regression. However, for this application, the latter is much more efficient in the model building phase. This is readily explained by the fact that, as opposed to PCE, the LS-SVM model complexity depends on the training set size rather than on the number of uncertain parameters, as discussed in Appendix B. The time required by the HSPICE simulation of the 1000 reference MC samples is 68934 s (19 h 9 min). This analysis was limited to a smaller number of samples due to the difficulty in handling larger datasets.

V. CONCLUSIONS
This paper presented an uncertainty quantification framework for large time-domain datasets. The proposed approach consists of combining a PCA compression with advanced surrogate modeling strategies such as those based on PCE or ML. The PCA allows reducing the amount of output data to be modeled to a minimum set of variables by exploiting the inherent correlation between the various responses at different time points.
The advocated technique allows the straightforward application of advanced surrogate modeling techniques to the uncertainty quantification of systems with multiple and timedependent outputs. Two application examples concerning the signal integrity assessment of digital links affected by 54 and 115 uncertain parameters illustrate the strength and feasibility of the proposed approach.
where the functions ϕ κ form a basis of orthonormal multivariate polynomials in the input design parameters x. They are typically built as the product combination of univariate polynomials, i.e., and K is a set of multi-indices κ = [κ 1 , . . . , κ d ] indicating the degrees of the polynomials in each variable. In this case, the model parameters are the PCE coefficients, i.e., w k,m = {β k,m κ } κ∈K . It is important to point out that in standard and naive PCE implementations [4], [5], the set K is formed by the multiindices up to a given total degree p, i.e., leading to a cardinality of |K| = (p + d)!/(p!d!). The same set K is used for each time point and output variable, which implies that also the basis functions ϕ κ do not change. This makes the calculation of the model parameters relatively simple. Indeed, given a set of training samples D k,m = {(x l , y l,k,m )} L l=1 , with y l,k,m = M m (t k ; x l ), the risk functional (5) reads where q k,m = [y 1,k,m , . . . , y L,k,m ] T ∈ R L and ∈ R L×|K| is a Vandermonde-like matrix with entries ϕ κ (x l ), ∀κ ∈ K, ∀x l ∈ D k,m . The optimal parameter set, minimazing the riskfunctional, is readily found as VOLUME 8, 2020 where + = ( T ) −1 T is the Moore-Penrose pseudoinverse of .
Since the regression matrix is the same ∀k, m, its pseudo-inverse + is computed only once and the above calculation is readily vectorized by stacking data for different datasets D k,m columnwise, thus obtaining the parameter set for all time points and port variables simultaneously. On the other hand, such a simple model structure may hinder the applicability of this method to high-dimensional problems. Since the regression problem needs to be overdetermined w.r.t. to the number of unknowns |K|, typically at least L = 2|K| training samples are used, thus rapidly causing the well-known ''curse of dimensionality'' as the expansion order and/or the number of uncertain design parameters is increased.
In order to mitigate the above limitation, sparse PCEs have been proposed [17], [19]. These techniques exploit the ''sparsity-of-effects principle'', meaning that most of the coefficients β k,m κ in (16) are actually negligible [18]. This suggests the adaptive identification of a small subset K ⊂ K of basis functions and corresponding coefficients, with |K| |K|. However, the subsetK typically differs for each output variable and time point. Hence, a separate regression problem must be set up and solved for each of these variables, with a detrimental impact on the efficiency when dealing with multiple and/or time-dependent outputs.

APPENDIX B LS-SVM REGRESSION
Given again a set of training samples D k,m , we now look for the best set of parameters w k,m of the LS-SVM regression in the primal space [22], which reads: where w k,m = [w 1 k,m , . . . , w D k,m ] T is a vector collecting the regression coefficients and φ(x) = [φ 1 (x), . . . , φ D (x)] T is a vector collecting the set of basis functions, such that φ(x) : The risk function minimization problem in (5) where the loss-function combines the least-squares error e 2 l with a Tikhonov regularization. The parameter γ k,m provides a trade-off between model flatness and accuracy on the available training samples, thus reducing the overfitting. The above formulation of the LS-SVM in the primal space is equivalent to ridge regression. By using the Lagrangian, the solution of the optimization problem (22) can be recast in terms of the following linear systems of equations: where the coefficients α l,k,m , for l = 1, . . . , L, are the Lagrangian multipliers associated to the LS-SVM model for the output y m at the time instant t k , whereas K (x, x ) = φ(x), φ(x ) is the so-called ''kernel function'', such that K (·, ·) : R d×d → R. By introducing the kernel function K (x, x ), the linear system (23) can be recast in matrix form: where α k,m = [α 1,k,m , . . . , α L,k,m ] T ∈ R L , y k,m = [y 1,k,m , . . . , y L,k,m ] T ∈ R L , 1 T = [1, . . . , 1] ∈ R L , I ∈ R L×L is the identity matrix, and k,m ∈ R L×L is the kernel matrix with elements m,k ij = K (x i , x j ; θ k,m ), ∀x i , x j ∈ D k,m with i, j = 1, . . . , L, and in which θ k,m is a set of hyperparameters characterizing the kernel function.
By solving (24), we can write the LS-SVM formulation in the dual space: where the regression coefficients α l,k,m , the bias term b k,m , and the kernel hyper-parameters θ k,m , need to be computed for each set of training pairs D k,m , i.e., for each output y m and time point t k .
As opposed to the PCE, the LS-SVM regression in the dual form is a non-parametric technique in which the number of coefficients α l,k,m to be estimated equals the number L of training samples, and it is completely independent from the number d of uncertain design parameters. Thanks to the kernel function K (·, ·), the dual space formulation does not require an explicit definition of the basis functions φ(x). This is the so-called ''kernel trick''.
In this paper, we use the Gaussian radial basis function (RBF) kernel: where θ k,m = σ k,m is a single hyper-parameter tuned according to the training samples.