Conservative Gaussian Process Models for Uncertainty Quantification and Bayesian Optimization in Signal Integrity Applications

Surrogate modeling is being increasingly adopted in signal and power integrity analysis to assist design exploration, optimization, and uncertainty quantification (UQ) tasks. In this scenario, machine learning methods are attracting an ever-growing interest over alternative and well-consolidated techniques due to their data-driven nature. However, an open issue is to properly assess the trustworthiness of predictions when generalizing beyond training data. Among various machine learning tools, Gaussian process regression (GPR) has the notable feature of providing an estimate of the prediction uncertainty (or confidence) due to the lack of data. Nevertheless, the uncertainty introduced by the estimation of the Gaussian process parameters, which is part of the training process, is typically not accounted for. In this article, we introduce improved GPR formulations that take into account the additional uncertainty related to the estimation of (some of) the Gaussian process parameters, thereby providing a more accurate estimate of the actual prediction confidence. Furthermore, the advocated framework is extended to UQ and Bayesian optimization (BO) settings. The technique is applied to two test cases concerning the analysis of crosstalk in a transmission line network and of the frequency-domain response of a microstrip line with a ground plane discontinuity.

Manuscript received 1 February 2024; revised 10 April 2024; accepted 14 April 2024.Date of publication 17 April 2024; date of current version 16 August 2024.Recommended for publication by Associate Editor X. Chen upon evaluation of reviewers' comments.
The author is with the EMC Group, Department of Electronics and Telecommunications, Politecnico di Torino, 10129 Turin, Italy (e-mail: paolo.manfredi@polito.it).
Color versions of one or more figures in this article are available at https://doi.org/10.1109/TCPMT.2024.3390402.
Digital Object Identifier 10.1109/TCPMT.2024.3390402 Among the abovementioned methods, the polynomial chaos expansion is mainly limited by the model complexity, which does not scale favorably with the number of independent design parameters.Moreover, the number of required training samples is also somewhat related to the input dimensionality [17], and may become therefore intractable.Neural networks are also very data hungry, and the related approaches require a massive amount of simulations of the actual model to achieve a reasonable training accuracy.This issue can be mitigated by leveraging suitable knowledge-based networks [22] or multiple neural networks trained at different levels of fidelity [23], when applicable.Not surprisingly, neural networks turn out to be effective mainly for those tasks for which huge datasets are available, e.g., image processing.Furthermore, the selection of an optimal neural network architecture, i.e., the number of layers and hidden neurons, and the activation functions, is also rather arbitrary and case-dependent, and it mostly relies on experience or trial and error.As such, this approach is more suitable for models that can be reused many times, rather than for oneoff simulations.Indeed, a lot of manual tuning is typically involved to obtain an acceptable model, which makes neuralnetwork-based methods rather cumbersome and with scarce flexibility.
On the other hand, kernel-based machine learning methods, such as support vector machines [24] and Gaussian process regression (GPR) [25], are very effective in this regard.They are characterized by a much simpler structure, in which the main key ingredient is the underlying kernel.This is usually chosen upfront, as there exist some standard kernels with excellent generalization performance.Alternatively, an optimal kernel can be selected from a pool of candidates during the training phase.Kernels are typically characterized by some free "hyperparameters," which are optimized during the training phase (similar to the weights in neural networks).The simpler structure of kernel-based methods does not imply lower modeling capabilities.Quite the opposite, it has been shown that GPR models are equivalent to infinitely wide fully connected neural networks [26].Support vector machine and GPR were successfully used for high-dimensional UQ, e.g., in [7], [9], [10], and [11].
One of the main open questions related to surrogate models is their prediction accuracy.This is usually assessed based on the available training data or, more appropriately, on a (small) additional set of validation data that has not been considered for training.Nevertheless, very little information is available as to the generalization beyond the observed data.In this regard, one of the attractive features of GPR is that the model inherently embeds an estimate of the prediction uncertainty outside the training dataset.In the context of Bayesian optimization (BO) [27], this information is used to drive the acquisition of additional observations in an iterative scheme [28], [29].GPR was also applied to UQ in [10] and [11], in which the single-point prediction uncertainty was propagated to aggregate UQ estimates (e.g., statistical moments and distribution functions).
However, the estimated prediction confidence is rigorous only as long as prior information on the Gaussian process is fully available, which is virtually never the case in practice.In particular, the additional uncertainty introduced by the estimation of the GPR model parameters is not accounted for [30].As a result, classical and naive implementations of GPR only provide a rough (typically, overconfident) estimation of the actual prediction uncertainty.
In this article, the GPR framework is extended by improving the accuracy of the prediction confidence, with targeted application to signal integrity analysis.With some relatively simple corrections of the posterior covariance [31], the uncertainty in the estimation of (some of) the prior parameters is accounted for, which naturally leads to more robust surrogates that embed a more accurate (i.e., conservative) estimation of the prediction confidence.The enhanced GPR models are then used in the context of UQ and BO.
The present article builds upon the preliminary work in [32], which presented the general idea for increasing the conservativity of GPR models.In [32], a single test case with a single output was considered.In the present article, the theory and analysis are complemented and extended with: 1) detailed information on the necessary corrections to increase the conservativity of the model prediction uncertainty; 2) extension of the framework to BO; 3) investigation of additional test cases with a larger number of independent parameters.Moreover, the combination with principal component analysis (PCA) is adopted to address problems with multiple and/or time-or frequency-dependent outputs.
The remainder of the article is organized as follows.Section II reviews the classical GPR framework.Section III introduces the necessary corrections to make the GPR models more conservative and account for the estimation of prior parameters.The improved GPR models are applied in the context of BO and UQ in Section IV.Sections V and VI present two application examples, whereas conclusions are drawn in Section VII.

II. STANDARD GPR MODELING
We start by considering a generic system in the form of with M : X ⊆ R d → R.This is a shorthand notation to describe an arbitrary computational model that, given a configuration of design parameters x ∈ X , provides a given output quantity of interest (QoI) y.For example, x could be a set of optimizable or uncertain design variables (geometrical or material parameters and circuit elements), and y the result of a circuit or electromagnetic simulation (e.g., a crosstalk voltage, an absorbed power, and a scattering parameter).
In surrogate modeling, the goal is to approximate (1) with an accurate though inexpensive mathematical representation that mimics the actual computational model.To begin with, we assume the QoI to be scalar.Clearly, especially in the context of UQ, the QoI may be an entire transientor frequency-domain response.Hence, we later relax this restriction.
The fundamental assumption of GPR (also called Kriging, especially in geostatistics where the method originated [33]) is that the functional relationship (1) can be assimilated to a specific realization of a Gaussian process called prior, i.e., where µ(x) is the prior mean function, or trend, σ 2 r (x, x ′ ) is the covariance function, or kernel, and σ 2 is its variance [25].

A. Definition of the Prior
Usually, the prior trend is expressed as a linear combination of basis functions, i.e., The basis functions h k are often polynomials of increasing order ("universal Kriging"), with a constant trend and a polynomial chaos expansion being notable special cases in the so-called "ordinary Kriging" and "polynomial-chaos-based Kriging" [34], respectively.For a given amount of training data, the latter was shown to perform better or at least as good as plain polynomial chaos expansion and traditional GPR.Furthermore, the kernel is often described by a stationary correlation function r (x, x ′ ) = r (x − x ′ |θ ), which depends only on the difference between x and x ′ and on a set of lengthscales θ = (θ 1 , . . ., θ d ).The lengthscales describe the degree of correlation along each input dimension, which in turn defines the smoothness of the function in that dimension.These are the so-called hyperparameters of the GPR model.The squared-exponential and Matérn 5/2 covariances are popular kernel functions (see [25], [35]).The kernel is said to be isotropic if the same lengthscale is considered for each dimension, or anisotropic otherwise.

B. Classical Prediction
If the prior trend and kernel are fully known a priori, the realization that best represents the function (1) is inferred by collecting a certain number of observations {(x l , y l )} L l=1 , with y l = M(x l ) (the "training samples"), and applying Bayesian inference.This leads to the best linear unbiased prediction of y at unseen points [31], which is provided by the posterior trend [35], [36] where • y = (y 1 , . . ., y L ) T ∈ R L is the vector of training observations; • R ∈ R L×L is the correlation matrix of the training samples, with entries R lm = r (x l , x m ); • r(x) = (r (x 1 , x), . . ., r (x L , x)) T ∈ R L is the vector of cross-correlations between the training samples and the prediction point; • H ∈ R L×K is the matrix of the trend basis functions evaluated at the training samples, with entries H lk = h k (x l ); • h(x) = (h 1 (x), . . ., h K (x)) T ∈ R K is the vector with the trend basis functions evaluated at the prediction point.Without loss of generality, we assume the training observations to be noiseless.The framework is readily extended, with minimal modifications, to the case of noisy data [25], [35].
One important feature of this Bayesian approach is that it allows assigning quantitative information on the prediction uncertainty due to the limited amount of information that is used for the inference.This is expressed by the function which describes the posterior correlation between predictions at two different points.Hence, the GPR model (posterior) is a Gaussian process with mean function (4) and covariance function c 0 (x, x ′ ) = σ 2 ς 0 (x, x ′ ), i.e., In the above equations, the subscript "0" is used to differentiate this "primitive" covariance model from the modifications that will be introduced later.We shall label this model as "GPR-0," indicating a model with the lowest confidence level.This corresponds to the most standard and popular model considered in the literature and available in commercial tools (e.g., MATLAB 1 ).The quantity c 0 (x, x) provides the variance of the prediction.The variance collapses to zero at the training samples, as (4) provides an exact interpolation of the observations [31].Since the posterior distribution remains Gaussian, knowing the prediction variance allows obtaining rigorous confidence bounds.For instance, we can expect the true value of y to lie with 95% probability within the interval ].

C. Estimation of the Prior Parameters
Unfortunately, the trend coefficients β = (β 1 , . . ., β K ) T and the kernel hyperparameters (σ 2 , θ ) are in practice never known a priori, and they are typically left as free parameters 1 Registered trademark.
to be estimated based on the training data.This allows better adapting the GPR model to the problem at hand.
The trend coefficients are estimated using a generalized least-square estimate, leading to [35] and [36] The kernel parameters are instead estimated by minimizing a suitable objective function F(σ 2 , θ ), i.e., Typical objective functions are the negative log-likelihood or a suitable (e.g., leave-one-out) cross-validation error [25], [35].
Once an estimate of the prior parameters ( β, σ 2 , θ ) is available, they are plugged into (4) and ( 5) to provide the prediction and the corresponding uncertainty.However, this approach neglects the uncertainty that is introduced by the estimation of the prior parameters themselves, thereby leading to a likely overestimation of prediction confidence [30].
In this article, we show that, by means of relatively simple modifications to the posterior covariance, we can account for the uncertainty in the estimate of-at least-β and σ 2 .Accounting also for the uncertainty in the estimation of θ would make the framework much more complex instead [31].

D. Multioutput Systems
The basic framework works with scalar QoIs, as GPR inherently provides a single-output model.On the other hand, the QoI is oftentimes a function of some sweep variable, e.g., time or frequency.Moreover, several concurrent outputs may be of interest, e.g., in multiport systems.A naive solution is to train a separate GPR model for each time point, frequency, and/or port variable of interest.However, this would quickly become inefficient because of the massive amount of models to be trained.
To circumvent this problem, multioutput formulations of GPR were proposed in the literature [37], [38], [39], [40].Another viable solution, which we will adopt in this article, is to compress the multioutput data using PCA [9] or, similarly, proper orthogonal decomposition [41], [42].This limits the separate models to be trained to the principal components only, thereby effectively reducing their number by orders of magnitude.The interested reader is referred to [9] and [10] for further details on the use of PCA in conjunction with surrogate models.

III. CONSERVATIVE GP MODELS
In this section, we outline the modifications of the posterior covariance that are required to account for the uncertainty in the estimation of β first and, in a second step, also of σ 2 .An important fact is that the posterior prediction remains the one provided by (4).
To account for the uncertainty in the prediction of β, the posterior correlation (5) requires to be modified with an additional term, leading to [31] and [36] where The posterior distributions remaining Gaussian, with covariance function c 1 (x, x ′ ) = σ2 ς 1 (x, x ′ ).We shall label this model "GPR-1," with To account also for the uncertainty in the prediction of σ 2 , we must use a Bayesian setting in which the kernel variance is inferred from the available data.In this case, the posterior distribution is no longer Gaussian, but it rather becomes a Student's t-distribution with ν = L − K degrees of freedom [43] and scale function [31] where Recalling that the t-distribution is a sort of "wider" normal distribution, which approaches the latter for ν → ∞, we can observe that the prediction uncertainty is higher if the number of training samples L is low and/or the number of trend basis functions K is large, which is reasonable. 2  The posterior covariance is in this case given by c 2 (x, We shall label this model "GPR-2," with where the above notation indicates a Student's t-process with ν degrees of freedom.In summary, while GPR-0 only accounts for the prediction uncertainty resulting from the limited amount of data that is used to train the model, GPR-1 also accounts for the uncertainty introduced by the estimation of the trend coefficients β and GPR-2 further accounts for the uncertainty due to the estimate of the kernel variance σ 2 .As we will show, this leads to an increasing level of conservativity and to a more accurate confidence information.It is important to remark that the proposed corrections to the posterior covariance are readily applied to any GPR formulation for which the prior trend can be cast as in (3), i.e., it is linear in its unknown coefficients β, and the kernel variance σ 2 can be factored out from the prior correlation function.This covers the vast majority of the formulations available in the literature.

IV. SURROGATE MODELING FOR BO AND UQ
Using the GPR model to surrogate (1) implies using (4) to generate model predictions, possibly combined with (5), (9), or (12) to assess the confidence of such predictions with an increasing level of conservativity.For a significance level of α (i.e., a confidence level of 1 − α), the upper and lower confidence bounds (LCB) are given by and respectively, where s(•) is a scale parameter that is, for GPR-0 and GPR-1, respectively, with i = 0 and i = 1, or for GPR-2.Moreover, F −1 (•) is the inverse cumulative distribution function of the (normalized) posterior distribution, i.e., the standard normal inverse cumulative distribution function for GPR-0 and GPR-1, or the Student's t inverse cumulative distribution function with ν degrees of freedom for GPR-2.

A. Application to BO
BO is a global optimization method that is suitable for objective functions that are expensive to evaluate [27].Starting from a minimal set of observations, a GPR model is fit to the objective function and is iteratively refined by querying additional data samples.The acquisition is driven by the information on the prediction uncertainty [29].Hence, having a more accurate estimate of the prediction uncertainty improves the optimization performance.
All selection strategies involve, at each iteration, the maximization (or minimization) or some "acquisition function."Hence, the next sampling point is selected as which, for the sake of notation consistence, we cast as a minimization problem.Some popular acquisition functions are discussed in the following, all providing a trade-off between exploration and exploitation.1) Lower Confidence Bound: This strategy selects the next sampling point as the location where the LCB ( 16) is minimum.For a point with a similar prediction uncertainty, the LCB is more likely to be minimum in the neighborhood of the actual function minimum.
2) Probability of Improvement: The next sampling point is selected as the one yielding the largest probability of improving the current estimate.The corresponding acquisition function is mathematically defined as where denotes the current estimate of the function minimum, i.e., the minimum of the posterior prediction, ϵ is a margin parameter (usually relative, e.g., ϵ = 0.001 • |y min |), and F(•) is the cumulative distribution function of the (normalized) posterior distribution.In essence, (20) defines the probability of finding a value of y that is smaller than y min − ϵ.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.3) Expected Improvement: This acquisition function is designed to take into account not only the probability of improvement (PoI), but also its magnitude.The expected improvement (EI) is defined as [44] EI for a Gaussian posterior distribution (GPR-0 and GPR-1), and for a Student's t posterior distribution (GPR-2) [45], where φ(•) is the probability density function (pdf) of the (normalized) posterior distribution and is the standardized output minimum.
It should be noted that the aforementioned acquisition functions are all analytical and rather well-behaved.Therefore, the minimization problem ( 19) is readily solved using some other standard optimization method, such as genetic algorithm.
In Fig. 1, the concept of BO is illustrated with the aim of finding the minimum of the analytical function x (2 cos(x) + sin(x)) (25) in the interval x ∈ [3,7].The actual function is shown in top of Fig. 1 (blue solid line) together with the corresponding prediction (dashed black line) of a GPR model with a constant prior trend and squared-exponential kernel, trained with L = 3 samples at points x = {4, 5, 6} (blue circles).The red, yellow, and green lines are the 95% confidence bounds of GPR-0, GPR-1, and GPR-2, respectively.The magenta circle indicates the true function minimum (i.e., y min = −0.1719),whereas the red asterisk is the current estimated minimum (i.e., the minimum of the GPR prediction).Fig. 1 (bottom) shows the LCB, PoI, and EI acquisition functions computed based on the GPR-2 posterior, normalized with respect to their maximum value for the ease of comparison.The next sampling point, selected as the location where the acquisition function is maximum, is denoted with a circle.It is noted that the LCB and EI prescribe to sample in the corner of the interval, where the prediction uncertainty is the highest, whereas the PoI suggests sampling a point that is much closer to the actual minimum.Nevertheless, any acquisition function leads to the correct result within a few iterations, as indicated by the results summarized in Table I, with PoI providing indeed a slightly faster convergence.

B. Application to UQ
GPR was also applied in the context of UQ, in which the prediction confidence was propagated from point-wise predictions to UQ measures, such as moments and distribution functions [10], [11].Although some semianalytical results are available that require the integration of the posterior [10], for high-dimensional problems, the most viable and convenient approach is to use the GPR model to surrogate the actual system (1) in a Monte Carlo (MC) analysis.
In the above scenario, the GPR prediction is calculated for a (typically large) ensemble of samples {x * i } N i=1 .This leads to a set of N correlated random variables defining the model predictions at the above ensemble of points, which are described by the mean vector and by the pertinent covariance matrix computed according to the posterior distribution.In (26), h * ∈ R K ×N and r * ∈ R L×N are matrices with entries [h * ] ki = h k (x * i ) and [r * ] li = r (x l , x * i ), respectively.To obtain the covariance matrices, we first define the correlation matrices stemming from ( 5), (9), and (12), i.e., for GPR-0 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
for GPR-1, and for GPR-2, where R * * ∈ R N ×N is a matrix with entries [R * * ] i j = r (x * i , x * j ) and Then, the covariance matrix is C i = σ 2 i for GPR-0 and GPR-1, respectively, with i = 0 and i = 1, and C 2 = (ν/(ν − 2)) 2 for GPR-2.The covariance matrix for GPR-2 is well defined only for ν > 2, which requires to have a number of training samples Based on the above definitions, the mean of the MC predictions (an estimate of the mean of the QoI) is a random variable whose expected value and variance are analytically computed from the mean vector (26) and the pertinent covariance matrix, as reported in [11].Indeed, the results for the output mean in [11] readily apply regardless of the posterior distribution.
Furthermore, the variance of the MC predictions (an estimate of the variance of the QoI) is also a random variable.Closed-form results for its expected value and variance were also derived in [11], yet the latter is valid only for a Gaussian posterior distribution, i.e., for GPR-0 and GPR-1 models.Nevertheless, empirical results suggest that assuming a Gaussian approximation for the distribution of both the posterior and the variance incurs a small error in the calculation of confidence bounds, while providing a useful assessment of the dispersion due to the limited amount of data used to train the GPR model.
In summary, the outlined procedure allows assigning confidence bounds to the MC prediction of the output mean and output variance, respectively, obtained when the GPR model is used to surrogate the actual model (1).When the posterior covariance of GPR-0 is considered, the confidence bounds only account for the uncertainty due to the limited availability of training data.Using the covariance of GPR-1 and GPR-2 allows accounting also for the uncertainty introduced by the estimation of the trend coefficients β and kernel variance σ 2 , respectively.Confidence bounds for higher order moments and probability distributions are computed numerically by generating a large number of posterior realizations instead [32].

C. Implementation
The outlined framework is implemented in MATLAB by leveraging the Statistics and Machine Learning Toolbox3 [46] for training the GPR models.The toolbox estimates the trend coefficients via (7) and the kernel hyperparameters using maximum likelihood.The provided predictive covariance is the one of GPR-0, i.e., (5).For GPR-1 and GPR-2 models, we correct the posterior covariance as outlined in Section III.For the latter, only the estimated lengthscales are retained, whereas the estimate of the kernel variance is already embedded in the posterior distribution.

V. APPLICATION EXAMPLE #1: MULTICONDUCTOR TRANSMISSION LINE NETWORK
The first application test case considers the interconnect of Fig. 2, consisting of three sections of coupled embedded transmission lines, some of which are terminated by nonlinear inverters at the far end [47].The cross section of the multiconductor transmission line sections is depicted at the bottom of the figure.The nominal values of the geometrical parameters and element values are provided in the second column of Table II.The central line of the first section is driven by a trapezoidal voltage pulse with an amplitude of 5 V, a duration of 1 ns, and rise/fall times of 100 ps.
The QoI is taken as the maximum absolute value of the far-end crosstalk occurring over time at the 0.5-pF termination (i.e., C c3 ).We perform an UQ task first, and then we optimize the parameters to minimize crosstalk.The simulations are carried out using HSPICE [48] which, for this example, implements the map (1) between the circuit parameters and the QoI.

A. UQ of Maximum Crosstalk
For this analysis, the uncertainty is provided by d = 9 parameters, namely the trace widths, gaps, and distances from the ground plane in each of the three sections, whose nominal values are indicated in the first nine rows of Table II.These parameters are assumed to have a Gaussian distribution with a 10% standard deviation from the nominal value, and the same variation is considered for all traces within the same section (i.e., for all parameters in the same row of Table II).
For the GPR models, we consider a polynomial-chaos-Kriging model [34] with a second-order Hermite expansion for the trend and an isotropic squared-exponential kernel.As a rule of thumb, it is often recommended to use a number of training samples from 5 to 10 times the number of uncertain parameters (see [11], [31]).Since the trend features K = 55 expansion terms, we consider L = {60, 70, 70} training samples so that condition ( 31) is also fulfilled.The training points are generated randomly using a uniform Latin hypercube sampling scheme in the hyperinterval spanning three times the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II NOMINAL VALUES OF THE INTERCONNECT PARAMETERS, TOGETHER WITH TOLERANCE AND RANGE OF VARIATION FOR UQ AND BO TASKS TABLE III MEAN AND VARIANCE OF THE MAXIMUM CROSSTALK VOLTAGE
standard deviation (i.e., ±30%) around the nominal value of the parameters.
Table III provides the results for the first two statistical moments, i.e., mean and variance, of the maximum crosstalk voltage.The first row reports the result obtained with a MC simulation performed in HSPICE for N = 1000 random configurations of the geometrical parameters.The remaining rows provide the 2-sigma intervals of the moments predicted with the GPR model based on the same ensemble of MC samples, 4 computed for different conservativity levels of the prediction uncertainty and for increasing training set size.It is observed that the confidence intervals of GPR-0 are very narrow and do not include the reference values, even for the largest training dataset with L = 80 samples.On the contrary, the estimates of GPR-1 and GPR-2 models are more conservative, yet only the latter always include the correct result.As the number of training samples is increased, the confidence interval of GPR-2 shrinks around the correct result.
This example shows that the classical model of the posterior covariance is prone to an underestimation of the prediction Fig. 3. Distribution of the upper and lower bounds of the 2-sigma confidence interval of the maximum crosstalk mean (top) and variance (bottom) computed over 100 independent runs for the GPR-0, GPR-1, and GPR-2 models (red, yellow, and green histograms, respectively) trained with L = 60 samples, and comparison with the reference MC values (blue bars).

TABLE IV POSTERIORI VERIFICATION OF THE ACTUAL PREDICTION CONFIDENCE
uncertainty, while the introduced corrections lead to a more accurate estimation.As mentioned before, confidence bounds can be obtained numerically also for the pdf of the QoI.Results in this regard were provided in are therefore they are not repeated here.
So far, the results were obtained with a single configuration of the training dataset for a given size.To evaluate the dispersion of the performance across different training datasets, the analysis is repeated for 100 independent runs for each training set size (i.e., L = {60, 70, 80}).The aim of this analysis is to assess whether the better conservativity holds for many random configurations of the training samples.Fig. 3 shows the distribution of the upper and lower 2-sigma bounds of the predicted mean (top) and predicted variance (bottom) obtained with the smallest training set size (i.e., L = 60) and the three different conservativity levels (left, central, and right).In particular, the lighter and darker histograms show the distribution of the LCB and UCB, respectively.The vertical blue line indicates the reference value from the MC analysis (see Table III).
It is interesting to note that for GPR-0 (left plot, red histograms), the two distributions overlap completely, indicating that it is very likely that the reference MC value falls outside the confidence interval.The two distributions become increasingly separated for the more conservative GPR-1 and GPR-2 models (central and right plots, respectively).In particular, for GPR-2 (green histograms), the two distributions are well apart, with the reference value lying in the middle.This signifies that the confidence bounds almost always enclose the correct result, as their distributions rarely cross its value.The above conclusion is further corroborated by counting the rate at which the correct value falls within the 2-sigma confidence interval over the 100 runs.As indicated in Table IV, the rate is remarkably close to 95% for GPR-2, whereas it is much lower for GPR-0 and GPR-1. 5This indicates a severe overestimation of the prediction confidence, which turns out to be actually much lower than 95% for those models.
Finally, Fig. 4 compares the performance of the GPR-2 model against the popular polynomial chaos method based on least-angle regression [49], implemented via the UQLab toolbox [50] and using the same second-order Hermite expansion as the GPR trend.In particular, the boxplot assesses the dispersion across the 100 training datasets of the relative error on the mean and variance with respect to the MC result, respectively, denoted with ϵ µ and ϵ σ .The red lines indicate the median error, whereas the bottom and top edges of the boxes are the 25th and 75th percentiles, respectively.The whiskers indicate the minimum and maximum values, while the red crosses are outliers.For the polynomial chaos method, the mean and variance are analytically derived from the model coefficients.For the GPR model, they are obtained via the analytical relations in [11].
It is noted that the two methods achieves a similar error for the mean, which is usually estimated with very good accuracy by most methods.For the variance instead, the GPR model overall provides a better accuracy, with a median error of about 3% for the datasets with and 80 training samples, compared to about 10% obtained with polynomial chaos.The higher dispersion of the GPR-2 model trained with 60 samples reflects its higher and more conservative prediction uncertainty, while the median error is still slightly lower compared to polynomial chaos.These results confirm that GPR models provide better or comparable accuracy as polynomial chaos, an outcome that was also highlighted in [11] and [34].

B. BO of Maximum Crosstalk
In this second scenario, we aim to minimize the far-end crosstalk with respect to all the 35 design variables indicated in Table II, which are now allowed to vary independently in the range indicated in the last column.To this end, we implement the BO scheme outlined in Section IV-A in conjunction with both GPR-0 and GPR-2 models.
It should be noted that the goal of this example is to assess the efficacy of the optimization scheme with a more conservative posterior covariance on a large input space, but some of the parameters considered may not be optimizable in practice (or, at least, not individually) due to other design or fabrication constraints.Moreover, even if the optimal design may lie in a corner of the design space, a full corner analysis would require to test 2 35  ≈ 3×10 10 configurations.
For the GPR models, we use an isotropic squaredexponential kernel and a constant trend, since a polynomial trend would feature an excessive number of terms in such a high-dimensional space.We start from a common initial dataset of L = 20 observations to initialize the GPR model, and we run the optimization scheme for 40 iterations using either the PoI or EI acquisition function.
Fig. 5 compares the evolution of the minimum predicted objective (maximum crosstalk, in volts) throughout the BO iterations obtained with the four combinations of the two GPR models and the two acquisition functions.It is observed that the EI acquisition function (red and purple curves) provides a faster and smoother convergence compared to PoI (yellow and blue curves) for a given model type.Moreover, for both acquisition functions, a better convergence is achieved by using the GPR-2 model instead of the GPR-0 one, especially in the first iterations.This is reasonable, since increasing the number of observations in turn increases the number of degrees of freedom in the GPR-2 posterior distribution, thereby reducing the difference in the covariance.In general, since GPR-2 provides a more accurate estimation of the prediction confidence, on which BO is highly reliant, we expect GPR-2 to always provide better, or at least similar performance compared to the standard GPR-0 model.Hence, GPR-2 should be preferred over GPR-0, also in view of its limited additional computational complexity.Furthermore, Fig. 6 (top) compares the transient crosstalk response of the optimal design achieved with GPR-2 and EI (red line) against the spread of 1000 MC configurations drawn randomly within the design space (gray area).The comparison highlights that the maximum crosstalk over time of the optimal design is lower than any of the random MC configurations.
Fig. 6 (bottom) further enhances the comparison, with the cyan circles indicating the maximum crosstalk level of all the random configurations, whereas the colored lines indicate the maximum crosstalk of the optimal designs achieved with the aforementioned four configurations of BO (i.e., GPR-0 and GPR-2 models with PoI and EI acquisition functions).It should be noted that, as opposed to Fig. 5, the results in Fig. 6 now refer to the actual maximum crosstalk that is achieved in the SPICE simulation, and not to the one predicted by the GPR model.It is observed that the all optimal designs exhibit a lower crosstalk level than any of the random configurations, and the overall best result is achieved by using GPR-2 in conjunction with the EI acquisition function (red line).(Top) Magnitude of a subset of samples from the MC simulation (gray lines) together with the average value computed from the MC samples (blue solid line) and predicted by the GPR models (dashed green line).The shaded areas are the 2-sigma confidence intervals of the GPR-0 (red), GPR-1 (yellow), and GPR-2 (green) predictions.(Bottom) Comparison between the variance obtained with MC and the one predicted by the GPR models (same color labeling as above).

VI. APPLICATION EXAMPLE #2: MICROSTRIP WITH GROUND PLANE DISCONTINUITY
The second application example concerns the analysis of the signal integrity in a microstrip line with a discontinuity in the ground plane, shown in Fig. 7 [20], [51].The free design parameters are the location l and the width w of the slot in the ground plane, both with a nominal value of 15 mm.The scattering (S)-parameters are simulated with CST Studio Suite 1 software from Dassault Systemès [52] at 1859 equally spaced frequency points from dc to 10 GHz.

A. UQ of the S-Parameters in the Frequency Domain
In this first scenario, the two design parameters are ascribed an independent Gaussian distribution with a 10% relative standard deviation.We aim to assess the variability of the S-parameters in the entire frequency range.To this end, we compute L = 50 training configurations of the two-port S-parameters and we compress the resulting dataset using a PCA with a 1% truncation threshold on the singular values.This allows compressing the dataset from 1859×4 = 7436 outputs to a mere 18 principal components.Since we build a separate model for the real and imaginary part of the scattering data [11], the total number of individual GPR models to be trained is 36.A constant trend and an anisostropic Matérn 5/2 kernel are considered for the prior.
Fig. 8 (top) shows the magnitude of S 11 (left) and S 21 (right) for some random configurations of the design parameters (gray lines), as well as their mean computed based on 1000 MC samples (blue solid line) and the GPR models (dashed green line).The 2-sigma confidence intervals of the GPR models (shaded areas) are also provided.Fig. 8 (bottom) shows a similar comparison between the variance of the MC samples and the corresponding GPR predictions.In this case, the prediction confidence of GPR-0, GPR-1, and GPR-2 models is similar due to the relatively large number of training samples that is 3 GHz (bottom).The distribution of the MC samples (blue solid line) is compared against the 95% confidence intervals of the GPR-0, GPR-1, and GPR-2 predictions (red, yellow, and green areas, respectively).required to properly capture the high-frequency variability of the scattering responses.
The GPR results are computed by recombining the 18 principal components to obtain the model of the real and imaginary part of the S-parameters at each port and frequency, as indicated in Section II-D.The posterior mean and covariance are then combined with the analytical estimates of the output mean and variance.In this process, the posterior distribution is assumed to be Gaussian, even though this is not the case for GPR-2.Nevertheless, a good accuracy of the GPR model is established, despite the very large variability of the S-parameters that can be observed in Fig. 8. Indeed, as it was shown in [20], the accurate modeling of the high-frequency response is particularly challenging for this structure.
Furthermore, Fig. 9 provides a comparison on the pdf at the frequencies of 6.2 and 7.3 GHz (top and bottom, respectively).The blue lines are the distribution of the MC samples of the S-parameter magnitude at those frequencies.The shaded areas are the 95% confidence bounds of the GPR-0, GPR-1, and GPR-2 predictions.These are obtained by generating a large number of posterior predictions and, as noted before, turn out to be similar in this case.Once again, they compare well with the reference MC distribution.
To better illustrate the difference between the prediction confidence of the various GPR models, Fig. 10 compares the GPR-0, GPR-1, and GPR-2 confidence intervals in predicting the same distributions based on ten training samples only.The confidence intervals are now wider to reflect the larger prediction uncertainty that results from considering a much lower number of training samples.In turn, the discrepancy between the various confidence intervals is also more visible, which reflects the higher uncertainty in the estimation of the trend coefficient and kernel variance.

B. BO of the Insertion Loss at Specific Frequencies
We now want to obtain the configurations that maximize the transmission (i.e., minimize the insertion loss) at some   specific frequencies, namely f 0 = (5, 6, 7, 8, 9) GHz.To this end, we run a BO scheme by considering the insertion loss at the operating frequency, i.e., IL = −20 log 10 |S 21 ( f 0 )|, as the objective function.We consider a range of the design parameters within ±35% around their nominal value.We start from a set of L = 10 configurations to initialize the GPR-2 model of the insertion loss and we run 20 iterations of the optimizer.Since the optimal configuration will differ for each operating frequency, we run an individual optimization for each value of f 0 .
The optimal design values are indicated in Table V, whereas Fig. 11 illustrates the corresponding responses.The insertion Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
loss of the optimal designs (colored lines) is compared against the spread of 1000 random responses from the MC analysis (gray area).It is observed that the optimal design provides indeed the smallest insertion loss at the corresponding operating frequency.

VII. CONCLUSION
This article introduced conservative GPR models with application to UQ and BO in signal integrity analyses.Compared to classical GPR implementations, whose prediction uncertainty accounts only for the lack of training data, the proposed models also take into account the estimation of the prior trend coefficients and kernel variance.As a results, the estimated prediction confidence turns out to be more accurate and much closer to the actual one.
The advocated framework was applied to two test cases involving the analysis of crosstalk in a multiconductor transmission line network and of the insertion loss in a microstrip with a ground plane discontinuity, for which both UQ and optimization tasks were performed.The results highlighted that the proposed conservative GPR models yield more accurate results compared to state-of-the-art implementations of GPR.

Fig. 4 .
Fig. 4. Relative error on the mean (top) and variance (bottom) obtained with the plain polynomial chaos method (left) and the GPR-2 model with the same polynomial chaos expansion as trend (right).

Fig. 5 .
Fig.5.Evolution of the minimum objective (maximum crosstalk) throughout the iterations of the BO scheme implemented with GPR-0 and GPR-2 models in conjunction with PoI and EI acquisition functions.

Fig. 6 .
Fig.6.Performance of the optimal design(s) in comparison with 1000 random configurations within the design space.(Top) Comparison between the spread of the entire transient crosstalk of the random configurations (gray area) and the optimal design achieved with BO in conjunction with the GPR-2 model and EI the acquisition function (red curve).(Bottom) Comparison between the maximum crosstalk over time of each random configuration (cyan circles) and the maximum crosstalk of the optimal designs achieved with BO and the four different configurations of model type and acquisition function (colored lines).

Fig. 7 .
Fig. 7. Layout of the microstrip line with ground plane discontinuity.

Fig. 8 .
Fig.8.UQ of S 11 and S 21 .(Top) Magnitude of a subset of samples from the MC simulation (gray lines) together with the average value computed from the MC samples (blue solid line) and predicted by the GPR models (dashed green line).The shaded areas are the 2-sigma confidence intervals of the GPR-0 (red), GPR-1 (yellow), and GPR-2 (green) predictions.(Bottom) Comparison between the variance obtained with MC and the one predicted by the GPR models (same color labeling as above).

Fig. 11 .
Fig. 11.Insertion loss of the optimal designs at various frequencies (colored lines) and comparison against the spread of responses of 1000 random configurations of the design parameters (gray area).

TABLE I CONVERGENCE
OF BO APPLIED TO FUNCTION (25) AND BASED ON GPR-2 WITH DIFFERENT ACQUISITION FUNCTIONS

TABLE V OPTIMAL
VALUE OF THE DESIGN PARAMETERS MINIMIZING THE INSERTION LOSS AT VARIOUS OPERATING FREQUENCIES