Bayesian Sensor Calibration

The calibration of multisensor systems can cause significant costs in terms of time and resources, in particular when cross-sensitivities to parasitic influences are to be compensated. Successful calibration ensures the trustworthy subsequent operation of a sensor system, guaranteeing that one or several measurands of interest can be inferred from its output signals with specified uncertainty. As shown in the present study, this goal can be reached by reduced calibration procedures with fewer calibration conditions than parameters that are needed to model the device response. This is achieved using Bayesian inference by combining the calibration data of a sensor system with statistical prior information about the ensemble to which it belongs. Optimal reduced sets of calibration conditions are identified by the method of Bayesian experimental design. The method is demonstrated on a Hall–temperature sensor system whose nonlinear response model requires seven parameters in the temperature range between <inline-formula> <tex-math notation="LaTeX">$\boldsymbol {-}30$ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$150 ^{\circ} \text{C}$ </tex-math></inline-formula> and for magnetic field values <inline-formula> <tex-math notation="LaTeX">${B}$ </tex-math></inline-formula> between −25 and 25 mT. For the prior, a multivariate normal distribution of the model parameters is acquired using 14 specimens of the sensor ensemble. I-optimal calibration at one, two, and three temperatures reduces the root-mean-square (rms) standard deviation of <inline-formula> <tex-math notation="LaTeX">${B}$ </tex-math></inline-formula> inferred from sensor output signals from <inline-formula> <tex-math notation="LaTeX">$203 \boldsymbol {\mu } \text{T}$ </tex-math></inline-formula> before calibration down to 78, 41, and <inline-formula> <tex-math notation="LaTeX">$34 \boldsymbol {\mu }\text{T}$ </tex-math></inline-formula>. Similar conclusions apply to G-optimal calibration. This article describes how to implement the Bayesian prior acquisition, inference, and experimental design. The proposed approach can help save resources and cut costs in sensor calibration.


I. INTRODUCTION
T HE goal of this article is to demonstrate the power of Bayesian inference and experimental design in the context of sensor calibration. Besides fabrication and packaging, calibration contributes a significant fraction to the final sensor cost [1], [2], [3], [4]. Among the reasons are parasitic sensitivities requiring compensation and the variability of sensor materials and fabrication processes [5], the need for individual sensor calibration in view of achieving demanding specifications [5], [6], the time consumed by some, in particular thermal, calibration steps [5], and expensive infrastructure [1], [2], [7]. Therefore, it is an endeavor of scientific interest and economic value to identify methods allowing to reduce the number of calibration conditions needed to achieve a prescribed accuracy goal of a sensor system over its operating range. The question is particularly pressing in the case of large production volumes, where sensors may require individual calibration. To the best of our knowledge, a single conference paper has so far addressed the subject of Bayesian experimental design at the service of sensor calibration [8]. It has demonstrated the benefit of prior information in decreasing the standard deviation of temperature-cross-sensitive pressure sensors and showed that sensors can be validly calibrated using fewer conditions than the parameters in their response model. The apparently modest echo of that paper among the sensor community, as measured in citations, was possibly due to its highly concise formulation constrained by the imposed four-page format. In this present article, we take up its lead and strive to expand its mathematical formulation, extend its conclusions, and demonstrate the approach in detail using an industrial Hall sensor. In doing so, we aim to make Bayesian sensor calibration more accessible to the broader sensor community.
For the sake of readability and unless stated otherwise, the term sensor will be used synonymously for multisensor, sensor system, measuring system, and transducer in general. An individual sensor is considered to be a member of a sensor ensemble with statistically distributed properties, as realized in approximation, for example, by a sensor production volume. In metrology, the term population is used as an alternative for ensemble [9]. Individual members of a sensor ensemble will be termed specimens. The properties of each specimen constitute a sample of the statistically distributed properties of the ensemble.
The question of the measurement uncertainty achievable with calibrated sensors is extensively addressed in the Guide to the Expression of Uncertainty in Measurement (GUM) elaborated by the Joint Committee for Guides in Metrology (JCGM). However, although the GUM mentions Bayes' theorem in several of its documents [46,Sec. 6], [47,Sec. 6.2] and thus reflects the discussions in the metrology community [48], it does not explicitly put forward the form of Bayesian inference in sensor calibration that is developed in this present article, where the calibration of a specimen is allowed to rely on prior information about the ensemble to which it belongs.
Central to the present analysis is the response function of a sensor, which involves, on the one hand, its output signals x 1 , . . . , x D , arranged as the column vector x = (x 1 , . . . , x D ) , where (·) denotes the transpose of a vector or a matrix. Such signals may be output voltages, currents, frequencies, phase shifts, and so on [49], [50]. On the other hand, the response involves the physical or chemical measurand y the sensor has been designed to measure. To measure y means to allow to infer y from x. The connection is made by a model y = φ(x, w), i.e., a functional description of the sensor response parameterized by the response parameter vector w = (w 1 , . . . , w M ) of dimension M. Provided that a sen- (a) Bayes' theorem combines the prior probability density p 0 (w ) of a sensor specimen's response parameter vector w with calibration data (X , y ) acquired with the specimen, i.e., experimental observations of its response, to infer the posterior response parameter distribution p 1 (w |X , y ) of w , given (X , y ). The connection is ensured by the likelihood p(y |X , w ) modeling the measurement process. (b) Both prior and posterior w distributions serve to derive predictive probability densities indicating the level of consistency of measurand values y with the specimen's output signal vector x . Such predictions before and after calibration are denoted by p 0 (y|x ) and p 1 (y|x , X , y ), respectively. The two needed marginalizations involve the likelihood p(y |x, w ) reflecting again the measurement process.
sor specimen's w is known, y can, with limited accuracy, be inferred from the specimen's output signals x. It is, therefore, crucial to estimate the value and the accuracy of its w. This is the goal of calibration [51,Definition 2.39].
Calibration of a specimen proceeds by exposing it to a well-designed series of N experimental conditions involving controlled measurand values y 1 , . . . , y N , summarized as y = (y 1 , . . . , y N ) , and parasitic influences. Together, these lead to measured output signals x 1 , . . . , x N , summarized as X = (x 1 , . . . , x N ). The goal of cost-effective calibration is to keep N small, possibly smaller than M, while guaranteeing a specified accuracy. For N < M, this is possible only if additional information can be relied on. Such prior information may be provided by parameters that are so well-defined within a sensor ensemble that they can be considered as known and do not need separate individual determination. Alternatively, ranges or distributions of parameter values of a sensor ensemble and correlations among them may have been obtained previously. The Bayesian framework teaches how such imprecise prior information can be merged with new evidence that a specimen's calibration reveals about its response.
The core of Bayesian methods is Bayes' theorem [10], [11,Sec. 1.2], [13,Sec. 1.4.3], [15,Sec. 1.2], recalled in Fig. 1(a). In the present context, prior inaccurate and imprecise knowledge of a specimen's w is expressed as a probability density p 0 (w). Thanks to Bayes' theorem, new evidence (X, y) enables one to update p 0 (w) into the more sharply defined posterior probability density p 1 (w|X, y), i.e., the probability distribution of w, given (X, y). As shown in Fig. 1(a), (X, y) enters Bayes' theorem as the conditional probability density p(y|X, w) reflecting in the present case the measurement process by the specimen; it describes the likelihood that the entries of y are the measurand values applied to a specimen during its calibration, given that the observed output signals are X and the response parameter vector of the specimen is w. Finally, the denominator in Bayes' theorem is equal to the numerator integrated over all w values, or in the language of statistics, marginalized over w. As a result, the right-hand side and, by the same token, p 1 (w|X, y) denote a properly normalized probability density.
As shown in Fig. 1(b), during the operation of a specimen, its prior and posterior response parameter distributions allow measurand values y to be inferred from its output signals x. In view of the inherent imprecision of the prior and posterior, this inference is statistically formulated as the so-called prior and posterior predictive probability densities [11,Sec. 3.3.2], [14, Sec. 2.1] p 0 (y|x) and p 1 (y|x, X, y), respectively. They quantify the level of consistency of y with an output signal vector x of the specimen, before and after acquisition of the new evidence. As shown in Fig. 1(b), the predictive distributions are obtained by weighted integrals of p(y|x, w) over w, with the prior and posterior probability densities of w serving as respective weight functions. Again, these integrals are marginalizations. Like p(y|X, w), p(y|x, w) represents the measurement process. It states the likelihood that y is the measurand being determined, given that the specimen's output is x and its response parameter vector is w.
This, in a nutshell, is the foundation on which this present paper rests. Mathematical details are clarified in Section II and Appendixes I and II. Although Bayes' theorem is of broad validity [11], [12], [14], [18], all likelihoods and probabilities in this article are assumed to be univariate or multivariate normal distributions [11,Ch. 1], [12,Appendix A]. This is justified by the fact that noise and uncertainty in technical measurements are often well modeled by Gaussian random processes. Under these conditions, many procedures and consequences of Bayes' theorem can be formulated in the language of linear algebra relying on the basic matrix operations.
The methodology proposed in this article is demonstrated using the temperature-sensitive semiconductor-based Hall sensor system presented in Section III. Due to a cointegrated temperature sensing element, the system offers the benefit of potential temperature compensation. Without prior knowledge, the compensation of the parasitic thermal effects would require at least five thermal calibration conditions per sensor specimen. In Section IV, we show that a suitably generated prior can bring the number of these conditions down to one while guaranteeing the satisfactory performance over the specified operating range. Other examples of sensors to which the methods reported in this article may apply are considered in Section V. In addition, we identify the salient features of the present Bayesian approach in the light of alternative ML methods.

II. CALIBRATION RELYING ON BAYESIAN INFERENCE
In Section II-A, we define the required probability densities and likelihoods. Section II-B is then dedicated to the Bayesian inference of p 1 (w|X, y) of a specimen from its calibration data (X, y) and an available p 0 (w). Conclusions on the achievable accuracy valid for the entire sensor ensemble are formulated. These are then used in Section II-C to optimize the achievable measurement accuracy by the Bayesian experimental design of the calibration procedure. Finally, Section II-D addresses the question of how to acquire the needed prior.

A. Definitions
The following definitions rely on univariate and multivariate normal distributions as defined in Appendix I. The likelihood p(y|x, w) is assumed to be described by the Gaussian distribution [11,Ch. 1], [12, Appendix A] where the measurand y inferred from x and w is distributed around the mean φ(x) w with variance σ 2 . The sensor model is written as is the column vector of basis functions. The model is linear in the parameters w, while the basis functions can be nonlinear. Let us concretize these definitions in view of the demonstration case in Sections III and IV, where the role of the measurand y is played by the magnetic induction B, while x = (V H , V T ) comprises the output signals V H and V T of the Hall sensor and the cointegrated resistive temperature sensor, respectively. Similar to an earlier non-Bayesian calibration study of a related system [24], we model B as a polynomial function of V H and V T .
On the assumption that the random contributions to the calibration measurements are independent, the joint likelihood p(y|X, w) can be written as [14,Sec. 2 With the N × M-dimensional design matrix [11,Ch. 3] where I N denotes the N-dimensional identity matrix and the distribution of y spreads around the mean (X)w with isotropic covariance σ 2 I N .
The prior probability density of w is described by the multivariate normal distribution with mean w 0 , covariance matrix 0 , and precision matrix 0 = −1 0 . Since p 0 (w) is a conjugate prior of p(y|X, w) [11,Sec. 2.4], we can write the posterior as the multivariate normal distribution as well, namely with mean w 1 , covariance matrix 1 , and precision matrix 1 = −1 1 to be determined in Section II-B. Note that the argument of the maximum of a univariate or multivariate normal distribution, also known as its mode, coincides with its mean [11,Sec. 1.2.4].

B. From Prior to Posterior
Before calibrating a specimen, any prediction about its response has to rely on the unsharply defined prior knowledge p 0 (w). Such prior knowledge may have been established by theoretical considerations about the sensor operation, complemented with statistical data about variations in its fabrication process and material properties. Alternatively, specimens of the same ensemble may already have been calibrated previously and may thus have provided an experimental database for constructing p 0 (w). Depending on the experimental history, the prior can contain data from a few specimens or from a large production volume, or anything in-between.
For illustration, the example of a hypothetical bivariate Gaussian prior p 0 (w) is shown in Fig. 2(a). It spreads around its maximum at w 0 . The dashed ellipse geometrically represents 0 , where p 0 (w) is reduced from its maximum by e −1/2 . The interior of the ellipse is the bivariate analogy of the ±1σ interval of the univariate normal distribution. A representative specimen with putative response parameter vector w r is shown in green.
According to Fig. 1(b), the prior allows to obtain the prior predictive sensor response distribution p 0 (y|x) for the ensemble by the marginalization of p(y|x, w) p 0 (w) over w, where the two terms are given by (1) and (6). This procedure yields the x-dependent normal distribution with the predicted maximum at inferred from any x and the corresponding variance Details of the marginalization are reported in Appendix I. Via σ 2 in (10) on the one hand, the standard deviation σ 0 (x) of y inferred from x reflects the uncertainty of the measurement process, as defined in (1). On the other hand, the term φ(x) 0 φ(x) captures the uncertainty in the inferred y due to uncertainty of the response parameters of the ensemble described by p 0 (w). Fig. 2(b) schematically shows p 0 (y|x) in gray shades, with its x-dependent mean measurand y 0 (x) and its ±1σ 0 (x) range delimited by the dashed lines. It also shows the probability distribution p(y|x, w r ) of the representative specimen, as given by (1), with w r substituted for w. This distribution is centered (a) Schematic 2-D multivariate Gaussian prior distribution p 0 (w ) of the parameter vector w = (u, v), defined by its mean w 0 and its covariance matrix 0 symbolized by the dashed ellipse. Darker shades correspond to higher probabilities. A representative specimen is assumed to have the response parameters w r . (b) Prior allows to infer the prior predictive distribution p 0 (y|x ) from measured sensor signals x , with mean measurand y 0 (x ) and standard deviation σ 0 (x ); y 0 (x ) is shifted by the measurement error Δy r0 from the mean measurand y r (x ) of the representative specimen indicated by the green line. The green shading represents the uncertainty of the response p(y|x , w r ) of the representative specimen. (c) Based on calibration data (X , y ) obtained with that specimen, p 0 (w ) is updated into the narrower posterior distribution p 1 (w |X , y ) of the specimen's response parameter vector w , defined by the updated posterior mean parameter vector w 1 (X , y ) and the updated covariance matrix 1 (X ). (d) x -dependent posterior predictive distribution p 1 (y|x , X , y ) is obtained from p 1 (w |X , y ). It has the updated mean measurand y 1 (x , X , y ) and the updated standard deviation σ 1 (x , X ) ≤ σ 0 (x ). The posterior measurand value y 1 (x , X , y ) inferred from x differs from y r (x ) by the smaller error Δy r1 (x , X , y ).
in the y-direction on the mean measurand y r (x) = φ(x) w r indicated by the green line in the center of the green shading: y r (x) is the response curve that would be used to infer y from output signals x of the representative specimen if its response parameter vector were known to be w r . The green shade shows the uncertainty of this inference, which has a standard deviation σ in the y-direction.
If measurand values y are inferred from x for specimens randomly sampled from the ensemble, for large samples, the statistical distribution of these y values is asymptotically given by p 0 (y|x). This is the textual translation of the marginalization yielding p 0 (y|x). In other words, p 0 (y|x) describes the distribution of the y values inferred from x for the entire ensemble modeled by the prior. As a consequence, when using y 0 (x) to infer y from x for random specimens of the ensemble, one makes random errors normally distributed with the standard deviation σ 0 (x). In Fig. 2(b), p 0 (y|x) can therefore be interpreted as the graphical representation of this x-dependent error distribution. In conclusion, based on the output signal x produced by any sensor specimen of the ensemble, the measurand y is best determined to be y 0 (x). This then constitutes the measured value of y corresponding to x and the limited accuracy of this measurement is defined by the standard deviation σ 0 (x). This conclusion holds when the prior is the only source of information about the specimen. It also holds for the representative specimen, whose response parameter vector is in reality known only within the constraints of p 0 (w). Its response curve in Fig. 2(b) illustrates this by lying safely within the ±1σ 0 (x) range of y 0 (x).
Assume now that a specimen is subjected to calibration. As a result, it yields evidence about its response in the form of calibration data (X, y). Bayes' theorem then allows to infer the posterior probability distribution p 1 (w|X, y) of its response parameters w compatible with (X, y). Note that this is the w distribution of all specimens that have produced or might produce (X, y) during calibration. Their posterior probability distribution of w [cf. (7)] is centered on the updated mean parameter vector [11,Sec. 3 The updated probability distribution of w is shown in Fig. 2(c) in blue. It is more compact than the prior distribution. Only specimens within the restricted range defined by p 1 (w|X, y) are likely to have yielded the data (X, y), and the maximum of p 1 (w|X, y) is attained by specimens with parameter vector w 1 (X, y).
Mathematically, the shrinkage of the w distribution due to the condition (X, y) is caused by the second term in the bracket added in (12) to the prior precision matrix 0 . Because (X) (X) is positive semidefinite, one has 0 ≤ 1 (X) in the sense of the Loewner order [52]. The shrinkage of 0 into 1 (X) symbolized by the two dashed ellipses in Fig. 2(c) is a consequence thereof. In plain language, the precision of our knowledge about the calibrated specimen's response parameters has increased.
Furthermore, the center of the w distribution has been updated according to (11) from w 0 into w 1 (X, y). The representative specimen whose putative w r actually served to generate the calibration data underlying Fig. 2(c) and (d) is also recalled in Fig. 2(c). Not surprisingly, w r lies within the range of p 1 (w|X, y).
According to Fig. 1(b), like the prior, the posterior allows to obtain the posterior predictive distribution p 1 (y|x, X, y) by the marginalization of p(y|x, w) p 1 (w|X, y) over w. This results in with the x-dependent maximum at and the predicted variance analogous to (9) and (10), respectively. Details of the derivation are again reported in Appendix I. The posterior predictive sensor response distribution is schematically shown in Fig. 2(d) in blue, to be compared with the prior predictive response in Fig. 2(b). In the posterior case, the inferred range of measurand values y compatible with x is significantly narrower over the entire x range. By the same reasoning as in the prior case, the measurand value y inferred from an output x produced by specimens randomly sampled from p 1 (w|X, y) is distributed according to p 1 (y|x, X, y). As mentioned, these specimens are randomly sampled according to their likelihood of having yielded (X, y) as calibration data. As a consequence, one makes a prediction error by using y 1 (x, X, y) as the measured value of y inferred from an output x of specimens with calibration data (X, y). This error is normally distributed and has the standard deviation σ 1 (x, X).
After a specimen has yielded the calibration data (X, y), the value y implied by an output signal x of the specimen is therefore best taken as y 1 (x, X, y); the measurand value determined thereby is uncertain with the standard deviation σ 1 (x, X). Both y 1 (x, X, y) (white curve) and the ±1σ 1 confidence interval (black dashed curves) are shown in Fig. 2(d). Note that the response curve y r (w) of the representative specimen again lies within the range of this predictive distribution that it has helped generate.

C. Experimental Design of the Calibration Procedure
It is noteworthy that the measurement uncertainty σ 1 (x, X) of a calibrated specimen depends on X. By a suitable selection of X, one is, therefore, able to minimize σ 1 (x, X) according to some optimality criterion of one's choice, such as G-optimality or I-optimality [8], [53], thus implementing the concept of Bayesian experimental design [16].
G-optimality minimizes the maximum variance (and, equivalently, the standard deviation) over the range of x values covered by the operating conditions of the sensor. Considering (15), we therefore minimize the objective function with respect to X.
In contrast, I-optimality [53] minimizes the rms measurement uncertainty over . The corresponding objective function reads (17) where V denotes the volume of in x-space. Both G-optimality and I-optimality are applied in Section IV to illustrate the choice of optimal calibration conditions.

D. Obtaining a Prior
A prerequisite for carrying out the above procedures is the availability of a prior. In the absence of any other knowledge about the response variability among the specimens of the ensemble, a viable approach is to first thoroughly characterize a small, yet sufficiently large sample of specimens drawn from the ensemble. We term them prior-generation specimens. For each one of them, the characterization is designed to provide a sufficiently accurate individual response parameter vector w i , with i = 1, . . . , Q, where Q designates the number of specimens. The vectors w 1 , . . . , w Q are scattered in w space with a probability distribution p(w|w 1 , . . . , w Q ) that can be identified using the method of multivariate Bayesian linear regression [12,Sec. 3.6], as outlined in Appendix II.
From the results of such analysis, p(w|w 1 , . . . , w Q ) is found to be the multivariate t-distribution [12,Sec. 3.6] , whose rows list the mean-centered response parameter vectors w i . Finally, the multivariate t-distribution is approximated by a multivariate normal distribution with the same mean and covariance, which is with and For details, cf. Appendix II.
The main advantage of the multivariate normal approximation is that it allows to straightforwardly derive analytic results, as described in Section II-B. It is noteworthy that the t-distribution has fatter tails than its normal approximation and more pronouncedly so for smaller Q − M. Related questions are addressed in Section V.

III. EXPERIMENT
The Bayesian sensor calibration methodology formulated in Section II is now demonstrated on a sensor system for measuring the magnetic field. We present first the design of the system with exemplary raw data (Section III-A), then the measurement setup (Section III-B), and finally the acquisition of the database for demonstrating the concept of Bayesian calibration (Section III-C).

A. Sensor Chip
The sensor system is a packaged Hall sensor microsystem fabricated in complementary metal-oxide-semiconductor (CMOS) technology. It is shown in Fig. 3. Besides analog and digital circuitry and mechanical stress sensors not used in this study, the system comprises horizontal Hall plates for measuring the out-of-plane magnetic field component B and a temperature-sensitive resistive element, whose output signal V T is intended for temperature compensation. For further details about the Hall sensor system, we refer to [56]. In the Hall sensor chip, two interconnected Hall plates are operated using the current spinning method [57], [58], [59], [60], [61], [62], [63]. As a result, contributions to the Hall sensor output voltage V H caused by mechanical stress and geometrical imperfections are largely, yet incompletely, compensated. Consequently, the resulting Hall voltage output by the sensor is described by where S A (T ) and V off (T ) denote the temperature-dependent absolute Hall sensitivity of the device and its residual offset voltage at B = 0, respectively. Reversely, by rearranging (22), defining the offset field B off = V off /S A , and considering that the temperature sensor transduces T into V T , we seek to infer B from V H and V T using a relationship like where 1/S A (V T ) and B off (V T ) will be modeled by calibration.
The sensitivity S A depends on T because of the T -dependent Hall mobility [64] and the piezo-Hall and piezoresistance effects due, e.g., to thermomechanical stress [65], [66], [67], [68], [69], [70], [71]. Fig. 4 shows the measured values of 1/S A and B off of a representative specimen as a function of V T . For the purpose of the present analysis, the digital output signal of the temperature sensor is shifted and rescaled, ranging from V T = −0.97 corresponding to T ≈ −30 • C to V T = 2.31 corresponding to T ≈ 150 • C, with V T = 0 corresponding to T ≈ 30 • C. From its value at V T = 0, S A increases by about +40% toward the lower limit of the T range and decreases by about −43% toward its upper limit. Like 1/S A , B off is a function of V T .
Similar to V T , the digital output signal of the Hall sensor is shifted and rescaled for the subsequent analysis. Consequently, V H covers the range from −2 to +2. Based on (23) and Fig. 4, it seems justified to infer B from V H and V T in terms of a polynomial model. Leaving the topic of Bayesian model selection [11,Sec. 3.4], [12,Ch. 7], [14,Ch. 5] to a separate study, we propose to work here with the set of basis functions Seven pairs of such sensor chips were assembled in dualdie TSSOP-16 packages [24]. Each packaged system was then soldered onto a printed circuit board (PCB). The number of sensor specimens is therefore Q = 14.

B. Experimental Setup
For the characterization, the PCBs are inserted into a thermal chamber of an automated measurement setup where they are exposed to well-controlled B values and to less precisely controlled temperatures T . A Helmholtz coil calibrated using a Tesla meter (Gauss/Tesla Meter Series 8000, F.W. BELL, Milwaukie, Oregon) serves to apply B. An airstreamer (Dragon Air Streamer, Froilabo, France) connected to the chamber is used to vary T . A schematic of the measurement setup is shown in Fig. 5 with a close-up photograph of a packaged sensor system on its PCB and a nearby temperature reference sensor.

C. Characterization
In order to acquire the data that are needed to perform the numerical studies of Section IV, we exposed the sensors to the following calibration conditions. 1) T was varied between −30 and 150 • C in steps of nominally 10 • C; 2) At each T value, B was set to −25, 0, and 25 mT. Overall, the sensors, therefore, experienced 57 conditions. Fig. 6 shows the measurement history with a representative specimen. The two top graphs show T , as measured by the temperature reference sensor, and the applied B values. The last two graphs show the resulting output signals V H and V T .
In preparation of the Bayesian data evaluation in Section IV, the data of each specimen are arranged as the list of independent variables X i = (x i1 , . . . , x i57 ) with i = 1, . . . , Q and x in = (V Hin , V Tin ) with n = 1, . . . , 57, from which the  corresponding design matrix (X i ) was computed. Similarly, the dependent variable vector of the Bayesian analysis is defined as y = (B 1 , . . . , B 57 ).

IV. RESULTS
After determining the response parameter vectors w i of all specimens in Section IV-A, we use them in Section IV-B to identify the multivariate normal prior of the sensor ensemble to which they belong. In Section IV-C, a posterior is inferred for each specimen from a calibration measurement performed under a single near-optimal thermal condition. Finally, in Section IV-D, the same is done for two and three thermal conditions.

A. Response Parameter Vectors of the Specimens
The response parameter vectors w i of the 14 specimens were determined from their individual characterization data by the method of least squares [11,Ch. 3], i.e., w i = {(X i ) (X i )} −1 (X i ) y. This is equivalent to applying (11) and (12) with a noninformative prior where 0 is the zero matrix.

B. Prior Generation
The response parameter vectors w i were then used to compute the prior for the subsequent Bayesian calibration analysis, applying (20) and (21) for its mean w 0 and covariance matrix 0 , respectively. The covariance matrix is shown by the heat plot in Fig. 7(a1) highlighting the order of magnitude of its elements.
From the prior, we infer the prior predictive distribution of the ensemble according to (8)- (10). With B playing the role of y and (V H , V T ) that of x, we therefore obtain the prior predictive B distribution with on the one hand its mode B 0 ((V H , V T )) = φ((V H , V T )) w 0 (where the distribution is maximal) and on the other hand the uncertainty of the inference quantified by σ 0 ((V H , V T )). The results are shown in Fig. 7(a2) Table I. The σ value underlying these results is 27.5 μT and was obtained independently. Fig. 7(a4) shows the distribution of the relative error (B 0 − B)/σ 0 for all 14 × 57 = 798 measurements of all specimens, where B 0 − B is the deviation of the prior prediction of B from the known, applied B. An absolute ratio ≤1 means that the applied B lies within one predicted standard deviation σ 0 of B inferred from x. The fraction of the data in Fig. 7(a4) satisfying this condition is listed in Table I for m = 0, where m denotes the number of thermal calibration conditions.

C. Calibration at a Single Temperature
We now take a step back and assume not to know anything about the 14 specimens except for the prior response parameter distribution of the ensemble which they represent as samples. For each specimen, we refine the prior by incorporating calibration data obtained at a single temperature  Fig. 7(a2). From X(V Tcal ), we infer the V Tcal -dependent posterior predictive distribution p 1 ((V H , V T ), X(V Tcal ), B cal ), according to (13)- (15). In particular, we obtain σ 1 ((V H , V T ), X(V Tcal )) depending on the calibration condition V Tcal . From this result, we derive the V Tcal -dependent objective functions f G and f I using (16) and (17). Both dependences are plotted in Fig. 8. Each objective function is minimal at a distinct value V Tmin , as listed also in Table I for m = 1. The calibration of any specimen can only approximate the ideal calibration conditions X(V Tmin ). For each specimen i , instead of X(V Tmin ), we therefore take the calibration conditionsX cal composed of the three available load cases x in whose V Tin values are closest to V Tmin . FromX cal and B cal , we compute w 1 (X cal , B cal ) and 1 (X cal ) of each specimen using (11) and (12). The covariance matrix 1 of an exemplary case is shown in Fig. 7(b1). Here,X cal consists of load case nos. 49-51. On the basis of w 1 and 1 of each specimen, one obtains its posterior predictive distributions of B, with predicted maximum at given by (14) and (15), respectively. They are plotted in Fig. 7(b2) and (b3) for the same specimen, as shown in Fig. 7(b1). As shown by Fig. 7(b3) in comparison with Fig. 7(a3), the calibration near the single optimal V Tmin significantly improves the accuracy of the inferred B. Maximum σ 1 values in resulting from G-optimality and the rms σ 1 values obtained under I-optimality are listed in Table I for m = 1. These are significantly lower than the corresponding values before calibration, documented in the rows with m = 0.
In analogy with Fig. 7(a4), Fig. 7(b4) shows the distribution of the relative error (B 1 − B)/σ 1 of all 798 available data points. This is the distribution of the discrepancy between the posterior predicted B values from the applied B values, normalized with the corresponding standard deviation. Again, the dominant fraction of the inferred B values lies within the predicted ±1σ 1 confidence range, as also documented numerically in Table I for m = 1.

D. Calibration at More Than One Temperature
The optimization was carried out analogously for two and three independently selectable calibration temperatures, corresponding to the customary two-and three-point calibrations. The results are again listed in Table I for m = 2 and m = 3. Note that in all cases, at least 68% of the applied B values lie within the respective predicted ±1σ 1 confidence ranges of the inferred B.
V. DISCUSSION A. Comments on the Present Gaussian Approach The first four columns of Table I show that the 14 sensor specimens were successfully calibrated near only one, two, and three optimal temperatures, despite the number of seven parameters of the sensor response model of polynomial degree 4 in V T . In fact, in view of the three magnetic field values applied per temperature, the calibrations were carried out with three, six, and nine measurements. However, as we verified numerically, one of the magnetic measurement conditions per temperature is essentially redundant since the magnetic response of the Hall sensor is highly linear [72] in the range of the applied B values. Leaving out the 0-mT condition and, thereby, simplifying the calibration to two, four, and six measurements at ±25 mT only marginally degrades the achieved measurement accuracy.
The calibration with fewer measurements than model parameters owes its success to the Bayesian update of the available, statistically broad prior information about the sensor parameter distribution of the ensemble. As quantified by the rms σ 1 over , the accuracy achieved after an I-optimal two-temperature calibration, for example, is 82 μT (a width of the ±1σ 1 confidence interval), which is about 0.2% of the full B range of 50 mT. Based on the prior alone, the corresponding value is 406 μT, i.e., about 0.8% of the full range. A similar improvement was reported in the context of temperature-dependent pressure sensors [8] described by five model parameters. The Bayesian update of a prior using three calibration measurements per specimen yielded a smaller integrated variance than a non-Bayesian calibration using five measurements. For five or more calibration measurements, the integrated variances, equal to V f I , were about a factor of 2 smaller in the case of the Bayesian update of a prior than for the prior-free analysis [8].
For the investigated calibration scenarios, the fourth column of Table I shows that in all cases except one, more than 68.3% of the applied magnetic induction B values lie within the ±1σ 1 range around the inferred values B 1 . However, the use of the 14 available samples for both prior generation and validation may be questioned. For this reason, we also performed 14-fold cross validation according to the leave-one-out strategy [11,Sec. 1.3], where one specimen at a time served as the validation case, while the remaining 13 specimens were used for generating the prior. As a quality measure of the Bayesian inference, we again determined the fraction of applied B values lying within ±1σ 1 of the inferred B values. Again, data for the six calibration scenarios (m = 1, . . . , 3) for each f G and f I are reported in Table I. In all six cases, among the 798 measurements, the fraction of case where the applied B lies within ±1σ 1 of the inferred B 1 is at least 70.8%.
Clearly, prior data generation constitutes a more substantial effort per specimen than the subsequent Bayesian calibration where only a few measurements need to be performed per specimen. It may therefore be tempting to reduce the number of prior-generation specimens Q. However, reducing Q comes at the cost of a lower precision of the prior, via (21), and thus reduces the Bayesian calibration accuracy as well. How to optimally balance this tradeoff is an open question, whose answer likely depends on the specifics of the calibration process, the production volume, and the specified accuracy goal. This question warrants detailed, separate investigations.
A related aspect of interest of the Gaussian prior is its deviation from the correct normal-inverse-Wishart distribution from which it was derived. The discrepancy is particularly pronounced at smaller Q, where the long tails of the inferred multivariate t-distribution of the ensemble may not be negligible [12,Sec. 3.7,p.77]. To our knowledge, analytical results for handling the ensuing marginalizations are not available. Numerical integration techniques based on covariance matrix sampling from inverse Wishart distributions [12,Part III] are likely able to help clarify related questions.
As described in Section II-C, Bayesian calibration at a single temperature is ideally performed at the optimal X(V Tmin ) identified for the selected optimality criterion. In principle, this can be achieved using a control loop by adjusting the calibration conditions. However, the corresponding expense in terms of time and experimental resources is in conflict with the goal of overall efficiency. Since f G and f I , as evaluated in Section IV-C, are continuous functions of V Tcal (cf. Fig. 8), a calibration carried out under less strictly controlled conditions with V Tcal near V Tmin can be expected to produce a near-optimal result. From the approximateX cal , a specimen's individual maximum and rms σ 1 can then be computed. Both can be expected to be close to the values putatively achieved under optimal calibration conditions. Analogous conclusions apply to calibration scenarios relying on more than one calibration temperature.
Let us note that the present analysis is strictly speaking a case of inverse calibration [20,Ch. 3]. Indeed, the output signals X produced by a specimen's calibration are in fact subject to random errors, while y is applied in a well-controlled fashion minimizing its uncertainty. From the viewpoint of signal generation, X is a result of y and the applied thermal conditions. Yet, in the spirit of inverse calibration, X and y are in the present regression analysis taken as the independent and dependent variables, respectively, since the goal is to predict y from x. In an analogous context with linear base functions, inverse calibration has been found to be superior to standard calibration followed by inversion [21]. This conclusion was proven to hold within clearly specified bounds of x [21]. In the present calibration case, the errors in the variables x = (V H , V T ) were not explicitly considered and the error σ was phenomenologically implemented in B instead. Nevertheless, the measurement accuracy of the calibrated specimens documented in Fig. 7(a4) and (b4) and in columns 4 and 6 of Table I corroborates the practical reliability of the present Bayesian calibration. Further studies need to scrutinize this issue, possibly supported by findings from the theory of total least squares and errors in variable [28].
Sensors where we perceive a potential benefit from the presented Bayesian approach include the following.
1) Sensors responding linearly to a single measurand, such as diode-based temperature sensors with output proportional to absolute temperature [4], [73], [74]. 2) Sensors with linear response experiencing temperature cross-sensitivity, such as diffused piezoresistive stress sensors [75], force/moment sensors based on piezoresistive field-effect transistors [76], [77], and pressure sensors [78], [79]. A pressure sensor case was already successfully analyzed in this spirit [8]. The uncertainty estimates obtained in [23] by a different method can likely be put on a sound statistical basis. 3) Sensors whose response is modulated by several, possibly nonlinear, parasitic influences such as temperature and stress [24], [71], [80]. In these cases as well, statistically well-founded accuracy predictions can plausibly be obtained. 4) Inertial sensors cross-sensitive to parasitic inertial effects, and inertial measurement units [2], [7], [81]. These are a few examples. Sensor systems whose response depends nonlinearly on model parameters do not lie within the scope of the present approach and therefore cannot be handled by it. Arrays of chemical sensors [5], [82], [83] such as electronic noses and tongues, whose responses derive from the laws of thermodynamics and critically depend on exponential activation-energy-modulated mechanisms, are strongly nonlinear. They may still be tractable by the Bayesian inference, but hardly so using the analytical mathematical formalism employed here. Further instances where the use of alternative calibration methodologies is beneficial are addressed in Section V-B.
Among these methods, ANNs have become particularly well known and broadly used due to their stunning power in solving classification and decision problems [11], [14], [84]. MLPs consist of layered networks of nodes, the so-called neurons, in which an output is computed as a weighted sum of values originating from neurons of the preceding layer. Before contributing to such a sum, those values are activated by a selected nonlinear function, such as the sigmoid, tanh, and logistic functions [84]. The parameters of the ANNs, and analogously those implemented in the other learning algorithms, are tuned in the so-called training phase, which relies, usually, on abundant available training data. In the second phase, one obtains a measure of the trustworthiness of the various learning algorithms by feeding the tuned algorithm with test inputs and comparing its output with the known outputs corresponding to the test inputs.
In view of their inherent nonlinearity, ML methods, and especially those based on ANNs, have also proven their worth in modeling nonlinear regression problems posed by sensor calibration and compensation [33], [34], [37], [44]. The studied algorithms were designed to compensate for temperature and relative humidity in the measurement of methane concentrations [37], to model the influence of two heat sources on a single-photon interferometer [44], and to identify methanol-water [33], [34] as well as acetone-water mixtures [34], both cleared of temperature cross-sensitivity while, in one case, simultaneously extracting the solution temperature as well [34]. The ANN in [33], the range of methods in [34], and the FNN in [35] addressed the objective of discriminating materials under test (MUTs) within a discrete set of MUTs, a typical classification task.
As an alternative, RFs have successfully been applied to the calibration of multipollutant sensors [38] and of NO 2 and particulate matter (PM 10 ) sensors [39]. Third, GPR has been applied to the calibration of thermal and differential-pressure anemometers [42], NO 2 and particulate matter (PM 10 ) sensors [39], and capacitive artificial skin sensors [43].
Several studies have compared ML and more traditional regression methods. In the case of multipollutant sensors responding to CO, NO 2 , CO 2 , and O 3 , with strong mutual interactions [38], the RF approach was found to outperform univariate linear regression and multiple linear regression. Similarly, in the case of the NO 2 and PM 10 sensors [39], ML methods, including RF and GPR, prevailed over multiple linear regression. A thorough comparison has rated 11 ML methodologies applied to the case of microwave-based liquid sensors [34]. The four top-performing methods, in this case, were MLP, SVP, K-nearest neighbors, and linear discriminant analysis, whereas RF followed on rank 6 after the decision tree method. In the study of radiowave attenuation by vegetation, ANNs were also found to deliver superior performance over multivariate polynomial regression needing between 1140 and 7770 coefficients [85].
Noteworthily, two major differences can be identified between the chemical sensors and sensor systems in the abovementioned studies and the thermal-magnetic sensors of the present work. The first difference, found in some cases, is the stronger nonlinearity of the chemical sensor response [39], [45] and the strong interactions in the cases of concurrent multiple chemical influences [38]. The second difference, encountered in other cases, is the large dimension of the input space. This is especially true with the split-ring resonators, where resonance spectra have consisted of 500 [33] or 5001 [34] signal values. This is the dimension of the input vector, i.e., of the space of the independent variables in the language of regression.
By comparison, the space of independent variables in the present Bayesian approach, V H and V T , is of modest dimension and the polynomial regression has to tackle only weak nonlinearities up to polynomial degree five (V H V 4 T ). This may explain why the Bayesian regression works so well, as summarized in Table I and in Section V-A. A second, substantial distinction lies in the fact that the training phase (the determination of the prior, as described in Section II-D) yields statistically qualified uncertainty estimates for the predictions based on future measurements. These measures of trust are based solely on the training data. Then, independent data can serve to validate the known uncertainties (cf. the 14-fold cross-validation results in Table I), rather than establishing them. Furthermore, the Bayesian approach allows to make predictions, again in conjunction with a corresponding qualified uncertainty, beyond the range of the calibration data. In contrast, RFs do not lend themselves to such extrapolation [38]; it remains to be seen to what extent ML approaches oriented toward classification do. Let us also point out that training in the Bayesian approach, i.e., prior calculation, is straightforward and unambiguous. Questions concerned with the order of the training data or their ideal subdivision into batches do not pose themselves. In addition, the computational cost incurred by polynomial model functions of low degree for prediction and uncertainty evaluation is modest. On-chip integration of the outcome of the present Bayesian algorithm on the sensor chip is therefore a realistic option in the case of CMOS-based microsensor systems.
Last but not least, the Bayesian sensor calibration framework described here shows a way how to most economically calibrate sensor specimens about which nothing is known except that they are representatives of an ensemble of sensors with known statistically scattered properties. Translated into the case of the abovementioned chemical sensors, this would amount to taking a pristine chemical sensor specimen and turning it into a sensor of known, narrow uncertainty over the entire measurement range by first performing a single or just a handful of calibration measurements and then combining this knowledge with the ANNs or RFs trained with other specimens of the same ensemble.
In summary, the analytical Bayesian approach of this article is tailored for cases with comparably simple models linear in the model parameters. It can therefore be seen as complementary to numerical ML approaches handling more complex input-output relations. Nevertheless, its formalism lends itself to the obtention of analytical results on questions including the dependence of the achievable accuracy on model complexity, due either to larger input dimensions or stronger nonlinearity. Separate studies concerned with this challenge should follow. How the dimension of the parameter vector varies with a number of sensor signals and polynomial degree is described in [24].
Finally, let us refer to the topic of Bayesian neural networks aiming at enhancing nonlinear learning algorithms with statistical methods [11,Sec. 5.7]. In the area of sensing, for example, the calibration of low-cost gas sensors using Bayesian neural networks has been shown to benefit of a noninformative prior compared with standard linear regression and neural networks [40]. Bayesian neural networks are computationally expensive. Even with a noninformative prior, the involved integrals are intractable [40]. Variational inference [11,Ch. 10], [84,Sec. 19.4] and Markov chain Monte Carlo methods based on efficient sampling techniques [12] have been shown to bring down the numerical effort to an affordable level [40]. No doubt, such approaches offer rich opportunities for further exploration.

VI. CONCLUSION
In this article, a Bayesian approach to sensor calibration was formulated in detail and applied to a temperature-crosssensitive Hall sensor system. Instead of at least seven mea-surements to calibrate each specimen, the method needs significantly fewer calibration conditions to achieve satisfactory accuracy. The prerequisite of the method is the availability of a prior whose acquisition was also described. The method is broadly applicable to multisensor systems whose responses are well described by parameterized models linear in the model parameters, subject to the assumption that all relevant variabilities are satisfactorily described by univariate or multivariate normal statistics. The findings rely on Bayesian inference and Bayesian experimental design. In view of the reduced number of required calibration measurements, Bayesian calibration has the potential for significant cost savings in the context of industrial sensor calibration, recalibration, and autocalibration.
Future work may address the tradeoff between the effort invested in generating the prior and the accuracy achievable after optimal Bayesian calibration. Likely, Bayesian model selection allowing to identify models with fewer model parameters is a natural extension of the present work. Finally, the method may prove its worth in cases of multisensor systems involving measurands cross-sensitive to more than a single disturbance.

APPENDIX I
Following the nomenclature of [11], the multivariate normal probability distribution of an M-dimensional random variable z with mean μ and covariance matrix is denoted as  N (z|μ, ). It is the short notation for N (z|μ, ) where has dimension M × M and is positive semidefinite. Its determinant is denoted || and its inverse, −1 , is termed precision matrix and is alternatively denoted . Both z and μ are column vectors, i.e., matrices with dimension M × 1. The distribution has its maximum at z = μ, from which it decreases in all directions like a Gaussian. In the univariate case, (A1) reduces to the 1-D Gaussian distribution where the role of the covariance matrix is taken over by the scalar variance σ 2 , i.e., the square of the standard deviation σ . The derivations of (8)- (10) and (13) ),  INCLUDING THEIR DEFINING PARAMETERS, WHICH SERVE AS  INPUTS FOR THE TWO MAJOR MARGINALIZATIONS PERFORMED  IN SECTION II, FOLLOWED BY THE RESULTING  (A8) Table II shows how these marginalization rules are applied in the two cases of Section II-B. The upper part of the table identifies the constituents of p(z a |z b ) and p(z b ). From these, the resulting p(z a ) and its parameters are deduced by straightforward substitution of the terms of the right-hand side of (A6)-(A8). The results are listed in the last three rows.
APPENDIX II An approach to model the distribution of the observed parameter vectors w 1 , . . . , w Q of Q prior-generation specimens is to look for the joint probability distribution p(μ, |w 1 , . . . , w Q ) of the unknown mean μ and covariance of a multivariate normal distribution conditional to the observations. The method of Bayesian multivariate linear regression allows to address this question [12,Sec. 3.6]. It is concluded that p (μ, |w 1 , . . . , w Q ) takes the form of a normal-inverse-Wishart distribution, namely p (μ, |w 1 , . . . , w Q The first term on the right-hand side of the equal sign is a multivariate normal distribution of μ with mean μ Q and covariance κ −1 Q , while the second term is an inverse Wishart distribution of with ν Q degrees of freedom and scale matrix Q . The Bayesian multivariate linear regression analysis enables one to obtain these parameters as [12,Sec. 3.6] and based on suitably chosen prior parameter values κ 0 , μ 0 , ν 0 , and 0 . The posterior predictive distribution of w values implied by p(μ, |w 1 , . . . , w Q ) is then obtained by marginalizing p(w|μ, ) p(μ, |w 1 , . . . , w Q ) over μ and , with p(w|μ, ) = N (w|μ, ). As a result, one concludes that w subject to the observed parameter vectors w 1 , . . . , w Q is distributed as a multivariate t-distribution [12,Sec. 3.6], namely p(w|w 1 , . . . , w Q ) If no information is available before the acquisition of w 1 , . . . , w Q , an appropriate and widely used noninformative prior is the Jeffreys prior || −(M+1)/2 [12, Sec. 3.6] corresponding to the limit κ 0 → 0, ν 0 → −1, and | 0 | → 0, and implying κ Q = Q, μ Q = w, ν Q = Q − 1, and Q = W W. This yields (18). Since a general multivariate t-distribution t ν (m, S) has mean m and covariance νS/(ν − 2) [12, Appendix A], after inspecting (18) and substituting Q − M for ν, w for m, and {(Q + 1)/Q(Q − M)}W W for S, one straightforwardly finds the multivariate normal approximation of (18) with identical mean and covariance as stated in (19)- (21). Note that (19)-(21) constitute a posterior probability density of w derived from the Jeffreys prior on the basis of the experimental evidence w 1 , . . . , w Q . At the same time, in the best tradition of Bayesian statistics [86], it serves as the prior for the subsequent Bayesian calibration of sensor specimens described in Sections II-A-II-C.