Microwave Radiometer Instability Due to Infrequent Calibration

We directly quantify the effect of infrequent calibration on the stability of microwave radiometer temperature measurements (where a power measurement for the unknown source is acquired at a fixed time, but calibration data are acquired at variable earlier times) with robust and nonrobust implementations of a new metric. Based on our new metric, we also determine a component of uncertainty in a single measurement due to infrequent calibration effects. We apply our metric to experimental data acquired from experimental ground-based calibration data acquired from a NASA millimeter-wave imaging radiometer and a NIST radiometer (Noise Figure Radiometer-NFRad). Based on a stochastic model for the NFRad, we determine the random uncertainty of an empirical prediction model of our stability metric by a Monte Carlo method. For comparison purposes, we also present a secondary metric that quantifies stability for the case where calibration data are acquired at a fixed time, but power measurements for the unknown source are acquired at variable later times.


I. Introduction
Microwave radiometers [1], [2] are typically calibrated based on power measurements of reference sources with well-characterized temperatures. Given calibration data acquired from the references, and the measured power of an unknown source, one estimates the temperature of the unknown source. The stability of this estimate depends, in part, on how often one acquires calibration data as well as the particular calibration method employed [3]- [6]. For instance, one might estimate the unknown source temperature at time t based on calibration data for two references at some past time t − τ or multiple times before and after t. In this article, we focus on the first approach that is typically denoted as the "two-point calibration" method.
Existing methods to quantify the stability of radiometer temperature estimates include the variogram [7]- [10] and Allan deviation spectra [11]- [14]. Although valuable, neither approach directly quantifies stability as a function of the variable time delay between the data acquisition time of the calibration data and the time at which the power of the unknown is measured. In this article, we directly quantify how stability depends on this variable time delay and quantify a component of uncertainty due to infrequent calibration. In contrast, neither the variogram nor the Allan deviation provides estimates of the component of uncertainty due to infrequent calibration. To suppress the influence of outliers (e.g., warmup effects in calibration studies), we construct a robust version of our stability metric. (For a related discussion of robust implementations of the variogram metric and robust statistical analysis in general, see [8] and [15], respectively.) We expect that discrepancies between the robust and nonrobust implementations of any stability metric may facilitate identification of outliers in a radiometer time series. For informational purposes, we also study a second new metric that quantifies instability when calibration data are acquired at a fixed time, but power measurements for the unknown source are acquired at variable subsequent times. We illustrate our metrics with experimental data acquired with the NIST radiometer (Noise Figure Radiometer-NFRad) [16], [17] and ground-based calibration data acquired from NASA's millimeter-wave imaging radiometer (MIR) [18]. For the NFRad, we attribute discrepancies between robust and nonrobust implementations of our metric to warm-up effects. We exclude data acquired during the "warm-up" period from our analysis. Based on a nonstationary stochastic model for the NFRad instrument, we determine the random uncertainty of our stability metrics by a Monte Carlo method. We expect our methods to apply to other instruments, including the new generation of small satellites [19]- [24], [25].

A. Linear Calibration Model
Suppose that the theoretical temperatures and powers of two reference sources are T 1 , T 2 , P 1 , and P 2 , and the theoretical temperature and power of an unknown source are T u and P u , respectively. At time t c , denote the measured powers of the first and second references as P 1 t c and P 2 t c , respectively. At time t u , denote the measured power of the unknown source as P u t u . According to the two-point calibration method (assuming perfect knowledge of T 1 and T 2 ), we estimate the temperature of the unknown source at time t u as T u t u , where T u t u = T 1 + P u t u − P 1 t c P 2 t c − P 1 t c T 2 − T 1 .
For compactness, we write T u t u as T u t u = T θ t c , P u t u (2) where a calibration dataset acquired at time t c is θ t c = P 1 t c , P 2 t c . For the special case where power measurements are free of random error but systematic error varies linearly with time, the bias (systematic error) of the temperature estimate for the unknown source is where β is the temporal derivative of the systematic error for any power measurement.
Here, we focus on quantification of instability due to stochastic effects. For the experimental cases studied here, the primary source of instability in radiometer temperature estimates is random electronic gain variation with a complicated correlation structure. In general, if there are deterministic trends in an observed radiometer time series, one should ideally detrend the time series and determine stability metrics from the residual time series. As a caveat, it may be difficult in some applications to distinguish deterministic trends from stochastic variations.

B. Stability Metrics
In our primary analysis method, the power of the unknown is measured at some fixed time t, but the calibration data are acquired at each of many variable times t − τ, where τ is nonnegative. We define a "variable calibration (VC)" deviation as ϵ VC (t, τ ), where ϵ VC (t, τ) = T θ t , P u (t) − T θ t − τ , P u (t) .
In a secondary analysis, the calibration data are acquired at some fixed time t, but power of the unknown is acquired at each of many variable times t + τ, where τ is nonnegative. We define a "fixed calibration (FC)" deviation, ϵ FC (t, τ ), as ϵ FC (t, τ) = T θ t , P u (t + τ) − T θ t , P u (t) . (5) In our analysis, we assume that the observed deviation ϵ VC (t, τ) is a realization of a random variable that has an expected value of 0 and finite second moment (expected squared value) < ϵ VC 2 (t, τ)) > that does not vary with t. The theoretical value of our primary stability metric for quantifying the effect of infrequent calibration is where < X > denotes the expected value of a random variable X observed in statistically similar experiments. We estimate the theoretical value of SVC(τ) as SVC(τ) = 1 2 ϵ VC 2 (t, τ) (7) where ϵ VC 2 (t, τ) is the sample mean (determined from experimental data) of squared VC deviations corresponding to all distinct values of (t − τ, t) where data are acquired. The robust version of our (7) estimate is where MAD(ϵ VC (t, τ)) is the median absolute deviation (MAD) of the set of ϵ VC (t, τ) deviations corresponding to all distinct values of (t − τ, t), where data are acquired. To get the MAD of n values, (x 1 , x 2 , …, x n ), one first computes their median x med . The MAD is the median of the following absolute deviations (|x 1 − x med |, |x 2 − x med |,…, |x n − x med |). As discussed in many references, including [26], the factor 1.4826 in (8) ensures that the scaled MAD of many realizations of a Gaussian random variable converges to the standard deviation of the Gaussian distribution (to within five digit precision). This follows from the observation that when applied to random variables with symmetric distributions, the MAD converges to 1/2 times the interquartile range of the distribution [15], which is approximately 0.67448 σ for a Gaussian distribution with standard deviation σ.
For cases where the asymptotic limit of RSVC(τ) determined from data pooled from N statistically similar and independent experimental datasets as N →∞ is defined, the theoretical value of RSVC is this limit. We define theoretical values for the nonrobust and robust version of our second metric in a similar way for SFC(τ) and RSFC(τ) in terms of the deviation ϵ FC (t, τ).

III. Results
A. NIST Radiometer 1) NFRad Measurement System-At the NIST, noise powers from sources are measured with NFRad-a total-power noise radiometer [16], [17]. The NFRad does not detect power with a typical square-law detector (such as diodes and thermocouples). Instead, the NFRad detects the power of a source with a thermistor that responds to injected RF power by adjusting the amount of dc power dissipated in the thermistor. The RF power of the source is inferred from the reduction of dc power necessary to maintain the thermistor in its original state (where no RF power is injected) based on the dc substitution principle [27].
The NFRad consists of an ultralinear amplifier chain terminated with a NIST Type-IV power meter. The ultralinear amplifier chain refers to the entire radiometer detection, including the amplifier, mixer, and detector. The two calibration noise standards are an ambient 50-Ω load and a cryogenic load immersed in a liquid nitrogen bath. Multiple noise sources may be connected to the measurement system at any one time to facilitate intercomparisons between devices.
We estimate the power of any source (reference or unknown) as P , where P = V off 2 − V source 2 R (9) where V off corresponds to measured voltage when no RF power is injected, V source corresponds to measured voltage when RF power from a source is injected, and R = 200 Ω is the nominal resistance of the thermistor in the power meter. The uncertainty of this resistance is negligible.
The NFRad operates in a band between 1 and 12 GHz. During each cycle of the experiment, we measure four voltages: V off and V source (for the unknown source and the two calibration sources). For each case, the average of ten repeat voltage measurements (acquired every 50 ms) are averaged. The wait time between the different cases within any cycle is 500 ms. For analysis purposes, we assume that the data acquisition times for power measurements within any cycle are the same. The interval between cycles is approximately 26 s.
2) Experimental Data-With the NFRad, we observe voltage time series corresponding to three sources labeled as "Warm," "Ambient," and "Cryogenic" (see Fig. 1). The corresponding temperatures for the three sources are 302.9, 296.9, and 84.25 K. In our study, the "Warm" source serves as the unknown source, and the other two sources serve as calibration reference sources.
In NFRad experiments, early measurements are typically unreliable due to warm-up effects. When we determine our stability metrics from the full data (including the first 200 cycles), SVC and RSVC are not in good agreement [see Fig. 2(a)].We expect this discrepancy since SVC does not down-weight outliers due to warm-up effects, whereas RSVC does. In contrast, when we determine these metrics from the reduced data (which excludes the first 200 cycles), the two metrics are in good agreement [see Fig. 2(b)]. Thus, the dramatic discrepancy between SVC and RSVC (when determined from the full data) indicates the presence of outliers due to warm-up effects. In general, we expect that comparison studies of SVC and RSVC may serve as a diagnostic for detecting other sorts of outliers besides those produced by warm-up effects. Similar comments apply to SFC and RSFC [see Fig. 2(c) and (d)].
3) Observed Stability Metrics-We determine our metrics from the reduced data, which correspond to a time series with 1000 samples. The spacing between samples is τ 0 = 26s. Hence, the time series that we analyze was acquired during an observation time of 7.2 h. For τ /τ o ≥ 1 (where τ 0 ≈ 26 s), we characterize observed values of SVC and SFC (see Fig. 3) with the following empirical prediction models: and SFC τ/τ o = α 2 + β 2 τ/τ o γ 2 K . (11) We determine the model parameters by the method of least squares, where all parameter estimates are constrained to be nonnegative (see Table I).
We model each observed NFRad voltage time series as a realization of a nonstationary stochastic processes (see the Appendix). Based on the values of the simulation model parameters determined from the observed NFRad data, we simulate many realizations of the voltage time series. From each realization of a set of four simulated voltage time series (V off , unknown source voltage, and voltages for the two calibration sources), we determine power time series for the unknown and the two calibration reference sources in exactly the same way that we determined power time series from the observed NFRad data. (See Fig. 4 for comparison of observed and example realizations of simulated power.) From each realization of a set of simulated power time series, we compute SVC and SFC metrics at all lags of interest and determine the prediction model [see (10) and (11)] parameters. From these results, we determine the standard errors of the prediction model [see (10) and (11)] parameters (see Table I) and the standard error of the prediction at each lag (see Fig. 3). Even though the relative uncertainties of the model parameters shown in Table I are generally greater than 50%, the lag-dependent relative uncertainties of the empirical prediction models [see (10) and (11)] range from 2.5% to 5.5% (see Fig. 3).

1) Experimental Data-
The NASA MIR is a five-receiver airborne radiometer built for remote sensing of water vapor, precipitation, and clouds [18]. The MIR is a total power radiometer with periodic through-the-antenna calibrations with two internally mounted blackbody references. We analyze data acquired from a 6-h laboratory experiment configuration of MIR (see [28] for more details). Three calibration targets were viewed periodically. The integration time for each measurement was 200 ms. Accounting for latency between target views, the cycling time was 1.16 s. Based on measured powers for the references and unknown, temperature is estimated with (1). The temperatures of the hot and cold reference sources are approximately 325.6 and 293.7 K, respectively. A third source with a temperature of approximately 79.0 K serves as the unknown source in our study.

C. Stability Metrics
As a safeguard against possible warm-up effects, we fit the above models to a subset of the calibration data that excludes the first 28% of the calibration data. The number of samples in the time series that we determine our metrics from is 12 890. We stress that we analyze only a subset of all the data presented in [28]. The spacing between samples is τ 0 ≈ 1.16 s.
Hence, the time series that we analyze (see Fig. 5) corresponds to a data acquisition time of 4.15 h. We characterize the observed values of SVC and SFC with empirical prediction models determined with model fitting software [29] (see Fig. 6 and Table II) as and SFC τ/τ 0 = a 2 + c 2 τ/τ o + e 2 τ/τ o 2 1 + b 2 τ/τ o + d 2 τ/τ o 2 K . (13) In contrast to the NFRad analysis, we failed to develop a stochastic model for the MIR data and determine standard errors for metric model parameters by Monte Carlo methods. Identification of such a stochastic model for the MIR data is a worthy topic but beyond the scope of this article.
For the NFRad, the maximum (minimum) values of observed SVC and SFC [see Fig. 3

A. Uncertainty Component Due to Infrequent Calibration
For any value of τ > 0, no matter how small, we expect SVC(τ ) to be nonzero, since random errors in power measurements for any two distinct times are different. Hence, SVC has a discontinuity at τ = 0. In the variogram literature, a similar effect called the nugget effect can produce a discontinuity in the variogram at lag 0. (See [7] for more discussion of this point.) For the two radiometers, the empirical prediction model parameters α 1 in (10) and a 1 in (12) correspond to the discontinuity in SVC/K at τ = 0. Assuming that the expected value of the deviation defined in (4) is 0, 2SVC 2 (τ) is a variance. We decompose this variance into the sum of an irreducible variance u ir 2 and a variance associated with infrequent calibration effects u ic 2 (τ) (e.g., temporal variations in gain). Assuming that measurement errors produced by these two effects are independent, we conclude that 2SVC 2 (τ) = u ir 2 + u ic 2 (τ) (14) where the irreducible variances for the NFRad and the MIR are (u ir /K) 2 = 2(α 1 ) 2 and (u ir / K) 2 = 2(a 1 ) 2 , respectively. Given an estimate of u ir , u ir , we quantify the component of uncertainty due to infrequent calibration in any one temperature measurement (given that For other cases, u ic (τ) = 0.
For the NFRad and the MIR, the estimates of α 1 and a 1 (listed in Tables I and II) are 1.58 and 1.51, respectively. In Fig. 7, we show how u ic (τ) varies with τ for both radiometers. We stress that neither the variogram metric nor the Allan variance metric yields an estimate u ic (τ).
For the case where the power of the unknown at a particular cycle is determined with calibration data acquired at the same cycle, i.e., for the case where τ = 0, we denote the theoretical variance of the temperature estimate at each time as u c 2 (assuming that the observed temperature time series is stationary). For the general case where calibration data and the unknown are not acquired during the same cycle, we express the theoretical variance of the temperature estimate as u 2 combined (τ), where u 2 combined (τ) = u c 2 + u ic 2 (τ) .
We determine u c 2 based on the analysis of the observed time series of temperature estimates, where powers of the unknown and calibration reference sources are acquired at the same cycle. In particular, with the auto.arima function [30] in R [31] (a public domain software system), we fit various candidate autoregressive-integrated moving average (ARIMA) models (see the Appendix for a definition of ARIMA models) to the temperature time series, and select the one that minimizes the corrected Akaike information criterion (AICc) [32], [33]. For the NFRad and the MIR, the selected models are AR(1) and AR(5), respectively. Since AR models are stationary, our model selection results are consistent with the hypothesis that the observed time series for the τ = 0 case is stationary. The associated estimates of u c and u c , for the NFRad and the MIR are 1.79 and 1.49 K, respectively. For the case where τ > 0, we express the combined uncertainty of the temperature estimate as u combined (τ) = u c 2 + u ic 2 (τ) . (17) In summary, the major steps in our analysis for general applications are the following.

2.
Estimate the discontinuity in SVC at τ = 0 with an appropriate empirical model for SVC(τ).

3.
Estimate the irreducible variance u ir 2 as twice the square of the discontinuity determined in step 2.

4.
Determine the component of uncertainty due to infrequent calibration at lag τ as: where SVC(τ) and u ir 2 are estimates of SVC(τ) and the irreducible variance u ir 2 , respectively.

B. Comparison of Stability Metrics
For the MIR data, we compare variograms to our SVC metric (shown previously in Fig. 6). We define the variogram at lag τ as γ(τ) = < ϵ var 2 (t, τ) > (18) where ϵ var (t, τ) = T θ t , P u (t) − T θ t − τ , P u (t − τ) . (19) In the variogram study, temperature estimates are determined at each cycle. However, distinct calibration data for the calibration sources are not acquired at each cycle. Instead, distinct calibration data are acquired every Δ cycles. We determine a temperature estimate at each cycle with calibration data acquired before or at the cycle of interest. For the case where the power of the unknown is measured at time t, but calibration sources are not, we set θ t = θ t* , where t * < t is the calibration data acquisition time nearest to t that also precedes t. In the variogram plots (see Fig. 8), τ corresponds to the lag between the two cycles where temperature estimates are determined. The assumed calibration data for the two cycles may be the same or vary and may not be acquired at either cycle. Hence, interpretation of the variogram is more complicated than interpretation of the SVC metric.
For lags less than approximately 300 cycles, the variogram corresponding to Δ =500 cycles is consistently lower than the variograms corresponding to Δ = 1 cycle and Δ = 50 cycles (see Fig. 8). That is, according to the variogram, a less-frequent calibration scheme provides more stable temperature estimates relative to a more frequent calibration scheme for lags less than 300 cycles. This strange behavior is consistent with our claim that interpretation of variogram results is more complicated than interpretation of SVC results (for the examples studied here). Since SVC and the variogram are conceptually different, we do not expect the two metrics to agree.

C. Other Calibration Schemes
In this article, we quantify stability for a calibration scheme where calibration data are acquired before the power of the unknown source is measured. Our stability metrics depend on both gain variations and possible temporal variations in the brightness temperatures of the reference sources. For the experimental data analyzed here, such brightness temperature variations affect our stability metrics in a very small or negligible way. In actual satellite systems, unknown source temperatures are typically determined with more complicated calibration schemes. For instance, the temperature estimate of the unknown source may depend on calibration data acquired before and after the time at which the power of the unknown source is measured. Next, we discuss a possible modification of our primary stability metric for application to more complicated calibration schemes.
Suppose that we have fine-scale calibration data at every time t u where the power of the unknown source is measured. Furthermore, suppose that we subsample these data to produce data on a coarse temporal grid, where the interval between successive samples is Δ. Suppose that the closest time to t u where there is coarse-scale calibration data is t * = t u + τ. Denote the temperature estimate of the unknown source at t u determined with the closest coarsescale calibration data based on the more complicated, but unspecified, calibration scheme as T t u , τ (20) and define a residual ϵ t u , τ = T t u , 0 − T t u , τ (21) where T t u , 0 is determined, in part, from fine-scale calibration data. Assuming that the mean square value of ϵ(t u , τ) exists and does not depend on t u , a natural candidate for a modified version of SVC is SVC extend (τ) = 1 2 < ϵ 2 t u , τ > (22) where the expectation would be over all possible values of t u where −Δ/2 ≤ τ ≤ Δ/2. One would estimate SVC extend (τ) from experimental data in a way similar to how SVC is determined [see (7)]. The other steps to determine the component of uncertainty due to infrequent calibration would be similar to those listed at the end of Section IV-A. Application and analysis of this extended metric to more complicated calibration schemes is worthwhile but beyond the scope of this article.

V. Summary
We directly quantified the effect of infrequent calibration on the stability of microwave radiometer temperature measurements acquired with the NFRad and the MIR with a new metric. For the calibration scheme studied, we also identified a component of uncertainty in a temperature estimate at a particular time based on the lag between the times at which the unknown power and the powers of the calibration sources are acquired. In contrast, existing stability metrics such as the variogram and Allan variance do not provide an estimate of this uncertainty component.
We developed a nonstationary stochastic model for the NFRad and determined random uncertainties of an empirical prediction model of our metric by Monte Carlo simulation. For the NFRad, we demonstrated that warm-up effects produced discrepancies between robust and nonrobust implementations of our metric. Hence, we expect that discrepancies between the two implementations may facilitate identification of outliers. For comparison purposes, we also studied a secondary metric that quantified stability for the case where calibration data are acquired at a fixed time, but power measurements for the unknown source are acquired at variable later times.
For the NFRad and the MIR, we fit many candidate empirical prediction models to our observed metrics and selected a parsimonious model from those that agreed best with observed values according to a root-mean-square deviation criterion. Since the radiometer hardware are different and the temperatures of the reference sources and unknown sources are different, differences in the selected mathematical forms are not unexpected. In general, the mathematical form of the "best" empirical prediction model may vary for other radiometers.
In our analysis, we determine our metric from relatively short time series. For sufficiently long time series, one might determine metrics from contiguous time intervals and search for temporal variations in the expected value of the metrics. Given a sufficiently long time series, one might determine a metric for each of many contiguous time intervals and determine metric uncertainty due to random effects based on these repeat measurements.
Due to size and mass limitations, small satellites typically lack blackbody targets and utilize external targets for calibration. Because these external targets cannot be observed as frequently as internal references, our methods are relevant to small satellites. Our methods should be useful for ground-based calibration studies as well.

Acknowledgment
The authors would like to thank J. Randa and D. Gu of NIST for helpful comments on the NFRad measurement system. Contributions to this work by staff of NIST represent an execution of the NIST Greenhouse Gas and Climate Science Measurements Program (Special Programs Office, Associate Director for Laboratory Programs, NIST, U.S. Department of Commerce). Work described herein was conducted as part of an interagency memorandum of understanding between NASA and NIST to advance the state of the art in noise source measurements and their application to remote sensing. Certain commercial products are identified in this article to foster understanding. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the products identified are necessarily the best available for the purpose.
This work was supported by NIST and NASA.

Stochastic Model for NFRad data
We model the unobserved gain time series, G, as a stochastic process G =1 + z, where z is an ARIMA time series. Our discussion of ARIMA processes closely follows [34]. An ARIMA process is a stochastic process that can be "differenced" to yield a stationary and causal autoregressive moving average (ARMA) process. More specifically, if {X t } is an ARIMA(p, d, q) process, Y t := (1 − B) d X t is a causal ARMA(p, q) process, where BX j := X j−1 and d is a nonnegative integer. If {X t } is an ARMA(p, q) process, it satisfies where {Z t } is white noise and ϕ 1 , …, ϕ p and θ 1 , …, θ q are model parameters to be determined. (In a causal ARMA model, X t can be expressed as a weighted sum of realizations of the white noise terms at time t and all times earlier than t.) ARIMA processes where d = 0 are stationary ARMA(p, q) processes. The terms AR(p) and MA(q) are shorthand for ARMA(p, 0) and ARMA(0, q) processes, respectively.
A simple example of an ARIMA process is a Gaussian random walk (a Brownian motion sampled at uniformly spaced times) X t = Z 1 + Z 2 + · · ·Z t , where Z 1 , Z 2 , …, Z t are independent and identically distributed Gaussian random variables with mean 0 and variance σ 2 , and X 0 = 0. Since the variance of X t is σ 2 t, {X t } is not a stationary process. However, Y t := (1 − B)X t = Z t is stationary with variance σ 2 and mean 0. Thus, the Gaussian random walk is an ARIMA(0,1,0) process. Although nonstationary, one can define the generalized power spectrum of the Brownian motion process as S(f) ∝ f −2 [35].
We stress that we simulate one realization of G that applies to all the sources as well as the observed V off time series. In our analysis, we approximate G as a scaled version of V off . Based on this approximation, we determine an ARIMA model for G based directly from analysis of the observed V off time series. In particular, we fit ARIMA models with varying orders to the V off time series and select the model that minimizes the AICc [32], [33], with the auto.arima function [30] in R [31] (a public domain software system). This approach yields an ARIMA(2,1,3) model. Hence, our initial model for z is the following nonstationary time-series model: y t = ϕ 1 y t − 1 + ϕ 2 y t − 2 + θ 1 e t − 1 + θ 2 e t − 2 + θ 3 e t − 3 + e t (24) where y t = z t − z t−1 and y t is an ARMA(2,3) stochastic process. The e t term is Gaussian white noise with variance σ e 2 . We do not expect a scaled version of the gain to exactly predict V off . Hence, we model the observed V off times series as the product of the unobserved stochastic gain function G and a scale factor β off plus additive Gaussian white noise, e off (t), as follows: V off (t) = β off G(t) + e off (t) . (25) We set β off equal to the median value of the observed V off time series. Estimation of the variance of the additive Gaussian white noise term, e off (t), in our model for V off is nontrivial because G(t) contains additive white noise. We first estimate the variance of the additive noise in V off at any time, σ e off 2 , as the sample variance of the V off time series. As a caveat, we expect that the sample variance of V off is an inflated estimate of the additive noise variance for various reasons. Our goal was to get a conservative estimate of the additive noise variance and then estimate a scaled version (where the scaling factor is less than 1) of it in a later analysis. (The initial estimates of the variances of additive noise terms for each reference source were similarly obtained.) We estimate the standard deviation of the additive noise terms in our model [see (25)] for V off , σ e off , as σ e off = ασ e off (26) where α is an adjustable scaling factor. Later, we scale e t in our (24) by replacing e t with e t = κe t where κ is adjustable. However, in the initial model described by (24) and (25), we do not scale e t . Table III lists estimated model parameters for the model of V off described by (24) and (25).
We model the voltage time series for any particular reference source as where Θ is an AR(1) time series. Thus, we have Θ(t) = γ 1 Θ(t − 1) + δ(t) (28) where δ(t) is Gaussian white noise with adjustable variance σ δ 2 and γ 1 falls in the interval (−1, 1). We stress that each simulated realization of the Θ time series applies to simulated time series that corresponds to the two reference sources and the unknown source. Like before, β ref is a scale factor set to the median value of the reference time series of interest. The term ϵ ref (t) is Gaussian white noise, and α is the adjustable scaling factor discussed earlier. We list estimated model parameters determined from V off and the reference sources directly in Tables III and IV. In Table V, we list parameters determined from a grid search, where we minimize the difference between the observed value of SVC and the mean value of 15 simulated realizations of SVC for all candidate values of the relevant model parameters.  In our study, we assume that the expected value of observed and simulated metrics exist. As a technical aside, for our Gaussian noise model, the expected value and the expected squared value (and hence the theoretical variance) of our simulated metric may not exist because the possible values of simulated normal (Gaussian) random variables are unbounded. To make this discussion more clear, we note that if observed powers are realizations of independent Gaussian random variables, the expected value and variance of the temperature estimate [see (1)] is a ratio of correlated Gaussian random variables, and its expected value and variance do not exist [36]. However, as discussed in [36], if the denominator is simulated from a truncated Gaussian distribution that sets a lower bound on the realizations, the ratio has welldefined expected value and theoretical variance. One can set this lower bound so that the probability of simulating a realization below the threshold from the original (untruncated) distribution is negligible. Hence, for theoretical reasons, it is reasonable to consider a truncated normal distribution model for noise. For instance, simulate realizations in the range (−kσ, kσ), where σ is the standard deviation of the noise and k is, say, 10. For our problem, we expect that such a truncation yields metrics with well-defined theoretical expected values and theoretical variance. Therefore, from a conceptual perspective, we should regard our simulation model as sampling from such truncated distributions. We stress that the number of realizations that one would have to simulate to observe a realization that differs from 0 by more or less than ten standard deviations is so large; the probability of observing one is negligible.   Stability metrics determined from reduced NFRad data. (a) Estimated SVC. (b) Estimated SFC. We predict the metrics determined from observed data with empirical models [see (10) and (11)]. We show the average of metrics determined from 500 realizations of simulated data (see the Appendix). We denote this average as the Monte Carlo prediction. The relative uncertainties of the predicted metric values determined by our empirical models [see (10) and (11)] generally increase with lag and range from approximately 2.5-5.5% for the lags shown. The interval between adjacent cycles (where data are acquired) is τ 0 ≈ 26 s. The normalized lag, τ/τ 0 , takes positive integer values (1, 2, 3,…).   Stability metrics determined from MIR data. In (a) and (b), we predict the observed metrics defined in (12) and (13)   (a) Empirical prediction models for SVC for the NFRad [see (10)] and the MIR [see (12)].
(b) Estimate of component of uncertainty due to infrequent calibration, u ic (τ) (15), for a measurement of the temperature of the unknown source. For MIR data, we compare SVC (same as in Fig. 6) to variograms determined for cases where the interval between acquisition of calibration data, Δ, varies. As described in Section IV-B, variograms for Δ > 1 cycle correspond to the case where earlier (relative to times when the power of the unknown source is determined) calibration data are assumed for power estimation. For Δ=1 cycle, at each cycle, power measurements of the unknown source are determined with calibration data acquired at that cycle.