A High-Resolution Single-Photon Arrival-Time Measurement With Self-Antithetic Variance Reduction in Quantum Applications: Theoretical Analysis and Performance Estimation

An almost all-digital time-to-digital converter (TDC) possessing subpicosecond resolutions, scalable dynamic ranges, high linearity, high noise immunity, and moderate conversion rates can be achieved by a random sampling-and-averaging (RSA) approach with the self-antithetic variance reduction (SAVR) technique for time-correlated single-photon counting (TCSPC) quantum measurements. This article presents detailed theoretical analysis and behavior-model verifications of the SAVR technique to effectively enhance the conversion rate of an asynchronous RSA-based TDC by more than 62 times with 7% power overhead. In addition, the proposed performance estimation methodology for SAVR can greatly improve the computation efficiency during the system-level design and reduce the read-out circuit complexity in the silicon-photonics RSA-based TCSPC realization.


I. INTRODUCTION
Time-correlated single-photon counting (TCSPC) [1], [2], [3], [4], [5] has become the key functionality in a variety of emerging quantum technology, including quantum imaging/sensing [6], [7], [8], quantum-state preparations [9], [10], [11], [12], quantum cryptography [13], [14], [15], positron emission tomography [16], [17], time-resolved spectroscopy [18], fluorescence-lifetime imaging [19], [20], molecular imaging, live-cell/tissue microscopy [2], free-space time-offlight measurements [21], and light detection-and-ranging (LiDAR) [6], [22]. One example of quantum-state detections shown in Fig. 1 exploits TCSPC to measure the quantum state of a light beam, which is a "vector" but very different from the vector defined in classical physics. Under the Dirac notation, a normalized quantum state |ψ can be expressed by the superposition of the horizontal polarization basis vector |H and vertical polarization basis vector |V in the Hilbert space [10] as follows: where c H and c V are complex coefficients, and the sum of their squared magnitudes is "1" in this orthonormal basis state-expression. More importantly, c H and c V are referred to as complex probability amplitudes of the polarization basis vectors, |H and |V , respectively. The physical meaning of these coefficients is that the squared magnitude of the coefficient, multiplying a particular polarization basis vector, is equal to the probability of photons occurring in that polarization after many single-photon measurements, which can be performed by the apparatus with TCSPC shown in Fig. 1(a); whenever TCSPC H receives a single photon from the |H output of the light polarizer at a certain time of t H , the circuit in TCSPC H adds a one on the accumulated number N H ( t H ) and records the quantized t H , i.e., a time-to-digital conversion (TDC) process, for the received photon. Similarly, TCSPC V accumulates N V ( t V ) and records the quantized t V for the received photons from the |V output of the light polarizer. Thus, the probability of the photon in each polarization can be obtained as follows: . (2) Note that the notations of the probabilities are conditioned on the measured quantum state |ψ . Meanwhile, since the quantum state is not affected by the reference phase of its complex coefficients, usually the phase of c H can be safely set to zero, and the phase of c V , φ, can be obtained by performing two more quantum-state measurements, as shown in Fig. 1(b) and (c): where (|-45 , |+45 ) and (|L , |R ) are the other two pairs of the orthonormal polarization basis vectors. With these three conditional probability measurements, the complex probability amplitudes, c H and c V , of the quantum state can be fully determined from (2) and (3) [11]. In other words, the probability measurements performed by the TCSPC system are essential in order to determine a quantum state. Consequently, the TDC specifications in the TCSPC systems are required to achieve picoseconds fine resolutions, submicroseconds dynamic ranges, high linearity and fast conversion rates under aggressive silicon-area, voltage, and power constraints of modern complementary metal-oxide semiconductor (CMOS) integrated-circuit process technology. The state-of-the-art TDC designs can satisfy some of these specifications, however, with tradeoffs among the other performance metrics [1], [2], [3], [4], [6], [7], [8], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44]. To simultaneously accommodate all performance requirements for the majority of quantum applications, a TCSPC system proposed in [45] incorporates the digital random sampling-and-averaging (RSA) technique [46], [47], [48], [49] into a two-step TDC procedure to enable high-resolution time-interval measurements without compromising the performance metrics among accuracy, dynamic range, linearity, and power/area efficiency. However, the slow conversion rate of the RSA technique has greatly limited the broadness of its applications especially for high frame-rate/fill-factor quantum imaging and ranging systems [6], [7], [8], [16], [17], [18], [19], [20], [21], [22]. Fortunately, there is a concept of variance reduction (VR), which has been broadly used in the fields of applied mathematics and financial engineering [51], [52] to reduce time consumption of the Monte Carlo methods in derivatives pricing and risk management [52]. Based on the similar concept, this article proposes a simple circuit implementation for the self-antithetic variance reduction (SAVR) technique [50] to effectively suppress the quantization-noise power or equivalently enhance the conversion rate of an asynchronous RSA-based TCSPC system by automatically introducing autocorrelations into the sampled random variable during the TDC process.
To comprehensively describe a practical RSA measurement with the SAVR technique, this article derives the mathematical expressions of the theoretical variance and its approximation form for highly efficient performancepredictions, which are all experimentally verified by the simulations. Meanwhile, the methodology of converting the mathematical models of the RSA with SAVR technique into an almost all-digital and low-power implementation is elaborated by a circuit-level example with a guideline of setting the circuit parameters. Compared to an ordinary RSA measurement [45], enabling the SAVR technique can enhance the conversion rate by more than 62 times with about 7% power overhead based on the simulation results.
The rest of this article is organized as follows. The circuitand-system level overview of an RSA-based TCSPC system with the SAVR technique is introduced in Section II. The required fundamentals of RSA and parameter definitions are summarized in Section III. The probability principles, theoretical variances, performance estimation methodology, and behavior-model simulations of the RSA with SAVR technique are derived and presented in Section IV. Finally, Section V concludes this article.

II. CIRCUIT AND SYSTEM OVERVIEW
The block diagram of the RSA-based TCSPC system with the SAVR techniques is shown in Fig. 2(a). The silicon-photonics interface is similar to the system described in [45], where each detection pixel includes a single-photon avalanche diode (SPAD) with the quenching/clamping circuits [3], [6], [7], [8] for the optical-to-electrical power domain transition followed by a silicon-photonics analog front-end (AFE) [53] and high-bandwidth CMOS pulse generator to convert the received single photons to eventtriggered electrical voltage pulses. For the high-accuracy time-interval measurement, the timing of T START is set by the START pulse from a specific single-photon detection pixel [2] while the multiplexer for the STOP pulse can select the timing of T STOP from either the other specific single-photon detection pixel [2] or the system input clock CK IN [3], [4], [6], [7], [8]. In any configuration, the time interval t between T START and T STOP is the primary quantity under the measurement, as shown in Fig. 2(a). The time-to-amplitude conversion (TAC) circuit, containing a current source and capacitor banks with low-resolution static controls for coarse dynamic-range/conversion-gain tunability, converts the time-interval information into a constant dc voltage [45] buffered by the variable-gain amplifier (VGA) offering noise rejection and driving capabilities with additional tunability.
The TDC mechanism, which is the main focus of this article, is illustrated in the lower half of Fig. 2(a). The two identical voltage-controlled delay lines (VCDL) are both driven by the input clock CK IN , so the clock periods of CK 1 and CK 2 are equal to the period of CK IN , T, but the time-domain delays of CK 1 and CK 2 are functions of the dc voltages V DD and V VGA , respectively; and the periodic delta τ between their delays represents the scaled version of t, as described in [45]. After merging CK 1 and CK 2 by the rising clock-edge combiner, the resulting clock CKτ maintains the T periodicity while its positive duty-cycle τ /T is the primary quantity under the RSA measurement, as shown in Fig. 3. To perform the random sampling process, a free-running ring-based digitally controlled oscillator (DCO) generates an asynchronous clock, CK DCO , to sample the waveform of CKτ through a 1-bit D flip-flop (DFF). At the meantime, the randomness of CK DCO sampling-instants is mainly accomplished by a digital pseudorandom-binary-sequence generator (PRBS Gen.) in Fig. 2(a) to dynamically modulate the DCO period and to ensure that the sampling probability density function (PDF) can satisfy the RSA criteria for the sampling outcome Y to be the random variable described in [45] or Section III. Finally, the averaging and noise-filtering VOLUME 3, 2022 processes are executed by the data and cycle accumulators (ACC.) to first count the numbers of ones N Y and N DCO at the outputs of the DFF and DCO, respectively; then, the final result per measurementȲ (mean value of all sampled Y), is basically the ratio of the counter outputs N Y /N DCO [45]. At this point, the process of an asynchronous RSA-based TDC has been completed.
To enhance the conversion rate of RSA, the SAVR technique can be realized on the top of asynchronous RSA by simply adding extra static coarse controls for the reconfigurabilities of the DCO frequency and PRBS energy-level. The detailed circuit and reconfigurable schemes of the DCO are shown in Fig. 2(b), including a 7-bit PRBS generator at the bottom and a four-stage pseudodifferential inverterbased DCO at the middle with static push-pull resistancecapacitance controls for SAVR. This PRBS generator produces an independent and identically distributed (I.I.D.) 7-bit binary control code, PRBS<6:0>, to dynamically modulate the DCO period through the digital-transistor-capacitor banks in the rate of CK DCO as follows: where "n" is the sample index from 1 to N DCO ; T DCO,n is the nth period of the DCO; T PRBS,n is the nth DCO period extension controlled by PRBS<6:0>; T DCO,MIN is the minimum DCO period when T PRBS,n = 0; T PRBS,MAX is the maximum DCO period extension, which sets the span of the time-domain sampling PDF of a "single" CK DCO sampling edge, f DCO,1 (t), as shown in Fig. 3. Intuitively, the nonlinearity of the binary-control scheme and only 127 (= 2 7 − 1) possible digital-to-period conversion steps would cause poor uniform-distributed sampling PDFs, which are not only discrete but also unequally spaced. Fortunately, because of the phase-noise accumulation property of ring oscillators [54], inherent circuit/device noise [54], central limit theorem [55], and modulo-T circular convolution theorem [55], [56], this low-cost implementation can equivalently offer extremely high resolutions and continuous sampling PDFs, as discussed in [45]. The low-power control schemes shown in Fig. 2(b) for enabling SAVR are the static reconfigurabilities of T PRBS,MAX and T DCO,MIN highlighted in pink and green, respectively. To statically adjust T PRBS,MAX (pink), the thermometer code S<6:0> can force the consecutive mostsignificant-bits (MSBs) of PRBS<6:0> to zeros, which accordingly shut down the MSB capacitor banks and only leave the residual activated least significant bits (LSBs) of PRBS<6:0> to modulate T PRBS,n . In other words, the range of the T PRBS,n , i.e., 0 to T PRBS,MAX , can be scaled down in powers of two based on the number of zeros in S<6:0> with a constant LSB step size; this coarse scaling approach correspondingly reduces the total number of the digital-to-period conversion steps. Therefore, the other control knob is employed by universally tuning the common drain-source dc voltage of the capacitor banks through a power/area efficient R-2R resistive ladder digital-to-analog converter [57], i.e., R-2R DAC in Fig. 2(b), without an analog voltage buffer due to the pseudodifferential DCO architecture. Using the digital transistors to perform this varactorbased capacitance tunability [54] can moderately scale down the LSB step size with negligible power overhead to partially maintain the total number of digital-to-period conversion steps when S<6:0> heavily scales down T PRBS,MAX . It is important to note that the reconfigurability of T PRBS,MAX is mainly for demonstrating the effect of SAVR; the theoretical and simulation results in Section IV both confirm that a smaller T PRBS,MAX offers more SAVR, but this is limited by the achievable minimum LSB capacitance in CMOS process technology. In any case, the gaps among the LSB steps in the DCO sampling PDFs are filled by the accumulation of circuit/device noise as mentioned. On the other hand, to statically adjust T DCO,MIN [see green in Fig. 2(b)], the primary inverter stages of the DCO are implemented in a cascode structure, so the gate terminals of the partial tail P/N transistors can be biased through the replica of the inverter stage [54] and controlled by a low-power current DAC, as shown in Fig. 2(b). At first glance, the implementations of these static reconfigurabilities are simple and ordinary, which indeed indicates the low power/area overhead of this SAVR technique. However, the theory and concept in behind are quite complicated and unintuitive; the detailed analysis is elaborated in Section IV.
All inherent phases of the DCO can be exploited to simultaneously sample the information signal CKτ and linearly improve the conversion rate of each RSA measurement. To minimize the complexity, each DCO clock phase can actually manage its own asynchronous RSA process (DFF and ACC.) with its own clock (or phase) domain, as shown at the top of Fig. 2(b); once the common cycle accumulator reaches the designated sampling number N DCO , an indicator is asynchronously sent to all data accumulators to stop individual counting processes all together, and then a final summer only needs a one-time operation to add all counter results in total. Note that the theoretical analysis in this article only focuses on one of these parallel RSA processes since they are basically identical and just occurring simultaneously. More importantly, the correlation induced by SAVR only exists among all sequential samples of each individual random variable, Y (i.e., at each DFF output), generated by its own DCO sampling phase, not among multiple DFF outputs. Therefore, the theoretical analysis only considers one set of Y, CK DCO , N Y , and N DCO , as shown at the top-right corner of Fig. 2

III. RANDOM SAMPLING-AND-AVERAGING OVERVIEW
This section briefly summarizes the parameter definitions and important results from [45], which are essential information and background for the RSA with SAVR technique.
In the asynchronous RSA-based TCSPC system of this article, Y is the primary random variable; E[Y] is the expectation of the random variable; Y n is the nth experimental  . One example of the asynchronous RSA sampling processes is shown in Fig. 3, where the waveforms of CKτ and CK DCO are assumed to have coincident rising edges at t = 0 for the sake of simplicity. With the parameter definitions in (4) and the phase-noise accumulation property of ring oscillators [54], the nth absolute sampling time t SAMP,n , as illustrated in Fig. 3, can be generalized as follows: Each t SAMP,n contains the deterministic term n·T DCO,MIN and stochastic term due to the DCO phase-noise accumulation, which describes the uncertainty of each sampling instant and can only be represented by a PDF f DCO,n (t). Therefore, the nth CK DCO rising edge occurs randomly but is confined within the distribution span and density magnitude of its own PDF, i.e., light-red areas in Fig. 3. More importantly, the stochastic term of each t SAMP,n is the accumulation of "n" samples of an I.I.D. random variable (i.e., T PRBS,k , k = 1 to n) created by the PRBS generator for "n" times, as shown in the second term of (5); equivalently, the PDF of the nth DCO sampling instant, f DCO,n (t), is the convolution result of total "n" fundamental PDFs, f DCO,1 (t), from the PRBS generator based on the convolution theorem [55]. Note that the fundamental PDF, f DCO,1 (t), has a T PRBS,MAX distribution span and constant 1/ T PRBS,MAX density magnitude, as shown in Fig. 3. When n >> 1, the central limit theorem guarantees that f DCO,n (t) converges to a Gaussian distribution with a wide distribution span n· T PRBS,MAX , as shown at the bottom right of Fig. 3 and top of Fig. 4(a) regardless of the sampling PDF, f DCO,1 (t), from the PRBS generator.
Because of the periodicity of CKτ , the entire distribution span of f DCO,n (t) is automatically segmented and compressed into a [0, T) duration and equivalently converted into a modulo-T random sampling PDF, f n (t), which still follows the convolution theorem but shall be mathematically expressed by a circular convolution CConv[·], due to the modulo-T operation [56] as follows: where δ(t) is the unit impulse; the "t" of f 1 (t), f n−1 (t), and f n (t) is the modulo-T time-domain variable within [0, T), but the "t" of f DCO,1 (t) and δ(t) is the absolute time-domain variable referenced to t = 0. Based on (6), f 1 (t) plays as not only the PDF of the first sampling instant but also the fundamental PDF element to obtain any f n (t) from f n−1 (t). According to both mathematical equation and statistical simulation in [45], as f DCO,n (t) converges to a Gaussian PDF with increasing "n," f n (t) converges to a uniformly distributed PDF with a constant density magnitude 1/T across the [0, T) distribution span. In other words, for all "n" >> 1, f n (t) becomes an "identically distributed" PDF and independent from the parameters of T DCO,MIN , T PRBS,MAX , and even "n," as illustrated in Fig. 4(a) and the top-row of Fig. 4(b). Therefore, the expectation of the asynchronous RSA measurement result also converges and can be expressed by a continuous onedimensional geometric probability [55] format as follows: where y n (t) is the modulo-T waveform of CKτ sampled by the modulo-T sampling PDF, f n (t), of CK DCO ; P 1 (= 1 − P 0 ) is the probability of obtaining a Y n as Logic-1, and P 0 as Logic-0. Finally, the time interval under the RSA measurement can be obtained based on the following: where K TAC , K VGA , and K DL are the conversion-gains of the TAC, VGA, and VCDL, respectively. Although all f n (t) are identical when n >> 1, the correlation among all f n (t) is a complicated function of f 1 (t). As proven in [45], to implement I.I.D. sampling PDFs, the distribution span of f 1 (t), T PRBS,MAX , has to satisfy one of the two criteria: is the modulo-T operator, so the samples of the random variable Y n can be pairwise independent. And, the theoretical variance of the asynchronous RSA measurement under the I.I.D. random sampling condition can be expressed by In agreement with the weak law of large numbers [55], the theoretical variance Var[Ȳ ], i.e., the power of the estimation error or quantization noise, in (9) reciprocally degrades with the total number of samples N DCO , and additionally it is a function of P 1 and P 0 , which matches the variance of a Bernoulli [55] random variable as each Y n only has two possible outcomes Logic-1 or Logic-0. In sum, an asynchronous RSA measurement can trade in a larger number of samples, N DCO , which is equivalent to a longer measurement time or slower conversion rate, for achieving a higher accuracy.
In this article, the experimental variances obtained from simulation data are required to verify the theoretical derivations and can be expressed as follows [45], [52], [58]: The accuracy of this experimental variance depends on the number ofȲ , N EXP , obtained from experiments or simulations. Note that, in realistic RSA measurements under a certain accuracy requirement with the settings of N DCO , T DCO,MIN , T PRBS,MAX , and T, only a singleȲ is required to represent one measurement result [45].

IV. SELF-ANTITHETIC VARIANCE REDUCTION
To comprehensively elaborate SAVR and its realization methodology in an asynchronous RSA system, the following analysis process can be started from formulating the variance of a Monte Carlo estimate in general as follows: where Cov[Y n , Y k ] is the pairwise covariance of any two samples Y n and Y k ; both "n" and "k" are the sample indexes.
, which is a symmetric covariance matrix. If Y 1 , Y 2 , . . . , and Y N DCO are all pairwise independent, then Cov[Y n , Y k ] = 0 for all n ࣔ k in (11), and the pairwise covariance sum [the second term in the fifth line of (11)] is zero, so (11) becomes the I.I.D. situation, as shown in (9). The key idea of SAVR is to create nonzero correlations among the samples, i.e., Y 1 , Y 2 , . . . , and Y N DCO , of the random variable Y, so that the pairwise covariance sum can be negative, and then overall variance Var[Ȳ ] per RSA measurement can be effectively less than the variance of the I.I.D. situation in (9).

A. THEORETICAL CONCEPT AND PERFORMANCE ESTIMATION
According to the detailed analysis about the covariances of the adjacent samples Cov[Y n , Y n+1 ] under three T PRBS,MAX scenarios (i.e., T PRBS,MAX <, =, or > T) in [45], the conclusion shows Cov[Y n , Y n+1 ] can have pronounced nonzero values when Mod[ T PRBS,MAX , T] < T due to the nonuniform conditional joint PDFs within their own integral timeintervals [0, τ ) and [τ , T). In this article, the case of nonzero covariances between adjacent samples is further extended to the deduction of any pairwise correlation between Y n and Y k due to the fact of the DCO phase-noise accumulation property reflected by (5) and (6). Thus, if the technique can take advantage of these nonzero covariances, Cov[Y n , Y k ], and assure the pairwise covariances sum is negative, then VR can be successfully performed.
To find Cov[Y n , Y k ], all conditional joint PDFs of any [Y n , Y k ] pair have to be first formulated, where n = 1 to (N DCO − 1) and k = (n + 1) to N DCO are sufficient to cover all covariance elements in the symmetric covariance matrix. Though there are only four possible binary combinational outcomes of a certain [Y n , Y k ] pair, its conditional joint PDFs shall also cover all possible binary combinational outcomes from Y n+1 to Y k−1 because the accumulation property and convolution theorem described in (6)   where q = 0, 1, 2, …, (2 k−n -1); all possible binary combinational conditions of [Y n , . . . , Y k−1 ] are represented by the corresponding decimal numbers "q" and decimal-to-binary operators D2B[·] for the sake of simplicity. In Fig. 4(b), the PDF curves and annotations highlighted in black represent the examples of (12), including k = (n + 1), (n + 2), and (n + 3), which follows the convolution theorem under the conditions of [Y n , …, Y k−1 ]. Since the location of "τ " within the modulo-T time interval [0, T) defines the PDF boundaries for the probability of Y k to be Logic-1 or Logic-0, the conditional joint PDF of a [Y n , Y k ] pair under all possible conditions of [Y n , …, Y k ] can be obtained by forcing one part, i.e., [0, τ ) or [τ , T), of the PDF in (12) to zero to satisfy one of the possible Y k conditions as follows: In other words, each conditional joint PDF under the conditions of [Y n , …, Y k−1 ] in (12) can diversify into two conditional joint PDFs under the conditions of [Y n ,. . . , Y k ], as shown in (13) and (14), which are highlighted in blue and orange, respectively, in Fig. 4(b). Note that the distribution profiles of all conditional joint PDFs are convoluted functions of τ and f 1 (t), and f 1 (t) itself is a function of Mod[T DCO,MIN , T] and T PRBS,MAX . Therefore, the plots shown in Fig. 4 (13) and (14): As shown in (15)

to N DCO when T PRBS,MAX ≈ T/16, and (c) single-dimensional covariance sums and partial covariance sums of RSA w/o SAVR ( T PRBS,MAX ≈ T) and w/ SAVR ( T PRBS,MAX ≈ T/8 and T/16).
even with the VR techniques. In short, the computation effort of the theoretical variance with nonzero pairwise covariance formulated by (15) turns out to be impractical for the performance estimation of the RSA with SAVR technique. This issue reflects the necessity of developing a computationefficient approach for the overall variance calculation.
To simplify the analysis process, the mechanism of SAVR is elaborated by specific examples in an asynchronous RSA system, and then general cases can be further summarized. ; third, the correlation is enhanced (longer correlation tails) with decreasing T PRBS,MAX ; fourth, the correlation degrades with increasing |k − n|; fifth, more importantly, the Cov[Y n , Y k ] of each "n" has a consistent sign-alternation pattern within each correlation-envelope period which guarantees the cancellation in the total covariance sum, i.e., effective VR, though the amplitude of the correlation-envelope attenuates with |k − n|. The zoom-in versions of the correlation functions at the top of Fig. 5(a) and (b), respectively, highlight the least-common-multiple periods (LCMPs) of the signalternation patterns and correlation-envelope periods of the two T PRBS,MAX scenarios. The theoretical variance calculation can be greatly simplified by grouping the effect of covariance cancellation into single or multiple LCMPs instead of considering all the covariance elements.
The effect of this SAVR technique can be more comprehensively demonstrated by the single-dimensional covariance sums and the overall (i.e., two-dimensional) covariance sum Var [Ȳ ]. The single-dimensional covariance sum is defined as the summation of all Cov[Y n , Y k ] from k = 1 to N DCO with respect to each "n," e.g., the summation of all red data points in the Cov[Y 128 , Y k ] plot is the single-dimensional covariance sum at n = 128. The single-dimensional covariance sums of the uncorrelated (I.I.D.) and two correlated examples are shown in Fig. 5(c) whose horizontal-axis variable

Engineering uantum
Transactions on IEEE is "n," and n = 1 to N DCO . The single-dimensional covariance sums of the I.I.D. example (black dots) are always 0.25 for all "n" since each "n" only has its nonzero covariance Var[Y n ] = 0.25 at n = k. On the other hand, the singledimensional covariance sums of the two correlated examples (red dots) are all smaller than 0.25 across all "n" in Fig. 5(c) because the periodical sign-alternation patterns of their correlation functions are canceling the power of Var[Y n ] = 0.25 at n = k, as shown in Fig. 5(a) and (b). This is the reason for naming this technique as self-antithetic VR. Finally, the overall variance of RSA can be obtained by the summation of all single-dimensional covariance sums divided by N 2 DCO , as shown in the third line of (11). According to the fact shown in Fig. 5(c), the overall variances (sums of red dots) are reduced from the value of the I.I.D. variance (sums of black dots). Equivalently, the first term in the fifth line of (11), i.e., P 1 ·P 0 /N DCO , equals the I.I.D. variance without any VR, and the second term in the fifth line of (11) is negative to successfully perform VR.
The analysis results demonstrated in Fig. 5 offer further insights: first, a smaller T PRBS,MAX , i.e., a narrower fundamental sampling PDF f 1 (t) creates stronger correlations across all sampling points (i.e., longer correlation tails), longer LCMPs (e.g., 16 points for T/8, 32 points for T/16), and eventually more variance reduction; second, though the sign of each single-dimensional covariance sum can be either positive or negative with VR, the average of all singledimensional covariance sums is always above zero, as shown in Fig. 5(c), which verifies the principles of nonnegative variances and finite measurement resolutions; third, more importantly, the computation efficiency of each theoretical single-dimensional covariance sum and overall variance can be greatly improved by only summing the covariances surrounding the "k = n" within a few LCMPs as follows: where "r" is the number of LCMP included in the approximation of (16). In Fig. 5(c), the red dots represent the accurate single-dimensional covariance sums; the blue dots represent the partial single-dimensional covariance sums formulated by the right-hand side of (16) with r = 1 and k = (n − LCMP/2) to (n + LCMP/2 − 1). Although the red and blue dots at each "n" contain certain amounts of delta, the distribution of the blue dots is relatively constant and basically the average of the red dots except the values of "n" approach 1 or N DCO since the correlation functions of these "n" have significantly unbalanced correlation tail lengths, e.g., Fig. 5(a) and (b). Therefore, any partial single-dimensional covariance sum (blue dot) at any "n" far away from 1 or N DCO is sufficient for approximating the average of the accurate single-dimensional covariance sums formulated by the left-hand-side of (16). For example, by plugging (16) into the third line of (11) with "n = N DCO /2," the overall variance can be approximated as follows: Note that the Cov[Y n , Y k ] in (16) and (17) is referring to its theoretical definition in (15) and annotated as blue circles in Fig. 5(a) and (b), while the all red dots in Fig. 5 are obtained from the statistical simulation results. And, the approximation errors in (16) and (17) can be improved by extending the number of LCMP, i.e., "r," included in the summation operators.
As discussed earlier, if the theoretical computation effort is dominated by the modulo-T circular convolutions, the computation efficiency improvement from (11) to (17) can be evaluated by the ratios between their operation numbers of modulo-T circular convolutions. The covariance matrix in the case of T PRBS,MAX ≈ T/8 (LCMP = 16) with N DCO = 256 is shown in Fig. 6(a). Since the correlation tails of this example are roughly vanished when |k − n| > N DCO /2, as shown in Fig. 5(a), only the covariance elements circled in black need to be considered to estimate the computation efforts for both (11) and (17). For (11), because the Cov[Y n , Y k ] of each "n" is self-symmetrical with respect to "k = n," the covariance elements circled in dashed-red are identical to those circled in solid-red, and covariance elements circled in green are the images of those circled in red. With the property of this symmetric covariance matrix, i.e., Cov[Y n , Y k ] = Cov[Y k , Y n ], all covariance elements circled in black can be fully obtained by only calculating the covariance elements circled in solid-red, as shown in the upper half of Fig. 6(a), and the required number of modulo-T circular convolutions in (11) is expressed in the numerator of On the other hand, to cover two LCMPs for (17), only the covariance elements within one LCMP circled in blue shown in the lower half of Fig. 6(a) need to be calculated, and the other LCMP circled in orange is again the image of those circled in blue, so the required number of modulo-T circular convolutions in (17) is expressed in the denominator of (18). The ratio of modulo-T circular convolution numbers between (11) and (17) under this specific example is about 2.18 × 10 34 . This incredible "computation effort reduction" (not variance reduction), even just for N DCO = 2 8 and two VOLUME 3, 2022 , the theoretical calculation results, i.e., blue circles from (15), and the statistical asynchronous RSA simulation results, i.e., red dots, match well, but the computation efficiency of the theoretical variance with SAVR enabled is actually quite low if relying on the equations from (11) to (15). On the other hand, (17) brings the computation efficiency to a reasonable level but losing the accuracy. Only the Monte Carlo approach (red dots from the statistical software/lab experiments) can simultaneously offer the efficiency and accuracy from the analytical point of view.

B. PRACTICAL CONSIDERATION AND SIMULATION RESULTS
In realistic applications, the SAVR technique has to be effectively applicable to a wide dynamic range of τ /T. to extend the covariance cancellation effect (or widening the Cov[Y n , Y k ] envelopes and correlation tails) without any extra hardware/circuit cost required to enhance the resolution of the PRBS generator since the infinite phase resolution has been taking care of by the inherent circuit/system noise accumulations through the DCO [45], [54]. Second, the efficiency of SAVR degrades with the increase of |τ /T − 0.5| as discussed, but the quantization-noise power of the RSA measurement Var[Ȳ ] stays relatively constant across the entire dynamic range. Third, the blue dots are the theoretical variance approximations based on (17) with covering at least two LCMPs (r ≥ 2) for different T PRBS,MAX scenarios. Even on a linear scale, the theoretical approximations (blue dots) from (17)   results (red dots) from the RSA experimental simulations; this verifies the contribution of (17), which can greatly improve the computation efficiency without compromising the estimation accuracy.
In Fig. 7(c), the variances of asynchronous RSA are plotted as functions of N DCO on the dBW scale with different SAVR settings at τ /T = 0.5 with N EXP = 2 8 . Similar to the cases of the oversampling synchronous RSA in [45], the weak law of large numbers still holds for the asynchronous RSA regardless of the VR technique; also, both theoretical approximation (blue lines) and simulation (red dots) results are well aligned and perform consistent variance degradations at −3 dBW per octave of N DCO or equivalently −6 dBW per octave of effective number of bits (ENOB) [45]. More importantly, the variances at τ /T = 0.5 exhibit −3 dBW per octave of the T PRBS,MAX divisor for all N DCO in both Fig. 7(a) and (c).
The key contribution of SAVR is not only to improve the ENOB, which can be obtained by increasing N DCO anyway, but more significantly improves the conversion rate of the asynchronous RSA measurement. For example, to achieve the same variance of −60 dBW in Fig. 7(c) , is improved by 62 times when the SAVR technique is enabled. Note that this example is at τ /T = 0.5, which has the worst quantization-noise power when SAVR is disabled (I.I.D.) but maximum variance reduction across the entire τ /T dynamic range, as shown in Fig. 7(a). Besides, the required setting of Mod[T DCO,MIN , T] ≈ 0.5·T gives the flexibility of design options for T DCO,MIN , including 0.5·T, 1.5·T, 2.5·T, and so on. If the DCO frequency has reached the limit due to circuit bandwidth or power limitations, then proportionally extending τ and T together can be an alternative way to achieve the fastest sampling option (T DCO,MIN ≈ 0.5·T), which can be simply accomplished by statically scaling the conversion gains of the analog circuits, i.e., K TAC , K VGA , and K DL , through the inherent calibration capability of the RSA measurement system [50]. Meanwhile, the nonlinearities and offsets due to analogcircuit nonidealities and mismatches can be precalibrated by RSA itself without any assistance from extra hardware [45], [49], [50].
Regarding the power overhead, the DCO power consumption with versus without SAVR includes four major factors in the comparison: first, the DCO average frequency (f DCO,AVG ≈ 1/T DCO,AVG ) is almost doubled, i.e., 1/(0.516·T) versus 1/T; second, the per-stage capacitance load (C L ) controlled by the PRBS generator is smaller since it is dominantly scaled with T PRBS,MAX , i.e., T/32 versus T; third, the power of the PRBS digital circuit is doubled with the DCO average frequency, i.e., 0.5 mW versus 0.25 mW; fourth, additional dc power, i.e., 0.2 mW, of the R-2R and current DACs is required for the DCO static reconfigurations plotted by the green and pink lines in Fig. 2(b). Therefore, according to the simulation results and power estimations above (C L ·V 2 DD ·f DCO,AVG + PRBS power + dc power), the overall DCO power roughly stays the same with and without SAVR, i.e., (2.4 + 0.5 + 0.2) mW versus (2.75 + 0.25 + 0) mW = 3.1 mW versus 3 mW. Meanwhile, the power consumptions of the TAC, VGA, VCDL, and edge-combiner are all independent from the SAVR technique, but the dynamic power of the DFF and clock buffers are scaled up with the DCO average frequency. Thus, the TDC power numbers (TAC + VGA + VCDL + edgecombiner + DFF + clock buffer) with and without SAVR are 1.5 mW and 1.3 mW, respectively. Overall, the total RSA-based TDC power, i.e., DCO + TDC = 4.6 mW versus 4.3 mW, is increased only by 7% after enabling SAVR for a 62 times conversion rate enhancement since the DCO power is roughly independent from this technique as discussed. Note that the power estimations abovementioned include one clock generator (i.e., DCO) and one TDC, but in real TCSPC applications one clock generator with proper clock distribution can support multiple TDCs in the pixel array.
Regarding the area overhead of the SAVR technique, the additional hardware includes the R-2R and current DACs for the DCO static reconfigurations plotted by the green and pink lines in Fig. 2(b), which add an extra 25 μm × 25 μm overhead on the top of the DCO active area in a 22-nm CMOS process technology.
Based on the theoretical analysis and experimental simulations of the SAVR technique so far, two practical concerns may be raised. One is about the limitation of SAVR or its conversion rate enhancement. Because it seems that a smaller T PRBS,MAX (or noise energy level) offers more SAVR, and if the minimum achievable T PRBS,MAX is dominated by the single LSB capacitance, which is process-technology dependent as mentioned in Section II, modulated by a 1-bit PRBS, then the system could simply rely on the noise accumulation of inherent thermal noise from the DCO circuit. This idea is theoretically valid, but accumulating such low-energy noise meanwhile induces two issues to offset the benefit of SAVR: first, the sampling PDF f n (t) now requires "n" to be extremely large to form a uniform PDF, as shown in Fig. 4(a) [45], and satisfy the expectation in (7); second, now the values of LCMP also become extremely large for effective SAVR due to its reciprocal relation with the noise energy level shown in (16). Both cause each RSA measurement to require a very large number of N DCO in total, so SAVR turns out to even slow down the conversion rate in this case. The other concern is about whether variance addition (VA) can occur due to a combinational setup of T DCO,MIN , T PRBS,MAX , and T deviating from the one for VR. The answer can be elaborated by an example shown in Fig. 6(c) having the identical parameter settings as those in  Fig. 6(c), the correlation function exhibits a different sign-alternation pattern and envelope from these shown in Fig. 5(a), but the single-dimensional covariance sum is still less than that of the I.I.D. case (i.e., 0.25); equivalently this is VR. However, when T PRBS,MAX ≈ T/2 shown in the upper half of Fig. 6(c), the correlation function contains all positive values, so definitely the single-dimensional covariances and overall variance are larger than those of the I.I.D. case; equivalently this is VA. The asynchronous RSA variance versus dynamic range plots shown in Fig. 7 And, this is the primary downside of the SAVR technique associated with the upside of high conversion rate improvement and negligible extra power cost for an asynchronous RSA measurement system. Fortunately, the variance approximation, i.e., (17) Multiple case studies in this section are summarized in Fig. 7(c), including I.I.D., VR, and VA scenarios as discussed. Note that each red dot for a certain N DCO approximately on the I.I.D. theoretical variance line is obtained from (10) by averaging 2 8 (= N EXP ) gray dots in Fig. 7(c). All theoretical approximation (i.e., blue lines) and simulation (i.e., red dots) results are well aligned and all follow the weak law of large numbers to perform a consistent variance degradation at −3 dBW per octave of N DCO .

V. SUMMARY AND FUTURE WORK
The key signal and circuit parameters are summarized below for enabling the RSA with SAVR technique: τ (converted from t) is the quantity under each measurement; T (converted from t MAX ) is the dynamic range under each measurement; T DCO,MIN is the deterministic portion of each DCO period; T PRBS,MAX is the PDF span of the stochastic portion of each DCO period; the relations among T DCO,MIN , T PRBS,MAX , and T are the key parameters to perform effective SAVR; N DCO is the total number of the samples; N EXP is not involved in the circuit implementation but required for the software/lab experiments to statistically verify the theoretical formulas and the performance estimations of the RSA-based technique.
The comparison of the RSA-based TCSPC systems in [45] and this article is summarized in Table 1 based on the circuit simulations along with the silicon measurement results of multiple state-of-the-art TDC implementations. Note that the primary focus of this article is to comprehensively elaborate the probability theories and implementable mathematical models of the SAVR technique, so the intent of this comparison is to cover a bigger picture of the modern TDC evolution in standard CMOS process nodes, not to show unfair competition by taking advantage of simulation results.
Overall, the SAVR technique enhances the conversion rate by 62 times (up to 120 kHz) with 7% power overhead compared to those of the ordinary RSA-based TDC in [45]. This improvement is also reflected by the energy efficiency of 3.1 pJ/step. To further broaden the potential of RSA in the quantum applications, a couple of VR-related techniques are under investigation. One is to stretch the limit of SAVR, i.e., scaling down the T PRBS,MAX and/or LSB capacitance for the DCO frequency modulation toward the limit of advanced CMOS process technology, which can pretty much offer additional 4 times conversion rate improvement without any extra power consumption. The other is to utilize a well-controlled VR technique with the downside of higher power/area overhead [50]. The forthcoming research and silicon-photonics realization will take the complementary benefits of these VR techniques to boost the conversion rate of RSA up to the range of MHz and accordingly to achieve sub-pJ/step energy efficiency.