A 4.5 Ps Precision TCSPC System: Design Principles and Characterization

With the recent advancements in single-photon detectors, very low-jitter timing systems are required to fully exploit their performance in real applications. In this article, we present the design principles and experimental characterization of a single-channel time-correlated single-photon counting (TCSPC) system, that achieves a jitter down to 4.5 ps FWHM, a peak-to-peak differential nonlinearity of 1.5% LSB and a count rate of 12 Mcps over a nanoseconds full-scale range. These results have been attained by minimizing the different jitter contributions that are introduced at various levels in the whole timing chain, still without trading them off with the other performance parameters. To the best of our knowledge, this work represents the state-of-the-art performance in case of a full-scale range as large as 12.5 ns.

A 4.5 Ps Precision TCSPC System: Design Principles and Characterization Serena Farina , Graduate Student Member, IEEE, Ivan Labanca , Giulia Acconcia , Member, IEEE, and Ivan Rech , Senior Member, IEEE Abstract-With the recent advancements in single-photon detectors, very low-jitter timing systems are required to fully exploit their performance in real applications.In this article, we present the design principles and experimental characterization of a singlechannel time-correlated single-photon counting (TCSPC) system, that achieves a jitter down to 4.5 ps FWHM, a peak-to-peak differential nonlinearity of 1.5% LSB and a count rate of 12 Mcps over a nanoseconds full-scale range.These results have been attained by minimizing the different jitter contributions that are introduced at various levels in the whole timing chain, still without trading them off with the other performance parameters.To the best of our knowledge, this work represents the state-of-the-art performance in case of a full-scale range as large as 12.5 ns.

I. INTRODUCTION
I N THE last twenty years, time-correlated single-photon counting (TCSPC) has become increasingly attractive for a large variety of scientific applications, such as fluorescence lifetime imaging (FLIM) in biology [1], [2], laser ranging (Li-DAR) in free-space or underwater [3], quantum cryptography and quantum optics [4], [5], just to mention a few.The TCSPC technique generally consists in the measurement with picosecond precision of the time interval between a stimulating laser pulse and the detection of a single photon impinging on a light sensor [6].
In this framework, nowadays one of the leading trend in the TCSPC field is the development of experimental measurements with an extremely high timing precision: while a sub-10 ps precision has already been achieved and represents the current state-of-the-art, a jitter below 1 ps still constitutes an open challenge for single-photon detectors.Indeed, it is possible to find in literature various examples of devices that are pursuing this direction, both in the field of single-photon avalanche diodes (SPADs) [11], [12], [13] and superconducting nanowire singlephoton detectors (SNSPDs) [14], [15].To our knowledge, the best result is reported today in the work by Korzh et al. [16], where a timing precision of 2.6 ps FWHM is demonstrated for visible wavelengths by using a niobium nitride SNSPD.
Despite these remarkable results, the exploitation of such high-precision detectors still remains an unachieved objective in real applications, since they do not find fully satisfactory readout systems with sufficiently low timing jitter.Indeed, their typical characterization procedure -as described in [16] -is based on high-bandwidth oscilloscopes, that can not be deployed in practical applications due to their bulkiness and cost.As a consequence, to perform on field measurements, scientists can only resort to TCSPC systems, that can easily become the bottleneck of the system in terms of jitter.
An overview of the best currently-available TCSPC modules is provided in Table I, along with their main performance parameters.These systems are based on two different converter architectures, i.e. time-to-amplitude converters (TACs) or timeto-digital converters (TDCs) [17], [18].In particular, the former are typically characterized by a very high linearity, while the latter can be generally replicated into a multichannel structure.As reported in the table, the Becker&Hickl SPC-150NXX [8] represents the best solution to obtain the lowest possible jitter; yet it can leverage this excellent characteristic only over a full-scale range (FSR) of slightly more than 1 ns.This may pose a strong limitation for applications where a reasonably larger FSR is required.An extremely good jitter performance is also achieved by the Swabian Time Tagger X [10], anyway with the potential drawback of a worse differential nonlinearity (DNL) due to the TDC-based architecture.It is therefore evident that none of the presented systems is capable of achieving an extremely low jitter without either affecting the DNL or restricting the employed FSR.As a consequence, new high-performance TCSPC systems are required, with the necessary characteristics to support the current advancements in the field of single-photon detectors.
In this context, our research is focused on the design of a novel timing module to overcome the aforementioned issues, thus providing a high timing precision along with an excellent DNL over a wide full-scale range.For this very reason, we decided to build our system around the TAC presented in [19], [20], that complies with all the desired requirements.Nevertheless, it is worth noting that the simple usage of a first-class converter does not inherently ensure the expected performance: indeed, most of the other components pertaining to the acquisition and conversion chain can easily play a major role in determining the timing and linearity characteristics.In this article, we present the design of such a system, with a specific focus on the optimization of those elements that can effectively impair the full exploitation of the converter performance.The described guidelines are therefore referred to the selected converter, but they can be in general valid also for other types of converters or other future evolution of the same one.By following the presented analysis and principles, we designed a single-channel TCSPC system with a state-of-the art jitter of 4.5 ps FWHM and a linearity of 1.5 % LSB peak-to-peak over a wide FSR of 12.5 ns (Table I).Moreover, these results are not traded-off with other performance parameters such as the maximum acquisition speed (12 Mcps) and the timing resolution (0.76 ps).Having obtained promising results, we replicated the same structure into an 8-channel module.For the sake of simplicity, in this article the discussion is mostly focused on the architecture of a single-channel.
The article is organized as follows: in Section II an overview of the system is provided; in Section III its main requirements are described; in Sections IV, V, and VI the selection of the elements of the timing chain is illustrated; in Section VII the practical board design is discussed; finally the experimental characterization is reported in Section VIII, and conclusions are drawn in Section IX.

II. TCSPC SYSTEM OVERVIEW
As mentioned before, the core of our system is constituted by a time-to-amplitude converter, surrounded by all the elements necessary to ensure its proper operating conditions.As a first step, it has been therefore required to identify the most suitable architecture for the system.While the final board picture is reported in Fig. 1, Fig. 2 shows the board architecture in terms of its main functional blocks.The structure is subdivided into an analog front-end and a digital section, both constituted by off-the-shelf electronic components, each one selected to ensure the best performance of the corresponding portion of the timing chain.First of all, the single-ended START and STOP signals are routed through two input connectors, followed by high-speed comparators, that are intended to regenerate the signals and to adapt their voltage dynamic range to the input of the TAC.In particular, the START signal features a direct connection to the TAC, while the STOP signal can be connected either directly to the TAC, or through a monostable circuit.The desired connection for the STOP can be selected by wiring the proper bonding to the TAC input.The monostable circuitry was not present in our previous TCSPC systems [7], [21], [22] and has been specifically introduced in this board, in order to reduce the oscillation disturbances that have been observed in the DNL due to the second edge of the STOP signal.The analog output signals of the TAC are then connected to an analog-to-digital converter (ADC), that transmits the digitized values toward a field-programmable gate array (FPGA).The FPGA is intended to perform two main tasks: (i) control the input and output digital signals of the TAC, and (ii) reconstruct the timing histograms, to be then transferred to a personal computer (PC) via USB3.0 connection.The resulting histograms can be visualized through a graphical user interface (GUI) designed in Labview, that allows an easy integrability with most control systems for optical setups.
Finally, it is worth noting that the whole timing chain is enclosed in a mechanical case of size 120x59x31 mm 3 , thus being suitable for applications where tight space constraints can represent a potential issue.

III. PRELIMINARY REQUIREMENT ANALYSIS
Following the definition of the system architecture, it is of utmost importance to analyze the main requirements in terms of TCSPC performance parameters, to correctly drive the selection of the hardware components.Indeed, a typical TCSPC system features multiple figures of merit, that can be in contrast with each others, thus outlining the necessity to find a proper trade-off between them.This concept is better highlighted in Fig. 3, where we illustrate the influence of each component on the main TCSPC parameters.For the sake of completeness, we report hereafter some more considerations about the parameter significance in relation to our specific design, and to the state-of-the-art comparison of Table I.
r Timing precision or jitter: This parameter is generally de- fined as the uncertainty in the time of arrival measurement of an optical signal.In the particular case of our system, the timing jitter can be expressed by the following formula, i.e. by the quadratic sum of all the jitter contributions that are encountered along the timing chain: While the laser (σ 2 laser ) and the detector (σ 2 det ) contributions are determined by the final applications, the system designer can specifically select the comparator (σ 2 comp ), the monostable (σ 2 mono ) and the ADC (σ 2 ADC ) to minimize their impact on the overall jitter.Regarding the TAC, instead, we decided to conceptually split its contribution into an intrinsic jitter (σ 2 T ACint ), that is determined by the TAC architecture itself, and into an extrinsic jitter (σ 2 T ACext ) that is determined by the external operating condition of the circuit.This consideration will be better clarified in Sections IV-A and VIII-A, where we show the impact of the input signal slew rate on the TAC performance.Finally, it is worth noting that the measurement setup itself can affect the jitter performance, as will be explained in details in Section VIII-A.
r Full-scale range (FSR): This parameter is generally de- fined as the maximum time interval that is possible to measure.In this system, the FSR is determined only by the selected TAC.Although lower precision can be obtained with a very small TAC full-scale range, it is important to adopt a FSR large enough for practical applications.In this case, the available ranges are 12.5 ns, 25 ns, 50 ns and 100 ns.
r Timing resolution: This parameter is generally defined as the minimum time interval that is possible to discriminate.In this system, the timing resolution value is determined by the ratio of the TAC full-scale range and the number of available levels of the ADC.Being the FSR already set by the TAC, it is possible to act on the ADC number of bits to increase the resolution.
r Differential nonlinearity (DNL): This parameter is gener- ally defined as the nonuniformity of the time channels with respect to the ideal time bin.From our standpoint, we can identify two major factors affecting the DNL: one directly arising from the ADC and TAC components, and a second one resulting from electric coupling and disturbances to internal signals either in the chip or on the hosting board.More precisely, the former is due a non-homogeneous resolution along the bins of the time axis of the TAC and ADC conversion chain, therefore it is temporally uncorrelated with respect to the START and STOP signals, and it results into a widespread floor over the whole FSR.It is worth mentioning that the ADC represents the main contribution in this case.Instead, the latter is temporally correlated to the START and STOP signals, thus introducing localized oscillations at specific time intervals.A more clear example of these two effects will be provided in Section VIII-C.
r Acquisition speed: In this article, we define this parameter as the maximum conversion rate achieved by a single timing channel.In our system, this is limited by the intrinsic rate of the TAC governed by the conversion and reset times, by the sampling time of the ADC, and by the time overhead introduced by the FPGA in the circuit management.The concept is better expanded in Section V.
r Power consumption: In this article, we define this pa- rameter as the maximum power consumption of a single channel.In this case, the main contributions are introduced by the ADC and the FPGA, while the TAC impact can be considered negligible.Comparing to the current state-of-the-art, in this design we will target a minimum FSR of 12.5 ns as to comply with most applications, a timing jitter below 5 ps FWHM to make it negligible with respect to a 10 ps detector, a timing resolution below 1 ps, a peak-to-peak DNL below 2% LSB, and a count rate in the order of 10 Mcps.This would allow to combine for the first time a state-of-the-art jitter over a wide FSR with high linearity.In the next sections we will explore the elements of the acquisition chain, with a twofold objective: on the one side we provide more detailed guidelines on their selection process, while on the other side we provide an individual characterization of each component, before proceeding to the characterization of the overall system.

IV. SIGNAL ACQUISITION AND CONDITIONING
Fig. 4 shows a detailed schematic of the input front-end, constituted by an input connector, a comparator and a monostable circuit, to acquire and condition the nuclear instrumentation module (NIM) input signal toward the TAC.In the design of such circuitry, we followed three main guidelines: first of all, we selected the components with the lowest possible jitter, as to minimize their single jitter contribution along the START and STOP signal paths; secondly we employed the best possible slew rate for signaling, as to decrease the jitter originated from threshold-crossing [23]; and finally, we tried to minimize the electrical coupling of START and STOP signals, as to eliminate the temporally correlated oscillations on the DNL graph.

A. Connector and Comparator
As illustrated in Fig. 4, the NIM signal is routed through a sub miniature push-on (SMP) connector, that has been chosen thanks to its high transmission bandwidth up to 40 GHz and its compactness.This last characteristic is of great importance in applications where a multichannel system is needed, still with a limited space occupancy.The NIM standard is instead employed to comply with most timing instrumentation, even though it is more prone to crosstalk and disturbances with respect to a differential signal.For this very reason, a high-performance comparator (HMC675LP3E [24]) regenerates the NIM signal into a low-voltage differential signaling (LVDS) one, that is compatible with the specifications of our TAC.To perform a fair comparison, in Table II, we report the main characteristics of the best high-speed comparators available on the market.
While all the comparators possess a negligible timing jitter (0.47 ps FWHM), the HMC675 features a very low overdrive dispersion (1 ps at 100 mV), that is important to avoid a degradation of the timing precision during the switching phase of the comparator input stage.In order to assess the impact of this dispersion, we performed an experimental characterization, directly on the final Printed Circuit Board (PCB), of the incremental jitter contribution added by the comparator, depending on the slew-rate of the input signal.The obtained graph (Fig. 5) indicates that for slew-rates above 0.3 V/ns the added jitter Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Fig. 5. Incremental jitter contribution added by the comparator versus input slew-rate.The graph is obtained by varying the input slew-rate and by calculating the increase in jitter with respect to the overall system jitter, acquired in the ideal condition at high slew-rate (7 V/ns).For slew-rates above 0.3 V/ns the TAC and ADC jitters are dominant, for lower values the comparator introduces a jitter of 2-4 ps FWHM.remains negligible with respect to the overall system jitter, as for instance in the case of a typical NIM signal (1 V/ns).As a consequence, particular attention should be payed only to signals with a limited voltage amplitude.
Regarding the output standard, instead, we opted for the current mode logic (CML) in order to ensure a constant current absorption from the power supply.Indeed, a variable current -as in the case of positive emitter coupled logic (PECL) -can easily induce power supply ripples, with potential detrimental effects on the comparator jitter.The CML output stage is powered with a non-standard 1.6 V supply to readapt the CML dynamic (1.1 V to 1.9 V) to the required LVDS dynamic (1 V to 1.4 V).
To further investigate the signal propagation between the comparator output and the TAC input, we carried out electrical simulations, taking also into account the non-idealities introduced by the PCB transmission lines and the bonding wires (Fig. 6(a)).The simulations are based on the Advanced Design System (ADS) software from Keysight [25], and make use of the input/output buffer information specification (IBIS) models for the comparator output and of the bondwire models directly integrated in the tool.In particular, the comparator output signal is propagated through a differential transmission line with a matched impedance of 100 Ω and encounters the bondwire discontinuity just before reaching the TAC input.We consider here three different conditions for the choice of the line termination: (i) a termination of value 100 Ω placed on the PCB before the bondwires; (ii) a termination of the same value, but directly integrated at the TAC input; and (iii) the removal of such a termination.This last situation is possible without affecting the impedance matching, since the CML output is terminated also at the source.Finally, we model the TAC input with a differential resistor of 8.8 kΩ and a capacitance of 1 pF.
From the resulting graphs (Fig. 6(b)), it can be seen that the usage of an internal termination dampens the oscillations generated from the inductance of the bonding wires at the TAC input, thus causing less disturbances to the conversion process.However, the termination resistor also reduces the output dynamic to 400 mV, consequently halving the signal slew-rate with respect to the non-terminated TAC input.In the design of our system we opted for the internal termination solution, to obtain a compact design also in the case of a multichannel system.Anyway, in Section VIII-A, we better investigate the impact of the termination choice on the extrinsic jitter performance of various versions of the selected TAC.

B. Monostable Circuit
As observed in a previous work [21], the rising edge of the STOP NIM signal can easily induce oscillations on the DNL.Since the duration of the STOP signal is not known a-priori, we inserted an optional monostable circuit along its path as to intentionally vary the length of the pulse.In a practical application, it is therefore possible to move such oscillations out of the region of interest.The circuit -depicted in Fig. 4 -is intended to generate a pair of monostabilized signals and a pair of non-monostabilized signals, to be fed to both the TAC and the FPGA.The desired connection for the STOP can be then selected by wiring the proper bonding to the TAC input.The core of the monostable is constituted by a flip-flop (NB7V52 M from On Semiconductor), that is set by the input pulse and reset after a certain amount of time, defined by an R-C network.To avoid signal integrity issues, the reset signal can not be spilled out directly from the flip-flop output: for this reason the reset is generated from the original input signal by means of a 1-to-4 buffer (SY58020 from Microchip).Finally, the reset threshold is set by a digital to analog converter (DAC), whose output value can be changed via the system control unit.As for the other components, particular attention has been paid to the possible jitter contribution, by selecting components with a timing jitter smaller or comparable to that of the input comparator (< 0.5 ps).By alternatively connecting the monostabilized STOP and the non-monostabilized one to the TAC, we verified that the timing precision of the system is not impaired by the monostable circuit.

V. TAC MANAGEMENT
The core of the conceived timing system consists of a highprecision TAC, whose architecture is discussed in [19], [20].For the correct operation of the converter, an external control unit is necessary: we chose therefore a Kintex-7 FPGA (XC7K160 T) to precisely synchronize the TAC management with the read-out of ADC data.A complete description of the FPGA firmware is already provided in [21]; we limit here our discussion to the interaction between the TAC and the FPGA, and to the optimization of the acquisition speed.
Such interaction is depicted in Fig. 7. Considering the TAC, its working mode is summarized in four main steps: an idle phase, a conversion phase, an hold phase and a reset phase.Once a new conversion is initiated and the conversion ramp is over, the TAC enters the hold state, where the output analog value is maintained constant.After almost 7 ns, the converter issues a STROBE signal and waits for an external RESET pulse before discharging the output capacitor and passing again to the idle state.Referring to [19], [20], the maximum theoretical dead time of the converter alone is expressed by the sum of the maximum conversion time T conv (12.5 ns), the minimum settling time of the TAC output T settl (38.7 ns) and the minimum reset time T res (21 ns).This results in an ideal dead time of 73 ns.
Clearly, with the introduction of the TAC into a complete system, also the ADC and FPGA impact have to be quantified.The ADC additional contribution simply corresponds to the sampling period.As will be better discussed in Section VI, the best available free-running ADC -i.e. the selected onefeatures an acquisition frequency of 125 MHz, giving thus rise to an increment of 8 ns in the worst case.Instead, concerning the FPGA (Fig. 7), its major role consists in the generation of the RESET signal after the asynchronous STROBE event.First of all the RESET generation should be fast enough to avoid introducing any unnecessary dead time overhead, still without prompting the TAC to reset before the end of its settling time.In our system, both the conditions are met, since the minimum delay between the STROBE signal and the consequent RESET is equal to 4 clock cycles.Indeed, considering the 7 ns delay before strobe issuing and the same clock frequency of the ADC, the STROBE-RESET time interval results in almost 39 ns.We can Fig. 8. Traces of the TAC output ramp at different FSRs, digitalized through the ADC.The curves are acquired with various START-STOP delays and aligned with respect to the STROBE, i.e. to the end of the conversion.The insets show a magnification of the curves around the settling time of the TAC output.At lower FSRs (12.5 ns), the settling oscillations are higher and the effective sampling occurs 8 to 9 samples after the STROBE signal.therefore conclude that the FPGA operation is not introducing a significant contribution to the system dead time, that remains equal to almost 81 ns, i.e. to a maximum frequency of 12.3 MHz.This result is perfectly aligned with the current state-of-the-art for TCSPC systems.
At this stage, it is important to mention that the maximum achievable speed can be easily traded-off with the timing precision.If on the one hand, an early sampling of the TAC output ramp would improve the achieved speed, on the other hand a higher dispersion of the analog voltage would appear at the TAC output due to the presence of a settling time.Fig. 8 provides an example of this phenomenon at different FSRs.The reported traces are acquired by directly reading the output of the ADC from the FPGA through the Xilinx integrated logic analyzer (ILA), and are aligned with respect to the STROBE signal.With the smallest FSR (12.5 ns) the curves are spread apart and suffer from more pronounced oscillations; as a consequence the effective sampling point should be moved further away from the STROBE signal.Moreover, it appears evident that, during the development phase, a higher sampling frequency of the ADC allows for a more clear visualization and analysis of the TAC ramp, thus better optimizing the sampling timeline.

VI. ANALOG TO DIGITAL CONVERSION
Finally, the last element of the timing chain is constituted by the analog to digital converter.In this context, the selection of a proper ADC is particularly critical since many of the TCSPC performance parameters will depend on it (see Fig. 3).
In Table III we summarize the characteristics of the most advanced components available to the system designer.A first reference is represented by the AD9252 that has been used in a previous timing system from our same research group [7].However, its major drawback concerns the maximum acquisition Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.frequency, that is limited to 50 Msps, i.e. to a sampling period of 20 ns.As discussed earlier, due to the free-running operation of the ADC, this aspect can easily affect the maximum speed of the system.The selection was therefore restricted to components with a frequency of 125 Msps, thus reducing the sampling period to 8 ns.Among those ADCs, the ones featuring 16 bits would represent the best choice in terms of timing resolution and DNL, but possess a relatively low number of channels.Since the second version of the presented system features an 8 channel structure, we adopted the LTM9011-14 [26] from Analog Devices; thanks to the variable input dynamic this also allows a higher compatibility with potential TACs designed in a more scaled technology node.It is worth noting that this choice is traded-off with a slight increase of the jitter contribution (2.14 ps), that can be anyhow reduced by averaging consecutive samples belonging to the same conversion; clearly, this procedure implies a reduction of the maximum acquisition speed.
With the aim of better highlighting this last aspect, we performed a first characterization of the ADC, to extract its sole jitter contribution (Fig. 9).The measurement is carried out in two steps.First of all, the ADC inputs are short-circuited to exclude the jitter contributions deriving from the upstream components, i.e. comparator and TAC.Secondly, the ADC output words are read out by the FPGA through an integrated logic analyzer (ILA), and sorted into a histogram.The same operation is repeated by averaging a different number of consecutive samples.As can be observed from the graph, the jitter contribution strongly decreases from 3.2 ps to 1.2 ps with a higher number of averaged samples.This second result clearly validate the usage of the LTM90-11 in our system, being its jitter negligible with respect to the one introduced by the TAC.Moreover, the ADC jitter in case of 16 averaged samples, i.e. 1.2 ps, is comparable to the one of the AD9653, that currently represents the best achievable value.If a further improvement of the jitter is required, the AD9653 can be adopted, still accounting for a higher PCB complexity in case of multichannel systems and a higher overall cost.

VII. PCB DESIGN
Once the components have been accurately selected, the last and crucial step consists in the effective design of the PCB.Among the many different aspects, we highlight here the importance of signal integrity and power supply.
Regarding the first one, we selected a six layer stackup as to guarantee enough room for the routing of controlled impedance lines.In particular, the first layer is dedicated to the most delicate analog signals from the input connector to the TAC input through the comparator.These signals require a 50 Ω impedance for the single-ended tracks and a 100 Ω impedance for the differential ones.The second layer is dedicated to the reference ground plane for the analog signals, while the inner layers are dedicated to the power supplies and to the routing of the 500 MHz DDR tracks between ADC and FPGA.All the critical lines have been simulated through the usage of a signal integrity tool (ADS Keysight).As will be better shown in Section VIII-C, one of the issue regarding the routing of the 100 Ω lines between the comparator output and the bonding pads is the need to bring together all the lines in a very small portion of the PCB, as to respect the pitch bonding requirement of the TAC.As a consequence, the little distance between these tracks can easily represent a major source of crosstalk.
Concerning the second aspect, it is of utmost importance to ensure a stable and clean power supply for the TAC, as to avoid the worsening of its performance, especially of the precision.For this reason we employed a power architecture composed by some DC-DC converters to lower the input 12 V supply, and then point-of-load linear regulators, specifically chosen with an extremely low output noise and a high power supply rejection ratio (LT3045 from Analog).These converters are then followed by a filtering network, to eliminate any residual noise.

VIII. EXPERIMENTAL CHARACTERIZATION
After the design phase, the described module has been manufactured and extensively tested, to characterize the parameters Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Fig. 10.Measured timing jitter (FWHM) as a function of the input delay on a FSR of 12.5 ns and a variable number of averaged samples (1, 4 or 8).A jitter of 4.5 ps is achieved with the new system mounting the first version of the TAC (no internal termination).
presented in Section III.For this purpose, two different versions of the same TAC are employed: the first one [19] without any internal termination resistor, and the second one [20] with an internal resistor of 100 Ω.The results obtained with this new system are compared against the performance of the first version of the TAC mounted on a previous TCSPC module [7], as to highlight the importance of the presented guidelines.

A. Timing Precision and Resolution
The first and most critical experimental characterization regards the jitter.For this measurement, the input signals are obtained by splitting up the output of a pulse generator (PM 5786B from Philips), and by delaying the STOP signal with a variable time interval to scan the whole FSR.With respect to the typical setups already used in [7], [21], particular attention is payed here to preserve the input signal integrity, in order to avoid the introduction of additional external jitter arising from the setup itself.For this reason, we resorted to the usage of an active signal splitter instead of a passive one; this new circuit basically consists of a high-speed comparator (HMC675) followed by two differential stages in parallel, constituted by radiofrequency bipolar junction transistors (BFP640).With this solution, it is possible to obtain a START and a STOP signal with falling edges as fast as 100 ps (20%-80%).
In Fig. 10, we illustrate the measured timing precision over a FSR of 12.5 ns with a variable number of consecutive averaged samples from 1 to 8.Besides reducing the ADC jitter contribution, the averaging operation also filters out the white noise introduced by the TAC itself, resulting therefore in a further improvement.In this experiment, three different conditions are considered.First of all, the first version of the TAC is mounted on the previous TCSPC system [7], achieving a jitter of 5.5 ps FWHM in case of the maximum number of averages.Secondly, the same version of the TAC is placed on the new system, observing a remarkable improvement of 20%, i.e. a precision down to 4.5 ps FWHM.To the best of our knowledge, this result outperforms the current state-of-the-art systems over the selected FSR; moreover, it confirms the importance of a good component selection and board design for high-performance TCSPC modules.Finally, the second version of the TAC is mounted on the new system, observing a decrease in the jitter performance.This situation can be ascribed to the presence of the internal termination resistor, that reduces the slew-rate of the For the sake of completeness, further jitter measurements have been carried out in various operating conditions, by leveraging the first version of the TAC mounted on the new system.In Fig. 11(a) the number of averaged samples is increased to 16 and 32: since no significant improvement is observed on the jitter performance, it is possible to conclude that the optimal number of averages should be limited to 8. In Fig. 11(b) the ADC output is sampled 8 or 9 samples after the STROBE signal.As expected from Fig. 8, with an earlier sampling point the jitter performance is impaired, especially at higher delays.In Fig. 11(c) and (d) the ADC jitter contribution measured in Section VI is subtracted from the overall system jitter of Fig. 10.When only a single sample is employed, the ADC impact can not be considered negligible, while at high averages the bottleneck of the system is represented by the TAC.This last result clearly demonstrates the suitability of the module for the full exploitation of even more advanced TACs.Lastly, in Fig. 11(e) and (f) the timing precision is evaluated also for other FSRs: in the particular case of 25 ns, the achieved performance is still competitive with respect to the best TCSPC systems currently available.At this stage, the timing resolution values can be easily inferred from the presented measurements following the same procedure as of [20].The obtained results are aligned with the expectations: 0.78 ps for a FSR of 12.5 ns, 1.59 ps for a FSR of 25 ns and 3.17 ps for a FSR of 50 ns.

B. Acquisition Speed
As discussed before, the maximum acquisition speed and the precision tipically present a trade-off.In Table IV, we report the characterization of the system speed as a function of the number of averages at different FSRs (12.5 ns, 25 ns, 50 ns, 100 ns).All results are reported for the worst case scenario, i.e. when applying the maximum possible input delay for the selected FSR.The maximum achieved speed is 12 MHz, that is aligned with the current state-of-the-art, and represents a remarkable improvement with respect to [7].

C. Differential Nonlinearity
Another important characterization is concerned with the DNL.For this measurement, the START and STOP signals are generated by two uncorrelated pulse sources: the same pulse generator as before (PM 5786B from Philips), and a photodetection module (PD-050-CT from Micro Photon Devices).The ideal response of the system should be a uniform and flat Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.event distribution over the whole FSR.For the sake of simplicity, the presented results refer only to a FSR of 12.5 ns, since the same considerations also apply for the other FSRs.
In Fig. 12, a comparison between the previous TCSPC system [7] and the new one is shown, both mounting the first version of the TAC.Moreover, the application of the dithering technique is highlighted in the two cases: a detailed analysis of this methodology is out of scope for this article, but is described elsewhere [7], [27], [28].
As previously mentioned in Section III, two major effects appear evident on the obtained graphs.Firstly, the widespread DNL floor is present over the whole FSR, that is mainly due to the unequal time bins of the ADC.By employing a different ADC, the new system shows a thinner floor, but with more pronounced peaks in the negative part.As expected, the dithering technique is proven effective in strongly reducing the DNL floor, achieving in both systems a peak-to-peak DNL of 1.5%.This result is comparable to that of the best state-of-the-art systems.Once the dithering technique is applied, a second effect is unveiled, consisting in fast and localized oscillations, that are temporally correlated to the START and STOP signals, i.e. they appear at a fixed location on the FSR.These oscillations depend on electromagnetic couplings that arise along the signal paths, either internally to the chip as in [7], or externally on the board itself.In the subsequent measurements we investigate some possible board impairments on the DNL, by focusing on two specific case studies: (i) the effect of wire bonding at the TAC input and (ii) the effect of track crosstalk on PCB.
For the first case study, in Fig. 13, we illustrate a comparison between a direct bonding solution and an indirect bonding one in case of the first version of the TAC.In the first situation, no internal or external termination resistor is present, and the terminal pads of the 100 Ω differential transmission line are directly wire-bonded to the TAC input.In the second situation, a surface mount resistor with a 0402 package is inserted, and is connected through a double wire bonding: from the transmission lines to the resistor itself, and then to the TAC input.As observed, the oscillations are enhanced in case of a dual bonding path.A precise understanding of the obtained waveform may be object of future studies.
For the second case study, we employed the first version of the single-channel TAC mounted on the 8-channel module, that is a multichannel replica of the board previously described in the article.A picture of the board is shown in Fig. 14(a), along with a layout inset over the signals of interests (Fig. 14(b)).In Fig. 14(c) we report a schematic representation of the exploited channels.The first channel is directly wire-bonded to the START of the TAC, and the same holds for the STOP channel.The second and eighth channels are exploited instead as disturbers: channel 2 is situated far from the STOP signal and is left floating, while channel 8 is close to the STOP and can be optionally terminated through a 0402 SMD resistor.The experiment consists in performing a DNL measurement while activating the disturbing channel at a fixed time distance after the START, following the timeline illustrated in Fig. 14(c).Being the START and STOP signals totally uncorrelated, when the STOP signal appears close to the disturbance, some effects can be observed on the DNL.
In Fig. 15(a) the measurement is performed by activating the disturbance on channel 2. Since this channel is spatially far from the STOP, no oscillations are present on the DNL.In Fig. 15(b) the disturbance is activated on channel 8, thus inducing a first dip and some subsequent oscillations on the DNL.To understand whether the electric coupling is generated from the comparator or from the afterwards transmission lines, we detached the output pins of the comparator from the lines.At this point, no more oscillations appear on the DNL, thus attributing the source of electric coupling to the transmission lines.In Fig. 15(c) only a single pin of the comparator is re-soldered to a single line.In this situation the first dip is reduced, while the oscillations are maintained.Finally in Fig. 15(d) also the second pin of the comparator is re-soldered to the second line and the optional 100 Ω termination is connected.Thanks to the lower signal dynamic, the electric coupling is reduced, with a consequent decrease in the DNL oscillations.In conclusion, in case of crosstalk issues, it is possible to insert a termination resistor to mitigate its effective impact; anyway this choice is traded-off with a slight decrease in the timing performance, as observed in Section VIII-A.

D. Power Consumption
Lastly, we characterized the total power consumption of both the single-channel and the 8-channel modules.The former results in a value of 5.75 W, and the latter in a value of 12 W, i.e. 1.5 W per channel.Thanks to the presence of shared resourcessuch as the FPGA -, scaling up the number of channels implies a decrease of the power required by each single channel.The achieved power consumption per channel is thus very competitive with respect to the current state-of-the-art regarding commercial modules.

IX. CONCLUSION
In this work, the authors discussed the implementation and characterization of a novel high-performance TCSPC system, that overcomes the current state-of-the-art by attaining a timing precision of 4.5 ps FWHM, a peak-to-peak DNL of 1.5 % and a maximum speed of 12 MHz over a wide FSR of 12.5 ns.The design has been analyzed by providing general guidelines for the correct management of a TCSPC timing chain and by applying them to a specific time-to-amplitude converter.The obtained performance is fully satisfactory in regards of all the considered parameters, meaning that the system can effectively sustain the development and exploitation of novel low-jitter detectors in practical applications.Moreover, the achieved results also prove the validity of the adopted approach.In conclusion, to our belief, the presented methodology paves the way to the comprehension and design of even more advanced TCSPC systems, pursuing the direction of the 1 ps jitter quest.

Fig. 1 .
Fig. 1.Picture of the designed TCSPC acquisition board; the main components of the system are highlighted in different colors.The board is inserted into a mechanical enclosure of size: 120x59x31 mm 3 .

Fig. 2 .
Fig. 2. Schematic of the implemented acquisition chain.The input signals are routed through the analog front-end toward the TAC.The subsequent ADC converts the TAC output into a digital word, transmitted to the FPGA.The digital section is then intended to communicate with the user interface to visualize histograms, and apply external settings and commands.Finally, the main connections between FPGA and TAC are highlighted.

Fig. 3 .
Fig. 3. Double-entry table showing the correspondence between components and TCSPC parameters: each component of the timing chain can influence multiple parameters.

Fig. 4 .
Fig. 4. Schematic of the input front-end for START and STOP signals.The NIM input signal is acquired through an SMP connector and regenerated by the comparator.The START signal is directly routed to the TAC input, while a monostable circuit is present for the STOP signal.Both monostabilized and non-monostabilized signals are forwarded to the TAC and to the FPGA.

Fig. 6 .
Fig. 6.(a) Schematic for the simulation of signal propagation between comparator and TAC, including transmission line, bondwires and TAC input model.(b) Left: simulated trace on a single terminal of the TAC input.The simulation is performed with three different termination conditions and a repetition rate of 80 MHz (worst case for the STOP signal).Right: detail of the signal rising edge, showing lower oscillations in the case of internal termination and higher slew-rate without any termination.

Fig. 7 .
Fig. 7. Top: analog output of the TAC in the different operating phases and corresponding time duration.Bottom: handshaking signals between TAC and FPGA.The issued STROBE is firstly synchronized (STROBE SYNC) in the FPGA, then edge detected to generate the RESET signal.Each step takes a maximum of 2 clock cycles, i.e. 16 ns at 125 MHz.

Fig. 11 .
Fig. 11.Measured timing jitter (FWHM) at increasing delays in different conditions, with the first version of the TAC mounted on the new system.(a) With 16 and 32 averaged samples (FSR = 12.5 ns).(b) Acquiring the TAC output value 8 or 9 samples after the STROBE (FSR = 12.5 ns).(c) Including or subtracting the ADC jitter contribution (FSR = 12.5 ns) with a single sample.(d) Including or subtracting the ADC jitter contribution (FSR = 12.5 ns) with 8 averaged samples.(e) With different averages at a FSR of 25 ns.(f) With different averages at a FSR of 50 ns.

Fig. 12 .
Fig.12.Comparison of DNL results between a previous system[7] and the new system over a FSR of 12.5 ns.Both of them mount the first version of the TAC.The effect of the dithering technique is also shown.

Fig.
Fig. Comparison of DNL results between a direct bonding connection from transmission lines to TAC, and an indirect bonding connection through an SMD resistor.

Fig. 14 .Fig. 15 .
Fig. 14.Setup for the measurement of PCB crosstalk effects on DNL: (a) employed 8-channel module.(b) layout inset with signals of interest.(c) schematic of the test structure.(d) signal timeline for test.With the STOP signal in position 2 an interaction with the disturber is observed on DNL.

Serena Farina (
Graduate Student Member, IEEE) received the B.S. degree in biomedical engineering and the M.S. degree in electronics engineering from Politecnico di Milano, Milan, Italy, in 2017 and 2020, respectively.Since 2020, she is working toward the Ph.D. degree in information technology with Politecnico di Milano, working on new solutions and systems for high-speed and low-jitter time-correlated single photon counting.Ivan Labanca received the M.S. degree in electronic engineering from Politecnico di Milano, Milan, Italy, in 2002.From 2002, he has been with the Department of Electronics, Information and Bioengineering, Politecnico di Milano.His research interests include the development of electronics systems for highperformances counting and timing based on single photon avalanche diodes arrays.Giulia Acconcia (Member, IEEE) received the B.S. degree in engineering of computing systems and the M.S. degree in electronics engineering from Politecnico di Milano, Milan, Italy, in 2011 and 2013, respectively.In 2017, she received the Ph.D. degree (Hons.) in information technology.She is currently a Senior Researcher with Politecnico di Milano.Her research interests include the development of integrated electronics for high-performance counting and timing with single photon avalanche diodes.Ivan Rech (Senior Member, IEEE) received the M.S. degree in electronic engineering and the Ph.D. degree (Hons.) in information technology from Politecnico di Milano, Milan, Italy, in 2000 and 2004, respectively.He is currently an Associate Professor with Politecnico di Milano.His research interests include the development of single photon detectors and associated electronics for biomedical, genetic, and diagnostic applications.

TABLE I STATE
-OF-THE-ART OF TCSPC SYSTEMS BASED ON TDC OR TAC

TABLE III HIGH
-PERFORMANCE ADC SELECTION Fig. 9. ADC jitter contribution (FWHM) with different number of averaged samples.The normalized histograms are obtained by collecting the ADC output values with short circuited inputs.A negligible jitter below 2 ps is observed starting from 4 averaged samples.