From Multiphase to Novel Single-Phase Multichannel Shift-Clock Fast Counter Time-to-Digital Converter

With countless applications, time measurements are among industrial electronics' current most important challenges. This is not a matter of precision, which by now standard architectures have brought in the order of picoseconds and therefore at the physical limits of most common detection systems, but to the number of channels in continuous enormous growth. Just think about time-of-flight (TOF)-based applications like 3-D-imaging, time-of-arrival real-time locating systems, TOF positron emission tomography, and so on. This addresses the research on the design of new time-to-digital converters characterized by a huge number of channels and precision compliant with detectors' time resolution. In this context, field programmable gate array architectures provide fully digital and completely programmable solutions meeting the demands for flexible setups and speedy prototyping. The innovative contribution provided by the new counter architecture proposed consists of the reduction of the area required for implementation (only 110 SLICEs), with a consequent increase in the number of channels (up to hundreds in a tiny Aritx-7), much higher than the state-of-the-art multiphase solutions available today and of the new complete generation of the time estimates in real time, all while maintaining a state-of-the-art low power consumption and high resolution (up to 150 ps) and precision (up to 68.4 ps r.m.s.).

Abstract-With countless applications, time measurements are among industrial electronics' current most important challenges.This is not a matter of precision, which by now standard architectures have brought in the order of picoseconds and therefore at the physical limits of most common detection systems, but to the number of channels in continuous enormous growth.Just think about time-offlight (TOF)-based applications like 3-D-imaging, time-ofarrival real-time locating systems, TOF positron emission tomography, and so on.This addresses the research on the design of new time-to-digital converters characterized by a huge number of channels and precision compliant with detectors' time resolution.In this context, field programmable gate array architectures provide fully digital and completely programmable solutions meeting the demands for flexible setups and speedy prototyping.The innovative contribution provided by the new counter architecture proposed consists of the reduction of the area required for implementation (only 110 SLICEs), with a consequent increase in the number of channels (up to hundreds in a tiny Aritx-7), much higher than the state-of-the-art multiphase solutions available today and of the new complete generation of the time estimates in real time, all while maintaining a state-ofthe-art low power consumption and high resolution (up to 150 ps) and precision (up to 68.4 ps r.m.s.).

I. INTRODUCTION
T HANKS to recent developments in electronics, time inter- val measurements have advanced quickly, finding use in a wide range of time-of-flight (TOF)-based applications where time and space are connected by the constant speed of light.Just consider, to cite the most disruptive ones, 3-D-imaging in industry 4.0 for quality inspection and volumetric detection [1], automotive LiDAR scanners for self-driving [2], time-of-arrival real-time locating systems to automatically identify and track objects and people [3], and TOF positron emission tomography (TOF-PET) in diagnostics imaging [4].This means that precisions of hundreds of picoseconds in time must be offered in order to guarantee the precision in space required for the majority of the applications previously stated.For this reason, time resolutions of used detectors are modest and limited to necessary for production cost restraint [5].The time-to-digital converter (TDC), a fully digital type of time interval meter (TIM), is increasingly successful since it offers all the benefits of digital designs [6].Complex mixed signaling is not necessary in this sense, to cite one of the most obvious advantages, principally simplifying the design in terms of engineering and production costs.The TDC can be implemented in an application-specific integrated circuit (ASIC) or, because of its fully digital structure, in a field programmable gate array (FPGA), which is currently a key resource due to its adaptability and extremely low nonrecurring engineering (NRE) costs, both in the R&D and production phases [7].The TDC is most frequently implemented in FPGAs using delay lines or counter systems.The principal architecture of the first type is known as tapped delay-line (TDL) TDC (that is TDL-TDC), while for the other category the shift-clock fast-counter TDC (SCFC-TDC) is the most common one [8].Of course, TDL and SCFC TDCs respond to different needs.In short, SCFC-TDCs accomplish lower resolution but require much less space for implementation than TDL-TDCs, which offer the maximum resolution at the tradeoff of a large area occupied.Therefore, TDL-TDCs are suited for applications needing very high performance in terms of resolution, while SCFC-TDCs for multichannel applications.In TDL-TDCs, the TDLs (Fig. 1) consist of chains of buffers, referred as bins or taps, each one being the unit of time quantization.As a consequence, TDL-TDCs provide resolutions, in first approximation, equal (or bigger) to the minimum propagation delay of the buffers in the technological node of the host device (e.g., ∼ 34 ps in 65-nm FPGAs, ∼ 25 ps in 40-nm FPGAs, ∼ 16 ps in 28-nm FPGAs) [9].The TDL-TDC structure involves the use of considerable resources.First of all, the buffer chains of TDLs are intrinsically highly hardware-consuming, and the presence of process and voltage-temperature fluctuations (PVT) determine mismatches among buffer delays that call for the implementation of a calibration mechanism (that consume hardware) in order to maintain high both resolution and and linearity [10].Furthermore, in order to extend the full-scale range (FSR) and to get resolutions orders of magnitude below the minimum buffers delay value, two additional stages must be added to the basic structure, which are the Nutt-interpolator [11] to cope the former item and the subinterpolator [9], [12], [13] the latter one.Consequently, few channels can only be implemented in a single device due to the complexity of the entire resulting design, i.e., some units in small-size FPGAs (e.g., Xilinx 28-nm Artix7 family [14], [15]) up to few tens in medium-size FPGAs (e.g., Xilinx 28-nm Kintex-7 [16]).However, in high-size FPGAs (e.g., Kintex-7 [17] and Virtex-7 [18]) we can achieve hundreds of channels only if the processing, in order to save hardware and power, is done outside the FPGA.Last but not least, we must take into consideration that the purely asynchronous structure is affected by significant metastability issues on the sampling D-type flip-flops (DFFs) due to setup-and-hold time violations.Indeed, as can be seen from Fig. 1, the D inputs of the DFFs are connected directly, without any synchronization mechanism, to the Start signal and its delayed versions, which are completely asynchronous with respect to the Stop signal present at the CLK inputs.However, the structure does not present any critical issues regarding the skew between signals.In fact, the distribution structure of the clock signal (Stop in Fig. 1) within the FPGA ensures a skew lower than the minimum propagation delay of the buffers [19].Furthermore, the topology of TDL itself prevents the output of the ith tap from being delayed compared to the output of tap i − 1.On the other hand, the SCFC-TDC is a pure synchronous TDC built using several low hardware-consuming counters characterized by a resolution corresponding to the minimum clock period achievable in the technological node of the host device.Therefore, the counters can be driven (Fig. 2) by the same clock period but moved by an offset equivalent to the desired resolution, even though always orders of magnitude below TDL-TDC [20].When it comes to area occupancy, SCFC-TDCs utilize a lot less space than TDL-TDCs, allowing for the implementation of more channels in a single FPGA device, even if small (e.g., Xilinx 6-Series 40-nm Spartan-6 and 28-nm 7-Series Artix-7 families).An evidence of this at different technological nodes (i.e., 65, 40, and 28 nm) and different FPGA's families (i.e., Spartan, Artix, Kintex, and Virtex) is reported in Table I.Scientific literature reports many examples of  the so-called multiphase SCFC-TDCs (MP-SCFC-TDC), [20], [23], [25], [29], [30], [36], [37], [39], [40] in which the clock shifting is obtained by means of one or more phase-locked-loops (PPLs) or mixed-mode clock managers (MMCMs) [19].So, the resolution corresponds to the clock period (e.g., bigger than 1.0 ns in Xilinx 28-nm 7-Series [37]) divided by the maximum number of clock nets that the FPGA can manage (e.g., Xilinx 28-nm 7-Series FPGAs offer 32 clocks nets with at maximum 16 clocks routed in the same clock zone).The high number of clock signals (also known as phases) with relative time skews [41] that must be controlled to maintain the resolution is the weak point of MP-SCFC-TDCs, which increases the system's complexity and portability between various configurations and FPGA types [24], [33], [34], [35], [38], [42].A new single-phase SCFC-TDC (SP-SCFC-TDC) design is introduced to get around this restriction.The proposed SP-SCFC-TDC combine the absence of skew issues of TDL-TDC (due by the presence of the TDL and the distribution structure of the clock signal inside the FPGA) with the synchronous architecture of the MP-SCFC-TDC that, at difference with respect to the classical asynchronous TDL-TDC, minimize the metastability due to the setup-and-hold violations of the flip-flop.The rest of this article is organized as follows: In Section II, an MP-SCFC-TDC used as a reference is presented, and the novel SP-SCFC-TDC is presented in Section III.
The experimental validations of the proposed architecture in 28-nm Xilinx 7-Series Artix-7 35 T (i.e., XC7A35T-1CPG236 C hosted in a Basys3 evaluation board [43]) and in Artix-7 100 T (i.e., XC7A100T-1CSG324C hosted in a Nexys4 evaluation board [44]) FPGAs are discussed in Section IV, demonstrating the possibility of implementing up to 64, 32, and 32 channels in the Artix-7 35 T (5,200 SLICE) and 128, 120, and 112 ones in the Artix-7 100 T (15,850 SLICEs) with resolutions of 625.5, 317.25, and 156.125 ps, respectively, using only 84, 108, and 110 SLICEs per channels and only one clock phase.Both MP and SP SCFC-TDC presented are organized as IP-Cores with an intuitive graphical user interface (GUI) that easily allows the user to set operating parameters, in particular the number of channels and the resolution.

A. Design
According to [20], [23], [25], [29], [30], [36], [37], [39], [40], for hardware saving purpose, the MP-SCFC has not been implemented using the scheme of one N C -bit wide counter per phase (Fig. 2) but, referring to the Nutt-interpolation, we have used only one (N C − 1)-bit wide counter placed side-by-side to N P H (i.e., number of phases) toggle-type flip-flops (TFFs) that are simple 1-bit wide counters.The (N C − 1)-bit wide counter is the coarse counter, while the N P H TFFs are the fine counters.unlike what happens in TDL-TDCs, to avoid setup-and-hold violations in the flip-flops that constitute the sampler, the Async signal is synchronized by a DFF, one per phase, fed with the corresponding clock.By doing so, we have N P H replicas (named Sync 0 , Sync 1 , . . ., Sync N P H −1 ) synchronized to each of the N P H phases.After that, each Sync i (where i ∈ [0; N P H ] and i represents the generic phase) is resampled by the same phase in order to generate OldSync i (one per phase, i.e., OldSync 0 , OldSync 1 , . . ., OldSync N P H −1 ).So, for each phase, the sampling process is concluded with the assertion of Valid i (one per phase, i.e., Valid 0 , Valid 1 , . . ., Valid N P H −1 ) when Sync i = 1 and OldSync i = 0 and the storage of the status of the ith TFF.After that, for avoiding to operate with multiple phases, all the sampled values are moved from the generic phase ith to phase 0 (a.k.a.master-phase) by means of a clock domain crossing (CDC).The coarse counter, synchronous with the master-phase, is sampled by N C − 1 DFFs when Valid 0 = 1.Finally, the timestamp is constructed by combining coarse and fine measurements once data has been translated from thermometric to binary code, having FSR equal to 2 N C • T CLK and resolution (LSB) equal to T CLK /N P H . Now, a multichannel MP-SCFC-TDC can be easily obtained by replicating the MP-SCFC and sampling circuit as many times as inputs are sharing the same MCMM.In addition, the system is set up as a configurable IP-Core enabling its simple usage.

B. Implementation Details
Having selected for implementation a 28-nm Xilinx 7-Series FPGA device, at maximum 16 clock nets can be routed in the same clock zone and 8 clocks can be managed by the MCMM [19].In fact, MMCMs have eight outputs, which would require the use of three of them to get 16 channels, weakening the architecture efficiency due to consequent strong clock jitter.Due to this, 8 N CLK is the maximum allowed (i.e., 16 for N P H ).Moreover, the timing analysis returns 2.5 ns as minimum clock period corresponding to 156.26 ps (i.e., 2.5 ns/16) of resolution.And here the first drawback of this architecture emerges: the technology of the selected host device, in terms of the amount of clock lines and MCMM resources, limits the achievable resolution.
In order to better focus MP-SCFC-TDC features and performance, thanks to the flexibility of the IP-Core, three different implementations have been considered (Table II), which are #1-MP with N CLK = 2 (N P H = 4) and 625-ps resolution, #2-MP with N CLK = 4 (N P H = 8) and 312.5-ps resolution, and #3-MP with N CLK = 8 (N P H = 16) and 156.26-ps resolution.In all implementations, the same FSR equal to 640 ns (i.e., N C = 8) has been set.However, because the resolution value can only go as high as 500 ps, implementations with N CLK > 2 are worthless.This is due to the second drawback of the MP-SCFC-TDC architecture: the time position of the asynchronous Async input with respect to the N CLK clocks cannot be fixed with precision better than hundreds of picoseconds.For example, as shown in Fig. 5, if the skew of the time paths (ΔSkew) between two consecutive phases i and i + 1 is greater than the LSB (i.e., ΔSkew i+1,i > LSB), the signal for that particular phase will  ).This results in the error of sampling "1100" instead of "1101" due to the skew of Async with respect to the last phase.For simplicity, the sampler has been omitted and Async directly samples the phases.
always be sampled at the next clock edge corrupting the fine part of the measure.This second main drawback could be be effectively addressed by utilizing the "ones counter method" [45] and the "phase resort" technique [46], originally designed to resolve similar issues, such as bubble errors, in high-resolution (picoseconds) TDL-TDCs.Both solutions are also feasible for MP-SCFC-TDC; although, due to their high-resolution target, they would result in an excessively complex system for the required resolution (i.e., tens of picoseconds).The ones counter method entails analyzing data for each channel and compensating for nonidealities using an additional module, which could pose challenges when dealing with numerous channels or limited FPGA resources.On the other hand, employing the phase resort technique would necessitate intricate manual routing and/or comprehensive knowledge of implementation delays, potentially leading to a time-consuming process, particularly for multichannel implementations.

III. NOVEL SINGLE-PHASE SCFC-TDC ARCHITECTURE
To overcome the drawbacks of the MP-SCFC-TDC architecture (i.e., the limited number of N CLK and the problem of the skew), we propose a single clock architecture using a chain of buffers that makes up a TDL, one per channel, instead of the MCMM that is common to all channels in the MP-SCFC-TDC.It is possible to observe that SP-SCFC-TDC combine the absence of skew issues of TDL-TDC (intrinsically due by the presence of the delay-line and the distribution structure of the clock signal into FPGA) with the synchronous architecture  offered by the MP-SCFC-TDC.As can be seen from Fig. 6, the output of each individual TFF (green) is sampled by the samplers (identical to those in Fig. 4), both driven by the same clock T CLK .However, the asynchronous signals DelAsync i , unlike in the TDL-TDC, are appropriately synchronized before being sampled by DFFs (white).In this way, SP-SCFC-TDC minimizes the metastability due to the setup-and-hold violations of the flip-flop.Fig. 6 depicts the developed architecture.The Async input is transmitted to the N P H samplers, which are the same as in the MP-SCFC-TDC architecture, using a TDL, while the (N C − 1)-bit coarse counter and only one TFF are supplied with the same clock with period T CLK and phase 0 • (i.e., the master-clock of the MP-SCFC-TDC solution).Fig. 7 presents the waveforms of the main nodes of the proposed architecture (SP-SCFC-TDC), which are not affected by skew, metastability, and are synchronized to the clock, comparing them with those of the MP-SCFC-TDC (synchronous but subject to skew) and the TDL-TDC (asynchronous, hence subject to metastability, setup, and hold time violations).In case of 28-nm Xilinx 7-Series FPGAs, the carry-chain resources (i.e., CARRY4) on the device are used to implement the TDL.Precisely, a proper number of CARRY4 resources (N CARRY4 ) are cascaded to give a TDL composed by N TAP taps (i.e., N TAP = 4 • N CARRY4 ) with total delay a bit bigger than T CLK .So, considering that each tap has a propagation delay t p comparable to the minimum one offered by the technological node (i.e., 16 ps in 28-nm devices), it is possible to choose at which step ΔN TAP to extract the N P H delayed Async, called DelAsync i (one per phase: DelAsync 0 , DelAsync 1 , . . ., DelAsync N P H −1 ) to be sent to the samplers.Thus, considering that ΔN T AP taps must introduce a delay equal to the desired LSB, we have to set ΔN TAP = LSB/t p and, obviously, N TAP ≥ T CLK /t p .In this design, the choice of N TAP = 256 (i.e., 256 > 2.5 ns/16 ps 157) corresponds to N CARRY4 = 64.
However, because it was not intended to be a specific timing resource, the CARRY4 primitive displays PVT fluctuations (as predicted in Section I) in the time delay between the four taps inside the same CARRY4 and to other CARRY4 stages.The propagation delay for the 28-nm Xilinx 7-Series, shown in Fig. 8, can vary by nearly 300% from the mean value (i.e., ∼ 16 ps), corresponding to a maximum of ∼ 50 ps (i.e., ultrabin) and a minimum of ∼ 1 ps.The temporal length of ΔN TAP , a consequence of the central limit theorem, remains almost constant and numerically tends to ΔN TAP • t p , even in the presence of considerable dispersion in propagation delays, provided that ΔN TAP is sufficiently large.In Section IV, this claim will be experimentally confirmed.We would like to highlight that the proposed SP-SCFC-TDC employs a merging method for the taps.Similar but more hardware-intensive methods, designed for achieving high-resolution (picoseconds) in TDL-TDCs, are proposed in [47] and [48].However, our proposed solution offers a simpler and more resource-efficient implementation, making it suitable for multichannel applications on smaller FPGAs with resolutions in the tens of picoseconds.In analogy with the MP-SCFC-TDC IP-Core introduced in Section II, the ΔN TAP and not the N CLK fixes the resolution, the FSR is equal to N C • T CLK and the LSB is equal ΔN TAP • t p .In this way, the resolution of the proposed SP-SCFC-TDC is no more constrained by the clock resources (i.e., LSB MP = T CLK /N P H ) but only depends on the taps propagation delay (i.e., LSB SP = ΔN TAP • t p ).Second, the problem of the skew is also eliminated, since the sampling is performed along the sequence of buffers and the skew between two samplers is negligible thanks to the structure of the CARRY4 stage.

IV. MEASUREMENTS
Experimental comparisons are made between the proposed SP-SCFC-TDC and systems that are currently available in literature, and with the reference MP-SCFC-TDC.In particular, Section IV-B compares the three implementations of MP-SCFC-TDC, i.e., #1-MP, #2-MP, and, #3-MP (Section II-B and Table II) with the correspondent SP-SCFC-TDC implementations named #1-SP, #2-SP, and #3-SP.All the TDCs have the same T CLK equal to 2.5 ns and FSR equal to 640 ns (i.e., N C − 1 = 8 − 1).Finally, Section IV-C compares MP-SCFC-TDC and SP-SCFC-TDC architectures with systems available in literature at the state of the art.While Section IV-A reports the measurement setup and the methods used to calculate the figure of merits.

A. Measurement Setup
A host computer processes the timestamps from the MP-SCFC-TDC and SPSCFC-TDC multichannel (N CH ) implementations in order to determine the resolutions, precisions, and respective differential and integral nonlinearity (DNL/INL).The clock signals necessary to the TDCs are provided by the oscillator available on the Basys3 and Nexys4 boards used as support hardware, while the Async signals are created by an external function generator.The setup layout is graphically represented in Fig. 9.The standard deviation (σ i0 ) of the distribution of the differences between timestamps of the ith channel and of channel 0 is referred as channel-to-channel precision, i.e., σ i0 = √ 0 ; it mathematically correspond to the contribution of the single-shot precision of channel ith (σ i ) and channel 0 (σ 0 ); i.e., σ i0 = √ To determine the DNL and INL of the generic ith channel, a code-density-test (CDT), i.e., the histogram of the occurrence, over the fine part of timestamps is performed [49].In this manner, an estimation of the propagation delay of the N P H phases is performed normalizing the histogram by the dynamic-range of 2.5 ns offered by the fine part (i.e., T CLK ).The average of the time duration of bins of the CDT is assumed as index of resolution.While bin-by-bin relative DNL and INL curves are derived by differentiation of the CDT, for the DNL, and successive integration, for the INL.Furthermore, according to measure theory, the generic ith channel precision is LSB i / √ 12 if jitters and linearity errors are negligible (i.e., the ENOB of the system coincides with the LSB), which means that σ i0 = √   light of the fact that all channels, roughly speaking, have the same resolution LSB (i.e., LSB i LSB 0 ), we can assume σ i0 = LSB 2 /6 as channel-to-channel precision.

B. MP-SCFC-TDC versus SP-SCFC-TDC
We must set the ΔN T AP value of the SP-SCFC-TDC to have nearly identical LSB values in order to compare MP-and SP-SCFC-TDCs fairly.By setting ΔN TAP = 39, the LSB value for #1-SP is in theory 39  III displays the resources used by the two architectures in the selected target device and in some others belonging to different families.
2) Linearity and Resolution: Linearity and resolution of #1-SP, #2-SP, and #3-SP have been investigated through the CDTb etween the N P H phases composed by ΔN T AP taps.Measured CDTs and bin-by-bin relative DNL and INL curves are shown in Fig. 10.Table IV summarize the measured resolution or LSB (Meas.LSB), the expected one (Exp.LSB) and, the DNL/INL errors (i.e., the maximum magnitude of the bin-by-bin curves).The conclusion in Section III, which states that if the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE V MP-SCFC-TDC VERSUS SP-SCFC-TDC PRECISIONS
dispersion of propagation delays is high and ΔN TAP is sufficiently great, ΔN TAP , as a consequence of the central limit theorem, the time duration of ΔN TAP (i.e., the Meas.LSB) remains almost constant approximating the Exp.LSB to ΔN TAP • t p , is supported by the data in Fig. 10.Because the propagation delay demonstrates a maximum dispersion (ultra-bin) that is negligible compared to the LSB, it is feasible to operate in a hardware-saving mode without calibration while maintaining low relative DNL and INL.
3) Precision: Considering both expected and measured LSBs, i.e., Exp.LSB and Meas.LSB, respectively, (with LSB MP = T CLK /N P H and LSB SP = ΔN TAP • t p ), Table V displays the average values of the singe-shot (σ) and channelto-channel precisions (σ i0 ) measured in the various implementations and the related LSBs.The proposed SP-SCFC-TDC's advantage to the MP-SCFC-TDC is thus made clear.In fact, the observed LSB value of 500 ps corresponds to the measured accuracy limit of 204 ps r.m.s. in MP-SCFC-TDC implementations (see Section II-B).
4) Temperature Fluctuation: Unlike the MP-SCFC-TDC, where the MCMM/PLL generating the phases incorporates analog temperature compensation mechanisms, the delay-line of the SP-SCFC-TDC is uncalibrated and uncompensated.As a result, the various taps are subject to fluctuations induced by temperature variations, leading to an LSB dispersion.Consequently, the percentage dispersion of the LSB for #1-SP, #2-SP, and #3-SP was characterized by varying the FPGA temperature between 20 °C and 80 °C with 5 °C steps using a climatic chamber.The results, depicted in Fig. 11, reveal that #1-SP and #2-SP exhibit a dispersion lower than 6% (i.e., 37.5 and 19 ps in absolute value over an Meas.LSB of 625.5 and 317.25 ps respectively), while

C. State of the Art
Table VI compares available MP-SCFC-TDC implementations at state of the art and the #1-MP in Artix-7 100 T, which represents the best compromise between timing performance (i.e., LSB, DNL/INL, and precision) and area occupancy (i.e., number of channels and number of slices available in the FPGA).Table VII compares the proposed SP-SCFC-TDC with available SCFC-based TDC architectures, in which the drawbacks of the SCFC are solved using custom routing [34], [38] (first block of elements in the table), constraints [24], [33] (second block), and use of SERDES [35], [42] (third block).Moreover, hardwareconsuming but high-resolution (picoseconds) subinterpolating architectures, e.g., PLL delay-matrix TDC [50] (997 SLICEs, LSB of 17.73 ps in a medium-size Virtex-6 FPGA), can be adopted in order to increase the resolution of the SCFC-based TDC up to few picoseconds.However, these subinterpolating techniques make it possible to implement hundreds of channels only in medium/high-size FPGAs.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

V. CONCLUSION
The performance requirements for time interval meters are continuously increasing, both in terms of precision and the ability to integrate numerous channels (up to hundreds) on a single device.The MP-SCFC-TDC architecture excellently meets these requests, but having structural issues (i.e., the limited phases N P H and the problem of the skew) that limit the resolution at 500 ps if no hardware-consuming architectures (e.g., the ones counter method, the phase resort, and the PLL delay-matrix, just to cite the most effective) are introduced.This was the motivation that led to develop and propose the novel architecture SP-SCFC-TDC, which overcomes all drawbacks of the MP-SCFC-TDC by granting both high performance and low area occupancy.The proposed SP-SCFC-TDC, similar to TDL-TDCs, utilizes a delay-line and a single clock distributed through the clock resources of the FPGA to quantize the temporal events for digitization, thereby limiting the effects of skews.However, unlike the purely asynchronous nature of TDL-TDCs, it adopts the sampling techniques of MP-SCFC-TDC, which is a synchronous structure.This design choice minimizes metastability issues that may arise from violations of setup-and-hold times.For validation purpose, three implementations (i.e., #1-SP, #2-SP, and #3-SP) in 28-nm Xilinx 7-Series Artix-7 FPGAs (i.e., Artix-7 35 T and Artix-7 100 T) have been performed, having, respectively, resolutions of 625.5, 317.25, 156.125 ps and single-shot precision of 180, 92.3, and 68.3 ps r.m.s.In addition, the suggested architecture's relatively low resource usage (84 SLICEs for #1-SP, 108 SLICEs for #2-SP, and 110 SLICEs for #3-SP) enables the implementation of up to 32/64 channels in Artix-7 35 T (5 k SLICEs) and 112/128 channels in Artix-7 100 T (15 k SLICEs).Last but not least, DNL and INL are always lower than the 26.4% of the LSB.The suggested TDC is also designed as an IP-Core, making it simple for the user to modify each operating parameter separately.'

Fig. 5 .
Fig. 5. Example of skew error in the MC-SCFC-TDC considering N CLK = 2 (N P H = 4).This results in the error of sampling "1100" instead of "1101" due to the skew of Async with respect to the last phase.For simplicity, the sampler has been omitted and Async directly samples the phases.

Fig. 8 .
Fig. 8. Experimental measurement of the distribution of the propagation delays in a TDL (N TAP = 256) implemented in 28-nm Xilinx 7-Series device.

Fig. 9 .
Fig. 9. Block diagram (rigth) of the experimental setup photo with Basys3 (left).To minimize the number of connectors, the channel 0 is directly connected to the function generator and all the other N CH − 1 channels to a delayed replica.

1 )
• 16 ps = 624 ps and 625.5 ps measured, which is compatible with the #1-MP LSB equal to 625 ps.Similarly, ΔN TAP = 20 in #2-SP corresponds to theoretical LSB equal to 20 • 16 ps = 320 ps and 317.25-ps measured as compared to 312.5 ps of #2-MP LSB.Finally, ΔN TAP = 10 in #3-SP determines theoretical LSB equal to10 • 16 ps = 160 ps and 156.125 ps measured as compared to 156.25 ps of #3-MP LSB.Hardware Occupancy: First of all, we have investigated the hardware occupancy of MP-SCFC-TDCs and SP-SCFC-TDCs, which differ, as Figs. 4 and 6 show, by N P H − 1 TFFs per channel and for the presence of one MCMM in MP-SCFC-TDCs replaced by 64 CARRYs per channel in SP-SCFC-TDCs.Table

TABLE I OVERVIEW
OF TIMING PERFORMANCE (I.E., LSB AND PRECISION) AND AREA OCCUPANCY (I.E., NUMBER OF CHANNELS IMPLEMENTED) IN TDL AND SCFC TDCS AT DIFFERENT TECHNOLOGICAL NODES AND FPGA FAMILIES (I.E., SLICE REPRESENTS THE TOTAL AVAILABLE UNITS IN THE DEVICE TO PROVIDE A COMPARATIVE METRIC)Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
20 /12, where LSB 0 and LSB i represent the resolutions of channels 0 and ith.

TABLE VII COMPARISON
BETWEEN STATE-OF-THE-ART SCFC-BASED TDC ARCHITECTURES IN LITERATURE AND THE #1-SP, #2-SP, AND #3-SP IMPLEMENTATIONS OF THE PROPOSED SOLUTION