A Highly Linear and Flexible FPGA-Based Time-to-Digital Converter

Time-to-Digital Converters (TDCs) are major components for the measurements of time intervals. Recent developments in Field-Programmable Gate Array (FPGA) have enabled the opportunity to implement high-performance TDCs, which were only possible using dedicated hardware. In order to eliminate empty histogram bins and achieve a higher level of linearity, FPGA-based TDCs typically apply compensation methods either using multiple delay lines consuming more resources or post-processing, leading to a permanent loss of temporal information. We propose a novel TDC with a single delay line and without compensation to realize a highly linear TDC by encoding the states of the delay lines instead of the thermometer code used in the conventional TDCs. The experimental results show our states-based approach achieves an improved Differential Non-Linearity (DNL) of [-0.998, -1.533] for time resolution of 5.00 ps, [-0.44,0.49] for 10.04 ps, [-0.16, 0.19] for 21.65 ps, [-0.10, 0.11] for 43.87 ps, [-0.06, 0.07] for 64.12 ps, and [-0.07, 0.05] for 87.73 ps, whilst no empty bins have been observed. To our knowledge, the achieved raw linearity together with the zero empty bins and a simple delay line structure exceeds previously reported of the FPGA-based TDCs.

high-speed analogue-to-digital converters (ADCs) to digitize the analogue input signals. These ADCs are limited by their gain-bandwidth and aperture jitter from the process technology and introduce the inevitable quantization and thermal noises during the conversion stage of the TDC [13]. Generally, digital TDCs are less sensitive to process technology and temperature than their analogue counterparts [1], [14] due to the robustness to analog noise and signal drift. A digital TDC can be constructed with a Vernier oscillator or a delay line [8], [15]- [19]. Vernier TDCs utilize the difference between the phase of their oscillators to measure the intervals, which are limited by their conversion rate due to numerous clock cycles for one measurement [15]. A digital TDC with a delay-line makes the conversion at every clock cycle, leading to higher efficiency than the Vernier method.
Delay line based TDCs can be implemented in Application-Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs). An ASIC-based TDC requires dedicated implementation, which increases design iteration cycles and reduces the flexibility for the TDC to be utilized in various instrument configurations [5], [8], [20]- [24]. FPGAbased TDC provides an alternative approach with a shorter development cycle for different requirements [14], [15], [25], [26]. Recent developments in FPGA have also enabled the opportunity to implement high-performance digital TDCs. Vernier delay lines (VDLs) and Tapped Delay Lines (TDLs) are the main architectures used in the Digital TDC [20], [27], [28]. TDL-based TDCs consist of a simpler structure with a faster conversion rate than VDL-based TDCs, making them more popular in recent studies [20], [29].
There has been a continuous development of FPGA-based TDL-TDCs in the past two decades. In 1997, a direct timeto-digital converter based on tapped delay lines was proposed. The achieved resolution was 200 ps with the Differential Non-Linearity (DNL) of [-0.47, 0.47] [30]. In 2000, a TDC with the Least Significant Bit (LSB) of 110 ps and DNL of 1.88 LSB was published [31]. In 2006, a TDC with Look-up Table  (LUT) as delay elements with a resolution of 65.8 ps and DNL within [-0.953, 1.051] [27]. In 2009, a 17 ps TDC using carry logic with DNL [-1, 3.55] was reported [16]. In 2013, a TDC with the sub-20 ps resolution and a DNL of [-1, 1.5] was implemented [8]. In 2016, a dual-phase TDL-TDC was reported, and the effects of the clock skew were discussed in detail [14]. In 2019, the impact of the manufacturing process on the root mean square (RMS) of a sub-TDL topology was studied [29].
The twenty years' development, as shown in Fig. 1, indi-arXiv:2107.13053v4 [eess.SY] 10 Nov 2021 [30] [30] [31] [31] [27] [27] [16] [16] [8] [8] [11] [11] [14] [14] [26] [26] [29] [29] [2] [2] cates a clear improvement in the resolution from 200 ps to sub-10 ps. However, the raw linearity of TDC, which is the linearity of raw data prior to any compensation method, has not improved over time. Multichain topology and post-processing are the two main methods previously used for the optimization of linearity. Multichain topology uses parallel delay lines to obtain an average of the bins [14], [25], [32]. This method is able to achieve small and uniform bins, however, it increases the complexity and leads to higher utilization of hardware resources. The post-processing method, which has appeared in recent studies, calibrate and compensate for the raw data to improve Differential Non-Linearity (DNL) and Integral Non-Linearity (INL) [2], [29], [33], [34]. However, due to the random variations in the clock skew and process mismatch, the misplacement of time intervals in the raw TDL output remains unpredictable, leading to a permanent loss of temporal information.
A traditional TDL-based TDC uses the thermometer code to encode time intervals. In an ideal delay line, there is no gap between the codes. However, in a real delay line, there may be one or more gaps, as shown in Fig. 2. In this figure, T 1, T 2, S1, S2 represent the propagation delay of each route. If a Stop edge occurs in the proximity of D 0,3 , then ideally, D 0,0 to D 0,3 are expected to have the same logic states while D 0,4 to D 0,15 have the opposite logic state. However, when T 1 < T 2 and S1 = S2, then D 0,2 will have the same logic state as D 0,4 to D 0,15 , hence, a gap appears in the real code resulting to a difference from the ideal code. The gap leads to a bubble error in traditional TDCs which use the ideal thermometer code. The variations of T 1, T 2, S1, and S2 are mainly due to the effects of the clock skew and process mismatch, which are unavoidable in electronic systems.
The output of the delay units representing the combined results of all the delay effects, including clock skew and process mismatch, are referred to as real states, which are the states of the internal carry chain. We propose a novel TDC which interprets and sequences the real states from the TDL and encodes the TDC with the real states instead of using the ideal thermometer codes. This method minimizes the   multiple effects of clock skew and mismatch on the TDL and contributes to a predictable and flexible raw linearity. Proof-ofconcept experiments demonstrate that the state-based approach achieves a high raw-linearity and significantly reduces the empty histogram bins while using only a simple delay line.

II. PROPOSED METHODOLOGY
A 16 nm Xilinx UltraScale+ MPSoC FPGA is utilized as the platform to implement the TDC and demonstrate the efficiency of the states-based approach. The block diagram of the TDC is shown in Fig. 3. A Mixed-Mode Clock Manager (MMCM) is applied to provide a 600 MHz clock as the Start signal for the TDL. The 100 MHz Stop signal used in the time interval measurement is also generated from the MMCM, which is externally controlled by a Personal Computer (PC) through a Gigabit network and an embedded driver within the processor part of the FPGA. Random signals generated from a Single-Photon Avalanche Diode (SPAD) [35] are used as the Stop signal in the code density test. The TDC consists of both coarse code and fine code. The coarse code counts the cycles of the Start signal while each cycle covers the measurement range of the fine code. Two interleaved histograms are generated on-chip and transferred by a Gigabit network through Video Direct Memory Access (VDMA). The main abbreviation and nomenclature applied in this paper are shown in

A. The states-based TDC
The proposed TDC is constructed using Configurable Logic Blocks (CLBs) which are the most widely available resources in the Xilinx UltraScale+ MPSoC FPGAs. Each CLB is composed of a carry chain with multiplexers, XOR gates and D-Flip-Flops (DFF). The traditional TDL-based TDC applies the ideal thermometer codes for encoding and assumes all gates in the CLBs are identical. Additionally, it assumes that the DFFs sample propagation states simultaneously. However, as Fig. 2 illustrates, the real states contain unavoidable gaps which vary with the ideal thermometer code. The generated timestamps from the ideal code cannot represent the actual time, which leads to reduced linearity. In this work, all the blocks in Fig.3 are based on a Xilinx Zynq UltraScale+ xczu3cg-sfvc784-1-e FPGA. In the implementation, real states which are the outputs of the CLBs are collected and used to encode the timestamps. As the Start signal (in Fig. 2 ) propagates through the CLB, the DFF registers each state with a binary code. Each real state is considered as a bin, while the relative propagation of each state is the bin width. Fig. 4 shows the two steps required to collect and apply the real states. In step 1, all states were collected directly from the TDL, while an iterative algorithm, shown in (1), was applied to sequence these states. In (1), D c,b is the logic state of the CLB as shown Fig. 2, while b is the position of D c,b in the CLB, c is the position of the CLB in the delay line, and 28 CLBs are used in total in this work. Each state has a Seq value which is the reference for the alignment of the states. In cases when two states have the same Seq according to (1), for 5 ps time resolution, we chose to keep the two states independently to ensure high resolution and sort them according to the latest 1 (High) to 0 (Low), or 0 (Low) to 1 (High) position within the state. In other time resolutions (10.04 ps, 21.65 ps, etc.), the adjacent states with same Seq are combined together to improve the time interval measurement. These sequenced states were then applied to the Encoder to generate timestamps, as shown in Fig. 4(a). In step 2, the TDL was connected back to the updated Encoder for the histogram generation and data output. The code density tests are performed with the real states for the bin width calculations. Although these tests can reveal the empty histogram bins, they do not reveal the missing codes, both of which contribute to the non-linearity. A missing code is defined as the missing output of the CLB, which are not recorded by the Encoder, hence, they do not appear in the histogram. The missing code occurs when the TDC input does not have a matched real state in the Encoder, hence, the time input is not recorded and may lead to an blank histogram. The code density test is performed by applying uniformly random pulses to the TDC as Stop signals. The Encoder inside the TDC includes the output of the CLBs, which represents the states of the Stop signals. It encodes these states into unique numbers and then writes them to the histograms. If a state was not included in the Encoder, it could not be recorded in the histogram, which means that state is missing. The time interval measurement is a method to identify missing codes by searching for the blank histograms, which contain all zero bins. Furthermore, the time interval measurements verify the sequential order of the real states and the functionality of the TDC.

B. Half-sized TDL and histogram generator
A 600 MHz signal with a duty cycle of 50% is generated by the MMCM as the Start signal of the TDC. When a Stop event occurs, the output of the CLBs, which is the sampled Start signal, is considered the Fine Code. The Low Scale and High Scale are interleaved coarse counters to extend the measurement range, based on the fine code measurement range of 1.667 ns. The 90-degree phase offset of the coarse counters with the Start signal is used to avoid a timing race condition. Typically, one cycle of the Start signal equals the propagation delay of the TDL. However, if the Start signal has a duty cycle of 50% and half of the signal detected, the remaining half is redundant to be detected.
Under this consideration, we shortened the TDL to be half cycle of the Start signal. Fig. 5 is an ideal example demonstrating how a half-sized TDL samples every possible Start signal and generates the states. With the half-sized TDL, which is 833.5 ps in length, the example Start signals such as Case 1, Case 2, and Case 3 shown in Fig. 5 can be uniquely recorded without the requirement of the other half part, which leads to a 50% reduction of the resources for the delay line.
An on-chip histogram generator was applied, with histogram A and histogram B in an interleaved configuration for continuous data collection and transfer. As shown in Fig. 6, the duration of the histograms is set by the integration time of the TDC which can be changed based on user requirements. The interleaving of histograms A and B eliminates the deadtime and ensures a continuous time-to-digital conversion. Each histogram includes 1200 bins with 16 bits depth. The Sync is the output trigger which is used to synchronize with the external devices such as a laser driver in a LiDAR system.

C. Relative Standard Error-based bin configuration
The propagation delay of each state represents the minimum bin widths which are calculated from the code density test [29]. The bin widths can be combined for different time resolution requirements. The process is referred to as bin configuration. As shown in Fig. 7, the bin widths of the real states are prepared for a two-pass bin configuration as In the second pass, the first bin in each group is moved to the previous group for the fine configuration. The calculation was performed according to (3). If SP is positive, then the first bin in each group such as s[n + 1] would be attached to the group ahead, while s[n + 2] would be selected as the starting bin for the following combination with the value of SP in (3).
Hence, the Second Pass, SP , increase the accuracy of the First Pass. The metric for evaluating the configurations after the second pass is the Relative Standard Error (RSE). As shown in (4) and Fig. 7, N is the total number of groups, while W [i] andW are the individual and average width of the groups, respectively. The average width of the groups is referred to as the time resolution of the TDC, which is also referred to as Least Significant Bit (LSB).
Each RSE corresponds to a bin configuration, while the bin configuration with the least RSE is chosen to simulate the DNL and INL. Based on the simulation, the bin configuration can be written to the encoder of the TDC for real DNL and INL calculation. This approach leads to a fast and reliable method to predict linearity before implementation.

III. EXPERIMENTS AND RESULTS
The efficiency of the proposed approach is presented in this section, which includes the experiments and results of RSE-based bin configuration, code density test with both fine and coarse code, and long-term time interval measurement. A Single-Photon Avalanche Diode (SPAD) sensor is used as the event source to provide random signals for the code density test [35].

A. Bin configuration
The reason for using RSE as a metric for the characterization of the TDC is to determine the best bin configuration, which leads to the highest linearity. The bin widths of the states used for the RSE simulation are obtained from the earlier code density test. According to the analysis in the RSE-based bin configuration described in Section II-C, the bin configuration starts from the value set of ref , representing the desired bin width which is then incrementally increased from  The bin configurations which has the smallest RSEs represents the best linearity for that time resolution. The RSE is based on (4), which is the ratio between the standard error and the average bin width (i.e. the time resolution). When the time resolution decreases from 5 ps to 100 ps, both standard error and the time resolution increase. However, the lowest standard error does not increase at the same rate of the time resolution, hence, the lowest RSE improves with decreasing time resolution from 5 ps to 100 ps.
In order to verify the accuracy of the RSE simulation, six representative time resolutions 5.00 ps, 10.04 ps, 21.65 ps, 43.87 ps, 64.11 ps, 87.73 ps from Fig. 8 were selected. The experimental results show the selected time resolution from the simulation and the FPGA implementation, which have a relative variation within 15%. Based on the bin configurations from the RSE simulation, code density test and time interval test were performed for the six representative time resolutions.

B. Code density test
Code density test is one of the primary methods to measure the linearity of the TDC. It uses a random signal source to achieve a uniformly distributed histogram to calculate DNLs and INLs. In this experiment, a SPAD is exposed to constant intensity to generate random pulses [36]. The Start signals from an oscillator and the Stop signal from the SPAD contribute to a uniform distribution in the histogram. The random pulses from the SPAD, enable histogram frame rate to reach over 1000 frames per second with each frame over 10000 counts, which is efficient for the DNL and INL measurement.
The DN L i and IN L i are calculated for each group of bin configurations using (5) where C i is the count number of the individual group, which represents the relative group width. The C Avg equals to N i=1 C i /N where N is the number of the groups.
After the simulations of DNLs and INLs based on the bin configurations, the six selected time resolutions were implemented to the FPGA. The comparisons of the simulation and implementation results are shown in Fig. 9. The implementation results are in close agreement with the simulations of DNL and INL, which both improve as the time resolution decrease. Fig. 9 show that the relations between the time resolution (LSB) versus the linearity follow a similar trend to the time resolution versus RSE.
Based on the fine code test results shown in Fig. 9, the initial measurement range of 1.667 ns was expanded to 5 ns for the selected 5.00 ps time resolution and a range of 8 ns at the remaining time resolutions, while the histogram records a maximum of 1200 bins. As Fig. 10 shows, the long-range (5 ns and 8 ns) DNL and INL maintain similar linearity as the short-range (1.667 ns). No empty histogram bins have been observed in all selected time resolutions including 5.00 ps, 10.04 ps, 21.65 ps, 43.87 ps, 64.11 ps, and 87.73 ps. Other than this, the traditional thermometer method were applied to make a comparison with the proposed method as shown in Fig. 10 (a)

C. Time interval measurement
Time interval measurement is an essential method to determine the functionality of a TDC. In this measurement, small time-steps are generated between the Start and the Stop signals, which sweep the entire measurement range of the TDC. In this implementation, both TDC Start and Stop signals were generated inside the FPGA using the MMCM block. The interpolated fine phase shift (IFPS) of the MMCM was used to generate the small time-steps between the Start and the Stop signals with a minimum step size of 14.8 ps.
Of all the histograms collected in the measurement, more than 90% have a Full Width at Half Maximum (FWHM) of less than two bins while the remaining histograms have less than four bins populated. For each step, the bin in the histogram with the highest count number was considered the TDC output. Time resolutions including 5.00 ps, 10.04 ps, 21.65 ps, 43.87 ps, 64.11 ps, 87.73 ps were measured with step-size of 14.8 ps. Based on the DNL results in Fig.9 and according to (5), we can calculate the smallest bin for each time resolution. For time resolutions of 21.65 ps, 43.87 ps, 64.11 ps, and 87.73 ps, the smallest bin is larger than 18.25 ps, hence, the phase shift step of 14.8 ps covers all the bins, including the smallest bin width. However, for time resolutions of 5.00 ps and 10.04 ps, the smallest bins are 0.03 ps and 5.60 ps, respectively, which are smaller than the phase shift step of 14.8 ps, hence, it is not possible to cover all the bins with the current technology. As the primary purpose of the time interval measurement is to confirm the linear trend between the TDC input and the TDC output, the 14.8 ps phase shift steps are sufficient to confirm the linear trends. A linear regression model was applied to determine the linearity of the measurement results. The residuals of the linear regression model are calculated by the difference between the experimental TDC output from the predicated line at a given TDC input using T DC output − P redicted output . Each histogram has a measurement count of over 10k, and 1000 frames of histograms were saved for the average TDC output calculation, leading to approximately 10M total counts for each data point. The Standard Deviation of each TDC output, which is calculated from 1000 frames of histograms, improves as the time resolution decreases from 5.00 ps to 87.74 ps. No significant Standard Deviation differences occur between the traditional thermometer method and the proposed statesbased method, which potentially indicates that the Standard Deviation was mainly affected by the jitters of the Start and Stop signal rather than the TDC itself.
From Fig. 11(c), the peak-to-peak residuals' range of the 5.00 ps time resolution is [-5.5, 6.92 Fig. 11(c), Fig. 11(d), and Fig.  11(e) show, the distribution of the residuals shares a similar trend as the DNL and INL, which achieve improvement as the time resolution decreases from 5.00 ps to 87.74 ps. The testing results of the traditional thermometer method at 5.63 ps are shown in Fig. 11(a), Fig. 11(b), and Fig. 11(c), with decreased  Fig. 11(c), Fig. 11(d), 11(e) are due to repetition of the fine code in the entire measurement range. Since the measurement range is 5 ns or 8 ns, and the fine measurement range is 1.667 ns, the fine code is repeated 3 or 5 times. As described in Section II-A no blank histograms were observed in Fig. 11(a), hence, no missing code was found in this experiment.

IV. DISCUSSION
The proposed state-based TDC is compared with few other approaches in TABLE II [2], [8], [11], [17], [29], [37]. we have demonstrated our result in six separate time resolutions between 5 and 100 ps. The results from raw DNL measurement show a minimum DNL of above -1, which, according to (5), meaning the smallest bin is non-zero, hence, no empty bins are recorded. Additionally, the maximum raw DNL shows significant improvement relative to other TDCs with a single-chain or sub-chain delay structure. The Vernier structure of the delay line implemented in [37] may results in a better raw DNL relative to single-chain or sub-chain based TDCs, however, the Vernier structure is more complex and has slower frame rates. Additionally, the proposed RSE-based bin configuration uses a simple RSE equation relative to the MB approach described in [2]. On the other hand, Our approach achieves a better raw DNL at higher resolution such as 51.28 ps due to the proposed state-based concept rather than the traditional thermometer concept implemented in [2]. Moreover, we have achieved a high similarity between the simulation and experimental results in DNL, INL and RSE measurements, which provides a linearity reference before the time-consuming compiling and implementing of the code. Our implementation has a longer on-chip 8 ns full measurement range (5 ns full range for the 5.00 ps time resolution) relative to the previous implementations in TABLE II. The on-chip interleaved histogram generated in our implementation enables high frame rate outputs without deadtime for real-time applications. Furthermore, our entire TDC system is implemented on a generic low-cost FPGA development board, enabling potential uses in applications with cost constraints, including robotics and Internet-of-Things (IoT). In order to make a fully embedded states-based TDC system without a PC, future developments may include the calculations of the SP , F P , and the linearity analysis of our RSE-based bin configuration inside the same FPGA.

V. CONCLUSION
We have developed a states-based TDC with improved raw linearity and flexible time resolution. The TDL consumes half of the traditional TDL resources whilst maintaining a continuous conversion without deadtime using interleaved histograms. The proposed RSE-based bin configuration approach is able to predict the combination of the states and inform the implementation of the TDC. We selected six different time resolutions for the verification of this approach. The RSE from the implementation compared to the simulation shows less than 15% variation. The six different time resolutions were evaluated individually using code density measurements, while the implementation results were consistent with the simulated DNL and INL. This RSE-based approach provides a fast and reliable method to predict the linearity of a TDC prior to its implementation. Moreover, an extended measurement range of 8 ns (5 ns for 5.00 ps time resolution) was evaluated, with the experimental results reproducing the linearity of the shorter range of the fine code. Time interval measurements were taken for each selected time resolution to verify the functionality of the TDC. Using a linear regression model, the residuals of the TDC output for each time resolution were found to be in agreement with measured DNL results.
In summary, the main contributions of the proposed TDC are as follows: 1) The concept of the states-based TDC is proposed. Compared with the traditional thermometer code based TDC, this method encodes the TDC with real states from the TDL, which contributes to the higher raw linearity and eliminates empty histogram bins. Currently, only limited data were collected which may not cover all the potential states within the TDL. These states with rare occurrences are potentially missed, affecting the bin configuration and reducing linearity. Moreover, the states are potentially influenced by the temperature and variations of the Start signal of the TDC which is also the system's clock, where more tests need to be done [38]. In conclusion, the FPGA-based TDC provides a level of high linearity and flexibility to rapidly develop instrumentation in a large variety of time-of-flight applications including, LiDAR, 3D imaging, and healthcare monitoring.