CMOS SPAD Line Sensor With Fine-Tunable Parallel Connected Time-to-Digital Converters for Raman Spectroscopy

A 256-channel single-photon avalanche diode (SPAD) line sensor was designed for time-resolved Raman spectroscopy in 110-nm CMOS technology. The line sensor consists of an $8\times256$ SPAD array and 256 parallel connected time-to-digital converters (TDCs). The adjustable temporal resolution and dynamic range of TDCs are 25.6–65 ps and 3.2–8.2 ns, respectively. The median timing skew along 256 channels is 43.7 ps, and TDC bin boundaries can be fine-tuned at the ps-level to enable precise timing skew compensation. The sensor is capable of real-time dark count measurement (two dark measurements for each excitation pulse) that gives accurate data for dark count compensation without any increment in measurement time. The maximum excitation pulse rate with real-time dark count measurement is 680 kHz. Raman spectra of six different samples were measured to prove the performance of the sensor in time-resolved Raman spectroscopy.


I. INTRODUCTION
C MOS single-photon avalanche diode (SPAD) sensors are used in a wide variety of application fields, including 3-D time-of-flight imaging [1], [2], fluorescence lifetime imaging [3], [4], super-resolution localization microscopy [5], nearinfrared optical tomography [6], and time-resolved Raman spectroscopy [7], [8], [9], [10]. The first CMOS SPAD line sensors developed for time-resolved Raman spectroscopy implemented fluorescence suppression by means of pulsed excitation and time-gated measurement [7], [11]. Instead of simple time gating, more recent line sensors are designed for time-correlated single photon counting (TCSPC) and have a temporal resolution in a range of 20-52 ps [8], [12], [13]. For both time-gated and TCSPC-based measurements, the most important feature of CMOS SPAD line sensors in Raman spectroscopy is that the majority of the hampering fluorescence emission and other background radiation, e.g., ambient light and thermal emission, can be separated from Raman scattering Manuscript  due to their different temporal appearance [14]. This is not possible in a conventional Raman spectrometer that uses a CCD sensor and a continuous wave excitation source. The use of CMOS SPAD line sensors that are capable of TCSPC has some additional advantages when compared to simple timegated measurement. With TCSPC, instead of rejecting most of the fluorescence photons by means of time gating, all photons can be detected and both a Raman spectrum and a fluorescence lifetime can be calculated from the same TCSPC data in postprocessing. This gives additional information about samples that are suitable for both Raman and fluorescence lifetime measurement. In fact, there are several application areas where both time-resolved Raman spectroscopy and fluorescence lifetime imaging have already been used, extracellular vesicle analysis [15], [16], caries detection [17], [18], and basal cell carcinoma detection [19], [20], for example. Similar to time-of-flight radar, a time-resolved Raman spectrometer is also inherently capable of depth analysis as demonstrated in [21], [22], and [23]. A clear advantage of time-gated sensors is lower complexity when compared to sensors capable of TCSPC. A combination of time gating and an adjustable delay generator can also be a good alternative to a time-todigital converter (TDC) if a very high background radiation level is expected. A thorough description of time-gated Raman spectroscopy can be found in recent reviews [24], [25], [26]. The usage of a time-resolved CMOS SPAD line sensor in Raman spectroscopy does not only give benefits, but it also creates some challenges. In time-resolved measurement, any temporal variation between sensor channels causes distortion to the measured Raman spectrum. This is commonly noticed in previous studies and cannot be completely avoided due to a mismatch in all CMOS processes [10], [27], [28]. However, if the timing skew of a sensor is characterized, most of the distortion caused by it can be eliminated in post-processing [10], [28], [29]. The efficiency of timing skew compensation is somewhat dependent on the temporal resolution of TCSPC measurement, and therefore it is beneficial to have a temporal resolution of tens of picoseconds, or better, even if the overall time gate width is hundreds of picoseconds or more.
Typically, timing skew is the dominant source of spectral distortion for highly fluorescent samples, but for samples without significant fluorescence, dark counts of a sensor limit the spectral quality. Of course, dark count subtraction can be done with an additional dark count measurement but depending on the dark count rate (DCR) and Raman signal level, the long acquisition time may be required to reduce shot noise of dark counts at an acceptable level. If the temperature of a sensor is not stabilized, accurate dark count compensation is also challenging due to the temperature dependence of the DCR which can be ∼12%/ • C, for example [30]. Dark counts are not solely a problem of CMOS SPAD sensors, as dark counts in CMOS SPAD sensors are analogous to dark currents in CCD sensors. In traditional Raman spectrometers, CCD sensors are routinely cooled to low temperatures to reduce the effect of dark current.
In this article, we present a 256-channel CMOS SPAD line sensor that is specifically targeted for high performance in time-resolved Raman spectroscopy by reducing the impact of two above-mentioned factors, timing skew and dark counts. Parallel connected flash TDCs are used to minimize the timing skew by means of mismatch averaging, and TDC bin boundaries are adjustable at the ps-level with an external reference clock signal to enable efficient timing skew compensation. Triple measurement mode is designed to make two dark count measurements for every excitation pulse, effectively providing real-time dark count measurement simultaneously with the actual measurement. This reduces the overall measurement time to a third compared to a separate dark count measurement and eliminates inaccuracy caused by temperature changes between the measurements. For systemlevel optimization, DCR reduction by means of sensor cooling is also supported with an integrated temperature monitoring circuit. The dynamic range of TDCs is wide enough for fluorescence lifetime measurements for typical fluorescence lifetimes of organic samples (up to ∼10 ns) so that both the Raman spectrum and fluorescence lifetime can be measured simultaneously.
The architecture of the sensor is described in Section II. Characterization results for the SPAD array, TDCs, instrument response function (IRF), and temperature monitoring block are shown in Section III. Application measurement results from six Raman measurements and one fluorescence lifetime measurement are shown in Section IV. Section V gives a comparison to previous CMOS SPAD sensors used in timeresolved Raman spectroscopy and concludes the article.

II. SENSOR ARCHITECTURE
A block diagram of the line sensor is shown in Fig. 1. The sensor consists of 256 identical channels each of which includes eight SPADs, a SPAD controlling block, a TDC, and registers for measurement results. Each block of 64 channels shares a 7-bit databus for data read-out. Two delay-locked loops (DLLs) are used to stabilize the delay elements of TDCs and other blocks against the process, temperature, and supply voltage variation. A triple measurement delay generator block generates three internal trigger pulses from a single external synchronizer pulse, so that three separate measurements can be made for each laser pulse. The control data register is a serial-in parallel-out shift register that stores user-definable operation parameters. The temperature sensor block gives an analog output voltage for sensor temperature monitoring.
The prototype sensor chips were fabricated in LFoundry 110-nm CMOS technology. The size of the chip is 2.13 × 8.44 mm, and a photograph of the sensor chip is shown in Fig. 2.

A. Delay-Locked Loops
Two DLL blocks are used to generate control voltages for three different delay element types. The block diagram of the DLLs and a schematic of the delay elements are shown in Fig. 3(a) and (b). If delay elements that have a delay value below 100 ps are stabilized in a single DLL, either the reference clock frequency value or the length of the delay line can become impractical. As an example, a 60-ps delay requires ∼83 delay elements if a reference clock frequency of 200 MHz is used. Therefore, to keep the reference clock frequency in the range of 50-200 MHz and the lengths of the delay lines in DLLs below 25 elements, two nested DLLs are used in both DLL blocks.
In the coarse DLL of the DLL1 block, the delay of seven slow delay elements is locked to the period of external reference clock signal ref_clk_1. Then, in the fine DLL of the DLL1 block, the delay of 21 (=23 − 2) fast delay elements that have blue sizing in Fig. 3(b) is locked to the delay of one slow delay element. As a result, the delay of a single fast delay element in the DLL1 block is the period of a reference clock signal ref_clk_1 divided by 147 (=7 · 21).
The basic structure of the DLL1 and DLL2 blocks is similar. The second external reference clock signal ref_clk_2 is used for DLL2, and now the delay of 14 (=16 − 2) fast elements that have red sizing in Fig. 3(b) is locked to the delay of one slow delay element [the only difference compared to Fig. 3(a) is that the fast_23 signal is replaced by the output of the 16th delay element, i.e., fast_16 signal, and en_fast is taken from the output of the seventh delay element]. Hence, the delay of a single fast delay element in the DLL2 block is the period of a reference clock signal ref_clk_2 divided by 98 (=7 · 14).
If reference clock signals ref_clk_1 and ref_clk_2 have an equal frequency, the delay value of a fast delay element in DLL2 is 1.5 (=21/14) times the delay value of a fast delay element in DLL1. In the TDC topology that is used in this design, the temporal resolution (one LSB) is determined by the delay difference of these two fast delay elements. Because of that, from the TDC point of view, the delay values of the fast delay elements are two LSB and three LSB, as noted in Fig. 3.  In both DLL blocks, the phase detectors are conventional D flip-flop-based phase detectors, and the charge pumps are current-splitting charge pumps proposed in [28]. Loop filter capacitors C 1 and C 2 are integrated MOS capacitors. By default, any DLL is vulnerable to false-locking. In this design, false-locking is prevented in two ways. To avoid harmonic locking, all control voltages are driven up to Vdd at the beginning of the operation. To prevent a stuck false lock, the phase detector is kept in reset until the enable signal (en_slow, en_fast) from the delay line goes high (otherwise rising edges that are generated by the same clock pulse could be compared). A detailed description of the false-locking mechanisms in DLLs can be found in [32].

B. Triple Measurement Mode and SPAD Control Logic
In normal measurement mode, one measurement is done for each incoming pulse of triggering signal trigger_in. These triggering pulses are typically generated from optical pulses to synchronize measurements with a pulsed laser. In triple measurement mode, a triple measurement delay generator block shown in Fig. 4 generates three internal trigger pulses from one incoming trigger pulse. The pulsewidth and the interval between the three trigger pulses are one period of a reference clock ref_clk_1 and five periods of a reference clock ref_clk_1, respectively. These values are set by means of slow (42 LSB) delay elements (seven elements for one period, 35 elements for five periods). In Fig. 4, gray blocks that include delay elements, an inverter, and an AND-gate are used to set the pulsewidth. Trigger pulses are then spread to 256 channels with the buffer tree where buffer outputs of each stage are parallel connected to minimize the timing skew along the sensor.
All channels include a SPAD controlling block shown in Fig. 5. Quenching and loading signals that are common to all SPADs, and starting signal to the TDC are generated in the circuit shown in the left part of Fig. 5. Loading pulsewidth can be set to 42 or 84 LSB with a load_pulse_width signal. If a longer loading pulse is chosen, TDC_start is also delayed so that the interval from the end of the loading pulse to the start of the TDC operation stays constant. Separate quenching and loading signals are generated for every SPAD by AND-gates shown in the right part of Fig. 5. Each SPAD can be disabled by setting SPAD_N_HPE bit in a control data register (HPE, hot pixel elimination). If SPAD_N_HPE is high, SPAD_N_HPE_inv is low and SPAD-specific loading and quenching signals are permanently low (SPAD stays quenched). An operating voltage of 1.4 V is used throughout the IC except for loading and quenching transistors (and preceding

C. SPADs and SPAD Front-End
The 8 × 256 SPAD array is built from square 27.2 × 27.2-μm P+/N-well SPADs which are available as a library component in LFoundry 110-nm technology. The active area of a single SPAD is 415 μm 2 , and the pitch in the SPAD array is 32.9 μm resulting in a fill-factor of 37.9%. The nominal breakdown voltage and the maximum photon detection efficiency (PDE) for these SPADs are 18.3 V and 30.76% (at 455 nm with 3-V excess bias), respectively.
The schematic of a SPAD front-end circuit is shown in Fig. 6. The anode of the SPAD is connected to an inverter that is made of 3.3-V transistors but is operated at 1.4 V to implement voltage level shifting from 3.3 to 1.4 V. The following NOR-gate acts as another inverter if the SPAD is not disabled, but outputs a constant low for a disabled SPAD. Signals from all eight SPADs of a channel are combined into a TDC with an 8-input pseudo-NMOS OR-gate. A single pseudo NMOS gate was chosen because propagation delay variations between its inputs are significantly lower than in CMOS OR-trees (schematic-level simulations indicated variations up to ∼15 ps for CMOS OR-trees). An obvious drawback of pseudo NMOS gates is their constant current consumption. In this case, the OR-gate is needed only during the measurement, and therefore the constant current consumption can be avoided by activating the PMOS transistor M1 only during the measurement with quench_inv signal.

D. Time-to-Digital Converters and Data Read-Out
In a conventional flash TDC that is based on a delay line and sampling DFFs, the temporal resolution (LSB) is determined by the delay of a single delay element and the dynamic range is n LSB, where n is the number of delay elements in the delay line. The TDC topology chosen for this sensor is a flash TDC but with two modifications that improve its performance for time-resolved Raman spectroscopy. A block diagram of the modified flash TDC is shown in Fig. 7.
First, for each of the 64 delay elements, two DFFs and two stop signals are used. When each delay element is sampled twice with an interval of 1 LSB (suitable stop signals are generated in the Stop signal generator block shown in Fig. 7), a temporal resolution of 1 LSB, and a dynamic range of 128 LSB is achieved with 64 2-LSB delay elements. This can be seen in the timing diagram in Fig. 8(a), where 1 LSB stepping of stop signals generates sequential thermometer code results (boundary cases for the first three thermometer codes shown). In addition to improved temporal resolution, by adjusting the frequency of a second reference clock ref_clk_2 that sets the delay for the 3-LSB delay element, every other bin boundary can be fine-tuned. Fig. 8(b) shows an example in which a 3 LSB delay element is adjusted for a 3.5 LSB delay. Now the stop signal step that is needed to change the thermometer code is no longer constant at 1 LSB, but it alternates between 1.5 and 0.5 LSB. In other words, every other bin boundary was shifted by 0.5 LSB. The resulting functionality is illustrated at the bottom left in Fig. 7, where tunable boundaries are drawn with dashed lines and stable boundaries (only affected by ref_clk_1) are drawn with solid lines. As an example, time gating containing bins 1-5 can be considered. The width of bin pairs bin 1-bin 2 and bin 3-bin 4 is constant 2 LSB, as shown in Fig. 7, but when ref_clk_2 is adjusted, the endpoint of time gating containing bins 1-5 is adjusted because the width of bin 5 changes (and bin 6 is not included in the chosen time gating). In timeresolved Raman spectroscopy, this can be used to implement picosecond-level adjustments to the endpoint of the time gate even if the temporal resolution of the TDC is some tens of picoseconds. This improves the precision of timing skew compensation, as part of the timing skew can be eliminated at the hardware level and smaller adjustments are needed in data post-processing. This is illustrated in detail with measurement results in Section V.
As a second modification, TDC_start signals and delay lines (buffered outputs of delay elements phase_1, phase_2, . . .) of all 256 TDCs are parallel connected. With the parallel  connection, most of the temporal variations in delay lines are averaged out and the timing skew along the sensor is reduced compared to 256 totally separate TDCs. The possibility of parallel connection was one of the main motives to choose this TDC topology. For the last three delay elements, the parallel connection was not made. Albeit this was caused by supply voltage wiring optimization in the layout, it enables an interesting comparison between unconnected and parallel connected structures.
Like any flash TDC, this TDC inherently generates a result as a thermometer code. The 128-bit thermometer code is then encoded to a 7-bit binary number and stored in a result register block shown on the right in Fig. 7. In triple measurement mode, three results are stored in registers 1-3 and in normal measurement, the only result is stored in register 3. The fourth 7-bit registers form four 64-channel long shift registers that lead to four 7-bit output buses, as illustrated in Fig. 1.
Multiplexers are used to select if data to the fourth registers are loaded from other registers of the same channel (to initially fill the shift register) or from the fourth register of the previous channel (to move data in the shift register toward the output bus). It takes 66 clock cycles to read out normal measurement results and 198 clock cycles to read out triple measurement results from all 256 channels through four 7-bit buses.

E. Control Data Register and Temperature Monitoring
A control data register is a serial-in parallel-out shift register for storing the operation parameters of the sensor. With a control data register, the user can set drive strength and slew rate of I/O output buffers, choose a measurement mode (triple measurement mode ON/OFF), set a loading pulsewidth (short/long), and enable or disable SPADs in a SPAD array.
A simple temperature monitoring circuit that consists of four parallel diodes connected to 5 × 5-μm PNP transistors in series with a current limiting 26-k poly resistor is also included in the sensor. According to simulations, the voltage drop over diode-connected transistors has a fairly linear temperature dependence of about −1.5 mV/ • C (−1.43 to −1.55 mV/ • C, over all process corners and a temperature range of 10 • C-50 • C). As expected, the absolute accuracy of such a simple structure is very low, but as the most probable use for temperature monitoring is to stabilize the temperature of the sensor by means of a thermoelectric cooler, high absolute accuracy is not needed.

III. SENSOR CHARACTERIZATION
Four chips from the same multi-project wafer run were tested in sensor characterization measurements. The most thorough TDC characterization and IRF measurements were made with chip no. 1, while other measurements were conducted for all four chips to receive more comprehensive data. Throughout the measurements, all SPADs in SPAD arrays were enabled and the operating voltages of the sensor were 3.3 V (for loading and quenching the SPADs) and 1.4 V (for the rest of the chip). For data read-out, control data setting, and reference clock generation, an Opal Kelly XEM7310-A200 FPGA board was used. The clock frequency for data read-out was 150 MHz resulting in a maximum pulse rate of 2.0 MHz and 680 kHz for normal measurement mode and triple measurement mode, respectively. The power consumption of the sensor depends on the excitation pulse rate, hit probability, and measurement mode. With typical operation parameters (280-kHz excitation pulse rate, 10% hit probability, triple measurement mode on) the average power consumption of the sensor is ∼28 mW. If the excitation pulse rate is changed, the change in the power consumption is ∼6 mW/100 kHz.

A. SPAD Array
DCRs of all channels in four chips (4 · 256 = 1024 channels) were measured at different temperatures and with different excess bias voltages. Cumulative channel DCRs for different SPAD cathode voltages (HV) at room temperature (25 • C) are shown in Fig. 9(a). Median DCR for a channel (eight SPADs) is ∼1-5 kHz for an HV range of 19.2-21.6 V (excess bias range of 0.9-3.3 V). At 21 V, the median DCR for a channel has an HV dependence of ∼71%/V. Cumulative channel DCRs for different temperatures are shown in Fig. 9(b). Median DCR for a channel is ∼1.6-29 kHz for a temperature range of 10-50 • C. In addition to absolute DCR values, also the temperature dependence of DCR increases with higher temperatures. At 20 • C, the median DCR of the channel has a temperature dependence of ∼6.2%/ • C but at 40 • C it is already ∼8.8%/ • C. DCR values as a function of a sensor channel are shown in Fig. 9(c) (25 • C, HV = 21 V). Obviously, the patterns are different, but none of the chips shows significantly better or worse DCR performance than the others.
Relative PDE of the spectrometer (including the sensor and other optical components) is shown in Fig. 9(d) for HV values of 19.3-21.3 V. A white LED was used for illumination and a reference spectrum for it was measured with a commercial spectrometer. Strong PDE modulation shown in Fig. 9(d) has also been observed for P+/N-well SPADs manufactured in 110-nm CMOS technology in [30].
The temporal resolution of TDCs was tested by targeting optical pulses to two different temporal locations in the TDC range. A known interval between the locations was generated by changing the length of a coaxial cable used for the trigger_in signal that defines the TDC starting moment. Resolutions of four chips (average of all 256 TDCs) for a reference clock range of 50-150 MHz are shown in Fig. 10(a). Over a range of 70-130 MHz, differences between measured resolutions and nominal resolutions are ±2 ps at most, and the largest deviation from the nominal value was 4 ps, observed at 50 MHz for chip no. TDC bin boundary adjustment with the second reference clock signal was characterized by measuring ambient light with different ref_clk_2 frequencies while keeping the ref_clk_1 frequency at 125 MHz. Ideally, hits should distribute equally between odd and even bins when frequencies of ref_clk_1 and ref_clk_2 are the same. Results in Fig. 10(b) show that the ideal 50/50 distribution is achieved with ref_clk_2 frequencies of 124-134 MHz depending on the chip. When both reference clock frequencies are set to 125 MHz, two chips produce very good distributions (∼49/51). In the other two chips, odd and even bins are clearly unequal (∼40/60), presumably due to a mismatch in DLL blocks. This is not a problem, because bin sizes can easily be equalized just by changing the frequency of ref_clk_2.
Thorough TDC characterization was done for chip no. 1 with 125-MHz reference clocks. The precise characterization method described in [29] was used with the exception that attenuated laser pulses were detected directly instead of fluorescence signals from a reference sample (the whole sensor area was covered by a pulsed laser source). In short, this method uses an optical code density test with ambient light to measure relative bin sizes, and pulsed light to two temporal locations to convert relative bin sizes to absolute time scale. TDC bin boundaries are shown for bins 10-20 and for bins 115-127 in Fig. 11(a) and (b). The time scale starts from bin no. 1; bin 0 that covers ∼1 ns period from SPAD loading to TDC start is ignored. In a typical Raman measurement, the Raman signal is targeted at the beginning of the TDC range so that the fluorescence tail also fits into the range. Therefore, bin boundaries of bins 10-20 represent typical time gate endpoints in time-resolved Raman measurement. One of the most important performance parameters for a line sensor that is used in time-resolved Raman spectroscopy is timing skew between sensor channels. Timing skew, i.e., a temporal variation on bin boundaries between the channels, can be seen visually in Fig. 11(a) and (b), and numerically in Fig. 11(c), where ranges and standard deviations are given for every bin boundary. The shape of the bin boundary lines, especially at the beginning of the TDC range [see Fig. 11(b)], consists of a flat horizontal line (ideal shape) combined with random variation. Near the end of the TDC range in Fig. 11(a), a weak linear slope of about 20 ps over 256 channels can be observed, possibly caused by a slight temperature gradient on a PCB. A drop for the last bins of channels 245-256 in Fig. 11(a) is related to the asymmetry of power lines at the end of the array. Due to reasons related to power net routing in the layout, parallel connection of TDCs was not done for the last three delay elements (last six bins). The absence of a parallel connection has a visible effect on the timing skew in  Fig. 11(a). The transition from the parallel connected part to the unconnected part is at ∼3500 ps, where the timing skew increases considerably. The transition point is also clearly seen in Fig. 11(c) (at bin no. 123). The fact that timing skew mostly consists of random variation (no gradients, the same shapes are not strongly repeating on every bin boundary) means that the parallel connection of TDCs is performing well, and timing skew is mostly caused by the mismatch in sampling DFFs.
A more conventional presentation for the linearity of TDCs is shown in Fig. 12. Differential and integral nonlinearity (DNL and INL) curves are shown in Fig. 12(a) and (b) Fig. 12(c). Two clear abnormalities are visible in the DNL curve in Fig. 12(c), for bins no. 1 and no. 122. The latter is related to the transition from the parallel connected part to the unconnected part and the first is most probably caused by smaller capacitive loading of the first delay elements compared to the subsequent delay elements. Except for these anomalies, the average DNL is within −0.08/+0.07 LSB. Finally, the bin width distribution for the whole sensor is shown in Fig. 12(d). Bin widths are approximately normally distributed with a standard deviation of 8.4 ps (0.29 LSB).

B. Instrument Response Function and Temperature Monitoring
IRF was measured for chip no. 1 by detecting 15 million backscattered laser pulses from white paper with HV values in a range of 19.0-21.6 V. ND filters were used for pulse energy attenuation to reduce hit probability to ∼2.5%. The results are shown in Fig. 13. The intensity values for HV = 21 V, shown in Fig. 13(a), were calculated by dividing raw data (hits) with bin widths (ps) from TDC characterization data. IRF FWHM value was calculated for each channel from Gaussian fits that were made to the intensity curves, and the median FWHM at HV = 21 V is 181 ps. When the contribution of the excitation laser (pulsewidth 140-ps FWHM) is subtracted, the resulting IRF FWHM value for the sensor is 115 ps. In Fig. 13(a), for bins whose width is less than 2 ps, the intensity value was replaced with an average intensity of the previous and next bins to decrease noise caused by very small bin widths. This replacement was only made to the figure for clarity, not to the data that were used for IRF calculations. With lower HV values, the IRF FWHM value increases, and the detected pulse moves forward on the time axis. These alterations are shown in Fig. 13(b). In Fig. 13(a), the intensity varies due to the nonuniform distribution of illumination.
Temperature monitoring blocks of all four chips were tested in a temperature chamber over a temperature range  of 10-50 • C. The minimum and maximum measured temperature coefficients were −1.5 mV/ • C and −1.62 mV/ • C, respectively. For all four chips, mean temperature coefficients for this temperature range are between −1.52 to −1.56 mV/ • C. When temperature monitoring output is digitized with the internal A/D converter of Artix-7 FPGA that is already included in the system, temperature variations can be observed with the resolution of 0.16 • C (12-bit converter, 1 V reference voltage).

A. Measurement Setup
A simplified block diagram for the time-resolved Raman spectrometer used in Raman measurements is shown in Fig. 14. The excitation source, a 532-nm pulsed laser (Teem Photonics ANG-500P-CHS), has a pulse rate of 280 kHz and a pulsewidth of 140 ps (FWHM). The measured average excitation power at a sample is 150 mW. A minor share of the optical power (∼3%) is split to a light detector (Thorlabs DET02AFC) that generates a trigger signal to synchronize the line sensor with laser pulses. The actual spectrometer part is built around holographic grating with 1800 lines/mm (Wasatch Photonics), and the wavelength range covered by the line sensor is 539.8-588.4 nm. This corresponds to a wavenumber range of 272-1803 cm −1 , giving a theoretical spectral resolution of 6.0 cm −1 for a 256-channel line sensor. Opal Kelly XEM7310 FPGA board is used to control the line sensor and to read out the data.

B. Raman Measurements
In previous work, the performance of CMOS SPAD line sensors in Raman spectroscopy has been demonstrated with various sample types including oils, minerals, water, paracetamol, and diamond [7], [8], [9], [10], [28], [33]. To achieve comparable results and to demonstrate how the different sample types set different requirements for a sensor, Raman measurements are here divided into three categories. The first sample category is highly fluorescent samples, for which the timing skew of the sensor is a significant source of spectral distortion (the higher the intensity at the time gate endpoint, the higher the distortion level caused by timing skew [27]). To demonstrate timing skew compensation capability based on adjustable TDC bin boundaries, highly fluorescent roasted sesame seed oil was measured (roasting increases the fluorescence level). The second sample category contains non-fluorescent weak Raman scatterers. In this category, spectral distortion caused by dark counts of the sensor can be notable due to low signal level. To prove the real-time dark count compensation capability of triple measurement mode, a water sample was measured. The third category is strong Raman scatterers which are easily measured. Four readily available strong Raman scatterers were measured with short (1-100 ms) acquisition times. In addition to the sensor itself, the required acquisition times for clear spectra strongly depend on the excitation source and the optical parts in the spectrometer, and therefore the exposure times can be used for system-level comparisons at some accuracy. In this section, time gating refers to the selection of a TDC bin range in data post-processing, not to hardware-level gating of SPADs.
PDE of the SPAD array strongly depends on the detected wavelength [as shown in Fig. 9(d)], and therefore PDE compensation has been made to the spectra in Figs. 15(c) and 16. DCR compensation has also been made to spectra in Figs. 15(c) and 16 (details for it are given in the description of water measurement). For PDE compensation, a white halogen lamp (ANDO AQ4303B) was used as a reference. SPAD cathode voltage HV = 21.3 V and a constant ref_clk_1 frequency of 113.33 MHz, which corresponds to a nominal resolution of 30 ps, were used throughout the Raman measurements.
The data related to roasted sesame seed oil measurement are shown in Fig. 15. For timing skew compensation, a reference sample, an aqueous solution of erythrosine B (Sigma-Aldrich no. 198269), was measured. A measurement series of 75 Mpulses (∼4.5 min) was divided into five parts in which different ref_clk_2 frequencies were used. The list of ref_clk_2 frequencies (92.73, 102, 113.3, 127.5, and 145.7 MHz) was chosen to create 10-ps stepping for bin boundaries from odd bins to even bins, i.e., the width of odd bins was adjusted from 10 to 50 ps by 10-ps steps. The spectra of erythrosine B (sum from bins 1-9) for all five ref_clk_2 frequencies are shown in Fig. 15(a). Reference clock frequency stepping, and the consequent time gate endpoint stepping (10-ps stepping for a width of bin no. 9 causes 10-ps stepping to time gate endpoint when the time gate consists of bins 1-9), is shown in signal increase, and all five spectra are visibly distorted because of timing skew. Erythrosine B is a fluorescent dye with a fluorescence lifetime of 90 ps [34]. Due to its short fluorescence lifetime, the fluorescence signal fits completely in the range of TDCs and it can be measured without the spectral distortion caused by timing skew. Such a distortionfree spectrum (sum from bins 1-125) is shown by the black line in Fig. 15(a). When distorted spectra from bins 1-9 are compared to the distortion-free spectrum, the optimal time gate endpoint for minimum distortion can be chosen for every sensor channel. This idea was applied with the addition of 99 interpolated spectra between every two sequential measured spectra, so that finally the values for the least distortion were chosen from 500 spectra. For example, for a channel at 1240 cm −1 shown in the zoomed area in Fig. 15(a), the optimal time gate endpoint is achieved with 102 MHz as the 102-MHz spectrum and the ideal spectrum intersect.
A similar measurement series was made for the roasted sesame oil sample, and a raw spectrum from bins 1-9 is shown in Fig. 15(b) (green line). To compensate for the timing skew, interpolation spectra were created from measured sesame seed oil spectra similarly as was done for erythrosine B spectra, and the interpolation value for each channel was selected based on the erythrosine B measurement. The PDE modulation pattern and Raman peaks are visible in the skew compensated spectrum in Fig. 15(b) (black line), but most of the distortion has disappeared. Raman spectrum of roasted sesame seed oil after post-processing (DCR, PDE, timing skew, and baseline compensations done, no computational filtering) is shown in Fig. 15(c). All main Raman peaks are clearly visible in Fig. 15(c). For a Raman peak at 1440 cm −1 , the signalto-distortion ratio (SDR) was calculated. SDR, calculated as a ratio of the highest hit count of the Raman peak and the standard deviation of hit counts over the range of 1500-1625 cm −1 (range that includes only the distorted baseline) is 31.0 for a Raman peak at 1440 cm −1 .
Fluorescence lifetime and fluorescence-to-Raman ratio (for a Raman peak at 1440 cm −1 ) without time gating for this roasted sesame seed oil sample are 2.1 ns and 115 (∼625 for a Raman peak at 1750 cm −1 ), respectively. In Fig. 15(b), the time-gated fluorescence-to-Raman ratio is ∼4.7. So, due to time gating, the fluorescence-to-Raman ratio was reduced by a factor of ∼24.5. Generally, fluorescence levels of edible oils can vary a lot, due to aging, for example [35]. Therefore, it is good to declare the measured fluorescence lifetime and the fluorescence-to-Raman ratio of an oil sample used to demonstrate the performance of a Raman spectrometer.
The Raman spectrum of water was measured in two parts (1500-2770 and 2770-3750 cm −1 ) because of the required wide wavenumber range. For both parts, one minute acquisition time (16.8 Mpulses) was used. The combined spectrum is shown in Fig. 16 from bins 21-60. The best raw spectrum quality was achieved with this ∼1.2 ns time window which is a compromise of minimizing the distortion caused by dark counts and timing skew. Spectral distortion is observed in Fig. 16 practically only in the inset, where channels with higher DCR show up. The water spectrum in Fig. 16 was measured using triple measurement mode so that for each excitation pulse, the Raman signal was detected in the first measurement, while two other measurements only detected dark counts of the sensor. The blue line in Fig. 16 shows a DCR compensated spectrum, where the average of the second and third measurements is subtracted from the first measurement. A comparison of black and blue lines in Raman spectra of ethanol, polystyrene, titanium dioxide, and diamond sample without any post-processing. Fig. 16 shows that DCR compensation based on this integrated DCR measurement performs well.
Raman spectra of four strong Raman scatterers are shown in Fig. 17. Hits from bins 23-37 (∼430-ps time window) were summed to spectra without any post-processing. Raman spectra of a diamond sample measured with a CMOS SPAD-based spectrometer were recently reported in [33]. The Raman spectrum of diamond achieved in 1-ms measurement in Fig. 17 shows a higher Raman hit count (276 > ∼230) and noticeably lower background noise without any postprocessing when compared to the 10-ms acquisition time spectrum with background subtraction in [33]. The largest factors for such a big difference in required acquisition times are excitation pulse energy (ratio ∼1:335, higher in this work) and active area of SPADs in a sensor channel (∼1:3, higher in this work). In fact, Raman scattering from the diamond sample is so strong that 276 Raman photons were detected with 280 excitation pulses in the measurement shown in Fig. 17. If there was a sample with this strong Raman scattering and more complex spectral shape, pulse energy should be attenuated significantly to prevent losing spectral details because of full saturation.

C. Fluorescence Lifetime Measurements
The range of the TDCs mostly determines how long fluorescence lifetimes can be measured. For the best temporal resolution, which is achieved with ref_clk_1 frequency of 150 MHz, the range of TDCs is 3.2 ns. This is suitable for ns-scale fluorescence lifetimes and can be expanded up to ∼8.2 ns at the expense of lower temporal resolution (50-MHz reference clock gives a range of ∼8.2 ns and a temporal resolution of ∼65 ps). If even longer photoluminescence lifetimes need to be measured, lifetimes up to hundreds of nanoseconds can be measured with the triple measurement mode. To demonstrate this, an aqueous solution of a photoluminescent dye, [Ru(bpy) 2 ]Cl 2 (Sigma-Aldrich no. 224758), was measured. The measurement was made with a ref_clk_1 frequency of 100 MHz that produces 50 ns delays between triple measurement parts. One million laser pulses were shot at the sample and hit counts summed over all channels and the whole TDC range were 1.416 M, 1.236 M, and 1.074 Mhits. Exponential fit to these values gives a photoluminescence lifetime of 361.8 ns, which is very close to a literature value of 358 ns [36].

V. DISCUSSION AND CONCLUSION
The main parameters of CMOS SPAD line sensors that have been used for time-resolved Raman spectroscopy are compared in Table I. In addition to IRF results, the pulse widths of excitation sources used in IRF measurements are shown because of their major contribution to the overall IRF result. Estimated IRF values for electronics are also given, assuming that IRF 2 measured = IRF 2 excitation + IRF 2 electronics . In comparison to previous work, the presented line sensor has the least timing skew, the highest output data rate, and good performance in terms of power consumption, temporal resolution, DCR, and IRF. When Raman measurements are compared to recent studies, significantly better spectral quality was achieved for a water sample than in [9], and acquisition time could be reduced by 90% for a diamond sample when compared to [33]. To the best of the authors' knowledge, the Raman spectrum of roasted sesame oil measured with CMOS SPAD sensor and 532-nm excitation has not been presented in literature before.
A 256-channel CMOS SPAD line sensor for time-resolved Raman spectroscopy was designed and demonstrated. Stateof-the-art timing skew performance was achieved by means of parallel connected flash TDCs, and the efficiency of functionalities designed for Raman spectroscopy (triple measurement DCR compensation, TDC bin boundary adjustment) was proven in Raman measurements. Achieved faster measurements and improved spectral quality encourage the utilization of CMOS SPAD-based time-resolved Raman spectroscopy in more and more challenging applications.