A CMOS Lock-In Pixel Image Sensor With Multisimultaneous Gate for Time-Resolved Near-Infrared Spectroscopy

This article proposes a CMOS lock-in pixel image sensor aiming for time-resolved near-infrared spectroscopy (TR-NIRS). The pixel employs lateral electric field charge modulation (LEFM) with an eight-tap multisimultaneous gate structure and negative substrate bias. The optimization of pixel structure creates a high potential slope with no barrier to facilitate the high-speed photo-generated charge transfer required in the time-resolved application. The sensor employs a two-stage charge transfer architecture with pinned storage diodes (SDs). The effectiveness of the sensor is demonstrated through simulations and experimental measurements. A prototype sensor with 70 (V) <inline-formula> <tex-math notation="LaTeX">$\times110$ </tex-math></inline-formula> (H) effective pixels to characterize the multisimultaneous gate lock-in pixel is implemented in a 0.11-<inline-formula> <tex-math notation="LaTeX">$\mu \text{m}$ </tex-math></inline-formula> 1-poly-4-metal CMOS image sensor (CIS) technology. A fast intrinsic response of 240 ps is achieved using an 80-ps pulsewidth 780-nm laser diode by the characterization using two simultaneous gates and a single time window of the lock-in pixel. Performance evaluation using silicon phantoms and further measurement with a rat demonstrates the feasibility of the proposed sensor and setup for TR-NIRS applications.


I. INTRODUCTION
T HE continuous development of CMOS image sensors (CISs) has resulted in utilization beyond imaging. Sensing is another area where the information captured by the pixels yields numerous new applications, such as in the biomedical field [1], [2], [3], [4]. One of the potential applications is near-infrared spectroscopy (NIRS), which has long offered noninvasive monitoring of blood oxygenation in various fields by injecting a light signal into the media and collecting the diffused signal at a distance from the light source [5], [6], [7], [8]. Several major techniques have been established in NIRS, such as continuous-wave (CW-NIRS), phase-modulated, and time-resolved spectroscopy (TR-NIRS). While CW-NIRS is the most widely used method due to its less complexity and cost-effectiveness, TR-NIRS offers the greatest potential by possessing the richest information from the collected signal.
The vital requirement to perform time-resolved measurement is a fast detector. The classic TR-NIRS approach uses time-correlated single-photon counting (TCSPC) and a photomultiplier tube (PMT) for detection. The size, cost, and complexity disadvantages of such a TR-NIRS system limit the use to mostly research environments. Recent advancement in the TR-NIRS technique has seen the implementation of the solid-state approach of the detector using single-photon avalanche diode (SPAD) [3], [9], [10], [11], [12]. The CMOS SPAD can either function in TCSPC mode, which resembles a mini multichannel PMT with the TCSPC, or the time-gated mode. SPAD-based methods usually suffer from either having a limited area or ineffective gating. A pinned photodiode (PPD)-based CIS has also been developed to realize timeresolved sensing [13], [14], [15], [16]; however, the intrinsic response and sensitivity at the near-infrared region are not This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Fig. 1. (a) Layout of the eight-tap pixel structure. Two sets of three taps are connected to the LEFM gates G 1 and G 2 , while another two taps are connected to G D . The pixel is made using three n-type doping layers. (b) Bipolar LEFM gate to better create potential slope due to work function difference. sufficient for the TR-NIRS. Intrinsic response in the order of a few hundred picoseconds is desired to obtain meaningful information [17], [18].
In this article, we introduce a PPD-based CIS of highly time-resolved lock-in pixel with a multisimultaneous gate based on lateral electric field charge modulator (LEFM) [19], [20] in a TR-NIRS setup. Negative substrate bias is applied to the CIS to enhance charge transfer speed, especially for charges coming from the deep substrate of the PPD. For photon shot-noise limited signal, the multiple-tap architecture when operated concurrently achieves faster charge transfer time while maintaining the signal-to-noise ratio (SNR). The TR-NIRS CIS with 70 (V) × 110 (H) effective pixels demonstrates the capability to distinguish absorption coefficient through the measurements of custom-made phantoms and a rat, with the ultimate aim of developing a miniature wearable TR-NIRS device for real-time continuous monitoring of brain activity. The remainder of this article is structured as follows. Section II describes the pixel and chip architecture. Section III provides the implementation and measurement results. Finally, Section IV concludes this article.

II. PIXEL AND CHIP ARCHITECTURE A. Multisimultaneous Gate Lock-In Pixel Structure
The pixel is designed to optimize the transfer speed of photoelectrons to achieve the capability for the TR-NIRS application. Fig. 1(a) shows the layout of the eight-tap twogate-one-drain modulator pixel structure performing two-stage charge transfer for a true correlated double sampling (CDS) readout. The pixel consists of a PPD region, eight pairs of LEFM gates (G 1 , G 2 , and G D ), eight pairs of charge assist (CA) gates, and eight pinned storage diodes (SDs). Two sets of gates, each consisting of three taps, are connected to G 1 and G 2 , respectively. Another set of gates, which consists of two taps, are connected to G D . LEFM gates G 1 and G 2 are intended to perform the potential modulation for charge sampling, while G D is used for charge draining. The LEFM gates are constructed as a bipolar structure, having both n-type and p-type polysilicon gates, as depicted in Fig. 1(b) [21]. Due to the difference in work function, the bipolar gates create a large electric potential slope under the gate to attract electron transfer to the SD and a higher potential barrier at the gate edge, without the need for threshold voltage adjustment underneath the gate. A margin of 0.1 µm of both n-type and p-type doping from the edge of the poly is allocated to prevent the undesired effect of any misalignment.
Three n-type doping layers are used to create a high electric field slope toward the SD. The center of the pixel has a square region devoid of n-type doping, thus creating a tip of low potential, pulled by the applied substrate bias. The electric field increases toward the LEFM gates due to the surrounding n 1 layer, which covers all regions of the PD and SD, except at the center region of the pixel. n 2 regions extend from the opening of the pair of LEFM gates to the entire SD to control and accelerate charge flows. The third n-layer, n 3 , further strengthens the electric field and forms the storage well. The combination and overlapping of the nlayers together with the operation of the bipolar LEFM gates creates a potential barrier when the LEFM gates are turned off and without a potential barrier when the LEFM gates are turned on, which allows complete charge transfer from PPD to SD. The target pinning voltages are −0.5 and 2.3 V at the PPD and SD, respectively. A negative bias of −3 V is applied to the substrate through front contact to achieve full depletion and create an electric field in the vertical direction. Fig. 2 shows the potential profile in the vertical direction at the center of the pixel. Charges coming from the deep substrate of the PPD drift due to the potential gradient rather than moving through diffusion, resulting in higher charge transfer speed. At the depth of about 1 µm, the charges move toward the horizontal direction due to lateral drift. The value of −3 V is used as the further increase of the negative bias does not yield a significant improvement in instrument response function (IRF). The peripheral circuits are isolated by a deep-n-well to avoid the effect of the negative substrate bias. The bulk of the modulator at the surface of the active area is biased at −1.5 V, while the in-pixel transistor bulk is biased at 0 V. Fig. 3 shows the equivalent schematic of the proposed pixel. Virtual switches are used to depict the LEFM gates. The operation of the CA gates is inversed and delayed for about 10 ns than that of the TX gates. The CA gate is turned on to create a larger potential in the storage well during the accumulation phase, while in the readout phase, the CA gate is turned off to lower the potential in the storage well to assist in transferring charges to floating diffusion (FD). The high voltage of CA is 2.3 V, while the low voltage is −1 V. Transfer gates, TX, transfer charges from SD to FDs while using the CA gate for assisting the charge transfer. Each pair of adjacent SDs shares an FD, a source follower (SF) for readout, and a row-select transistor to optimize the pixel area. The reset levels are read out first after FDs are reset by RT1 and RT2, and then, the signal levels, which appear after charges are transferred from SD to FD by the operation of TX, are read out to perform the CDS. In the proposed design, instead of connecting the draining taps' SD3 and SD7 to the VDD, they are connected to the FDs to ensure uniformity of sensitivity on all signal taps. Charges are drained through the operation of TX3 and RT2. TX1 and TX3 were designed to be controlled individually to have the flexibility of having different modes of operation.
The proposed pixel design is simulated using the SPECTRA device simulator (Link Research Corporation). Fig. 4(a) shows the simulated 2-D potential plot when G 1 is turned on (2.3 V), G 2 and G D are turned off (−1.5 V), CA gates are turned on (2.3 V), and substrate bias of −3 V is applied. The three red dots of A, B, and C indicate the initial positions of electrons at the depth of 5 µm, and the black lines indicate the charge transfer path. The charges move toward the higher potential G 1 gates with a simulated transfer time of less than 400 ps, even when the initial position is at the edge of the PPD. The operation of a multisimultaneous gate effectively reduces the distance traveled by charges to reach SD, thus achieving faster charge transfer time, in contrast with eight discrete taps [16]. The simulated potential profile across the X -X ′ and Y -Y ′ directions in Fig. 4(a) is shown in Fig. 4(b). The X -X ′ potential profile plotted by a black line represents the condition when both gates (Tap 3 and Tap 7) are turned off. The potential barrier created by the gates prevents the charge to flow to the respective SD. The potential profile of Y -Y ′ plotted by a red line has one side (Tap 1) turned on, while the other side (Tap 5) is turned off. There is no potential barrier in the Y ′ -direction to allow complete charge transfer from the PPD to the SD. CAs are turned on (2.3 V) to form the storage well. Fig. 4(c) shows the potential profile across X -X ′ during the readout phase, where LEFM and CA gates are turned off, while TX gates are turned on. This operation creates a large potential gradient to effectively transfer charges from SD to FD.  inverter tree before arriving at the nonoverlap signal generator as CK_IN, further generating CKP and CKN. The variable delay between CKN and CKP driving a pMOS and an nMOS separately is set to 220 ps. There is a nonoverlap signal generator for each gate in each column, preventing the mismatch issue between the two clocks. The nonoverlap operation prevents short-circuit current from VH_TG to VL_TG. First, CKN goes low before CKP turns on the pMOS column driver to drive the LEFM gate high. Next, CKP transitions from low to high followed by CKN to turn off the LEFM gate with the help of the in-pixel nMOS driver. VH_TG, VL_TG, and VSUB are the gate high voltage, gate low voltage, and substrate bias voltage, respectively. Decoupling capacitors C 1 and C 2 stabilize the gate voltage and create a local current loop at the column level, reducing the time constant of the LEFM gate operation.

B. Pixel Driver and Chip Architecture
The size of each pixel is 22.4 × 22.4 µm. Each column of the pixel has four outputs connecting to four column-parallel analog-to-digital converter (ADC) with a pitch of 5.6 µm. A 17-bit folding integration/cyclic ADC is employed to provide low-noise readout operation [22]. Other major blocks include the pixel driver with an inverter tree, logic block for data processing, vertical scanner for row selection, horizontal scanner for column selection, and a four-channel low-voltage differential signaling (LVDS) as the sensor interface. Gapless microlens with the size of the pixel is used in the pixel array.

III. IMPLEMENTATION AND MEASUREMENT RESULTS
As discussed in Section II, one SF is shared by two adjacent SDs, which require two times for readout operations. To assess the sensor's capability of achieving high intrinsic response, this characterization uses a single readout and single time window configuration. To do this, gates G 2 and G D are modulated, while G 1 is always OFF. The transfer gate TX 2 is selected to read out the signal from G 2 , which is the sum of the signals from SD2 and SD8.

A. Basic Characteristics
The CIS has been implemented using 0.11-µm 1-poly-4metal CIS technology with a 20-µm-thick p-epitaxial wafer. The epilayer selection is aimed to increase the sensitivity of near-infrared light, which penetrates deeper into the silicon. A low resistivity epilayer is employed, and the reverse bias of −3 V can achieve full depletion, thus realizing high-speed charge transfer. The sensor has a pixel array of 70 (V) ×  Table I shows the summary of the sensor specification and performance. The measurements are performed with the chip assembled in a package with a Peltier device to provide cooling from room temperature to about 15 • C.

B. Principles of NIRS Measurement
In a TR-NIRS measurement, by irradiating a short pulse laser to the target medium, the temporal response of the reflected light scattered through the medium can be observed by using a highly time-resolved detector. The reflectance, R, as a function of source-detector separation (ρ) and time (t) employing the diffusion theory, is given by the following equation [17]: where µ a is the absorption coefficient, D is the diffusion coefficient, c is the velocity of light, and Z 0 is the depth, where the incident photons initially scattered. By plotting ln{R(ρ,t)} against time, the absorption coefficient can be obtained from the asymptotic slope of the curve as follows: The time-resolved signal observed from the sensor is a convolution of the reflectance with the IRF. Fig. 6 shows the normalized simulation result after convoluting the theoretical reflectance of different µ a values with different IRFs at ρ = 30 mm. The faster IRF resulted in a larger slope difference between different µ a values. This contributes to better accuracy of estimating µ a from the TR-NIRS measurement. Fig. 7 shows the TR-NIRS sensor operation timing diagram. All control signals are generated by a field-programmable gate array (FPGA) chip. Each frame consists of accumulation and readout cycles, operating in global exposure mode. Throughout the accumulation cycle, reflectance signals from the sample are accumulated repeatedly N times to achieve sufficiently  high SNR. CK_GD is complementary to CK_G2, which has a duration of 6.6 ns, to drain out charges outside of the time window. During the readout cycle, the charges accumulated in the SDs are digitized. The trigger signal to drive the light pulse is synchronized and shifted in every frame to scan the full response of the reflectance, with a shifting step of 104 ps. The shifting is performed using the phase-locked loop (PLL) of the FPGA. One set of phase shifting is configured to be 160 frames and repeated multiple times to obtain the average. Phase shifting of light pulse using PLL from FPGA is more efficient than using a discrete delay controller [14], [15] and consequently realizing a smaller system. In this demonstration, 160 frames are captured to show the overall response of the time window. For the real NIRS application, only the falling edge is necessary to obtain the absorption coefficient, thus reducing the frame number required and the measurement time. Fig. 8(a) depicts the experimental setup for reflectance measurement. The system includes the custom-developed TR-NIRS camera with the proposed imager and onboard FPGA for signal generation and light trigger signal shifting, a short pulse laser diode module, and a personal computer. Fig. 8(b) shows the measurement setup for obtaining the IRF. The light pulse goes through an neutral-density (ND) filter, a diffuser, and a reflector before being captured by the sensor. A short pulsewidth laser source is scanned across the time window, and the signals of the sensor response are integrated. To obtain the pure sensor response, the successive sensor outputs are differentiated. The response obtained by the sensor is the convolution of the short pulse laser response, sensor intrinsic response, and the LEFM gate response. Considering the laser pulse as a narrow impulse response function, the resulting sensor response is largely corresponding to the sensor intrinsic and LEFM gate responses, which are exponential [23].

D. Instrument Response Function
The response function that includes all the responses other than that of the sample is called an IRF. Since the measured time-resolved NIRS signal is the convolution between the IRF and reflectance signal from the sample, the IRF is an important characteristic in the measurement of TR-NIRS. Once the IRF is known, the absolute µ a can be estimated by finding the best fit of the convolution of the IRF and theoretical reflectance to the sample measurement response. A faster intrinsic response of the sensor increases the resolution and accuracy of the detected µ a . Fig. 9 is the plot of the measured IRF. A shortpulse 780-nm laser diode (LDB-100, Tama Electric Inc.) with a pulsewidth of 80 ps is used for the measurements. The responses of all pixels in the region of interest (ROI) of 60 × 110 pixels are averaged. Each point in the result is the average of 11 sets of measurements. A plot of selected 100 pixels with faster responses is also shown. The plot is the outcome after differentiating the sensor response, achieving a time constant of 305 ps for the ROI and 240 ps for the selected pixels. The measured IRF of G 1 from the same ROI achieves a similar time constant of 316 ps. Only G 2 is used for the subsequent measurements. A large active area is favorable to enhance the scattered signal collections in diffuse optical imaging, as the photons cannot be focused on a small area. In this implementation, the large active area is realized by the pixel array of 60 × 110 pixels, which is equivalent to 3.808 mm 2 , comparable to PMTs for diffuse optics and SPADs [9], [12].

E. Performance Evaluation Using Phantoms
Experimental evaluation is performed to attain the absorption coefficient of two solid phantoms, which are custom-made by mixing titanium dioxide, ink, polymerization agent, and silicone elastomer. The phantoms with different absorption coefficients are labeled as P1 and P2. Both phantoms are first measured using time-resolved spectroscopy (TRS) system   (TRS-20H, Hamamatsu Photonics K.K.) [24] at the wavelength of 797 nm, the closest wavelength available to our experimental setup of 780 nm. The reference values of absorption coefficients obtained for P1 and P2 are 0.0107 and 0.0273 mm −1 , respectively.
For the measurement using our custom-developed system, as shown in Fig. 8(c), source-detector separation is fixed at 30 mm, which is commonly used for human brain measurements [6]. Each set of measurements consisting of 160 frames requires 2.5 s of accumulation time. Eleven sets were performed and the average value is taken for processing. Fig. 10 shows the time-gated output obtained from the measurements of intrinsic response and the two solid phantoms, P1 and P2. A distinct slope difference can be observed from the obtained response. The noise level of P2 is the highest due to the higher absorption coefficient, resulting in lower photon detection and low SNR. The absorption coefficients of the phantoms are retrieved using the least-squares method. The measured data from 90% of the rising edge to 10% of the falling edge were selected for fitting with the convoluted response. The retrieved absorption coefficient is plotted against the reference values, as shown in Fig. 11. Error bars depict the standard deviations of five repetitions.

F. Rat Measurement
Measurement on a real biological sample was performed on a rat to verify the sensor's ability in detecting hemodynamic responses. The light source-detector separation is set to 15 mm considering the size of the rat's brain. Other measurement setups are identical to the phantom measurement. A male Sprague-Dawley rat weighing approximately 300 g purchased from SLC Inc. (Hamamatsu, Japan) was used. All study experiments were performed according to the guidelines for the care and use of animals established by the Physiological Society of Japan. The experimental protocol was approved by the Committee on Ethics of Animal Experimentation of the Hamamatsu University School of Medicine (No. 2019026).
The rat was continuously anesthetized with 1.5% isoflurane and placed in a stereotaxic apparatus. The rectal temperature was maintained at 37 • C with a heating pad. In the first measurement, breathing tubes delivering 100% oxygen (1.5% isoflurane in 100% O 2 ) were supplied to the rat. Next, 95% oxygen plus 5% of carbon dioxide (1.5% of isoflurane in 95% O 2 + 5% CO 2 ) was supplied to the rat for the second measurement, which aimed to observe the increased cerebral blood flow. The results are shown in Fig. 12, where the first measurement is labeled as "Control" and the second measurement is labeled as "CO 2 ." Fig. 12(a) shows that the response of the measurement of increased cerebral blood blow condition is steeper compared to the control condition. This is due to the increase in absorption coefficient with the increase in cerebral blood flow [25]. From the response, the reflectance of both conditions is deconvoluted and shown in Fig. 12(b). The obtained absorption coefficient values for "CO 2 " and "Control" are 0.0331 and 0.0271 mm −1 , respectively.

IV. CONCLUSION
This article proposes a CMOS lock-in pixel multisimultaneous gate image sensor intended for the TR-NIRS system. The optimization of the pixel structure creates a high potential slope with no barrier to achieve a fast-intrinsic response. Characterization using the two-tap simultaneous gate demonstrates the IRF of 240 ps measured by a laser diode operating at 780 nm. The fast IRF enables the detection of small slope changes at the falling edge of the measurements. Absorption coefficients from two solid phantoms and a rat experiment are successfully retrieved as experimental validation. The developed CIS-based TR-NIRS system aims to be the initial step toward compact, wearable TR-NIRS applications.