A 12-Bit Column-Parallel Two-Step Single-Slope ADC With a Foreground Calibration for CMOS Image Sensors

This paper proposes a novel 12-bit column-parallel two-step single-slope (SS) analog-to-digital converter (ADC) for high-speed CMOS image sensors. Cooperating with the output offset storage (OOS) technique, a new correlated double sampling (CDS) is adopted to reduce the non-uniformity in column-level ADCs. In the proposed structure, the decision point of the comparator is independent of the input signal. The variation of the comparator offset caused by the input level is eliminated. Through a foreground calibration, the non-idealities from the ramp generator and the column ADC are both corrected. Design and simulation in a 130nm CMOS process, the proposed ADC achieves the differential nonlinearity (DNL) of +0.76/−0.8 LSB and the integral nonlinearity (INL) of +1.06/−0.84 LSB at a sampling frequency of 100 KS/s with the calibration. The effective number of bits (ENOB) is also improved from 4.66 bits to 11.25 bits. The single ADC occupies an active area of <inline-formula> <tex-math notation="LaTeX">$7.5\times 775\,\,\mu \text{m}^{2}$ </tex-math></inline-formula> and the power consumption is <inline-formula> <tex-math notation="LaTeX">$72~\mu \text{W}$ </tex-math></inline-formula>.


I. INTRODUCTION
CMOS Image Sensor (CIS) is an important component in the visual system, which has been widely used in digital single-lens reflex (DSLR) cameras, digital camcorders, and medical equipment. With the development of digital imager technology, the demands for CMOS imagers with the highresolution format, high frame rate, and high ADC resolution have grown rapidly. This necessarily increases the conversion speed of each column ADC. Therefore, multiple types of column-parallel ADC architectures have been employed to increase the sampling rate, such as successive approximate register (SAR) ADCs [1]- [4], cyclic ADCs [5], [6], and single-slope (SS) ADCs [7]- [11].
Although SAR ADCs have been utilized in various highspeed image sensors, they occupy a large silicon area due to the DAC capacitor array. Cyclic ADCs only need less silicon area while keeping the comparable speed to SAR ADCs. However, they consume more power owing to the high-gain and high-speed operational amplifier. SS ADCs The associate editor coordinating the review of this manuscript and approving it for publication was Dušan Grujić . are mostly applied in CIS because of their simplicity, low power dissipation, high linearity, and small area. However, the conversion speed of the SS ADCs is slow. A T -bit SS ADC requires 2 T clock cycles, which is much longer than T clock cycles required by both SAR ADC and cyclic ADC. Although several high-speed SS ADCs have been reported in [9]- [11], they use extremely high clock frequency which results in high power consumption.
To overcome the low-speed problem, multiple two-step (TS) SS ADCs have been proposed [12]- [21]. A T -bit TS SS ADCs divide the A/D conversion process into M -bit coarse conversion and N -bit fine conversion where T = M + N . Compared with the standard SS ADC, the conversion time of the TS-SS ADC is reduced to 2 M + 2 N clock steps, which greatly improves the conversion speed. With the usage of a holding capacitor [13], the final analog coarse voltage can be stored. However, any parasitic capacitors existing in the holding capacitor will distort the fine ramp slope. Through a fourinput comparator [14], the conversion paths for the coarse ramp and the fine ramp are separated. Although it prevents the deterioration of the fine ramp slope, the ADC linearity will be worsened by the signal-dependent charge injection. By connecting each reference voltage of the coarse ramp to the comparator input nodes [16], this architecture realizes the two-step conversion without any memory capacitors. But, the input paths of the comparator grow exponentially with the resolution of the coarse conversion [17]. The routing will become necessarily complicated and more switch errors will be introduced. Although the TS-SS ADCs are implemented in different circuit topologies, the signal-dependent comparator offset will deteriorate the linearity of the ADC in the above designs. In this work, a novel 12-bit two-step SS ADC has been proposed. It eliminates the dynamic comparator offset with the merging of the sampling and the conversion paths. Through a foreground calibration, the proposed TS-SS ADC achieves high-speed conversion while ensuring excellent linearity. The outline of this paper is as follows. In section II, the architecture and operating principle of the proposed TS-SS ADC are described. The non-ideal factors limiting the system linearity are analyzed in section III. Section IV and Section V give the foreground calibration method and simulation results, respectively. Finally, the conclusions are drawn in section VI.

II. THE PROPOSED TWO-STEP SS ADC
A. ARCHITECTURE OF THE PROPOSED TWO-STEP ADC Fig. 1(a) shows the simplified schematic of the 4-transistors (4-T) active pixel sensor (APS) and the proposed TS-SS ADC. The proposed column ADC consists of a multistage comparator, a sampling capacitor (C S ), a holding capacitor (C H ), a set of switches, delay control logic, and data memory.
The resistor DAC-based ramp generator, which is shared by all column ADCs, is composed of two parts: a coarse ramp V RC for M -bit coarse conversion and a fine ramp V RF for N -bit fine conversion. The total bit (T -bit) resolution of the proposed ADC is T = M + N . The C S is utilized to sample the reset voltage and the signal voltage, which are both supplied by 4-T APS. The C H is connected in series with the ramp generator. The top plate of the C H is applied to store the final determined charge in the coarse conversion. This charge represents the coarse conversion residue. Moreover, the bottom plate of the C H is driven by the fine ramp to realize the fine conversion. Through the delay control logic, the decision error that occurs in the coarse conversion can be relaxed. The conversion results are latched by the column memory. V CM 1 and V CM 2 are the common-mode voltages of the PreAmp1 and the PreAmp2, where V CM 1 = 1.6 V and To remove the pixel offsets, a multistage comparator topology with output offset storage (OOS) [22] technique has been used to conduct the CDS operation. Due to the single-ended structure, the input swing of the preAmp1 needs to cover the range of the analog input signal for correct conversion. Hence, in the preAmp1, the PMOS transistors are used for input pairs and the supply voltage is 3.3 V. To prevent the output of the preAmp1 from being saturated, the gain of the preAmp1 is set to 1. Thanks to the direct-current (DC) split of the offset storage capacitor C D , other pre-amplifiers can be designed with 1.2V supplies to reduce the power consumption. From the preAmp2 to preAmp4, the NMOS input pairs are used to enhance the gain of each amplifier, which are 17 dB, 17dB, and 62dB, respectively. Since the gain of the preAmp1 is only 1, the noise contribution of the PreAmp1 and the PreAmp2 both need to be taken into account. Thanks to the OOS technique(or auto-zeroing), the input-referred noise of the preAmp1 is shaped by a comb filter [23]. The low frequency noise component is suppressed. The simulation shows that the input-referred noises of the preAmp1 and the preAmp2 are 49 µV rms and 36 µV rms , respectively. The simplified schematic of the multistage comparator is shown in Fig.1(b).

B. OPERATION OF THE PROPOSED TWO-STEP SS ADC
With the simplified schematic shown in Fig. 1(a), the main operation of the proposed column ADC is performed in four phases: reset sampling, signal sampling, coarse A/D conversion, and fine A/D conversion. The complete timing diagram of the proposed two-step SS ADC is shown in Fig.2(a).
Reset Sampling. First, with switches RX, S S , S H , S R , and S A close, a reset voltage V reset from the pixel is sampled by C S . Then, when the switch RX is turned off, the error voltage V E from RX, which includes the charge injection and the clock feedthrough, is also sampled by C S after a certain time delay. Next, by sequentially turning off S R and S S , the signal-dependent switch errors caused by the sampling switch S S are both eliminated. Finally, through the coarse ramp switch S C , the maximum voltage V CT of the coarse ramp is loaded into the left plate of the C S (or voltage node P 2 ), as shown in Fig. 1(a). The resulting positive input voltage of the comparator V P1 is where P 1 is the input node of the comparator. With the OOS technology, the input-referred offset V offset1 of the preAmp1 and the differential input voltage (V P1 − V CM 1 ) are both amplified to the output coupling capacitor C D . The right plate charge of the C D is calculated from When the S A is switched off, the charge stored on the C D remains constant. Signal Sampling. With the TX closed, the photodiode (PD) charges are generated by the incident light. This causes the pixel output V signal to be where V S is the actual signal that corresponds to the incident light intensity. Then, S C is open and S R , S S are closed. The signal voltage V signal is sampled by C S . Next, similar to the reset sampling phase, the S R is turned off before the S S . Coarse A/D conversion. With the closure of the S C , the left plate of the C S is driven by the coarse ramp V RC . Considering the stored charge on the C D , the equivalent voltage at the input node P 1 is calculated from The input offset of the preAmp1, the reset signal from the pixel, and the charge errors from the RX are all canceled. Hence, with the output coupling capacitor C D , the CDS operation based OOS is realized. The essence of the conversion is to determine the sign of V P1 − V CM 1 . During the coarse A/D conversion, V RC starts in synchrony with the coarse counter and sweeps from the comparator output V O will be changed to the logic low when V CT − V RC becomes (m + 1) V RC , as shown in Fig. 2(b). V RC is the minimum conversion voltage of the coarse ramp, which is V FS / 2 M . Then, the upper M -bit memory stores the coarse counter value as the coarse A/D result D M and S H is turned off. At this moment, the bottom plate charge of the C H (node P 3 ) is (6) VOLUME 8, 2020 When V RC drops to V CT − V FS , the coarse conversion is over and the S C is open. Fine A/D conversion. When S F is closed, the fine ramp signal V RF is coupled to the positive input of the comparator through the C H . According to the charge conservation, the equivalent voltage at the input node P 1 is obtained from where the variation of the resulting coupling voltage V P1 is V RC , as illustrated in the gray line of Fig. 2(b). The step of the fine ramp is V RF , which is V RC / 2 N . When V P1 drops below the V CM 1 , the comparator output V O is changed to the logic low again and the lower N -bit memory stores the fine counter value as the fine A/D result D N . Therefore, the final digital output D OUT is calculated from:

III. ACCURACY CONSIDERATION A. THE NONLINEARITY OF THE RAMP GENERATOR
Generally, the performance of each column ADC is directly determined by the linearity of the ramp generator. To guarantee the monotonicity of the proposed ADC [12], the resistorstring DAC (RDAC) architecture is adopted for the ramp generator. However, for high-resolution applications (beyond 10-bit), the switch components of the voltage selector grow exponentially with the number of bits. This will result in larger RC output delay, unacceptable silicon area, and complex metal routing. Hence, with two cascaded resistor strings, the resistor-resistor-string DAC (RRDAC) [24] is utilized in this work and illustrated in Fig. 3. Through the reference buffers (OP 5 and OP 6 ), the fine resistor ladder is connected to a unit coarse resistor. Due to the isolation of the operational amplifiers (op-amps), the effective resistance of the unit coarse resistor is not reduced by the fine ladder. Similarly, since the input nodes of the output buffers OP 3 and OP 4 both have high input impedance, the on-resistance of the switches in the voltage selector also cannot reduce the effective resistance of the unit coarse resistor. The mismatch of the switches hardly deteriorates the linearity of the ramp generator. Considering that the voltage of the ramp generator drops monotonically, only one pair of switches is switched in each conversion period. One switch is on and the other is off. Thus, the clock feedthrough errors will be neutralized and disappear rapidly [25], [26]. The major error sources in the RRDAC are resistor mismatch and amplifier offsets.
To investigate the effect of resistor mismatch on the ADC performance, Fig. 4 shows the simulated averaged ENOB and SFDR as a function of mismatch deviation σ ( R/R). As can be seen, the linearity of the column ADC is deteriorated severely as the resistor mismatch increases. When the mismatch deviation remains at 1%, the ENOB and SFDR are only 9.42 bits and 62.27 dB, respectively.  Actually, the input-referred offsets of the op-amps will further degrade the ADC performance. In the fine conversion, the bottom plate of the C H is switched from V ref to V RF . Due to the offset difference V diff between OP 4 and OP 7 , the effective fine ramp will shift vertically, as shown in Fig. 5(a). The over-ranged ramp will cause a dead-band in the final digital output. Then, considering the offsets in OP 5 and OP 6 , not only the conversion range of the effective fine ramp is changed but the ramp slope is also deteriorated. As shown in Fig. 5(b), the shadows represent the probable range of the real effective fine ramp caused by the offsets in OP 5 and OP 6 .

B. THE NONLINEARITY OF THE COLUMN ADC
In the column ADC, non-idealities such as charge errors of MOS switches, parasitic capacitors, and comparator offsets can also make the proposed structure malfunctioned. After the reset sampling phase, taking into account these non-ideal factors, the resulting positive input voltage V P1 in (1) is rewritten with where V RC,offset is the output offset of the coarse ramp, Q SR is the charge error when S R is open, C Cp1 is a parasitic capacitor existing at node P 1 , as shown in Fig. 1(a). Similarly, when the signal sampling is over and the coarse conversion began, the equivalent voltage V P1 in (4) is re-calculated from where V offsetr is the residual input-referred offset contributed by preAmp2-4. Q SA is the mismatch in charge injection from two switches S A . Thanks to the CDS, the output offset V RC,offset of the coarse ramp and the charge error Q SR are both canceled. When the comparator output V O is changed to the logic low, S H is switched off immediately and the charge error Q SH injects into the bottom plate of the C H . After V RC drops to V CT − V FS , the S C is open and the charge error Q SC flows into the node P 2 . During the fine A/D conversion, the equivalent voltage V P1 in (7) is re-calculated from where C Cp2 and C Cp3 are parasitic capacitors existing at nodes P 2 and P 3 , respectively. V E,SW is the switch error caused by S C and S H . The parasitic capacitors will cause a severe ramp error, which degrades the linearity of the proposed ADC seen from (11). The effective coarse ramp voltage used for fine conversion is no longer the original coarse ramp V RC but the f (V RC ) in (11), as illustrated in the blue line of Fig. 6. The variation in the coarse ramp slope will cause a huge voltage gap such as V D , which is far beyond the conversion range of the fine ramp. For example, when V RC drops to V H 2 and S H is open at the time of T 2 , as shown in Fig. 6, the resulting voltage at the node P 1 is V H instead of V H 2 . To eliminate this voltage gap, the switch S H needs to be turned off in advance at the time of T 1 . However, for a causal system, it cannot be implemented physically. If V RC drops to V CT −V FS , V RC will be reset to V CT subsequently. After that, S C will be open. The resulting effective coarse ramp is changed to VOLUME 8, 2020 which is the red line shown in Fig. 6. The calibrated coarse ramp is the vertical shift of the real ramp. To ensure that the voltage used in the fine conversion is V H 2 , the opening time of the S H is simply delayed by T D . Therefore, by constructing a proper delay chain and expanding the range of coarse ramp signal (V Re ), this coarse ramp error can be alleviated. However, there is also a serious slope difference between the effective coarse ramp and the effective fine ramp. The slope of the effective coarse ramp and the effective fine ramp are denoted as βγ and β, respectively, as shown in (11).
Since the input common-mode voltage V CM 1 is kept at the same level during the whole conversion process, the decision point of the comparator is independent of the input signal. The variation of comparator offset caused by the input level can be canceled [27], [28]. The residual input-referred offset V offsetr is viewed as a constant static voltage. Since switches S C , S H , and S A are all connected to fixed voltages, the resulting charge errors Q SC , Q SH , and Q SA are also regarded as static errors when these switches are turned off. Hence, the above static errors from the column ADC can be corrected by a foreground calibration, as depicted in section IV. Since the residual comparator offset V offsetr and charge error Q SA exist in the whole conversion, they cannot introduce the extra conversion dead-band. However, charge errors Q SC , Q SH only exist in the fine conversion, they also cause the vertical shift of the effective fine ramp, as shown in Fig.5(a). Thus, the deadband in digital output is also generated.

IV. FOREGROUND CALIBRATION
In this work, the conversion process has been re-divided into 5-bit coarse conversion and 8.5-bit fine conversion. The conversion range of the fine ramp is extended by ±0.75 C to cover the decision error that occurs at the coarse conversion. Similarly, to calibrate the slope degradation of the coarse ramp caused by parasitic capacitors, the range of the original coarse ramp is also slightly extended analyzed in section III(B). The conversion steps of coarse and fine conversion are 41 and 320, respectively. The clock period of the coarse conversion needs to be extended to ensure that the coarse ramp can be established within 0.5 LSB during a coarse conversion cycle. The clock periods of the coarse conversion and fine conversion are 60 ns and 20 ns, respectively. The main clock frequency of the proposed ADC is 50 MHz. Fig. 7 shows a complete schematic of the proposed ramp generator. A 5-bit redundant R-string with the unit resistor R U1 of 190 constitutes the coarse ramp generator, and the fine ramp generator is composed of an 8.5-bit R-string with the unit resistor R U2 of 10 .
Considering the sources of the non-ideal factors (from the ramp generator and the column ADC), the foreground calibration can be performed in two steps: 1) the coarse correction for the input-referred offsets of the op-amps in the ramp generator, and 2) the fine correction for the ramp slope degradation and the resistor mismatch. In the coarse calibration, by adjusting the transconductance of the input transistors according to 6-bit digital codes [29], the offsets of op-amps  OP 4−6 in the ramp generator are alleviated. Fig. 8 shows a simplified schematic diagram of the op-amp used in OP 4−6 . Using the concept that the fine ramp signal is selected as a reference element to represent the coarse ramp [30], the fine calibration is performed by the ramp generator itself and only requires little digital logic.

A. THE COARSE CALIBRATION
Considering the sources of the ramp offsets, as elaborated in section III (A), the coarse calibration can be divided into two major steps: one is used to correct the offsets of OP 5 and OP 6 , the other is used to compensate the offset difference between OP 4 and OP 7 . The proposed coarse calibration flow chart is presented in Fig. 9.
Step 1. Ideally, the fine quantization code of the coarse step voltage V RC is '10000000' (or decimal number of '128'). By changing the input transconductance of the OP 5−6 , the conversion range of the fine ramp is gradually adjusted. Then, the fine ramp is used to measure and express the step voltage V RC of the coarse ramp. If the actual fine quantization code of V RC is close to the ideal value, the fine ramp will be corrected and the input offsets of OP 5−6 will be also canceled.
Firstly, to ensure that the conversion range of the fine ramp is −0.75∼1.75 V RC and avoid the interference from the input offset of OP 4 , the bottom plate of the holding capacitor C H needs to be connected to the fine ramp voltage V RF,224 instead of V ref . V RF,i is the fine ramp output voltage when V F < i > is selected as the fine resistor ladder voltage. OP 5 and OP 6 are connected to V C < 14 >, V C < 16 > and V C < 16 >, V C < 18 >, respectively. It makes the common-mode voltages of OP 5−6 close to V C < 16 >, which is half of the input signal. The calibration range (4 V RC , V C < 14 > −V C < 18 >) is much larger than the actual conversion range of the fine ramp (2.5 V RC ) to correct the offsets of OP 5−6 .
Then, through the S C , the information of the coarse ramp voltage V RC,0 is sampled by the C S . Similar to the fine ramp voltage V RF,i , V RC,i is the coarse ramp output voltage when V C < i > is selected as the coarse resistor ladder voltage. In the coarse conversion, the coarse ramp voltage remains at V RC,0 . Since the V RC,0 is treated as the maximum voltage V CT of the coarse ramp, this effective coarse voltage is not affected by parasitic capacitors when the switches S F and S C are turned off in sequence. This phenomenon is analyzed in (12). After the fine conversion, the quantization code D F1 is obtained. Considering that the auto-zeroing is used to cancel the offset of preAmp1 in each conversion, D F1 indicates the residual comparator offset V offsetr and switch errors. Similarly, with the sampling of the V RC,−1 and the loading of the V RC,0 , the fine conversion code D F2 is generated. The code difference between D F1 and D F2 marked as D is the conversion code of the actual coarse step V RC . Since the initial values of the D CAL1 and D CAL2 are '111111' and '000000', respectively, the range of the fine ramp is far beyond 2.5 V RC . Therefore, the code gap D will be less than 128, which is the full scale of the ideal fine ramp. Controlled by the counter value, D CAL1 minus '1' when the counter is odd or D CAL2 plus '1' when the counter is even. The adjustment of the fine ramp range is realized. The calibration process is repeated in this fashion until D>127, as shown in Fig.9 (a), and the offsets of OP 5−6 are both corrected.
Step2. If the sampling signal and the loading signal are both V CT , the charge stored on C S and C H represents the residual comparator offset V offsetr and switch errors at the end of the coarse conversion, as analyzed in Step1. Next, when the bottom plate of C H is switched from V RF,224 to V ref , the switch errors from S F , S C and the voltage difference between V RF,224 and V ref are all introduced into the stored charge. The offset difference between OP 4 and OP 7 directly determines the voltage difference between V RF,224 and V ref .
By changing the input transconductance of OP 4 until the output of the comparator is flipped, these static errors will be mutually neutralized.  level. With the decrease of the D CAL0 , the V ref gradually approaches V RF,224 . When the V O is changed to low, the offset difference between OP 4 and OP 7 is canceled, as shown in Fig. 9(b).

B. THE FINE CALIBRATION
Due to the limited correction accuracy (6-bit embedded DAC), the ramp offsets cannot be completely removed. The residual offset of the ramp generator needs to be re-corrected in the fine calibration along with the resistor mismatch, parasitic capacitors, and the residual comparator offset V offsetr . First, to compensate for the degradation of the original coarse ramp caused by parasitic capacitors, a delay chain needs to be constructed. Because the resistor variation in the fine ramp generator contributes insignificant errors to the conversion accuracy [30], the weights of the coarse ramp can be sensed and digitized by the fine ramp in the next step. The proposed fine calibration flow chart is presented in Fig. 10, and the operation is as follows: Step1. The essence of the delay chain is to use the degraded effective coarse ramp to approach the original coarse ramp until the voltage difference between these two is controlled within 0.5 V RC . With the decline of the original coarse ramp, the voltage gap between the effective ramp and the original ramp will gradually increase, as shown in Fig. 6. This will cause the delay time corresponding to each original coarse ramp voltage to grow step by step.
First, the maximum delay time TD MAX needs to be reset to 0. To obtain the delay information of any original coarse ramp voltage V RC,i , this voltage should be sampled by the sampling capacitor C S . As the loading coarse ramp is gradually reduced in the coarse conversion (or the sequence number of the coarse ramp j is increased), the voltage gap between the effective ramp and the original ramp will be eventually controlled within 0.5 V RC . The absolute value of the effective fine conversion code |DE F -224| is less than 64 at this moment. Then, the conversion code DE F of the ramp voltage V RC,i is stored as DE F,i in the memory and the delay information TD is the sequence number difference between the loading ramp j and the sampling ramp i. If TD exceeds TD MAX , TD MAX will be assigned by TD and the sequence number of the sampling ramp i will be stored in the delay latch array. The calibration process is repeated until the delay information of all coarse ramp voltages is obtained (except the redundancy), as shown in Fig.10 (a). The delay control circuit consists of the delay chain and the delay latch array, as shown in Fig. 11. By comparing with the codes stored in the delay latch array, the location of the coarse conversion code is determined and the corresponding delay time can also be confirmed.
Step2. Ideally, with the sampling of the V RC,i+1 and the loading of the V RC,i , the weight information of the i-th resistor  in the coarse ladder is quantified by the fine ramp. However, considering the parasitic capacitors, the actual loaded coarse voltage is V RC,i+TD , where the delay time TD is decided in the delay chain according to the sequence number i. After the fine conversion, the weight of the i-th resistor WR i is the difference between the two fine conversion codes, DW Fi and DE Fi , as depicted in Fig. 10(b). DW Fi includes not only the weight information of the unit resistor, but also the residual conversion error of the delay chain and the non-idealities of the circuit. DE Fi only contains the error information mentioned in DW Fi . The weight W i of the ramp voltage V RC,i is the sum of the prior voltage weight W i−1 and the resistor weight WR i . W 0 indicates the residual comparator offset and static switches errors, as depicted in section III (B).
When the reset sampling, the signal sampling, and the coarse conversion are sequentially completed during the normal conversion, the coarse conversion code DC is converted. Then, through the delay chain, the coarse ramp voltage V RC,DC+TD is loaded in the top plate of the C H , as shown in Fig. 10(c). After the fine conversion, the final digital output is where W DC is calibrated weight of the coarse ramp voltage V RC,DC , DF is the fine conversion code. The proposed foreground calibration realizes the correction of non-linearity errors from the ramp generator and the column ADC.

A. BEHAVIOR SIMULATION
A behavioral model of the proposed ADC is created in Matlab, and the simulation accounts for the resistor mismatch, the parasitic capacitors, the charge injection and feedthrough of the non-ideal MOS switches, the offsets of the reference op-amps, and the comparator offset and noise.
In the 130 nm CMOS process, the resistor variation σ ( R/R) is 1%. Due to the usage of the metal-oxide-metal (MOM) capacitor, the percentages of top-and bottom-plate parasitic capacitances of the corresponding capacitors are both 3%. From the post-layout extraction, the parasitic capacitance caused by input pairs of the comparator is 25 fF. The standard deviations of the comparator offset and reference opamps offsets are all 7 mV. A total of 1000-runs Monte-Carlo simulation is done to estimate the nonlinearity performance, as shown in Fig. 12. Since each calibrated weight of the VOLUME 8, 2020 coarse ramp is the accumulation of the resistor weights, the quantization error in each resistor weight will slightly degrade the linearity of the coarse ramp [31]. With the foreground manner, the non-ideal errors of each ADC in the CIS can be corrected in sequence. It allows all column ADCs to share a global calibration engine, which greatly improves the area efficiency of the single ADC. For an ADC array with 1024 columns, the coarse calibration for the ramp offsets and the construction of the delay chain are both based on the first column. The delay latch array in Fig. 11 can also be shared by multiple columns to reduce the silicon area. Then, with a unified delay chain, the voltage information of the coarse ramp can be quantified in each ADC. Considering the size variation of MOS switches and the mismatch of comparator offsets, the static error weight W 0 in each ADC needs to be measured and stored. However, thanks to the different parasitic capacitors, the coarse ramp presents the different effective coarse weight in each ADC. For any original coarse ramp voltage V RC,i , there are 1024 corrected results due to 1024-column ADCs. It is impractical to store these results in memory, which causes a huge area dissipation. Hence, the column-to-column mismatch will be mitigated by an averaged operation. The fine calibration in each ADC is performed column by column, from ADC 1 to ADC 1024 . The effective resistor weight WR i with the same sequence number i in different columns are accumulated until the correction of the last column is completed. Then, with the averaging operation, the unified weight information of resistors has been generated. Next, the weights of coarse ramp voltages are Similarly, the behavioral model of the ADCs array is also created in Matlab. The capacitances of the C H and the C S are implemented with 400 fF and 1 pF in the 130 nm CMOS process. The larger plate of the capacitor will cause a smaller capacitor mismatch σ ( C/C) of only 0.1%. The size variation of MOS transistors is 5%, resulting in different switch errors and parasitic capacitors. In each simulation, the performance of the ADCs array is calculated, which is the specification distribution (includes the average value µ and the standard deviation σ ) of all column ADCs. With a total of 1000-runs Monte-Carlo simulation, the effect of the foreground calibration on the ADCs array is visualized in Fig. 13. Due to the weighted averaged operation, the quantization error that exists in each resistor weight will be weakened. Compared with the single ADC, the dynamic performance of the entire 1024-column ADC is slightly improved. The tiny performance deviation in Fig. 13 demonstrates the excellent consistency of the ADCs array with the proposed calibration.

B. POST-LAYOUT SIMULATION
The proposed two-step SS ADC is implemented in a 0.13-µm CMOS technology as shown in Fig. 14. The layout consists of a single-column ADC, a 5-bit coarse ramp, an 8.5-bit fine ramp, a BGR for reference generator, and digital blocks for timing control. The foreground calibration engine is performed off-chip for algorithm debugging and optimization. The total area of the chip is 1615 × 755 µm 2 , with the active area of the proposed single-column ADC taking up only 7.5 × 775 µm 2 . Within this size, the capacitor array, which includes the sampling capacitor C S and the holding capacitor C H , occupies 18.5% area. The comparator, which consists of the PreAmp1-4 and the CDS capacitor C D , occupies 57.5% area. The logic circuits occupy and 24% area. The full-scale input signal range is 1.2V. The power supplies in the digital and analog domains are 1.2V and 3.3V.
The HSPICE post-layout simulation for the proposed two-step SS ADC is also performed. Fig. 15(a) shows the static performances of the simulated DNL and INL. The peak DNL error is +132.93/−1 LSB and INL error is +18.6/−195.63 LSB when the digital calibration is off. DNL and INL both show significant code gaps when the code transition occurs in the upper 5-bit. This indicates a severe slope difference between the coarse ramp and the fine ramp. Fig. 15(b) shows the simulated fast Fourier transform (FFT) spectrum at 100KS/s sampling frequency f s . The frequency of the input signal is 48.8 KHz. Before the calibration, the SFDR, SNDR, and ENOB are 35.31 dB, 29.84 dB, and 4.66 bits, respectively.
With the foreground calibration, the non-ideal errors from the ramp generator and the column ADC are both corrected. The calibrated DNL and INL are shown in Fig. 16(a). The DNL and INL are significantly suppressed to +0.76/0.8 LSB and +1.06/−0.84 LSB, respectively. Similarly, the dynamic performance is also improved and most tones are suppressed, as illustrated in Fig. 16(b). However, since the accumulation of the quantization error, there are still some un-optimized harmonics. The simulated SFDR, SNDR, and ENOB are  improved to 78.55 dB, 69.49 dB, and 11.25 bits, respectively. The total power consumption of the ADC core is only 72 µW. The partition is listed as following, it includes 15.6 µW for digital logics and 56.4 µW for the analog parts. The walden FoM of the proposed ADC is 296 fJ/step at the sampling frequency f s of 100 KHz. The Monte Carlo analysis is also performed in the post-simulation, as illustrated in Fig. 17 and Fig. 18. Through the proposed calibration, the results show  that the minimum ENOB and SFDR are 10.54 bits and 67.06 dB, respectively. The maximum |DNL| and |INL| can be controlled within 1 LSB and 4 LSB.
The performance of the proposed TS-SS ADC is compared with other state-of-the-art column ADC in Table 1. The power consumption of this work is not best, mainly owing to the high supply voltage of 3.3 V using in the first preamplifier of the comparator. Due to the usage of the delay chain in each column ADC, the area of the proposed ADC is slightly larger than the designs in [7], [17]. However, through the foreground calibration and proposed CDS operation, this TS-SS ADC realizes an excellent trade-off among readout speed, silicon area and power consumption. It not only achieves higher sampling rate and resolution, but also maintains the prominent FoM and area efficiency [2].

VI. CONCLUSION
This paper presents a 12-bit column-parallel TS SS ADC. The resolution of the coarse conversion and fine conversion are 5-bit and 8.5-bit, respectively. With the OOS circuit topology, a novel CDS technology is performed. The decision point of the comparator maintains at V CM 1 , which eliminates the dynamic comparator offset. Through the foreground calibration, the resistor mismatch and op-amp offsets in the ramp generator are corrected, as well as the parasitic capacitors and switch errors in the column ADC. The DNL and INL are +0.76/0.8 LSB and +1.06/−0.84 LSB, respectively. At the sampling frequency fs of 100 KS/s, the column ADC achieves SFDR/SNDR/ENOB of 78.55 dB, 69.49dB, and 11.25 bits under the 48.8 KHz input signal. The FoM and area efficiency of the ADC are 296 fJ/step and 2.39 µm 2 /code, respectively. The excellent simulation results demonstrate that the proposed two-step SS ADC is appropriate for the high-resolution and high-speed image sensors in consumer electronics.