Offset-Canceling Current-Latched Sense Amplifier With Slow Rise Time Control and Reference Voltage Biasing Techniques

The current-latched sense amplifier (CLSA) is a promising candidate for detecting stored values in a memory cell. With technology shrinks, however, the input referred offset voltage (<inline-formula> <tex-math notation="LaTeX">$V_{\mathrm {OS}}$ </tex-math></inline-formula>) in the SA increases, resulting in a degradation of the memory read yield. To obtain a high read yield, <inline-formula> <tex-math notation="LaTeX">$V_{\mathrm {OS}}$ </tex-math></inline-formula> reduction and cancellation techniques have become essential in deep-submicrometer technology nodes. When determining the <inline-formula> <tex-math notation="LaTeX">$V_{\mathrm {OS}}$ </tex-math></inline-formula> in the CLSA, the voltage mismatch of the input NMOS pair is the dominant factor (<inline-formula> <tex-math notation="LaTeX">$\sim $ </tex-math></inline-formula>75%), followed by that of the latch NMOS pair (<inline-formula> <tex-math notation="LaTeX">$\sim $ </tex-math></inline-formula>25%). In this paper, 1) slow rise time (<inline-formula> <tex-math notation="LaTeX">$T_{\mathrm {RISE}}$ </tex-math></inline-formula>) control technique of SA enable signal and 2) reference voltage (<inline-formula> <tex-math notation="LaTeX">$V_{\mathrm {REF}}$ </tex-math></inline-formula>) biasing technique are proposed, and the effectiveness of the proposed techniques are analyzed for the conventional CLSA with footswitch (FS-CLSA) and offset-canceling CLSA (OC-CLSA). Post-layout based HSPICE simulation results using 28 nm model parameters show that the FS-CLSA with size-up strategy (OC-CLSA) achieves a 17.7% (10.5%) reduction of the standard deviation of <inline-formula> <tex-math notation="LaTeX">$V_{\mathrm {OS}}$ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$\sigma _{\mathrm {OS}}$ </tex-math></inline-formula>) when a slow <inline-formula> <tex-math notation="LaTeX">$T_{\mathrm {RISE}}$ </tex-math></inline-formula> of 0.6 ns is employed. The measurement results from a 28 nm test chip show that the OC-CLSA with <inline-formula> <tex-math notation="LaTeX">$V_{\mathrm {REF}}$ </tex-math></inline-formula> biasing achieves a 22% reduction of <inline-formula> <tex-math notation="LaTeX">$\sigma _{\mathrm {OS}}$ </tex-math></inline-formula> compared to the conventional OC-CLSA.


I. INTRODUCTION
W HEN designing a memory, the sense amplifier (SA) is an essential peripheral circuit because it senses the small differential input value and amplifies it to a digital one (1 or 0). This can significantly reduce the required power consumption in a read operation [1]. Because the latch type SA consists of a cross-coupled inverter structure, its positive feedback characteristic enables low power consumption and a high-speed read operation. Therefore, it is widely used in various applications [2], [3], [4]. There are two representative latch type SAs, namely, a voltage-latched SA with an NMOS footswitch and PMOS access transistors (FSPA-VLSA) and a current-latched SA with an NMOS footswitch (FS-CLSA), as shown in Fig. 1 [5]. The VLSA senses a small input voltage difference ( V ) between the bit line voltage (V BL ) and the bit line bar voltage (V BLB ). The CLSA senses the current difference flowing through an additional differential input transistor pair (MN3 and MN4 in Fig. 1(b)). The VLSA has better performance in terms of area and speed than the CLSA [5].
However, when the global reference voltage (V REF ) generator circuit, that shares all the V REF (= V BLB ) nodes, is used for power consumption saving [6], [7], the VLSA can be vulnerable to noise from the output nodes, unlike the CLSA. In other words, the noise from the OUTB node to the BLB node causes the global V REF generator to be a temporary nonconstant voltage, because the VLSA's output nodes are directly connected to its input nodes by the access transistors (MP3 and MP4 in Fig. 1(a)). Thus, when the global V REF generator is used, the CLSA, with separate input and output nodes, is better than the VLSA.
In CLSA, to successfully detect the stored values in a memory cell during the read operation, the following two conditions must be satisfied: 1) V BL and V REF must be greater than the threshold voltage (V TH ) of MN4 and MN3, respectively. If V BL and V REF are smaller than V TH , then the MN3/MN4 turns off, leading to sensing failure. This input voltage range is called the sensing dead zone of the SA [5]. 2 V OS is dominantly generated by the V TH mismatch of transistor pairs [3], [8], [9], [10], which is induced by process variations, such as random dopant fluctuation [11], [12], [13]. Moreover, the read yield can be statistically expressed by these two factors ( V and V OS ) modeled by Gaussian distributions. The read yield, represented as the read-access pass yield for a single cell (RAPY CELL ) [14] is expressed as However, as the technology node scales down and the supply voltage (V DD ) decreases, the process variation increases significantly, leading to a greater V TH mismatch of the transistor pair. The mismatch ends up having a more significant impact on V OS . If V OS is higher, a larger V is required for accurate sensing, which results in greater power consumption and a delay in correct sensing. Thus, to improve the read yield, V OS must be minimized. The most straightforward way to reduce V OS is to increase the size of the transistors. Another straightforward method is to use a higher V DD for a larger V . However, these two techniques are not desirable in deepsubmicrometer technology nodes, because of area overhead and increased power consumption. For this reason, V OS reduction and cancellation techniques have become essential in deep-submicrometer technology nodes.
In this paper, we analyze the V OS of conventional CLSA using I-V curve. And then, 1) we propose the slow T RISE control technique of the SAE signal for CLSA [21], apply it to FS-CLSA and OC-CLSA, and compare σ OS , area, sensing time, and energy of the two SAs using post-layout simulations. In addition, 2) we propose the V REF biasing technique for OC-CLSA and analyze the effectiveness of the proposed technique using the fabricated 28 nm test chip.
The remainder of this paper is organized as follows. Section II describes V OS analysis and operation of the conventional CLSA and OC-CLSA. Section III introduces the proposed slow T RISE control technique of the SAE signal for CLSA. Section IV introduces the proposed V REF biasing technique for OC-CLSA. Section V presents the conclusions.

II. V OS ANALYSIS AND OPERATION OF CONVENTIONAL
CLSA AND OC-CLSA The CLSA consists of the input NMOS transistor pair (MN3/MN4), the latch NMOS transistor pair (MN1/MN2), the latch PMOS transistor pair (MP1/MP2), the precharge PMOS transistor pair (MP3/MP4), and the NMOS foot switch (MNFOOT), as shown in Fig. 1(b). The sensing operation of the CLSA is as follows: When the SAE signal is low (deactivated), MP3/MP4 is turned on. Then, the OUT and the OUTB nodes are precharged to V DD , and the differential input voltages (V BL and V REF ) are captured. V BL and V REF are generated from the BL in a cell array (or sensing circuit), and the global voltage generator, respectively. When the SAE signal becomes high (activated), MP3/MP4 turns off and MNFOOT turns on. Then, as the sensing current begins to flow through MN1/MN2 and MN3/MN4, the voltages of the OUT/OUTB nodes (V OUT and V OUTB ) start to decrease from V DD . The cross-coupled inverter structure (MP1/MN1 for one inverter and MP2/MN2 for the other) begins to compare a small output voltage difference (= |V OUT -V OUTB |) caused by the current difference and amplifies it to a rail-to-rail digital value (V DD or GND). Ideally, the CLSA is symmetric. However, because of the process variation, the sensing current is influenced by the transistor pair's V TH mismatch, which leads to the generation of a V OS . The influence of each transistor pair's V TH mismatch on the V OS varies. Fig. 2 shows the V OS of the CLSA according to each transistor pair's V TH mismatch level, when V BL is 0.8 V [21]. The input NMOS pair's V TH mismatch increases the V OS by the V TH mismatch because the mismatch can determine its drain current, and because it operates in the saturation region, meaning that the input NMOS's small-signal effective resistance (R INPUT ) is relatively large. The input NMOS acts as a current source. The latch NMOS pair's V TH mismatch has a smaller influence on the V OS than that of the input NMOS pair because of the diode-connected configuration in the early sensing period, meaning that the latch NMOS's small-signal effective resistance (R LATCH ) is smaller than that of the input NMOS (R INPUT ). In contrast, the latch PMOS has little influence on the V OS because it does not operate in the early sensing period. The precharge PMOS pair does not affect the V OS at all since it is completely turned off during the sensing operation. Therefore, V OS_INPUT is the most dominant factor (∼75%) in determining the overall V OS , followed by V OS_LATCH (∼25%) because they are activated during the sensing operation and therefore affect the sensing current. Thus, both must be reduced to minimize the V OS . In the early sensing stage, the CLSA circuit shown in Fig. 3(a) can be simply represented as an equivalent circuit with resistors (R LATCH and R INPUT ), as shown in Fig. 3

(b). Figs. 3(c) and (d) show I-V
curves of input NMOS (MN3 and MN4) and latch NMOS (MN1 and MN2) when there are input NMOS pair's V TH mismatch of 50 mV and latch NMOS pair's V TH mismatch of 50 mV, respectively. As clearly shown in these figures, the sensing current difference ( I = I D1 -I D2 ) is directly affected by the input NMOS pair's V TH mismatch ( I = 5 µA) because of the large R INPUT , whereas I is only 1.2 µA when the same V TH mismatch exists in the latch NMOS pair because of the small R LATCH . Thus, the sensing current from OUT/OUTB to GND can be simply expressed as V DD /(R INPUT + R LATCH ). Because R INPUT dominates the sensing current, the expression clearly describes the reason why the input NMOS pair's V TH mismatch is dominant on V OS .
In the CLSA, σ OS is large because it is dominated by the standard deviation of V OS_INPUT (σ OS_INPUT ). σ OS can be expressed as [5] where σ OS_LATCH is the standard deviation of V OS_LATCH . To minimize σ OS , a reduction in σ OS_INPUT is essential. σ OS_INPUT can be reduced using an OC-CLSA, as shown in Fig. 4. The OC-CLSA has the advantage of offset cancellation characteristics caused by the mismatch of the input NMOS pair by using the diode-connected configuration. The operation of the OC-CLSA is as follows: Initially, the PRE signal is high (similar to the initial condition of the CLSA with SAE = low), and the IN (

III. PROPOSED SLOW T RISE CONTROL TECHNIQUE OF SAE SIGNAL FOR CLSA
As described in the previous section, V OS is determined by the V TH mismatch of the input and the latch NMOS pairs (∼75% and ∼25% respectively) when V BL is 0.8 V. Note that if V BL becomes higher, the saturation current of input NMOS becomes higher and the operation region moves from saturation to linear region, leading to the decrease in R INPUT at the operating point (I D increases and V D decreases in Fig. 3(d)). Thus, the latch NMOS pair's V TH mismatch has a greater effect on the V OS than before. To minimize the V OS , the effect of the latch NMOS pair's V TH mismatch needs to be reduced as well. To this end, a slow T RISE control technique for the SAE signal is proposed. In addition to the gate voltage the operation region of the input NMOS pair. For the fast T RISE , the COMN node discharges quickly during the initial sensing period, resulting in the input NMOS pair operating on the boundary between the linear and the saturation regions. This means a decrease in R INPUT in the same way as a higher V BL . In contrast, when using the slow T RISE , the MNFOOT is slowly turned on, which allows the COMN node voltage to drop slowly and maintain a high voltage at the beginning of the sensing operation. Thus, the saturation current of the input NMOS pair can be kept sufficiently low, resulting in the input NMOS pair operating in the saturation region. This means an increase in R INPUT . Thus, the sensing current flowing from OUT/OUTB to COMN can be dominantly determined by R INPUT and not R LATCH . In other words, by employing the slower T RISE control technique for the SAE signal, the impact of the latch NMOS pair's V TH mismatch on σ OS can be minimized, leading to a decrease in σ OS .
Because the OC-CLSA can cancel σ OS_INPUT effectively, the σ OS in the OC-CLSA is dominated by σ OS_LATCH , and the slow T RISE of the SAE signal can be applied to the OC-CLSA to effectively mitigate the remaining σ OS_LATCH . Therefore, the OC-CLSA with the slow T RISE control technique is suitable for minimizing σ OS .
To verify the proposed slow T RISE control technique of the SAE signal in the conventional CLSA (FS-CLSA) and OC-CLSA, Monte-Carlo HSPICE simulations were performed using industry-compatible 28-nm model parameters with 1.0 V as nominal V DD . To fairly compare the effect of each transistor pair's V TH mismatch on σ OS , two pMOSCAPs of the OC-CLSA with a width of 2.0 µm and a length of 0.05 µm were used. All the other transistors being used had a width of 0.1 µm and a length of 0.03 µm. V was set to 20 mV to determine σ OS . The pulse widths of the PRE signal (T PRE ), P1 signal (T P1 ), and P2 signal (T P2 ) were set to 2 ns, 2 ns, and 0.1 ns, respectively. P3 signal rises with P2 signal. Note that in actual application, the PRE signal is initially high, and the same as the SAE signal of the FS-CLSA, which is initially low. Fig. 5 shows the σ OS of the FS-CLSA and OC-CLSA according to the T RISE of the SAE. Generally, the T RISE of an inverter is approximately 0.05 ns. The T RISE can be controlled simply by an inverter with a capacitor size in the global signal generator. The simulations were performed by adjusting this capacitor size. As the T RISE increases, the σ OS tends to gradually reduce and saturates at approximately 0.6 ns in both SAs. For a minimum σ OS , the T RISE is selected as   Fig. 6 shows the σ OS of the FS-CLSA according to the input voltage (V BL ) with and without the T RISE control technique. When V BL is in the sensing dead zone (V BL < V TH ), the input NMOS pair is not turned on and no sensing operation occurs. Fig. 6 clearly shows the efficacy of the T RISE control technique. When the V BL is in the 0.4-0.5 V range, the input NMOS pair already operates in the saturation region without the T RISE control technique. Therefore, the effect of σ OS_LATCH on σ OS is negligible. In this case, when applying the slow T RISE at V BL = 0.4 V, the σ OS decreases only 0.7% from 51.36 mV to 51.01 mV. In other words, as V BL decreases, the effect of the T RISE control technique on σ OS becomes insignificant. However, as V BL increases, the saturation current of the input NMOS pair increases, leading to the input NMOS pair operating in the linear region. Thus, as V BL increases, σ OS_LATCH increases. Therefore, the sensing current is more affected by the mismatch of the latch NMOS pair. When the T RISE control technique is applied at V BL = 0.7 V, the σ OS decreases by 4.4% from 52.72 mV to 50.48 mV. In other words, the effect of the T RISE control technique on σ OS increases with increasing V BL .
Even though σ OS_LATCH can be reduced by employing the slow T RISE control technique of the SAE signal, σ OS in the CLSA is still large because it is dominated by σ OS_INPUT . To minimize σ OS , the OC-CLSA with the T RISE control   technique is recommended. Fig. 7 shows the σ OS of the FS-CLSA and OC-CLSA according to the V BL without the T RISE control technique. Fig. 7 clearly shows that the OC-CLSA (blue line) achieves an average σ OS (from V BL = 0 V to V BL = 0.65 V) of 11.92 mV (minimum σ OS = 7.22 mV at V BL = 0.2 V; maximum σ OS = 21.7 mV at V BL = 0.65 V), which is four times lower than that of the FS-CLSA, 53.23 mV (from V BL = 0.35 V to V BL = 1 V). This result is because of the significant decrease in σ OS_INPUT by the OC-CLSA. Because of the decrease in R INPUT with increasing V BL , the OC-CLSA has a sensing dead zone of V BL > 0.75 V. The case of FS-CLSA with size-up (yellow line) will be explained later. Fig. 8 shows the σ OS of the OC-CLSA with/without the T RISE control technique according to the V BL . When applying the T RISE control technique to the OC-CLSA, the σ OS on average is reduced by 20.6% (0%, from 8.26 mV to 8.26 mV at V BL = 0 V; 35.64%, from 17.59 mV to 11.32 mV at V BL = 0.5 V). It is noted that in the OC-CLSA, the efficiency (20.6%) of the T RISE control technique for the average σ OS improves . This is because the σ OS of the OC-CLSA is dominated by σ OS_LATCH owing to the cancellation of σ OS_INPUT , and the slow T RISE control technique can effectively mitigate the remaining σ OS_LATCH . Fig. 9 shows the average σ OS of the OC-CLSA with/without the T RISE control technique according to the length of C SA (L CSA ) when the width of C SA (W CSA ) = 2.0 µm and T P1 . As the L CSA increases, the effect of the capacitive coupling increases, owing to the capacitance difference between the parasitic capacitance of the input nodes (IN, INB) and C SA . The L CSA was selected as 0.05 µm, considering area overhead. As T P1 increases, C SA becomes more discharged, resulting in a better cancellation of σ OS_INPUT . With considering the performance overhead, T P1 was set to 2.0 ns. Fig. 10 shows the σ OS of the FS-CLSA according to the width sizes of the input and the latch NMOS when V BL = 0.8 V. According to Pelgrom's research [30], σ OS can be reduced by increasing the size of the input and the latch NMOS pairs. For a fair comparison of the FS-CLSA and the OC-CLSA in terms of area, the widths of the input NMOS and the latch NMOS pairs of the FS-CLSA were increased to reduce the average σ OS . The total pre-layout area of the SA was estimated by the sum of each transistor's area (width × length). To satisfy the average σ OS = 11.92 mV of the OC-CLSA without the T RISE control technique, the FS-CLSA should increase the width of the input (latch) NMOS to 4 µm (4.6 µm). In this case, the total pre-layout area of the FS-CLSA (size-up) was estimated to be 0.531 µm 2 , whereas the total area of the OC-CLSA was 0.272 µm 2 . Note that the FS-CLSA (size-up) has σ OS of 11.92 mV when V BL = 0.8 V and average σ OS (from V BL = 0.35 V to V BL = 1 V) of 17.08 mV. The yellow line in Fig. 7 shows the σ OS of the FS-CLSA (size-up). Although the OC-CLSA generally uses an area 10.1 times larger than that of the FS-CLSA (0.027 µm 2 ) when the size of transistors in both circuits is minimum, it uses an area 1.95 times smaller than that of the FS-CLSA (sizeup). However, because these calculations are based only on transistor size, layout-based evaluations are required. It will be dealt with later. Fig. 11 shows the pre-layout transient responses of the FS-CLSA, the FS-CLSA (size-up), the OC-CLSA without  In the FS-CLSA, the average (worst-case) sensing time is 0.077 ns (0.128 ns). The FS-CLSA encounters many sensing failures when V = 50 mV, since σ OS of the FS-CLSA is approximately 50 mV, which corresponds to RAPY CELL = 1σ . In contrast, σ OS of the FS-CLSA (size-up) and OC-CLSA are approximately 10 mV, which corresponds to RAPY CELL = 5σ . Thus, there is no sensing failure in these three cases. Compared to the FS-CLSA, in the FS-CLSA (size-up), the average and the worst-case sensing time increases to 0.437 ns and 0.6 ns, respectively, owing to the loading delay. Compared to the FS-CLSA (size-up), the OC-CLSA has 2 ns additional sensing time owing to the offset cancellation stages of S1 and S2. The T RISE difference between the OC-CLSA with and without the T RISE control technique is 0.55 ns (= 0.6 ns -0.05 ns). However, the average sensing time difference is 0.338 ns (= 2.809 ns -2.471 ns) because of the σ OS reduction.
As mentioned previously, layout-based estimations of delay, area overhead, and power consumption are required since the circuit complexity can make the difference between pre-layout-based result and post-layout-based result large. Figs. 12(a) and (b) shows the layout of FS-CLSA (size-up) and OC-CLSA, respectively. Although the pre-layout area of the OC-CLSA was found to be 1.95 times smaller than that of the FS-CLSA (size-up) when considering the same σ OS , it clearly indicates that this is not the case in reality. Interconnects are the biggest contributors to area overhead. Post-layout area of OC-CLSA (24.15 µm 2 ) is 72.5% bigger than that of FS-CLSA (size-up) (14 µm 2 ). Fig. 13 shows the σ OS of OC-CLSA and FS-CLSA (size-up) with and without T RISE according to V BL , based on post-layout simulations. When the slow T RISE control technique is applied, the average σ OS of OC-CLSA decreases by 10.5% (15.4 mV to 13.78 mV), while the average σ OS of FS-CLSA (size-up) decreases by 17.7% (23.9 mV to 19.68 mV). Table I Table I confirms the above analysis results for the case where the layout area is the same.

IV. PROPOSED V REF BIASING TECHNIQUE FOR OC-CLSA
When SA is used for memory (e.g., static random access memory), the input voltage difference V (= |V BL -V REF |) should be large enough with considering σ OS . In general, V is designed to be greater than 200 mV. It means V REF should be lower than V DD by at least 200 mV so that V BL at state 1 (V BL1 ) is larger than V REF by 200 mV and V BL at state 0 (V BL0 ) is smaller than V REF by 200 mV. However, because of the cell leakage (or other non-idealities, such as aging, temperature variation, noise, etc.), V BL1 cannot maintain its value to V DD but decreases as time elapses. For this reason, V BL range should be greater than at least 500 mV (i.e., V DD -500 mV ≤ V BL ≤ V DD ). Moreover, as V DD reduces with technology node shrinkage, the range of V BL decreases with it. Furthermore, because non-volatile memories (e.g., MRAM) generate intermediate voltages between V DD and GND, wide V BL range is required. Therefore, the operational range of V BL must be addressed in order for SA to operate effectively and adaptably in diverse V DD regions and applications.
Both OC-CLSA and FS-CLSA designs have limitations on the V BL range, as was noted in Section III. As shown in Figs. 7 and 13, the FS-CLSA cannot operate properly until V BL exceeds the threshold voltage of the input NMOS transistors (e.g., V BL > 0.35 V), and as V BL increases, the FS-CLSA efficiency declines as well. Although the OC-CLSA was able to mitigate the sensing dead zone problem of the FS-CLSA to some extent, its effectiveness also decreases when the V BL is raised. To solve the sensing dead zone problem and to improve efficiency of the OC-CLSA, we propose the V REF biasing technique for the OC-CLSA. As mentioned in Section III, in S2 of the OC-CLSA (see Fig. 4 In addition, as the gate voltage of input NMOS for the FS-CLSA and OC-CLSA, V + V TH is the optimal voltage for minimizing σ OS . Thus, the average σ OS can be significantly reduced. To further demonstrate the effectiveness of the OC-CLSA with the proposed V REF biasing technique, we offer measurement results of the fabricated 28 nm test chip. Fig. 15(a) shows the test chip structure with 32 × 32 SA array containing 1024 OC-CLSAs (OC-CLSAs with V REF biasing technique) and 1024 FS-CLSAs (size-up). Fig. 15(b) shows the die and layout photo of the test chip. The test chip includes 1024 FS-CLSAs (size-up) and 1024 OC-CLSAs designed to be able to change the source node voltage of MNBIAS transistor so that it can be used both as conventional OC-CLSA and OC-CLSA with V REF biasing technique, as shown in Fig. 15(c). Following signals are generated inside the signal generator of the test chip using CLK signal input: SAE, PRE, P1, P2, P3 signals for OC-CLSA and SAE signal for FS-CLSA. Also, the test chip includes multiplexers and decoder to select the test cell, and buffers and D flip-flop (D-FF) to display visible output signal for σ OS testing.      [31]), based on post-layout simulations and test chip measurement results. As indicated in the Table II,  TABLE II  POST-LAYOUT/TEST CHIP PERFORMANCE SUMMARY AND COMPARISON BETWEEN CONVENTIONAL FS-CLSA (SIZE-UP), OC-CLSA, STATE-OF-THE-ART SAS, AND OC-CLSA WITH V REF BIASING TECHNIQUE to fairly compare our proposed designs with state-of-the-art SAs, we made post-layout analyses on differential input body biased SA with predischarge output nodes (DIBBSA-PD) [25], variation-tolerant SA (VTSA) [27], and single-ended offsetcanceling SA (SOSA) [17], [31]. Fig. 18 shows the layouts used for the comparison analysis. DIBBSA-PD proposed using body-biasing technique on VLSA to lower σ OS . To offer fair comparison between proposed designs and DIBBSA-PD, two different layouts of DIBBSA-PD were made. First layout utilizes same transistor size as OC-CLSA and it is shown in Fig. 18(a), while the second layout has increased the transistor sizes so that the layout area is similar to the OC-CLSA and is shown in Fig. 18(b). Shown in Fig. 18(c) is the layout of the VTSA. It is a hybrid design between VLSA and CLSA, and it offers accurate operation in low voltages. For the layout in Fig. 18(c), transistor sizes were increased so that the VTSA's layout area is similar to our proposed design. Fig. 18(d) shows the layout of SOSA. SOSA is VLSA type design that offers low σ OS while enabling wide-voltage operations. For the layout shown in Fig. 18(d), the transistor sizes were the same as our proposed design because SOSA uses two capacitors for offset-cancellation and the SOSA's layout area is similar to the proposed design. Additionally, we changed the NMOS switch transistors used in SOSA to transmission gates to make the comparison more accurately.
By applying V REF biasing technique on OC-CLSA, the average σ OS (test chip) was successfully lowered by 22% (from 21.58 mV to 16.83 mV) because the proposed technique successfully eliminates the sensing dead zone. Even though the V REF biasing technique improves the average σ OS of OC-CLSA, as a result of the lowered voltage, current degradation occurs in S3 and it leads to delay. However, despite the latency increment in average sensing time (from 4.7 ns to 8.44 ns), the average power consumption is lowered by 56.9% (from 1.23 µ W to 0.53 µW). As a result, the overall energy consumption is lowered by 22.5% (from 5.77 fJ to 4.47 fJ).
When transistor sizes of DIBBSA-PD is chosen to be same as our proposed design, the average energy consumption is 49.2% lower but average σ OS is 113% larger than the OC-CLSA with V REF biasing. Therefore, we concluded that increasing the transistor sizes of DIBBSA-PD to make the layout area similar to our proposed design is fair. For DIBBSA-PD in Fig. 18(b), the transistor sizes (width/length) were increased to 1.5 µm/0.03 µm. As a result, the average σ OS of DIBBSA-PD is decreased and it is 47% lower than our proposed design. However, the energy consumption increases dramatically with transistor size increment. Also, the DIBBSA-PD has a sensing dead zone range of V BL < 0.4 V.
Compared to the proposed design, average σ OS of VTSA is 17.9% smaller but the energy consumption of VTSA is 50.9% bigger. Because the VTSA utilizes hybrid design in which output nodes are connected to V BL and V BLB , average sensing time of VTSA is 89% longer than the proposed design. Also, VTSA has a sensing dead zone range of V BL > 0.45.
SOSA is the most efficient design in terms of performance among previously proposed designs and it successfully eliminates sensing dead zone. Therefore, average σ OS of SOSA is 31.4% smaller than our proposed design. However, as shown in Table II, the average energy consumption of SOSA is enormous than the proposed design, because it utilizes auto-zeroing technique that uses excessive short-circuit power during this period. Note that for SOSA we analyzed σ OS dependency on PRE and SMP signals and concluded that 5 ns for the T PRE and T SMP were reasonable.

V. CONCLUSION
In the first part of this paper, we proposed a slow T RISE control technique for CLSAs, which reduces σ OS_LATCH without area overhead, and conducted a comparative study between OC-CLSA and FS-CLSA using slow T RISE technique on both. Post-layout simulation results showed that the OC-CLSA achieved a 10.5% reduction in σ OS by employing the T RISE control technique, while the FS-CLSA (size-up) achieved a σ OS reduction of 17.7%. In addition, the simulation results clearly proved that the OC-CLSA is more energy efficient and the FS-CLSA (size-up) is more performance and area efficient.