IEEE Xplore At-A-Glance
  • Abstract

Sense Amplifier Power and Delay Characterization for Operation Under Low-Vdd and Low-Voltage Clock Swing

Two critical aspects of sense amplifiers (SA), power consumption and clock-to-data delay, are studied and presented for operation under low-supply voltage and driven by low-swing clock. Trade-offs and simulation results are given for a 4-stack StrongARM and a 3-stack double-tail SA, showing up to 50% power reduction in the SA itself and 25% in the clock generation circuit, with acceptable delay degradation.



THE sense amplifier is a fundamental but important component in many applications such as high-speed serial link receivers [1], [2], [3], A/D converters [4], [5], and SRAMs [6]. With technology scaling, IC design increasingly emphasizes two merits: high data rate and low power dissipation, and the SA requirements also follow these trends.

The StrongARM SA is one of the most conventional structures [1], [7], [8] of latch-type, voltage SAs due to its high input impedance, full output swing and negligible static power consumption. However, this structure has several drawbacks such as limited voltage headroom under low power supply, high input offset due to transistor mismatch, and contradicting requirements for tail current between low offset and fast latching [9]. The double-tail latch-type voltage SA was therefore proposed in [9], solving the above problems by using less stacking and enabling both a large tail current in the latching stage for fast regeneration, and a small tail current in the input stage for low offset. Because the StrongARM and the double-tail are two typical structures in the SA design, both will be used as examples to study and verify the following low-power techniques.

This work explores two different techniques for reducing the SA power with little degradation in delay. First, lowering the supply voltage is a straightforward way to reduce the power consumed by the SA itself, but increase in delay is a critical concern. Second, employing a DC-biased, AC-coupled, low-voltage swing clock reduces power consumed in the clock generation circuits, while keeping the clock-to-data delay within an acceptable range. Hence, power-delay-product (PDP) is used in this paper as a metric in evaluating this low supply technique. For both techniques above, simulations in 90 nm CMOS have been performed and analyzed for the two SA structures.

In Section II, the detailed system description and theory derivation are presented. In Section III, simulation results for both types of SAs, under different power supply and/or driven by different clock swing with optimized DC bias, are given. Minimum PDP is proposed as the criterion in deciding the best operating condition. In Section IV, final conclusions are drawn.


System Description and Derivation

The block diagram of the simulated system of StrongARM and double-tail SAs is shown in Fig. 1.

Figure 1
Fig. 1. Block diagram and schematics of typical SAs.

The first low-power technique of lowering the supply voltage can reduce the power consumed by the SA [10], [11]. However, the clock-to-data delay, defined as the time between the clock edge and the instant when Δ Vout meets 1/2 VDD, will be adversely affected.

The second low-power technique is to reduce the voltage swing of the clock input. The clock generation circuit is implemented by using an inverter-type clock buffer followed by a capacitive divider with proper DC bias. While the input of the buffer is rail-to-rail, its output is less than full swing:Formula TeX Source $$V_{swing}= {C_1 \over C_1 + C_2}\cdot V_{DD}\eqno{\hbox{(1)}}$$in which C2 is the parasitic capacitor of SA, and C1 can be varied to control the value of Vswing. Compared to the conventional clock buffer that directly drives the parasitic capacitor C2, the ideal energy consumed in a clock cycle is reduced to:Formula TeX Source $$E = V_{DD}\int idt = V_{DD}C_2 V_{swing} = {C_1 \over C_1 + C_2}C_2 V_{DD}^2\eqno{\hbox{(2)}}$$which is smaller than that of the normal connection by C1/(C1+ C2). A simulation result of this capacitively-divided clock buffer driven by 2 GHz full swing clock is shown in Fig. 2. Although the relationship between power consumption and output swing is not exactly proportional as (2) expects because of the direct-path, static current dissipation when both NMOS and PMOS of the buffer are ON, it does show a linear relation. Note that nearly 25% power reduction can be obtained when the clock swing is only half of the supply voltage, and this ratio will not scale with clock frequency.

Figure 2
Fig. 2. Simulation result of power consumption vs. output swing for clock buffer connection shown in Fig. 1. (C2 = 13 fF).

From the above derivation and simulation, it can be seen that the power consumed in the clock generation circuit can be saved by reducing the output swing. The next section will show the limits to power reduction both in the SA and the clock generation circuit while maintaining a reasonable delay.


Simulation Results of SA Performances

A. SA Performances Under Different Supply Voltages

The power and delay performances of the nonlinear latch-type SA are simulated with a 100 mV differential input, while the supply voltage VDD is varied and clock swing is kept rail to rail, as shown in Fig. 1. Simulation results for both Strong ARM and double-tail SAs are shown in Fig. 3. As expected, with the falling of power supply voltage, the delay increases while the power consumption decreases. Moreover, a minimum PDP occurs when the supply voltage is around 0.9 V, resulting in the optimal operating point.

Figure 3
Fig. 3. Power and delay performances of (a) StrongARM and (b) doubletail SAs under different supply voltages.

Further attention can be paid to the trends of power and delay performances of SA. The rate of the power consumption reduction with a decreasing supply voltage does not vary across the entire range. The relationship between power dissipation and supply voltage follows between a linear line and a quadratic curve, which can be expected if non-ideality is taken into account. At the same time, the delay increases with falling supply in a different way. When the supply voltage is still high enough to keep all the transistors in saturation region, the delay does not increase significantly. However when the supply is so low that most transistors enter the triode region, the delay increases dramatically, resulting in the optimal operational point around 0.9 V VDD.

B. SA Performances Driven by Low-Swing Clock

Generally, a full swing clock is used for the conventional, operational mode, but consumes more power as shown in Fig. 2. Interestingly, a full swing clock is not necessary to drive every transistor of a SA. For example, the gate voltage of the tail NMOS in StrongARM or double-tail SA does not need to be VDD to turn it ON, although a smaller value may cause a smaller tail current that degrades speed. Likewise, the gate voltage does not have to be ground to turn that NMOS OFF, although a larger value may cause extra leakage current that increases the static power. Similar duality also exists for the situation of the PMOS device.

Simulation results for low-swing clocking are shown in Fig. 4. The supply voltage is kept 1.2 V while the clock swing is swept from 0.5 V to 1.2 V. For simplicity, clocks driving NMOS and PMOS are assumed to have the same swing. The DC bias is optimized so that for each different clock swing, a minimum PDP is obtained. Note that while both the SAs are driven by a clock whose swing is as low as half VDD, their delays do not exceed 130% of that when driven by full swing clock and the power consumption does not exceed 120%. However, significant power is saved in the clock generation circuit, as shown in Fig. 2. And with higher clock sampling rates, such power savings become even greater.

Figure 4
Fig. 4. Power and delay performances of (a) StrongARM and (b) doubletail SAs driven by clock of different swings.

One thing that cannot be observed from Fig. 4, however, is the static power. There exists a trade-off between static power and clock-to-data delay, especially when the clock swing is small. If the clock cannot turn off the tail transistors during the reset phase of the SA, the static power will increase, but the delay will be reduced because in this case the output nodes of SA are precharged to a potential lower than VDD (in the StrongARM SA case) or higher than ground (in the double-tail SA case). Therefore, the time taken to discharge or charge those nodes until one of the two inverters in the latch starts to regenerate is reduced. In most applications, large static power consumption is not desirable—therefore, a second optimal point with a little larger PDP and delay but with smaller static power may be chosen.

Second, since the DC bias in the clock generation circuit can be set optimally, when clock swing is large, the actual clock voltage range may exceed that of ground and VDD. In some of the above cases, the optimal PDP is achieved. This phenomenon can be explained as follows: during the calculation phase of SA, the absolute value of the gate-source voltage on the tail NMOS of the StrongARM SA or tail PMOS of the second stage of the double-tail SA is higher than VDD as compared to the normal case. Hence the tail current is increased and the delay is considerably reduced.

C. SA Performances Under Low VDD With Low-Swing Clock

The previous simulation results show that power consumption can be saved either in the SA circuit itself by using low supply voltage or in the clock generation circuit by using low-voltage swing clock, with acceptable degradation in delay. Naturally the next step is to find out whether both techniques can be employed at the same time.

This special case is simulated and the results are shown in Fig. 5. The selected supply voltage is 0.9 V since Fig. 3 shows that for both types of SAs designed, the optimal PDP points happen to be achieved when the supply voltage is around 0.9 V. The clock swing is swept from 0.4 V to 0.9 V, and DC biases for clocks both driving NMOS and PMOS are swept to find the optimal operation point. As can be seen in Fig. 5, the general trend for both types of SA structures is similar to that shown in Fig. 4. The PDP and delay decrease as clock swing increases, while the power consumption is maintained at a comparable level. However, a small clock swing with proper DC bias may be more favorable if the clock-to-data delay can meet the specification, because the power consumed in clock generation circuit is much smaller, even if that consumed in SA itself may be a little larger.

Figure 5
Fig. 5. Power and delay performances of (a) StrongARM and (b) double-tail SAs under low supply voltage while driven by low swing clock.


Two low power techniques for SA design, namely low supply voltage and low swing driving clock, have been proposed. With minimum PDP taken as the criterion for the optimal operating point, power consumption and clock-to-data delays are given for both the StrongARM and the double-tail SA structures under different situations, in which neither, either and both low power techniques are employed, respectively. With both low-power techniques used together, the optimal PDP can be improved nominally by 35% and up to 70%, and the power consumption in the SA and the clock generation circuit can be improved by about 50% and up to 25% respectively, while the degradation of the clock-to-data delay of the SA is still acceptable.

Therefore, a normal supply voltage and a full swing driving clock is not mandatory, and if a clock-to-data delay can be achieved meeting the given specification with a lower supply voltage and low swing driving clock, significant power can be saved.

Table 1
TABLE I Simulation Results Summary


Tao Jiang and Patrick Y. Chiang are with the School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA (,


1. Low-power area-efficient high-speed I/O circuit techniques

M. J. E. Lee, W. J. Dally, P. Chiang

IEEE J. Solid-State Circuits, vol. 35, p. 1591–1599, 2000-11

2. Low-swing on-chip signaling techniques: Effectiveness and robustness

H. Zhang, V. George, J. M. Rabaey

IEEE Trans. on Very Large Scale Integration (VLSI) System, vol. 8, p. 264–272, 2000-06

3. A 0.28 pJ/b 2 Gb/s/ch tranceiver in 90 nm CMOS for 10 mm on-chip interconnects

E. Mensink, D. Schinkel, E. Klumperink, E. van Tuijl, B. Nauta

Proc. IEEE ISSCC Dig. Tech. Papers, 2007-02, 414–415

4. A 65 fJ/conversion-step 0-to-50 MS/s 0-to-0.7 mW 9b charge-sharing SAR ADC in 90 nm digital CMOS

J. Craninckx, G. V. der Plas

Proc. IEEE ISSCC Dig. Tech. Papers, 2007-02, 246–247

5. A 1.9 μW 4.4 fJ/conversion-step 10b 1 MS/s charge-redistribution ADC

M. van Elzakker, E. van Tuijl, P. Geraedts, D. Schinkel, E. Klumperink, B. Nauta

Proc. IEEE ISSCC Dig. Tech. Papers, 2008-02, 244–245

6. A 256 kb 65 nm 8T subthreshold SRAM employing sense-amplifier redundancy

N. Verma, A. P. Chandrakasan

IEEE J. Solid-State Circuits, vol. 43, p. 141–149, 2008-01

7. Yield and speed optimization of a latch-type voltage sense amplifier

B. Wicht, T. Nirschl, D. Schmitt-Landsiedel

IEEE J. Solid-State Circuits, vol. 39, p. 1148–1158, 2004-07

8. Offset compensation in comparators withminimum input-referred supply noise

K. L. J. Wong, C. K. K. Yang

IEEE J. Solid-State Circuits, vol. 39, p. 837–840, 2004-05

9. A double-tail latch-type voltage sense amplifier with 18ps setup+hold time

D. Schinkel, E. Mensink, E. Klumperink, E. van Tuijl, B. Nauta

Proc. IEEE ISSCC Dig. Tech. Papers, 2007-02, 314–315

10. Low-power CMOS digital design

A. P. Chandrakasan

IEEE J. Solid-State Circuits, vol. 27, p. 473–484, 1992-04

11. Low-power digital design

M. Horowitz, T. Indermaur, R. Gonzalez

San Diego, CA
IEEE Symp. Low Power Electron., 1994-10, 8–11


No Photo Available

Tao Jiang

No Bio Available
No Photo Available

Patrick Y. Chiang

No Bio Available

Cited By

No Citations Available


INSPEC: Controlled Indexing

delays, amplifiers

INSPEC: Non-Controlled Indexing

No Keywords Available

Authors Keywords

No Keywords Available

More Keywords

No Keywords Available


No Corrections


No Content Available

Indexed by Inspec

© Copyright 2011 IEEE – All Rights Reserved