A Hybrid Low-Dropout (LDO) Regulator Using a Load Replication Circuit for DRAM Cores

This paper presents a cost-effective hybrid low drop-out regulator (LDO) circuitry for state-of-the-art DDR DRAM cores that not only supports various refresh operations, but also meets the JEDEC specification of the refresh period by improving the load-transient response. In order to guarantee a stable output voltage by achieving the precise off-control operation, a load replication circuit with dummy DRAM cells is exploited. The proposed cost-effective LDO has been implemented and fabricated in a standard 180nm CMOS technology and occupies 0.165mm2. By adopting the hybrid LDO, voltage droop improvements of 62mV and 110mV, and <inline-formula> <tex-math notation="LaTeX">$t_{RFC}$ </tex-math></inline-formula> gain of 100ns and 120ns are measured with refresh rates of 4K and 8K, respectively. The measured current consumption overhead by 8 hybrid LDOs is <inline-formula> <tex-math notation="LaTeX">$36\mu \text{A}$ </tex-math></inline-formula> during the 8K refresh operation. The peak current efficiency is 99.6% at a supply voltage of 1.2V.


I. INTRODUCTION
Recent trends in dynamic random-access memory (DRAM) manufacturing have emphasized cost reduction by scaling down DRAM to the 10nm class and below. However, reduction in memory storage capacitors due to this process shrinkage has raised concerns regarding the sensing margin and data retention time of the refresh period [1]. To overcome these issues, significant research and development on sensor amplifiers with an offset cancellation function [2], and in-DRAM refresh solutions including error correction code (ECC) engine [3] have been devoted. Moreover, in terms of power management of DRAM cores, low drop-out regulators (LDOs) capable of responding to the severe current consumption associated with the increased refresh rate are required.
A lot of effort [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17] has been devoted to achieving low-power, high-performance LDOs with a smaller die area, higher accuracy, faster transient response, and noiseinsensitive characteristics. Conventionally, according to the current control of power transistors, the LDO structure The associate editor coordinating the review of this manuscript and approving it for publication was Khaled Hani Ahmed . is classified into analog LDO (ALDO) and digital LDO (DLDO). Since ALDOs can achieve superior power supply rejection ratios (PSRRs) compared to DLDOs, they are more suitable for noise-sensitive applications. Moreover, pioneering ALDOs have achieved a fast transient response with a low-level quiescent current, high PSRR, and large bandwidth [4], [5], [6]. On the other hand, digitally controlled DLDOs [7], [8], [9], [10], [11], [12], [13], [14], [15] are increasingly being preferred due to their low design effort for lower voltage operations. Furthermore, DLDOs have the advantages of easy process technology scalability, including less stability issues, and a small chip area for the same supply voltage and load current conditions. However, DLDOs have fundamental tradeoffs between transient response speed and noise performance including output ripple and PSRR characteristics depending on the LDO operating speed. To improve the overall transient response and noise performance of DLDOs, asynchronous DLDOs (AS-DLDOs) [12], [13] and analog-assisted DLDOs (AA-DLDOs) [14], [15] utilizing analog feedback have been published. However, these LDOs still suffer from the tradeoff between speed and noise performance, including a power overhead due to the increased quiescent current. State-of-the-art hybrid LDOs (HLDOs) VOLUME   provide the desired PSRR with low output ripple and power efficiency with a fast transient response [16], [17]. Moreover, the HLDO architecture can be reconfigured to optimize PSRR with a power budget similar to that of ALDOs while maintaining the transient response performance similar to that of DLDOs. This paper presents a cost-effective hybrid LDO circuitry for state-of-the-art DRAM cores with an improved transient response; the circuitry not only supports various refresh operations but also meets the DDR4 JEDEC specifications for the refresh period [18]. This paper is organized as follows. In Section II, the architecture and operation of the proposed LDO circuit using a load replication circuit are described. Section III describes the circuit implementation and experimental results of the proposed LDO. Finally, the paper is concluded in Section IV.

II. PROPOSED LDO ARCHITECTURE
A. LDO FOR DRAM CORES Figure 1(a) shows the physical implementation of a DRAM chip composed of several bank groups including DRAM cores and LDOs that supply their power. Each bank includes an individual LDO, and the LDO is implemented inside the column decoder. For the cost-effective implementation by the small DRAM chip size, the die area of each LDO is severely required to be squeezed. In the case of commercial DRAM device products, the LDO implementation with more than one operational amplifier (OP-AMP) is not allowed. In extreme cases, the die area of the OP-AMP is limited to the area of the single stage configuration. However, as the storage capacitance in state-of-the-art DRAM cores decreases with scaling down, LDOs for core sensing and refresh operation is required to be more robust. In particular, when sensor amplifiers with the offset cancellation function [2] is employed, the demand for high performance LDO for DRAM cores with fast load-transient response and low output ripple increases.
The DRAM core device and circuit diagram, excluding the cell transistor driving circuit of the word line driver, which consumes the dominant current in the DRAM core during active and refresh operations, is shown in Fig. 1(b). The DRAM core comprises arrays of storage cell capacitors and transistors, sensor amplifiers, and their driving circuits. The bit-line capacitor charging is the main source of current consumption during DRAM core operation, while the sensor amplifier amplifies the voltage difference between bit-lines after turning on the corresponding cell transistor by active and refresh commands. The bit-line capacitance C BL , which includes the wiring parasitic capacitance, is determined by the number of cell capacitors and transistors connected to one bit-line, which has a tradeoff with the entire DRAM die area. Increasing the number of cells per bit-line connected to the sensor amplifier reduces the area portion of sensor amplifiers, which results in the entire die area saving.
Furthermore, the current consumption during refresh core operation by the refresh command is multiplied according to the refresh rate, considering the DRAM core data retention time. While the DRAM die area reduction due to the DRAM process shrinkage can be achieved, the reduction in DRAM storage capacitance significantly reduces the retention time. Therefore, it is required to reduce the maximum load current consumption and load-transient response time by increasing the refresh rate. Figure 1(c) shows the estimation of the loadtransient response time of the LDO for DDR DRAM cores, considering the reduction in data retention time for each DRAM process generation [1], [3].

B. PROPOSE LDO ARCHITECTURE
Compared to the conventional LDOs, the main contributions and novelties of this work can be summarized as follows. This work exploits the hybrid LDO that combines DLDO with ALDO for DRAM cores, which leads to fast LDO recovery that improves the refresh cycle time t RFC characteristics. However, due to the added DLDO circuitry for the hybrid LDO implementation, the die area is increased. For reliable DLDO control and cost-effective implementation, a load replication circuit, which is implemented in the unused dummy DRAM cells, is proposed. The replication circuit complements the precise control of the inverter-based comparator to minimize the die cost overhead due to the additional DLDO integration.
Since DLDOs suffer from an intrinsically slow transient response and large output ripple, including low PSRR, conventional LDOs for DRAM cores comprise an ALDO circuit with an OP-AMP feedback loop to achieve low output ripple. Figure 2 shows the architecture of the proposed hybrid LDO with the modeled DRAM cores. While the DRAM cells used in the load replication circuit adjacent to the column decoder are unused dummy DRAM cells, all other cells use actual load cells. A hybrid LDO located at the column decoder of each bank group employs both ALDO and DLDO circuits that provide low output ripple and a fast load-transient response. The DLDO is activated only during a DRAM refresh operation with large load current and support fast load-transient characteristics of the ALDO, which improves t RFC characteristics.
While the ALDO consists of PMOS power transistors and a single-stage OP-AMP as an error amplifier, the DLDO consists of NMOS power transistors and a single-bit inverterbased comparator rather than an OP-AMP with considering current consumption and die cost overhead. Since the DDR4 system additionally has an external power supply V PP higher than the general power supply V DD , the NMOS transistor having smaller size than the PMOS one can be used as the power transistor for the DLDOs. All of the DLDO circuits, except the power transistors, use the external V PP to drive the NMOS power transistors. To activate the DLDO only during the refresh operation, a refresh command signal REF_CMD from the row decoder is used as a comparator control input. Unlike the DLDO, the ALDO is always activated during core operations. For further optimization of LDO output voltage V core in the context of load current variations according to the DRAM cell address being activated, the OP-AMP input voltage of the LDO output feedback V core,fb is applied by wiring a V core signal inside the DRAM core.
However, while the ALDO stably operates through the feedback loop, the DLDO in the proposed hybrid LDO can cause unstable operation, which can lead to DRAM core operation failure due to V core voltage overshoot when the DLDO comparator input is connected to V core,fb inside the DRAM core. Owing to the slow response speed of the V core,fb signal including large parasitic components, DLDO off-operation is delayed, which may generate voltage output that exceeds the target. In order to eliminate this unstable operation, our DLDO in this paper comprises a replication circuit with an output voltage V core,rep that recreates the DRAM core load current. While previously published works use the replica to achieve a fast load response time by improving the feedback loop delay [4], [6], this work adopts a replica circuit to guarantee stable output voltage by achieving precise off-control operation of the DLDO. Since the proposed LDO has a refresh command signal as an input, no additional circuit for fast DLDO on-control operation is required. As shown in Fig. 3, the proposed load replication circuit consists of sense amplifier replicas including their drivers M1 and M2, powerreset switch M3, NMOS power transistor replica M4, and dummy cells on both edges of the cell mat array in the stateof-the-art DRAM; the DRAM has a folded bit-line structure with 6F 2 trench capacitor cells. The blue-highlighted bit-line is connected to the dummy cells. To minimize the power consumption and die costs associated with the replication circuit, the parasitic load of the V core,rep power routing and number of dummy cells are reduced proportionally to those of the implemented DRAM core. In this work, the replication circuit is optimized with a reduction ratio of 250:1; four dummy cells are used in a 4K refresh operation and eight dummy cells in an 8K refresh operation. Power routing in the load replica is implemented with the same proportion of parasitic loads by finely adjusting the width and length, while using the same power routing metal used in the DRAM core. After a 4K or 8K refresh operation is finished, V core,rep is restored to V core via the M3 switch as a pre-charge signal PCG of the sensor amplifier. However, if the dummy cell and its adjacent cell operate at the same time, a data read operation failure may occur, due to the signal coupling between bit-lines, which is not scope of this paper. Figure 4 shows the design of the inverter-based comparator in the DLDO. Due to the nature of the digital implementation, the conventional comparator of the inverter-based comparator is sensitive to process, voltage and temperature (PVT) variations. The inverter-based comparator in this work focuses on reducing performance degradation under temperature variations. The comparator includes a self-compensation of mobility and threshold voltage temperature effects with diode-connected bias transistors M5, M6, and M7 [19]. The bias induced by M5-M7 transistors is applied to the current source transistors M3 and M4 of the inverter, enabling stable comparator operation that is insensitive to temperature changes. To reduce the performance degradation associated with variations in the process as well as temperature variations, the comparator is optimized with transistors larger than the minimum feature size of the process. Moreover, since it uses VPP power supply, it is implemented with thick oxide transistors. The post-layout simulation results show the temperature insensitivity of this comparator compared with a conventional inverter with only two transistors M1 and M2. As shown in Fig. 4(a) (right), in the DC and transient simulations, 78% and 62% improvements in temperature insensitivity are observed, respectively. And the validity of the design for process variations is confirmed through Monte Carlo simulation of the entire LDO as shown in Fig. 4(b).

III. CIRCUIT IMPLEMENTATION AND EXPERIMENTAL RESULTS
The prototype of the proposed hybrid LDO has been implemented and fabricated in a standard 180nm CMOS technology. A prototype die micrograph is shown in Fig. 5(a). The prototype occupies 5mm × 5mm with eight bank groups of DRAM cores, row decoders and column decoders, including monitoring circuitry, I/O pads and coupling capacitors. To verify the performance of the LDO under ×8/×16 active operation and various refresh operations, including 4K and 8K refresh, the DRAM cores of eight bank groups have been modeled and implemented under various process parameters, such as the C BL of the state-of-the-art DRAM process. The load current of the DRAM core is determined by the CBL and the number of activated bit lines while the DRAM is under active or refresh operation. Since the 180nm CMOS process of this work can reflect the process variables of the sub-10nm DRAM including the C BL value, it is effective to verify the feasibility of the DRAM core LDO IP. While the DRAM cell storage capacitor is modeled using a metal-insulator-metal (MIM) capacitor with a value less than 10fF, the C BL of the activated DRAM cell and sensor amplifier connection is modeled using poly wiring as in commercial DRAM devices. The V core power routing between bank groups is separated to ignore any effect of simultaneous bank group operation, and the coupling capacitance of the LDO output is split into two types for the upper and lower bank groups.
The single-stage OP-AMP of the ALDO with the conventional differential pair is implemented with an open-loop dc gain of 36dB and a unity gain frequency of 3.2MHz. And since the OP-AMP current consumption in each bank group is dominant in the quiescent current of the entire LDO, the OP-AMP current consumption is strictly limited to 5uA or less to minimize current consumption during DRAM core operation. The current mirror-based monitoring circuitry is also implemented to monitor the V core voltage without additional test load. In addition, the DLDO on/off test option input is contained and the gate control signal DLDO_ON of the DLDO NMOS power transistor is monitored to evaluate the contribution of DLDO performance. This chip does not contain DRAM DDR interface circuitry. For the DRAM core operation, the core operation start signal and the REF_CMD signal are externally applied. The chip has mounted on a standard PCB and directly wire-bonded for testing. Figure 5(b) shows the measurement setup with a test board including the prototype of the proposed LDO. Figure 6 shows the measured load-transient response waveforms with a V DD of 1.2V and V core of 1.1V, while the VOLUME 10, 2022    DLDO is on or off by the external test option input. By adopting the DLDO, voltage droop improvements of 62mV and 110mV, and t RFC gain of 100ns and 120ns, are measured with refresh rates of 4K and 8K, respectively. Figure 7 shows a t RFC shmoo plot of the change in V DD at a low temperature of 0 • and a high temperature of 100 • . The shmoo plot confirms that the proposed hybrid LDO, which includes the temperature-invariant DLDO with the proposed load replication circuit, achieves significant t RFC gain without any temperature dependency. The current consumption overhead of the eight LDOs is measured as 36µA during the 8K refresh operation. The peak current efficiency is 99.6%, with a V DD of 1.2V. Figure 8 shows the measured PSRR over the frequency range from 1Hz to 10MHz. The measured PSRR is better than −30dB at low frequencies, while the lowest PSRR is −8.5dB at 20MHz. Table 1 summarizes the performance of the proposed LDO and compares it with state-of-the-art LDOs with the standard figure of merit (FoM) and processscaled FoM [20]. The proposed LDO achieves the lowest FoM among the state-of-the-art LDOs.

IV. CONCLUSION
In this paper, a hybrid LDO for DDR4 DRAM cores is presented. In order to guarantee a stable output voltage by achieving the precise off-control operation, a load replication circuit with dummy DRAM cells is proposed. The prototype has been fabricated using a standard 180nm CMOS process. The experimental results show a peak current efficiency of 99.6% and maximum t RFC improvement of 120ns, while the die area overhead by the digitally implemented DLDO circuits including the proposed load replication circuit using the unused DRAM cells is 38% or less compared to the conventional DRAM core ALDO. Furthermore, we have demonstrated that our LDO outperforms several state-of-theart LDOs based on FoM comparisons, that it can be used as a LDO IP capable of responding to various refresh operations for sub-10nm DRAMs.