Power Delivery Networks for Embedded Mobile SoCs: Architectural Advancements and Design Challenges

Conventional power delivery networks (PDNs) and power management techniques using off-chip power converters with bulky passive components cannot meet the ever-evolving power delivery requirements of high-performance modern system-on-chips (SoCs). In SoCs, heterogeneous components, including multi-core processors and mixed-signals peripheral circuits, require state-of-the-art PDNs to provide high-quality power-on-demand with minimum latency, simultaneously achieving the small-factor, high conversion efficiency, and minimum current consumption. To satisfy these power delivery requirements, various PDNs have been developed over the past decades, such as the conventional architectures using off-chip power converters, architectures using in-package power converters and fully-integrated power converters, and heterogeneous architectures (off-chip power converters and on-chip regulators). This paper reviews these architectural advancements of the PDNs and their advantages and limitations, which leads us to discuss a heterogeneous PDN structure consisting of a highly efficient off-chip switching-mode power converter and multiple highly precise small linear regulators integrated on chip at point-of-load locations. The heterogeneous PDN has been proved one of the most suitable architectures to achieve high-quality fine-grained on-chip power delivery and management in SoCs. This paper also discusses unified voltage and frequency regulators (UVFRs), which support dynamic-variation-aware dynamic voltage and frequency scaling (DVFS) for fine-grained power management in multi-core processors. Based on the UVFR, we propose a modified heterogeneous PDN using frequency-referenced digital low-dropout regulators (FR-DLDOs) for more efficient DVFS, eliminating the need for band-gap circuits to provide reference voltages. As an exemplary implementation of FR-DLDO for this PDN, we present an FR-DLDO with a transient-boost control, which accelerates the transient response. The transient-boost control is activated dynamically only when an abrupt change happens out of the steady state. The implemented FR-DLDO fabricated in a 40-nm CMOS process outperforms other FR-DLDOs in the figure-of-merit and peak power efficiency while driving 40 mA of load current.


I. INTRODUCTION
The continuous and aggressive down-scaling of CMOS technologies has led to on-chip integration of various micro-/nano-electronic circuits and systems on a SoC platform [1], [2]. In SoC platforms, heterogeneous components, including multi-core processors and other peripheral circuits, which are mixed-signal (analog and digital) in nature, are all integrated together on a single semiconductor chip [3]- [5]. Such heterogeneous integration enables versatile applications and high performances while maintaining the minimum energy-pertask [6], [7].
On the contrary, the integration of these heterogeneous components in close proximity imposes severe signal integrity challenges. Power delivery noises and fluctuations in the supply voltage levels due to load variations impose additional challenges to the SoC platform. One of the main challenges in these heterogeneous platforms is posed on the effectiveness of the PDN. PDNs are a critical design component in SoCs. PDNs consist of power converters and regulators to supply the required amount of power to the load circuits from the source (battery). Thus, a robust PDN is required to achieve a high level of power-supply integrity in SoCs. However, with the down-scaling of CMOS technologies, the operating voltage levels in SoCs are also decreased to enable the quadratic gains in the power (P = CV 2 F). Simultaneously, the current demands from SoCs are increased, thus inducing larger power losses and voltage drops on the board, package, and chip [8]- [10]. Moreover, heterogeneous components in SoCs demand multiple different voltage levels simultaneously for optimal energy consumption and performance of each component [8]. Furthermore, the design of PDN gets more complex due to the limited number of passive elements and I/O pins, and limited impedances on the board and package. Therefore, to supply high-quality power to multiple voltage domains of an SoC with an optimal energy-performance trade-off, high power efficiency, minimum footprint area, PDNs should be efficiently designed.
One of the most traditional PDN configurations is shown in Fig. 1(a). It uses off-chip power converters only. The battery power is first down-converted by off-chip switching-mode power converters, and then the power is delivered through board-level interconnects and I/O pads to the chip [11]. This PDN type with off-chip switching-mode power converters has a high power conversion efficiency.
Still, it typically consumes a large board area due to off-chip converters and passive components. Moreover, it is difficult to meet modern SoCs' huge current demands, which have a significantly high density of on-chip circuits. High currents through board-level interconnections and I/O pads inevitably cause large voltage drops and large power consumption, severely degrading the overall efficiency and the quality of on-chip power [12].
To reduce the voltage drops and power losses associated with board-level interconnects, power converters can be placed inside the package together with the chip, as shown in Fig. 1(b). This configuration is called PSiP. Within the same package of the load device (SoC), different chips such as switching drivers, controllers, and passive components are placed together. They can effectively lower the parasitic impedance effects both on the board and the package. However, the typical in-package integration technologies are still limited in supporting the increased number of on-chip power domains with large load current requirements [13]. The PSiPs have been considered as a partially off-chip configuration and an intermediate power-supply technology in terms of cost, complexity, and performances [13], [14].
To meet the challenging and ever-increasing power delivery demands in recent SoCs, efforts have been made to advance in full on-chip integration of the power supplies and converters, as illustrated in Fig. 1(c). Several state-of-the-art power converters were fully integrated on chip by using on-die MIM capacitors, air-core inductors, or non-magnetic package-trace inductors [15]- [18]. These fully integrated power converters improve the delivered power's quality and significantly reduce the transient times and the voltage droops in load transient occasions. For example, a fully integrated PDN [15] consisting of multiple FIVRs is implemented in Intel core SoC. It has a driving capacity of 700 A of load current while maintaining a peak power efficiency of 90%. In addition, it enables >50% improvement in the battery life for mobile products and ×2 -3 increase in the peak available power. Although full integration of power supplies has multiple advantages, a single on-chip power converter for each voltage domain cannot supply sufficient regulated currents with required preciseness.
To maintain a high-quality regulated power supply all across each voltage domain, hundreds of ultra-small micro-power converters/regulators can be integrated on chip at the PoL locations within each individual voltage domain [10], [19], [20]. This PoL-distributed PDN is illustrated in Fig. 1(d). This fine-grained power delivery can significantly enhance power supplies' quality and speed because PoL-embedded small regulators can achieve smooth, fast, and precise voltage regulation. The effectiveness and utilization of PoL-distributed PDN are evident from the recent commercial chip designs [6], [20]. Thanks to the PoL-integration of multiple micro-power regulators, high energy-performance trade-offs were achieved [6], [20]. Although full integration of switching-mode power converters has been demonstrated to some extent, however, the on-chip integration of large passive components still remains challenging. Besides, ultra-small regulators are generally less power efficient due to their inherent lossy nature (refer to Sec. II-B for more details). Thus, efforts are being made to mitigate these losses by system-level solutions through on-chip distribution networks [8].
To optimize the performances of PDNs, various equation-based verification and mathematical analyses have been presented. Because they are out of the main scope of this paper, some major works are briefly given here for readers' reference [12], [21]- [26]. An analytical analysis based on a convex optimization method can be used for a PDN with on-chip/off-chip buck converters, providing an accurate and fast evaluation of important characteristic parameters, such as power efficiency, output stability, and DVS [21]. As another example, a geometric programming can be utilized to find the optimal design variables for different architectures such as on-chip power converters of power delivery systems [26].
To determine suitable locations of on-chip power regulators, various optimization algorithms that are widely used for facility location problems have been applied in PDNs [22]. [23] presents a comprehensive methodology based on Mason's Gain Formula for modeling and analyzing distributed linear regulators and their interactions to optimize the PDN's overall stability. To boost power/energy-efficiency gains, some other optimization techniques are also presented [12], [24], [25].
As briefly discussed so far, various PDN architectures and techniques have been proposed to date to achieve highly efficient fine-grained power delivery and management in SoCs with robustness and small form factor. These architectures and techniques are comprehensively discussed in the next sections. In the next section (Sec. II), the main power converter topologies are reviewed and compared. Sec. III presents a comprehensive comparison between typical PDNs using off-chip power converters and heterogeneous PDNs with an off-chip power converter and on-chip small regulators. In Sec. IV, state-of-the-art heterogeneous PDNs are discussed. Sec. V presents a recently proposed new approach of power management using UVFRs along with state-of-the-art UVFR architectures for fine-grained dynamic voltage and frequency scaling. The architecture of a proposed heterogeneous PDN is presented in Sec. VI, and Sec. VII presents an implemented example of the regulator based on the principle of UVFR along with some measurement results. The paper is concluded in Sec. VIII.

II. TOPOLOGIES OF POWER CONVERTERS
DC-DC power converters can be largely categorized into two types, switching and linear converters, depending on the regulation control method. The switching-mode converters used to be preferred over the linear converters due to their higher power efficiencies (ideally near 100%). However, they normally require large off-chip passive components. On the contrary, the linear regulators typically consume much less area, so they are considered more suitable for full on-chip integration. Both the switching and linear converters are discussed more in detail in the following subsections.  Fig. 2(a) is widely used to supply a wide range of output voltage with a high current delivery while offering a high power efficiency (e.g., > 90%). However, they intrinsically have large voltage and current ripples at the output because of their switching-based control and operation. To reduce these ripples, two techniques can be commonly utilized. First, the size of the LC filter can be increased but at the cost of more area. Second, the switching frequency can be increased, but at the cost of more power consumption. Moreover, most performance metrics of the switching-mode buck converters such as the power efficiency, output load current (I LOAD ), and transient response are also largely affected by the LC filter size. Hence, on-chip integration of the buck converters is greatly hindered by these bulky passive components. The required size of these passive components can be reduced for full on-chip integration by operating at an ultra-high clock frequency F SW , but at the cost of increased power dissipation in the converter [11], [27]. As a result, the energy efficiency degrades significantly in on-chip buck converters.
Recently, some methods to integrate the passives, i.e., inductors and capacitors, on chip for buck converters have been investigated. Such methods include the techniques using air-core inductors [28]- [31], non-magnetic package trace inductors [15], stacked inductors [32] and on-die MIM capacitors [33]- [35]. These FIVRs can achieve faster transient responses than the regulators using off-chip passive components. However, the integrated passive components still suffer from significant area overheads and poor quality factors owing to their large equivalent series resistance (ESR, ≈200m /nH). Moreover, for full integration, the converters should operate at a very high frequency of F SW as 100s MHz. Such a fast F SW incurs high switching losses in the power switching devices and their corresponding drivers, thereby degrading the overall power efficiency. Due to these area and power overheads, on-chip integration of switching power converters is still considered undesirable for SoCs, especially ones with multiple voltage domains like the one shown in Fig. 1(d).

B. LINEAR CONVERTERS (LOW DROPOUT REGULATORS
The linear power converters, widely known as LDOs, convert V IN to a regulated V OUT against variations of the load current and the input voltage by comparing the feedback voltage with a reference voltage (V REF ) using an error-amplifier, as shown in Fig. 2(b). LDOs can provide a more precise and ripple-free supply voltage and current to a load circuit with much less area overhead than switching mode power converters (buck converters) [36]- [39]. They are considered as a better candidate for PoL power supply. It is because LDOs are typically easier to integrate on chip since they require active devices only and no passive components to down-convert the voltage [36], [38], [40]- [43]. Moreover, compared to buck converters, LDOs can offer better process and voltage scalability, smaller silicon area overhead, higher rejection capability against the power supply noise, and faster response times to load current changes. Due to these advantages, state-of-the-art LDOs have been integrated at PoL locations to implement low-cost, low-power distributed power management ( Fig. 1(d)) [40]- [44].
However, the power efficiency of LDOs is intrinsically limited by the dropout voltage V DO (= V IN − V OUT ). The power loss from the resistive division increases as the dropout voltage increases [3], [45]. The power efficiency of a LDO is directly related to the dropout voltage as follows: Thus, to increase power efficiency, the dropout voltage should be reduced. Also, the performance of analog LDOs ( Fig. 2(b)) is mainly dependent on the gain of the error amplifier. Since the circuits should operate at near-threshold voltage (NTV) levels in modern SoCs, it has become greatly challenging to attain a sufficient EA gain. Analog LDOs' transient response, line/load regulation, and regulation range are adversely affected by such insufficient EA gains.
Contrarily, DLDOs are suitable to operate at NTV levels, so they have been extensively developed over the past few years [46]- [48]. In addition, the dropout voltage in DLDOs is typically quite small (40 mV -50 mV) compared to analog LDOs (∼200 mV), thus making DLDOs a better candidate in terms of the maximum achievable power efficiency. Therefore, distributed PDNs ( Fig. 1(d)) using multiple small DLDOs are most suitable and largely accepted to highly efficient fine-grained power delivery and management.

III. CONVENTIONAL PDNs
As discussed above, buck converters are more powerefficient, but they are inappropriate for on-chip integration due to the large physical sizes of the required passive elements. Alternatively, LDOs are more suitable for delivering high-quality power to load circuits with reasonable area overheads. But, they suffer from limited power efficiencies due to their dropout voltages. Hence, PDNs with either switching-mode or linear converters only suffer from either significant area or power overhead. These overheads for each PDN type are further illustrated in Fig. 3. In-package power converters (PSiPs) partially trade off the power and area overheads, but both their area and power overheads are not small enough yet. As shown in Fig. 3, a desirable PDN should have both small power loss and area.
To utilize the advantages of buck and LDO converters simultaneously, a heterogeneous PDN is commonly adopted [11], [12], [19], [20], [22], [24], [45], [49], [50]. As illustrated in Fig. 4, the heterogeneous PDN converts the input power by using off-chip buck converters first and then regulates these converted powers by using on-chip LDOs. This de-coupling of the power conversion from the power regulation lowers the power and area overheads. A more  detailed discussion on typical PDNs using off-chip power converters and heterogeneous PDNs is presented below in this section.

A. TYPICAL PDN USING OFF-CHIP CONVERTERS
A simplified diagram of a typical PDN with off-chip buck converters for powering a mobile SoC is shown in Fig. 5

(a).
Here the off-chip buck converters supply power directly to individual blocks and multiple processor units of the SoC through on-board power supply rails. Across these power rails, intermittent but large IR drops happen when the multi-core CPUs' power density increases [21], [51]. These large and fluctuating IR drops worsen the response latency of the PDN to load current changes, resulting in further large voltage drops in the power supply [52]. Such a slow response of the conventional PDN in a scenario of voltage scaling is illustrated in Fig. 5 In addition, to power multi-core CPUs, one off-chip buck converter is typically implemented using a shared power rail, as shown in Fig. 5(a). However, in the shared power VOLUME 9, 2021 rail, the supply voltage typically remains fixed at a certain voltage level by a single off-chip buck converter. Thus, all the CPU cores that share the power rail have the same power supply level regardless of each core's workload. This results in wastage of the energy resources, as shown in Fig. 5(c). Hence, the typical PDN using off-chip buck converters results in a reduction of energy efficiency [52], [53].

B. HETEROGENEOUS PDN
To overcome the limitations of the conventional PDNs, heterogeneous PDNs using on-chip LDOs have been extensively investigated [19], [20], [22], [24], [49], [50]. A simplified block diagram of the heterogeneous PDNs for a mobile SoC is shown in 6(a). Here an off-chip buck converter, which is highly power efficient, serves as a master power converter while multiple PoL-embedded LDOs serve as slave regulators dedicated to each CPU core or block. As shown, dedicated LDOs are integrated to supply a precise ripple-free voltage to each CPU core or block. Each of the dedicated LDOs produces and varies the required voltage for each CPU depending on the workload condition, thus saving a significant amount of energy [24] as shown in Fig. 6(b). In addition, because these LDOs are embedded at PoL, i.e., near the target circuit, the IR drop is significantly reduced. Hence, a fast and precise voltage scaling can be implemented more easily for each CPU core or block, unlike the typical PDN using off-chip converters [49], as shown in Fig. 6(b) and (c).
However, even with substantial research and development in on-chip power delivery and power management, there are still several challenges to address further to achieve energy-efficient fine-grained power delivery and management. These design considerations and state-of-the-art power delivery and management techniques are discussed in the next section.

A. DISTRIBUTED HETEROGENEOUS PDN
The heterogeneous PDN can optimize the performance and power consumption for each core of modern processors to some extent while supplying power to multiple voltage domains, as shown in Fig. 6. But, the IR drops induced across the on-chip parasitic resistances have become substantially large due to the increase in the power density of microprocessors over the past decade [54]. These IR drops can no longer be ignored during the design process in the ultra-deep-submicron CMOS process, where the operating voltages are already very low. Energy losses and fluctuations associated with the IR drops are expected to be worse in the future as workloads in the CPU cores for cognitive computing and artificial intelligence becomes more heterogeneous and specialized [54]. To guarantee the performance in such workload conditions, the minimum supply voltage for the core must be larger than a critical voltage. Therefore, it requires a guardband above the critical voltage, causing an additional large power penalty [25], [54].
In addition, the power efficiency of PDNs can also be degraded by the poor efficiency of LDOs owing to their large dropout voltages. Moreover, on-chip LDOs may also suffer from a slow loop latency, which results in a severe voltage drop during peak load transients, further requiring a larger V DD guardband. To address these challenges of the heterogeneous PDNs, a heterogeneous PDN with multiple distributed digital LDOs is proposed [54]. Digital LDOs can maintain a low dropout voltage of 40 -50 mV, offering a higher power efficiency as compared to their analog counterpart. A simplified block diagram of an exemplary heterogeneous distributed PDN is shown in Fig. 7. Instead of using a single large LDO, each core embeds 9 LDOs, all supplied by the common off-chip buck converter. As shown, these multiple LDOs are connected like a power grid to supply power to the core and operate in a cooperative fashion to significantly reduce IR drops compared to typical heterogeneous PDNs. Because the LDOs are placed in close proximity to the load, noises incurred by the parasitic elements across the power lines can be reduced as well.
Furthermore, by sharing the information with neighboring LDOs, the PDN can significantly enhance the transient response [54]. Fig. 8 illustrates the cooperative regulation scheme, which can enhance the transient response in two 46578 VOLUME 9, 2021  aspects. First, when a large-load-current transient occurs anywhere in the power grid, the load current does not flow from the most adjacent LDO only but also from other neighboring LDOs to quickly mitigate the voltage drop. Second, in each local LDO, the comparison information about the local output voltage is shared with neighboring LDOs. Upon receiving this information, the neighboring LDOs can attentively tune the voltage in consideration of both the shared information and the local voltage. Using this distributed heterogeneous PDN, the overall voltage drop was reduced by more than 66%, and a sub-ns transient response time was achieved against more than 500 mA of load current [54].

B. FULLY INTEGRATED RECONFIGURABLE PDN (RPDN)
As an efficient PDN for complex multi-core processors, per-core FIVRs with on-chip inductors and multiple local power gates (PGs) were demonstrated [15], [20], [28], [55]. The FIVRs can offer high-current highpower-density fine-grained power management with a fast transient response because they can operate on a high-speed switching frequency (F SW ). For instance, in the PDN [15] for an Intel processor, the power conversion stage is implemented on the motherboard, while the power regulation comprises 31 FIVRs. These FIVRs are synchronous multiphase buck converters while operating at a fast F SW (=140 MHz) and up to 16 phases. The FIVRs in [15] achieved a current density of 1.3 A/mm 2 and a power efficiency of 88%, while driving four voltage domains of the core with a total load current of 50 A at F SW = 140 MHz. To achieve such performances, however, the integration of the air-core inductors (16 inductors for 16 phases) requires a massive silicon area. Moreover, as the aggressive CMOS scaling continues, each core shrinks, and the number of cores per die increases. Hence, the required number of inductors is also increased. On the contrary, each inductor's required footprint does not scale down, posing a severe challenge in integrating these inductors [56]. Therefore, the FIVRs are not suitable to achieve efficient power delivery and power management in multi-core SoCs because of their huge area overhead.
To overcome the area constraints of the PDN using FIVRs only, a fully integrated autonomous reconfigurable PDN (RPDN) was recently proposed [56]. As shown in Fig. 9, this RPDN reduces the number of FIVRs to just two. Each FIVR is dynamically shared among four co-located cores via per-core DLDOs [56]. Each per-core DLDO consists of a local LDO control (LLC) block, which controls two local power gates (PGs), as shown in Fig. 9. With these techniques, the 2-input/4-output RPDN shown in Fig. 9 delivers a better performance on demand while maximizing the overall energy efficiency along with better core count and scalability [56]. However, the fully integrated RPDN using on-chip inductors still occupies a significant valuable area on the package, conflicting with the stringent requirements of modern enterprise microprocessors [55]. Moreover, due to its limited achievable on-chip inductance, the peak output power of the FIVRs is also restricted.
Based on this fully integrated RPDN with 2-input/4-output FIVRs, a fully-integrated QOESC converter was proposed in [57]. The top-level structure of the PDN using the QOESCs for 4 CPU cores is shown in Fig. 10. The PDN consists of a switched-capacitor (SC) power converter and four LDOs. The SC converter comprises 32 equal partitions of capacitors and switches, operating as a power conversion stage. Each of the LDOs operates as a regulator for each core. This SC design uses an extended binary (EXB) scheme, which uses two flying capacitors to produce three ratios, each with VOLUME 9, 2021 1/4 resolution. For each output ratio, 32 time-interleaved phases are generated with equal partitions to reduce the output voltage ripples at the SC output. While each core's power is supplied by its own LDO, which is a phase-locked DLDO in this case. The QOESC routes the power on demand by sharing the total capacitance of the SC network across all the four cores and delivering power to each core in a time-interleaved manner. For example, if the current demand of a particular core increases, more resources (capacitors and switches of the SC network) are dynamically allocated to the core. If the power demand increases further, the corresponding SCs are dynamically configured to change the conversion ratio to provide a higher output voltage. The QOESC with this resource-sharing technique achieved a power efficiency of >87% and 2.5× core frequency range enhancement. However, QOESC's load-current driving capability per core is just up to 5 mA, which is not sufficient for a typical high current demand of multi-core processors.

V. UVFRs FOR FINE-GRAINED DVFS
The fully integrated heterogeneous PDNs using distributed LDOs [28], [55]- [57] increase the energy and power efficiencies of the multi-core processors. However, as the number of cores in a microprocessor increases due to an increase in computing demands, the performance is often limited by the system's TDP rather than the total number of the cores, which can be integrated into the processor chip [55], [58]. To achieve optimal use of computing resources, high power efficiency is of paramount importance. In power delivery and management, dynamic voltage and frequency scaling (DVFS) is a well-established method for dynamic thermal management. It is extensively utilized to increase the power efficiency in multi-core processors and SoC platforms [59], [60]. This design strategy is more needed for cases with different workloads. For example, some cores on the processor may not be required to operate at the maximum speed. Then, applying the same supply voltage to all cores would be wasteful. In addition to TDP, variations of the device, circuit, and system parameters degrade the performances of processor cores in an SoC [61]. These variations are categorized into static and dynamic parameter variations. The static parameter variations occur during manufacturing processes. The parameters are different across dies, but the variations are static over time [61]. Contrarily, the dynamic parameter variations happen over time during processor operations as environmental and workload conditions change. The dynamic parameter variations include supply voltage V DD drops, temperature-dependent variations, transistor aging, and processor power variations due to workload fluctuations [62].
A key feature of the dynamic parameter variations is the time scale, over which the parameter changes meaningfully. An example of slow-changing variations is the temperature fluctuation, while the V DD drop is a representative example of fast-changing variations [62].
In commercial processors, static parameter variations can be easily taken care of. Because these variations occur during the manufacturing process, they can be easily detected. For that purpose, each processor is tested using either the maximum clock frequency (F MAX ) at a constant supply voltage (V DD ) or using the minimum V DD at a constant clock frequency (F CLK ) across multiple DVFS conditions per die. These F MAX or V min tests enable to adapt F CLK and V DD per die to compensate the static variations. In contrast, the dynamic parameter variations require the processor to either operate at a F CLK lower than the F MAX for a target V DD (i.e., F CLK guardband) or at a V DD higher than the V MIN for a target F CLK (i.e., V DD guardband) [62]. Due to these guardbands, the processor cannot exploit the opportunities for higher performance by increasing F CLK or for lower energy by reducing V DD . Hence, their performance and energy efficiency are compromised. Furthermore, the guardband is often required to be increased further due to the wide DVFS ranges of state-of-the-art processors.
These guardbands cannot be reduced sufficiently by the voltage-referenced voltage regulators, which are discussed so far in this paper. These regulators require on-die parameter monitoring circuits and an adaptive control circuit inside the processor, as shown in Fig. 11, to measure specific dynamic parameters (e.g., temperature, supply voltage, and aging) and adjust F CLK and V DD accordingly to compensate the dynamic parameter variations. This technique has several drawbacks. One of the most significant ones is the area and cost overheads caused by the additional sensors and circuits. Another one is that it requires a long recovery time to mitigate the fast-changing dynamic parameter variations like fast-switching load currents, which causes severe supply voltage V DD drops.
To address these challenges, variation-aware voltage and frequency regulators have been proposed over the past few years [62]- [68]. Due to their main operation principle, i.e., regulating the voltage and frequency simultaneously, they are broadly known as UVFRs. UVFRs can aggressively reduce the guardband by adapting the voltage and frequency together to the supply voltage V DD and temperature variations. Because UVFRs continuously track not only V DD but also the temperature variations, the voltage margin that should be added due to variations in other conventional systems can be taken away even at low V DD , improving the power efficiency of the SoC. Conventional and stateof-the-art topologies of UVFRs for fine-grained DVFS and variation-aware adaptive voltage and frequency scaling are discussed in the following subsections.

A. TYPICAL VOLTAGE AND FREQUENCY REGULATORS FOR DVFS
An exemplary DVFS system based on two independent loops for the voltage and frequency (V REG and F REG ) is shown in Fig. 12. The voltage regulation loop, which can be implemented by a buck converter, switched-capacitor converter, or LDO, generates V REG the regulated voltage based on a reference voltage (V REF ), as shown in Fig. 12(a). In the frequency regulation loop, a phase-locked loop (PLL) is usually employed to regulate the clock frequency (F REG ) based on a reference clock (F REF ). In this system, the two loops are independent and don't affect each other. Thus, in response to any voltage drop in V REG , which are commonly caused by load current transients, F REG is not adapted accordingly as shown in Fig. 12(b). Then, this occurrence degrades the timing slack and may cause a failure of timing margin in the processor [62]- [64]. To avoid such issues, a voltage guardband should be introduced. But it is gravely undesirable because of its overhead in power consumption, design complexity, and area [62], [64].

B. STATE-OF-THE-ART UVFRs
To minimize the guardband, it was proposed recently to combine the V REG and F REG regulation loops into a single unified loop. This loop can be based on an LDO [65], switched capacitor [64], or buck converter [66]- [68]. Buckand LDO-based UVFRs are shown in Fig. 13(a) and (b). In both schemes, the basic operation principle of the unified V REG and F REG regulation is almost identical. As illustrated, the UVFR systems generate F REG from a TRO. Based on  F REG , V REG is regulated and supplied to the digital logic load, which is typically a processor. At the same time, V REG is also supplied to the TRO. This single control loop implements an adaptive clocking by using the V REG -powered TRO. The TRO has a replica path for the load circuit's critical path, mimicking both the logic delay and the voltage drop of the critical path [64]- [66]. As a result, when any slow-down of signal progression happens in the critical path due to either V REG drops or other dynamic variations, the clock also slows down accordingly. Such clock stretching (slow-down of the clock) ensures avoiding any failure in the timing margin. Similarly, in the events of voltage overshoots, the clock speeds up accordingly, maintaining the timing margins in the VOLUME 9, 2021 processor efficiently. As such, the use of the V REG -powered TRO guarantees a consistent timing slack regardless of the voltage drop events that may happen either due to load current transients or any PVT variations.
The UVFRs in SoCs exhibit several design benefits, including their voltage-reference-free all-digital architecture and effective responses to the dynamic parameter variations in order to maintain the performance. One of their main advantages over the typical regulation scheme is the onthe-fly DVFS without interrupting the load circuit operations. In addition, the timing margins of the load circuits are successfully met using UVFRs because the V REG -powered TRO immediately stretches/speed-up the clock in response to any V REG changes during current transients. Thus, unlike conventional regulation schemes, the V REG margin is not determined by the transient V REG response to the load variations anymore, but rather by other factors like I/O and buffering considerations. Thus, these margins are significantly reduced by using UVFRs. For instance, the UVFR in [66] achieved 96% of margin recovery even though the loop bandwidth was well below 1 MHz. It reduced the processor's overall energy consumption by 48% while supplying V DD of 1.0V. Due to the reduction of the guardband by using UVFR, the UVFR in [65] reduced the overall supply voltage of the load circuits by 27%. Furthermore, it should also be noted that UVFRs require F REF only. They eliminate the need of V REF [64]- [67]. Hence, they don't require to use bandgap reference (BGR) circuits.

VI. PROPOSED HETEROGENEOUS PDN USING FREQUENCY-REFERENCED DLDOs
Considering the advantages of UVFRs, we propose a modified heterogeneous PDN that uses FR-DLDOs for fine-grained DVFS. A simplified block diagram of the proposed heterogeneous PDN is shown in Fig. 14. As shown, an off-chip buck converter is utilized as a master power supply to multiple frequency-referenced DLDOs, which are all integrated inside the SoC. The FR-DLDOs operate as slave power supplies dedicated to each load circuit block. They are fully integrated right at the PoL locations. In the proposed PDN, the FR-DLDOs do not require V REF from a BGR circuit. Instead, a global clock is utilized as the reference frequency (F REF ) for each FR-DLDO. Hence, multiple BGRs required for typical LDOs and PDNs, are no longer needed in the proposed heterogeneous PDN, thereby significantly reducing power distribution rails, board area, and component counts. Moreover, each FR-DLDO sets the local supply voltage (V Ci ) for the designated load circuit and the local output frequency (F Ci ), depending on the workload demand of the load circuit. V Ci and F Ci are the output voltage and frequency from each FR-DLDO, and they are used as the local power supply and the operating clock for the designated local load circuit, respectively. The FR-DLDO generates and regulates this tightly coupled (F Ci -V Ci ) pair together, achieving onthe-fly fine-grained DVFS.
There are some FR-DLDOs [65], [69] that can be utilized in the proposed heterogeneous PDN. The FR-DLDO in [69] successfully operates at under-or near-threshold voltage levels with dynamic DVS. However, it can drive only a small amount of load current, just up to 1 mA, which is not sufficient to meet SoCs' typical current demands. In addition, its loop response time is also too slow, so it cannot meet the fast switching current demands of SoC load circuits. The UVFR in [65] has an enhanced loop response time and a faster load-transient response time (T R ). But its current driving capability is also minimal at 6 mA. To overcome these limitations of the FR-DLDOs [65], [69], a fast-transient FR-DLDO with a large current-driving capacity and wide regulation range is presented in the next section [63]. The FR-DLDO achieves a faster transient response time due to its transient-boost control. In addition, its 10-bit binary-weighted PMOS switch array can drive a large load current up to 40 mA. Fig. 15 shows a block diagram of the implemented FR-DLDO (presented in [63]) for the proposed heterogeneous PDN. The FR-DLDO generates both V REG and F REG by using a single regulation loop with a TRO, as shown in Fig. 15. This single unified loop tracks the change in V REG , which may happen due to different workload conditions, and adjusts (slow-down or speed-up) F REG accordingly. After meeting these workload The VCO does not model the critical path of the target load circuit in terms of the workload requirement. Thus, this FR-DLDO's output voltage scaling is independent of the frequency scaling, and it cannot optimize V REG and F REG according to the variations in the load circuit. Contrarily, in the proposed FR-DLDO, the TRO is powered by V REG , generating the core clock F REG . Thus, the clock intrinsically adapts to the voltage (V REG ) and temperature variations while variations in the load circuit's critical path delays are compensated. As a result, the timing margins are maintained nearly constant in the implemented FR-DLDO [63]. However, the critical path modeling of the TRO is totally loadcircuit-specific. If the load circuit changes, the TRO should be reconfigured. In addition, the FR-DLDO should be integrated with the close proximity of the load circuits on the same chip to ensure a good matching between the critical path delay of the load circuits and the TRO under PVT variations.

VII. IMPLEMENTATION EXAMPLE OF FREQUENCY-REFERENCED DLDO A. OVERALL ARCHITECTURE AND OPERATION
In the implemented FR-DLDO shown in Fig. 15, the phase difference between the two incoming clocks F REF and F DIV is detected by a bang-bang phase frequency detector (BBPFD). The phase-difference outputs (UP/DN ) of the BBPFD are fed to a digital loop filter (DLF). The DLF operates as both a low-pass filter and a loop controller. The filter counts up or down according to the polarity of UP/DN and accordingly turns on/off PMOS power transistors by controlling a 10-bit output signal (SW [9:0] Simplified block diagram of the implemented FR-DLDO is shown in Fig. 16(a). The FR-DLDO generates a tightly coupled pair of V REG and F REG . V REG serves as the power supply of the TRO, which generates F REG . The overall behavior of the FR-DLDO can be modeled in small signal as shown in Fig. 16(b). V REG and F REG can be estimated from the small-signal model as follows: where K PD is the gain of BBPD, K PMOS is the gain of PMOS output stage in V/rad, K TRO is the gain of the tunable-replica oscillator in Hz/V, and p 1 is the output pole frequency in Hz. As Fig. 16(b) and the equations show, the FR-DLDO is a second-order system consisting of 1) a pole at DC due to the accumulator operation and 2) the load-dependent output pole due to C OUT and the load. We can get the steady-state open-loop transfer function O(s) in s domain as follows: p 1 and z 1 at the output node are given as follows: where H 0 is the loop gain which mainly depends on the gains of K PD , K PMOS , and K TRO , R ESR represents the equivalent series resistance of C L , and g ds is the sum of the output conductance of PMOS transistors at steady state. The FR-DLDO uses a minimal value of C L as 10 pF, which makes p 1 and

1) TRANSIENT-BOOST CONTROL
To enhance the load transient response and reduce the voltage undershoot ( V REG ) for load current step changes, a transient-boost control was proposed [63]. Its circuit diagram is shown in Fig. 17 Fig. 17(b). As shown, the boosted clock significantly reduces the peak of voltage undershoot V REG and the transient response time (T R ) by shortening the feedback loop response time (t RES ). It is because the peak of V REG mainly depends on the t RES with a given load current step I LOAD and the output capacitor (C L ) [45]. V REG can be roughly calculated as follows: 46584 VOLUME 9, 2021  Fig. 19. As shown in Fig. 19(a), for a load current step of 37 mA changing from 3 mA to 40 mA with an edge time (T EDGE ) of 25 ns, the FR-DLDO recovered V REG with V REG of 300 mV and settles within 5 µs of the transient response time (T R ) without using the transient-boost control. On the contrary, with the assistance of the transient-boost control, V REG was reduced to 133 mV, and V REG was fully recovered and settled within 400 ns of T R , as shown in Fig. 19(b). The proposed transient-boost control reduced V REG and T R by 55.6%, and 92%, respectively. In addition, the FR-DLDO achieved a load regulation of 0.014 mV/mA while driving a load current of 40 mA. The performance summary of the FR-DLDO and other state-of-the-art FR-DLDOs is shown in Table 1. The proposed FR-DLDO outperforms other stateof-the-art FR-DLDOs [65], [69] and frequency-referenced buck converter [67] in the regulation range, peak power efficiency, and figure-of-merit (FOM).

VIII. CONCLUSION
PDNs and power management techniques have gained prime importance in SoC designs due to the integration of diverse heterogeneous circuits and systems on a single chip, requiring multiple voltage domains and high-quality fast power delivery. To meet these challenging power delivery and management demands, PDNs and power management systems are required to perform intelligent, energy-efficient, finegrained, and dynamically controlled on-chip power management. The shortcomings of the typical PDNs using off-chip power converters have been well overcome over the years by using state-of-the-art techniques of PSiP and on-chip integration of power converters. On-chip integration of just a few power converters is no longer enough to address highly-efficient fine-grained power management challenges. However, on-chip integration of a large number of power converters drastically increases the occupied silicon area. Therefore, state-of-the-art PDNs with heterogeneous structures have been utilized as a good alternative to fully-on-chip integrated converters. In the heterogeneous PDNs, highly power-efficient switching-mode power converters (buck) are utilized as off-chip converters, and multiple small-sized LDOs are integrated on chip at PoL locations inside the SoC. This technique delivers highly precise on-demand voltages to multiple voltage domains of the SoC. The state-ofthe-art PDNs, including the distributed heterogeneous PDN, fully-integrated reconfigurable PDN, and quad-output elastic switched-capacitor PDN using per-core digital LDOs, are discussed thoroughly in the paper. These PDNs enable fine-grained power management with high peak-power efficiencies and high current densities for multi-core processors. In addition to these PDNs, frequency-operated UVFRs have gained a lot of attention over the past few years due to their capabilities of variation-aware fine-grained DVFS in VOLUME 9, 2021 multi-core processors. The UVFRs can achieve on-the-fly DVFS over PVT variations without interrupting the timing margins of load circuits. The UVFRs help achieve high energy efficiencies in the load circuits thanks to their simultaneous voltage and frequency scaling, which is tailored to the load circuit's workload demand. Based on the UVFR operation, we proposed a distributed heterogeneous PDN with FR-DLDOs to achieve fine-grained DVFS in an SoC. In the proposed PDN, a buck converter is utilized as a master power supply to multiple FR-DLDOs, which are integrated inside the SoC as slave power supplies dedicated to each load circuit. An implementation of FR-DLDO for the proposed heterogeneous PDN is presented. A transient-boost control is used in the FR-DLDO to mitigate the transient response time and voltage drops dynamically. It is activated only when the regulated output voltage faces any deviation during the steady state. The 0.02-mm 2 FR-DLDO prototype fabricated in 40 nm CMOS process outperforms other FR-DLDOs by achieving a minimum figure-of-merit, maximum peak power and current efficiencies, and wide regulation range.
The distributed heterogeneous and fully-integrated PDNs are expected to be adopted more widely because distributed digital LDOs in these PDNs can supply a uniform power to each core with a small IR drop and help the thermal management, as proposed in a recent design [70]. Furthermore, with the advancement of synthesizable digital LDOs for distributed PDNs, PDN designs will become easier and process scalable. Some synthesized digital LDOs were recently demonstrated with promising performance metrics [71], [72]. In addition to the distributed PDNs, the synthesizable design of UVFR is recently adopted due to its scalability [67]. Moreover, buck-converter-based UVFRs have been also emerging because they are capable of producing multiple voltage outputs with a single voltage only, i.e., single input, multiple output (SIMO). A SIMO buck converter with an adaptive-clocking scheme for UVFR operation was proposed, achieving significant performance gains in terms of the voltage margin reduction and power efficiency enhancement [73]. Over the decades, PDNs for large power delivery have been improved significantly in terms of power efficiency, cost, area, and many other factors. In addition to this, there has been another recent trend for PDNs, i.e., a continuously growing demand for ultra-low-power and low-cost power delivery and management systems for wearable and IoT devices.