Early-Stage Planning of Switched-Capacitor Converters in a Heterogeneous Chip

The switched-capacitor converter (SCC) has been widely used for voltage regulation in multicore chips, where energy efﬁciency is the major concern. However as the overhead to integrate SCCs in a chip is non-negligible, the SCCs could not be overused. Hence in this paper we propose an early stage SCCs planning framework to obtain the SCC supply scheme together with the optimized Metal-Insulator-Metal (MIM) capacitance allocation and converter ratio selection for each SCC when the given number of SCCs is less than the number of cores . Besides, our method could also explore to ﬁnd the best number of used SCCs for a given chip. The experiments show the results of our SCC planning methods.


I. INTRODUCTION
By fully exploiting the unique advantages of different types of cores (CPU, GPU, accelerators etc.), a state-of-art heterogeneous multi-core chip can achieve both powerful performance and high energy efficiency [1]- [6]. For example, the Apple A12 processor [5] has two high-performance CPU cores, four energy-efficient CPU cores, four GPU cores and eight neural engine cores; The Kirin 980 processor [6] has two high-performance CPU cores, two medium-performance CPU cores, four efficiency CPU cores, one GPU core and two neural processor cores. The heterogeneous chip is typically divided into several power domains [7], and each domain can be powered individually by the integrated on-chip voltage regulators such as switching-capacitor converters (SCCs) [8], [9], inductive switching regulators [7], [10] and LDOs [11], [12] (see Fig. 1 for an example). Table 1 shows the comparison of different types of voltage regulators. The main problem with inductive switching regulator is that inductor could not easily be integrated on the chip, and is usually manufactured on the package. The SCC has advantages such as wide output voltage, high energy conversion efficiency and high power density. Therefore, it has been widely studied and applied in recent processors [13]- [15].
The associate editor coordinating the review of this manuscript and approving it for publication was Poki Chen .    'Step up/down'' means output voltage is higher or lower than the input source voltage (i.e., Vdd). a frequency of f sw . The charging and discharging behavior of the flying capacitors result in supply voltage ripple V at the output. It should be mentioned that although only a 3:1 SCC is shown in this figure, other conversion ratios could be achieved with different topologies [16].
Energy conversion efficiency is critical for the SCC design [17]- [20]. A lot of prior works have investigated the optimization of the SCC design to achieve better energy efficiency, and have proposed techniques such as tuning the size of flying capacitors, operating frequency and switch width etc. [17], [21]- [26]. However, there are limited works on optimizing the energy efficiency of SCCs in a holistic way in a multi-core chip. In [9], the authors improve the overall energy efficiency of many-core system by dynamically adapting the switching frequency f sw of each SCC to the specific output load. The authors in [26] aim to not only achieve the highest energy efficiency but also suppress supply noise by optimizing the allocation of limited die area between flying capacitance and decoupling capacitance. In [19], the authors propose a system level efficiency model which characterizes the number, size and distribution of the SCCs, and they solve the optimization problem by mathematical optimization methods.
Although SCC is a promising technique for implementing fine-granularity power management in a mulitcore chip, we should not overuse the SCCs in a given chip since integrating the SCCs in a multicore chip needs the control circuit, the routing resource, the power consumption of clock signals, the chip area and so on [19]. On the other hand, in a heteroge- neous multi-core chip, the cores with close voltage demands (i.e., their voltage demands slightly differ) are usually placed together in the physical layout [7]. This implies that theoretically the SCCs in a domain with higher voltage supply can also be shared by the cores in a adjacent domain with sightly lower voltage demand. In this way, we can potentially reduce the number of used SCCs for a given chip. Therefore, in our work, we will explore the best planning strategy of the SCCs in a heterogeneous chip.
More specifically, our work tries to optimize the energy efficiency of a SCC-powered heterogeneous chip from the following two important aspects: itemsep=0.2em • The SCC supply scheme.
Let us examine the two cases shown in Fig. 3. In both cases, two SCCs are used to power a four-core chip (the detailed load information could been see in Table 4 in Section VI), while Case2 can achieve higher efficiency with a different supply scheme (the mapping between SCCs, the supply side, and cores, the demand side). When several cores are supplied by one SCC, the cores with lower demand voltage would be over-supplied, which leads to extra power consumption. And different supply schemes would have different extra power consumption. 1 So it motivates us to develop a smart SCC supply scheme for better energy efficiency. Besides, we also explore the number of used SCCs in a chip.
• The capacitance allocation and ratio selection of SCCs. The energy efficiency of the SCC is related to its switching capacitance and conversion ratio (see Section III).
Recently Metal-Insulator-Metal (MIM) capacitors have been utilized as flying capacitors for SCCs [8], [18], [27] (see MIM capacitance in Fig. 1), and more than one conversion ratios are available for a SCC to supply for the loads where the ideal output voltage of the conver- 1 In Case1 the SCC2 supplies Core2, Core3 and Core4 simultaneously, the output voltage of SCC2 should be no less than the maximal voltage demand among the three cores (i.e., leading to output 0.90V for all the three cores). Hence in Case1 the Core2 and Core3 would have an over-supplied voltage which leads to extra power consumption, and the system would have an extra power loss (0.90-0.68)*0.10+(0.90-0.82)*0.30=0.046W (see SectionIV-B for details). On the contrary, in Case2 the extra power loss is (0.68-0.55)*0.08+(0.90-0.82)*0.30=0.0344W. VOLUME 8, 2020  sion ratio is greater than the demanded voltage. Table 2 shows two cases with different capacitance allocation and conversion ratios for SCCs in Fig. 3. We can see that the energy efficiency is different. Since different capacitance and ratio in each SCC would lead to different power loss with the given SCC loss mechanism (see Section III), capacitance allocation and ratio selection would have effect on the energy efficiency. 2 Hence, given a supply scheme we also need to optimize the capacitance allocation and conversion ratio of the SCCs to improve the efficiency. Motivated by the aforementioned observations, in this work, we propose an early stage SCCs planning framework to improve the energy efficiency of the multi-core chips when the number of SCCs is less than the number of cores. It is noticed that when the number of SCC equals to the number of cores, each SCC could be used to supply the power for one core by converting the global voltage to a specific demand voltage of the core. In this way, a finest power management is achieved. However, the overhead associated with integrated SCCs (such as control circuit, routing resource) is non-negligible [19]. As a result, if there is limited budget to integrate SCCs, we should integrate less number of SCCs (i.e., the number of SCC is less than the number of cores). The pros in this scenario are that we could reduce the overhead of integrating SCCs, while the cons are that since some cores with different demand voltages are supplied by the same SCC outputting the higher demand voltage of cores, there would be some extra power loss due to the over-supply of some cores. And we also provide a method to guide how many SCCs should be used to get the highest efficiency for the system. The rest of this paper is organized as follows. We formulate the problem in Section II. In Section III, we introduce the basic loss mechanism of SCCs. Then the proposed SCC planing framework -SCC supply scheme and SCC optimization are introduced in Section IV. We then show the experimental results in Section VI. Finally, we make conclusions in Section VII. 2 In Case1, the MIM capacitance allocated in each SCC (C sw ) is proportional to the area of the cores this SCC supplied, and the selected ratio is the one whose no-load output voltage is minimal but larger than the demand of the supplied cores. According to the SCC loss equation (see Section III-A), the loss would be P scc = P scc1 + P scc2 = (e 1,scc1 · C sw,scc1 + e 2,scc1 C sw,scc1 ) + (e 1,scc2 · C sw,scc2 + e 2,scc2 C sw,scc2 ) where e 1 and e 2 are the ratio-determined parameters. From this equation, we could see that the ratio-determined parameters would significantly affect the loss model, and after given these e 1 s and e 2 s the capacitance allocated in each SCC would also affect the efficiency. In this situation, the MIM capacitance allocated and the selected ratios in each SCC in Case1 may not lead to the minimal loss. As a contrast, the MIM capacitance allocation and ratio selection with our proposed method (introduced in Section IV-C) would have a higher efficiency.

II. PROBLEM FORMULATION
In the literature [7], Intel has proposed the one regulator per core scheme to achieve the finest power management for a given chip. However, considering the overhead associated with distributing a large number of regulators over the chip, in this work, we focus on the scenario that the available SCCs are typically limited (less than the number of cores).
The problem can be formulated as follows: Given 1) the layout of a M -core heterogeneous chip, 2) the minimal supply voltage and maximal demand current of each core, 3) the available number of SCCsM (1 ≤M < M ), 4) the total MIM capacitance for the flying capacitors of SCCs and 5) the available ratios for one SCC, our work attempts to maximize the overall energy efficiency of the power supply system by finding 1) the best supply scheme for a specifiedM SCCs, together with 2) the amount of flying capacitance and the conversion ratio of each of theM SCCs.
In our work, we also explore the numberM (1 ≤M < M ) to find the best number of SCCs for the given M -core chip.
It's noticed that as the MIM capacitors are designed and used as the flying capacitors in SCCs, works like [18], [28] have studied the design techniques to optimize the performance and quality of MIM capacitors, such as optimizing the ESR (i.e., Equivalent Series Resistance). Hence, we don't take the ESR issue into consideration in this paper. Instead, we use the optimal parameter like frequency shown in [18] to significantly reduce the ESR's effect on the MIM capacitors.

III. ANALYSIS OF THE INHERENT POWER LOSSES OF SCCS
In this section, we will briefly introduce and analyse the loss mechanism of SCCs, and then further discuss the ratios and flying capacitance used in SCCs.

A. THE LOSS MECHANISM OF THE SCCS
The switched-capacitor converter would have several inherent losses when delivers energy from the input side to the output side. These non-negligible power losses include switching loss caused by charging flying capacitance, conduction loss caused by driving these switches and load power loss caused by the output voltage ripple. Each kind of the losses could be seen in Fig. 4, and all of these are explained in the work of [17], [19] and [22]. Our problem formulation in Section IV-C are based on these loss mechanisms. The detailed formulation of these loss mechanisms are introduced in the following.
(1) Conduction loss. For a specific SCC topology, when the switches are turned on and the charge would be transferred to the flying capacitors through the switches, and part of the power will be dissipated in the switches [19] as Here, M sw is the parameter that related with the topology, I out is the load current of a SCC and R on is the equivalent resistance density of a switch when it is on. σ is a fitting parameter and γ is a topology-dependent parameter. f sw is the switching frequency of the SCC.
(2) Gate-drive loss. When the switches are turned on or off, the gate capacitors of these transistors would be charged or discharged. The energy lost here in each cycle [17] is Since W SW is the cumulative width of switches that are turned ON/OFF in this period, it is also proportional to the frequency f sw and could be written as [19] W sw = σ γ f sw C sw /N phase . As a result, the gate-drive loss is proportional to f 2 sw . Hence we have the gate-drive loss (2) where N sw represents the number of switches in a SCC and C gate is the per-unit-width gate capacitance of the switches.
(3) Load power loss. Due to the voltage ripple of the SCC output voltage, the load power loss [19] is P load = 1 2 I out V . Since the load current of a SCC can be expressed as I out = M topo · f sw · C sw · N phase · V , we could have where M topo is topology-dependent and N phase is the number of interleaving stage in a SCC. Finally, the load power loss could be written as As a result, the inherent power loss of one SCC is where and Here the parameters are divided into three types. 1) Non topology-dependent parameters:  [17], [19]. V nl is the no-load output voltage of SCCs (V in = 1.2V ). and σ , 2) topology-dependent parameters: N sw , M sw , M topo and γ , 3) other parameters: I out and C sw . These parameters show that the power loss is related with the topology of SCCs, such as the switches represented by the parameters of N sw and W sw and the flying capacitors represented by the parameter of C sw .
As non topology-dependent parameters such as f sw (see Table 5 in Section VI) are fixed and the load information I out is also known, we can see the loss mechanisms are only related with the flying capacitance C sw and the topology-dependent parameters (see Table 3) of SCCs.
As a results, two issues (flying capacitance and conversion ratio) could be optimized to improve the conversion efficiency of the SCCs.
For the MOS capacitor, it has large parastics which will significantly reduce the efficiency of SCC [29] and the leakage issue of MOS capacitor is also serious [31]. The deep trench capacitor is implemented by dry-etching macro pores arrays in silicon and filling the pores with the dielectric and eletrode [32]. This technique is not part of baseline CMOS, which leads to much more additional masks and costs [20]. The MOM capacitor (typical density 1.5fF ∼ 2.8fF/µm 2 @65nm [33]) is fabricated by the lower metal layers on the chip, resulting in heavy capacitive coupling to the substrate [34]. On the contrary, the MIM capacitor (typical density 1.6fF ∼ 1.9fF/µm 2 @65nm [33]) is fabricated between one upper metal layer and one additional metal layer above it, resulting in small capacitive coupling to the substrate [34]. Although the combined capacitors such as MIM and MOS capacitors [13], [35] are used together as flying capacitors in SCCs, in recent high-end processor chips [8], [18], [27] the MIM capacitors are widely used in the SCCs.
As the global resource, the MIM capacitance existing in the top metal layers are shared by all the SCCs in the chip(see Fig. 1). Hence, in our SCC planning method, we should better allocate the MIM capacitance to maximize the total conversion efficiency of SCCs. VOLUME 8, 2020 FIGURE 5. Using ratio 2:1 to output the demand voltage in the overlap region would need much flying capacitance (Equation (3)), leading to high power loss (Equation (5)). Instead, using ratio 3:2 may lead to better SCC conversion efficiency.

C. THE CONVERSION RATIOS OF THE SCC
As shown in Table 3, different conversion ratios could output different no-load voltages. Since there exists voltage ripple in the output of SCCs (see Equation 3), the voltage acquired by the core loads, V nl − V , should be no less than the minimal supply voltage of the core, V core .
Generally speaking, we choose a ratio for one SCC according to the minimal supply voltage of the load, which is shown in Fig. 5 (the Vdd is 1.2V here). However, when the demand voltage is slightly less than 0.6V (for example, 0.58V) and we choose the ratio 2:1, the output voltage ripple of the SCC V is allowed up to V nl − V core = 0.02V . This would lead to that the demand amount of flying capacitance C sw for this SCC is very large (see Equation (3)). And consequently the power loss is huge (see Equation (5)), resulting in low conversion efficiency. For instance, if one SCC with the ratio 2:1 is employed to supply to one core, using the parameters in Table 3 and Table 5 we can have e1 = 7.08 * e + 5 and e2 = 9.13 * e − 12 in Equation 5. When the demand voltage is 0.588V, the demand capacitance is at least 2.1nF (see Equation (16)) and the loss increases to 5.9mW. However when the demand voltage is 0.598V, the demand capacitance is at least 12.5nF and the loss increases to 9.6mW. What's more, as the MIM capacitance is global resource, the MIM capacitance used in other SCCs would decrease. On the other hand, perhaps the ratio 3:2 is better to achieve high efficiency.
Hence we could set an overlap region whose demand voltage is slightly less than 0.6V (and also other V nl s such as 0.8V, 0.9V. . . ). In these regions, we should determine which ratio is better to select for the efficiency of all SCCs.

IV. PLANNING METHODS OF SCCS
In this section, we would show how to do the SCC planning to achieve better power efficiency at early design stage of the chip, when the M cores system are supplied by less number of SCCs (i.e.,M ). Here we study two steps: 1) the SCC supply scheme (i.e., mapping relationship between SCCs and cores), whenM SCCs supply energy to the M cores, 2) the MIM capacitance allocation and conversion ratio selection of thê M SCCs. Besides, we also provide the guidance of how many SCCs should be used to achieve the minimum overall loss.

A. OVERVIEW OF THE SCC PLANNING
In order to planM SCCs (1 ≤M < M ) to supply energy to the M cores, we would merge the cores intoM groups with each group supplied by an individual SCC. Hence two kinds of loss are shown and introduced in details in the followings.
(1) Assume the Groupm has L cores, these cores have the minimum supply voltage V 1 core , V 2 core , . . . , V L core respectively. As this group is supplied by one SCC, it would have an minimum supply voltage Vm group = max{V 1 core , V 2 core , . . . , V L core }. Hence many cores here would have an higher supply voltage which is not necessary and this leads to the extra power loss where the I l core is the maximum current demand of core l. Hence we need to wisely merge the M cores intoM groups with each group supplied by an individual SCC, so that the total extra power loss of each group P total extra = M m=1 Pm extra is minimized to get the highest energy efficiency.
(2) We also need to optimize the flying capacitance and selected ratio for each SCC to minimize the total power loss of each individual SCC P total scc = M 1 P m scc , where P m scc is the power loss of the m-th SCC (see Equation (5)).
So our SCC planning tries to minimize these two kind of losses.

B. STEP 1: GROUPING THE CORES
It is reasonable that only the cores/groups having a neighbor relationship could be merged into one group, and then can be supplied by one SCC. Hence it is not easy to minimize P total extra = M m=1 Pm extra (see Equation (8)) because the physically adjacent location of cores in each group would be the constraint. Therefore we would introduce our greedy approaches (two strategies) to merge the cores intoM groups with inducing a decent P total extra . Our grouping method is similar to the hierarchical/agglomerative clustering methods in unsupervised learning [36], [37].
We could treat each group as one core, and it has a minimum supply voltage where core1, core2, . . . , coreL ∈ Groupm and the maximum demand current In order to represent the adjacent relationship among the cores/groups on a chip and the extra power loss induced by the merging of two adjacent cores/groups, we could use an adjacency graph where the connected nodes x and y represent the adjacency cores/groups, and the weight W x,y between nodes x and y represents the induced extra power loss if these two cores/groups merge. According to Equation (8), the W x,y could be written as 85904 VOLUME 8, 2020 FIGURE 6. The layout of a 4-core chip and its adjacency graph representing the neighbor relationship.

FIGURE 7.
Each time our approach select two adjacent groups connected with the smallest weight in the graph to merge (assuming in each of the above graphs the labeled weight is the smallest weight).
Given the layout of all the cores in a multicore chip, we could obtain this adjacent graph by using the Voronoi diagrams. After we get the adjacent graph (M nodes), each node would represent one core with a minimum supply voltage and a maximum demand current. We could follow the loop below to iteratively merge the nodes until the number of nodes in the graph isM .
1. Find the smallest weight W x,y in the adjacency graph. Merge the node x and node y into a new node x_y, and calculate (Vx _y group , Ix _y group ) (see Equation (9) and (10) respectively). 2. Update the weights connected to the original node x and y with Equation (11). Then a new adjacency graph is generated. 3. Goto 1, until the number of nodes in the adjacent graph isM . For example, to represent the neighbor relationship in a layout of a 4-core chip shown in Fig. 6(a), we can use a adjacency graph shown in Fig. 6(b). The weight in this adjacency graph is the extra power loss if this two cores are merged into one. Then our greedy approach would always choose two groups connected with the smallest weight to merge. Hence each time we greedily induce the smallest extra power loss and reduce the number of used SCC by one. In this way our approach iteratively reduces the number of SCCs and could finally get the targeted number of SCCs. This flow is shown in Fig. 7.
After the grouping, we could calculate the total induced extra power loss with Equation (8), Pm extra =M m=1 L l=1 I l core (Vm group − V l core ) (12)

C. STEP 2: OPTIMIZING THE SCCS
When the cores are merged inM groups, we treat each group as one core with a minimum supply voltage Vm group and a maximum demand current Im group (see Equation (9) and (10)). Given the load information of these cores, the total available MIM capacitance for SCCs and optional ratios for one SCC, The problem is to optimize the capacitance allocation and ratio selection for theM SCCs to get better energy efficiency.
In this section, we conduct our method to optimize the capacitance allocation and ratio selection for each SCC to achieve the minimal power loss of all SCCs, P total scc . We could use x(m, n), a binary-variable, to indicate whether the n-th ratio is used in them-th SCC. According to Equation (5), the total power loss of all SCCs would be x(m, n) · (em ,n 1 · Cm sw + em ,n 2 Cm sw ) (13) whereM and N are the number of SCCs and the number of optional ratios. And em ,n 1 /em ,n 2 is the value of e 1 /e 2 in the power loss model of them-th SCC, when the n-th ratio of them-th SCC is used. So we would minimize the objective function of Equation (13).
And we have the constraints: 1) each of theM SCCs would only choose one conversion ratio, that's N n=1 xm ,n = 1, ∀m = 1, 2, . . . ,M 2) the MIM capacitance used in each SCC as flying capacitance would have a total amount, which should be no more than the total MIM capacitance C total . So we havê M m=1 Cm sw ≤ C total (15) 3) besides, according to Equation (3) and Section III-C, the output voltage ripple of them-th SCC, Vm ,n , should not exceed the maximal allowed ripple of this core group, Vm ,n nl −Vm group [19]. This is because the SCC with different conversion ratios would output different no-load voltage V nl (for example, the SCC with ratio 2:1 would output no-load voltage 1/2*VDD), and the received voltage of the core is V nl − V , where V is the voltage ripple. If the received voltage of the core V nl − V is less than the demand voltage of the core V core , this would lead to malfunction of the cells in this core. As a result, the maximum allowed voltage ripple is V max = V nl − V core ≥ V = I out M topo f sw N phase C sw (see Equation (3)). This could lead to a lower bound of each Cm sw , since  Notice Vm ,n nl here is the output voltage of them-th SCC, when it uses the n-th ratio. Obviously this is a MINLP (i.e., Mixed-Integer Nonlinear Programming) problem since we have continuous variables Cm sw s, binary variables xm ,n and non-linear terms in the objective function. Although there are many solvers such as IBM Cplex [38] could directly be used to solve the problem, the problem is still hard to get its optimal result in a suitable period of time, especially when the size ofM /N is large (eg., 32/4). Actually the number of possible ratios for each SCC is limited according to their load demands (see Fig 5), so we could enumerate the possible conversion ratios for each SCC [39] and then eliminate all the binary variables xm ,n s here. This would make the solving process more smart.
Hence we could solve it by establishing and solving the sub-problems: 1. Each SCC only has one or two possible ratios to achieve better power efficiency (Section III-C). Hence for all SCCs, we could enumerate all the possible ratio combinations, and result in a set of sub-problems. Let us see an example. Fig. 8 shows how we choose the possible ratio(s) for a given load voltage. If we haveM = 3 SCCs, and each of them supplies the demand voltages 0.57V, 1.05V and 0.78V, we would have possible ratio for each SCC r 1 pos ={2:1, 3:2}, r 2 pos ={1:1} and r 3 pos ={3:2, 4:3}. We choose one ratio from each set rm pos , so there are total K = 2 * 1 * 2 = 4 ratio combinations. In each ratio combination, all the xm ,n s are fixed, and then we have a specific sub-problem P total scc = M m=1 ·(e m 1 · C m sw + e m 2 C m sw ), which is convex and easy to solve. As a result, we get K simple sub-problems. 2. We could solve each of the sub-problems by CVX [40] and get a case of the optimal power loss of the SCCs, P total,k scc , k = 1, 2, . . . , K . The solution of the original problem is the minimal solved power loss among the sub-problems, i.e., P total scc = min k=1,2,...,K {P total,k scc }. And the capacitance allocation and ratio selection for each SCC are the optimization results in that sub-problem.

D. THE WHOLE FLOW OF THE FRAMEWORK
After the two steps, we could have our energy efficiency expressed as η = P load /(P load + P total extra + P total scc ) where P load is the total load power of all the cores in a chip.
As a conclusion of this solution for the SCCs planning, Algorithm 1 shows the the framework of the early-stage SCC planning method.

Algorithm 1
Early-stage planning of SCCs 1: Input: the layout of M cores, the minimum supply voltage and maximum current of m − th core (V m core , I m core ), the number of SCCsM . 2: Output: the supplied cores in each of theM SCCs, the capacitance C m opt and the ratio r m for each SCC. 3: // Step1: Grouping the cores 4: Generate the adjacency graph G 5: while the number of nodes in G >M do 6: Find the smallest weight W x,y in G, and merge the nodes x and y into new node x_y 7: Calculate the minimum supply voltage and maximum demand current (Vx _y group , Ix _y group ) with Equation (9) and (10). 8: Update the weights connected to the orignal nodes x and y with Equation (11) 9: end while 10: // Step2: Optimizing the SCCs 11: Formulate the power loss optimization as a MINLP problem (Equation (13) (14) (15) and (16)). 12: According to the demand voltages of each core group, get the K ratio combinations (sub-problems). 13: for k = 1 to K do 14: get a case of optimal total power loss P total,k scc , and the corresponding MIM capacitance Cm sw and conversion ratio rm,m = 1, 2, . . . ,M . 15: end for 16: get the case of Cm sw and rm,m = 1, 2, . . . ,M , which is corresponding to the minimal total power loss P total scc = min k=1,2,...,K {P total,k scc }.

V. THE OPTIMAL NUMBER OF THE SCCS
As introduced in [19], the authors use a penalty term for the power loss of control circuit and the power consumption of clock signals. Here, we also use a penalty term for the power loss overhead of integrating the SCCs since integrating the SCCs in a multicore chip needs the control circuit, the routing resource (including the supply routing resource), the power consumption of clock signals, the chip area and so on. It's reasonable that the overhead to integrate one SCC could be evaluated as a constant loss P 0 (as the penalty). With the increase of the number of SCCs, the overhead to integrate more SCCsM * P 0 would increase, while the power loss P total extra + P total scc would generally decrease because of finer power management. The overall loss withM SCCs is P total extra (M ) + P total scc (M ) +M * P 0 (18) whereM is the number of used SCCs in the chip. By varying the number of SCCM from M to 1, we could get the overall loss with the aforesaid techniques at every granularity of SCCs. And the optimal number of SCCM opt is the one with the minimal overall loss. Therefore one could explore the numberM to find the best number of used SCCs which achieves the minimal overall loss.
It's noticed that the optimal number of SCCs is closely related with the layout information of cores in the chips. For the Equation (18), the layout information would affect the term P total extra (M ) and P total scc (M ). Therefore, we could not directly figure out the optimal number of SCCs at one time. Instead, we could use the method that by varying the number of SCCs, we can get the loss information and finally achieve the optimal number of SCCs.

VI. EXPERIMENTAL RESULTS
In this section, we would present the results of our SCC planning work.
Heterogeneous multicore benchmarks, including 4-core, 8-core and 16-core, are tested. The loads information in each core of our benchmarks are obtained from a reasonable scaling of the value in [19] and are shown in Table 4. We assume that firstly the cores in [19] are in many types (such as Cortex-A72@ 0.78V/@ 0.82V [41], DSP@ 0.55V [42]), and the current information could be achieved by the system-level simulators GEM5 [43] and McPAT [44] which simulate the hardware behavior and get the power information [45]. Then the aspect ratios of these cores could be customized [46]. The layouts of cores in the three chips are shown in Fig. 9 and simpler versions of such heterogeneous chips could be seen on today's market [47]. The layouts of these three multicore chips are fixed patterns. And the Algorithm 1 in Section IV-D can be applied to any benchmarks as long as the layouts of cores, the minimum supply voltage and maximum current and the available number of SCCs are given. As introduced in Section II, by given enough information, we can easily formulate the problem and apply the optimization algorithm. And the results vary with the layouts of cores in different benchmarks.
The parameters of SCCs used in the experiments have been listed in Table 5. And the CVX [40] is used to solve the convex problems.
A. RESULTS OF THE SCC PLANNING 1) THE CORE GROUPING With our grouping method, the cores in a chip could be in several groups. Fig. 10 shows two strategies used to group the cores, and the respective obtained extra power loss.
Strategy 1: The proposed grouping strategy in Section IV-B.   (11) with W x,y = |V x group − V y group |. We can see that as the number of groups (i.e.,M ) varies from M to 1, the extra power loss would increase. This is because with less SCCs the power management would be coarser, and more power could be wasted as there are more mismatches between the core demand voltage and the SCC output voltage. And we also see that our strategy 1 is better than strategy 2 since it comes out with less extra power loss in most cases (Notice that when the caseM = 1, both grouping strategies would result in only one group that includes all cores, leading to same extra power loss. What's more, since there are some homogeneous cores in this chip, both grouping strategies would put these cores together first, leading to no extra power loss atM = 6 or 7).

2) THE SCC OPTIMIZATION
In this section, we would show the obtained power loss of all SCCs when optimizing the MIM capacitance allocation and ratio selection. To show the effectiveness of our optimization, we also show the obtained power loss when we do not optimize the capacitance or ratio. We show power loss results of 4 SCCs in the 4-core chip without and with optimization in Table 6. In the w/o method, we allocate the capacitance to each SCC according to _1: the percentage of core area supplied by this SCC, _2: the percentage of core current supplied by this SCC. And then we select the ratio according to the demand voltage without considering the overlap voltage regions (see Fig. 5). We mark this two methods as w/o optimizing SCCs _1 and w/o optimizing SCCs _2, respectively. It can be seen that our method could allocate the capacitance wisely and also select better ratios in some SCCs, which totally significantly reduce the total power loss of SCCs.

3) RESULTS OF THE WHOLE FRAMEWORK
In our work, we propose the SCC planning framework to obtain better power efficiency when the given number of SC converters is less than the number of cores in chip. In this part, we would show the planning results and the obtained power efficiency when the given number of SC converters varies from M to 1. To the best of our knowledge, there is no literature to explore the supply method where the used number of SCCs is less than the number of cores. Three methods stated in the following to implement the SCC planning framework, which include the general ideas if one would use less number of SCCs to supply the power for the cores, are shown as the comparisons in the results.
(0) Ours: we use the proposed grouping strategy and the SCC optimization technique both described in Section IV.
(1) Method 1: we do not use the proposed grouping strategy (but strategy 2 in Section VI-A.1 instead) and use the SCC optimization technique.
(2) Method 2: we do not use the proposed grouping strategy (but strategy 2 in Section VI-A.1 instead) and do not use the SCC optimization technique (w/o optimizing SCCs _1 in Section VI-A.2 instead).
(3) Method 2_2: we do not use the proposed grouping strategy (but strategy 2 in Section VI-A.1 instead) and do not  use the SCC optimization technique (w/o optimizing SCCs _2 in Section VI-A.2 instead).
Let us see the 4-core benchmark as an example firstly. If the given number of SC convertersM is 3, the results of our SCC planning framework are shown in Table 7. We can see with our methods the supply schemes and optimized capacitance and ratios could be obtained for better efficiency of the chip.
The energy efficiency of the chip with three planning methods are shown in Fig. 11 respectively. We can see the energy efficiency with our method is slightly higher than that with Method 1, and more higher than that with Method 2. Besides, it could be seen that as the the given number of SCCs in the 4-core benchmark varies from 1 to 4, the energy efficiency of the chip with our method could be improved since more SCCs would lead to a finer power management. On the contrary, as Method 2 does not use the SCC optimization technique, the more SCCs are given, the more unreasonable capacitance allocation would occur and the worse efficiency would appear.
The energy efficiency of the 8-core and 16-core benchmarks with our methods are shown in Fig. 12 and Fig. 13, respectively. It is noticed that the energy efficiency would stay the same when the number of SCCs nears M . This is because there are many identical cores (i.e., homogeneous) in the heterogeneous chip, and it would not induce any more P total extra or P total scc when two identical cores are merged into one group and supplied by one SCC.

B. THE BEST NUMBER OF SCCS
Our method also provides the way to find the best number of SCCs, which could lead to the minimal overall power TABLE 6. Optimizing the four-SCC case of the four-core benchmark. Here * stands for the selected ratio among the possible candidates with our optimization.

TABLE 7.
When the 4-core chip is supplied with three SCCs, our SCC planning methods could obtain smart supply schemes and optimized capacitance and ratios for each SCC.  loss. We would show the results in the 8-core benchmark. If we set the overhead of using one SC converter as 30mW (we refer this from literature [19]), as the number of SCCs grows, the power loss P total extra (M ) + P total scc (M ), the constant losŝ M * P 0 and the overall loss (i.e., the sum of that two) could be respectively shown in Fig. 14. We can see the power loss P total extra (M ) + P total scc (M ) would decrease while the constant loss  M * P 0 would increase, as the number of the SCC grows. As a result, when two SC converters are used we would obtain the minimal overall loss.

VOLUME 8, 2020
The similar trend is also observed in 4-core and 8-core benchmarks, and the results are shown in Fig. 15 and Fig. 16 respectively. We can see that the optimal numbers of SC converter for 4-core and 16-core benchmarks are 1 and 3 respectively.

VII. CONCLUSION
As the overhead of integrating SCCs in a chip is nonnegligible and the SCCs could not be overused. In this paper, for better energy efficiency we propose an early stage planning framework of SCCs to obtain the SCC supply scheme together with the optimized MIM capacitance allocation and converter ratio selection for each SCC when the given number of SCCs is less than the number of cores. Besides, our method could also explore to find the best number of used SCCs for a given chip. The experiments show the results of our SCC planning methods.