Chip Power-Frequency Scaling in 10/7nm Node

The 10/7nm node has been introduced by all major semiconductor manufacturers (Intel, TSMC, and Samsung Electronics). This article looks at the power-performance benefit of the 10/7nm node as compared to the previous node (14nm). Specifically, we track the power-performance in high performance space, using Intel’s Core-i7 (Intel’s highest performance consumer microprocessor that uses the highest performance CMOS technology node) manufactured in Intel’s 10nm. The paper first looks at the scaling of the device power-performance from the Intel 14++nm node to Intel 10nm, using 3D TCAD simulation with dimensions obtained from actual product cross-sections, and also scaling of the interconnect capacitance node-to-node. Next, the paper does a comparison of industry 10/7nm node technologies (from Intel, TSMC, and Samsung Electronics). The paper argues that for Intel, in the 10nm nodes, the total chip power at constant frequency (energy-per-operation) has scaled by a much lower amount vs. the 14++ node, as compared to the 14++ vs. the previous (22 nm) node. The lack of power scaling can be traced to a reduction in current per device perimeter (caused by the increased device parasitic resistance and the reduced device and fin pitch) and to an increase in capacitance per fin (caused by an increase in the FinFET height). Proper scaling of the device is critical for chip power scaling (energy-per-operation) at upcoming nodes, especially as it applies to high performance microprocessors and for the data analyzed here this is not the case.


I. INTRODUCTION
Key benefits of CMOS scaling have been density improvement (i.e. more transistors per area), chip frequency improvement (i.e. for single thread tasks), and power reduction at a given frequency [1], [2]. In recent nodes, frequency scaling has slowed down. Focus has shifted to adding cores and functionality with migration to the new node. The extra functionality was enabled by the density increase and drop in power at constant frequency; however, in the most recent nodes, there has been a slowdown in chip power scaling at a given frequency (i.e. energy-per-operation). This has been compensated by design improvements (architecture, place and route, etc.) to maintain power scaling [3], [4].
In order to evaluate the benefit of CMOS scaling in the high-performance space, we had previously followed the evolution of the node-to-node power-performance benefit for Intel's highest performance consumer microprocessor, Intel Core-i7, across many technology nodes [5] (highest The associate editor coordinating the review of this manuscript and approving it for publication was Muhamamd Aleem . performance prior to the introduction of Core-i9 in 14++ nm, which is limited to desktop applications). That work showed that there has been marked reduction in power-performance gain in recent technology nodes (i.e. 22nm through 14nm) as compared to the earlier technology nodes (250nm through 32nm nodes). By the time that work was published, there was only one announced Intel 10nm part. That 10nm part did not show any power-performance gain. Since then, Intel has announced several 10nm parts. The goal of this study is to compare the power-performance in the newly announced parts, and to do a systematic TCAD study in order to understand the power-performance behavior as we transition from a well-designed 14 nm technology to a 10/7 nm node. Furthermore, we performed a systematic comparison of Intel's 10nm vs. the other industry 7nm nodes with similar ground rules (TSMC and Samsung 7nm nodes), to determine if the observed behavior of any 10/7 node is unexpected.

II. METHODOLOGY AND APPROACH
This work uses total chip power at a given frequency to track the evolution of the total power (or total VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ energy-per-operation) through node transitions. Total chip power, P Total is roughly about fCV 2 + P Leakage , where f is the frequency and C is the effective switching capacitance, and P Leakage is the stand-by leakage power (power at stopped clock). P Leakage is a function of total device widths on the chip, device off currents (i.e. threshold voltage), and operating voltage. In this study, focus is on high performance CMOS, and specifically higher performance microprocessors. In recent high-performance microprocessors, active power dominates, and the leakage power (i.e. deep sleep power) is about 5-20% of the total power [8], [9]. For the total chip power, the thermal design power (TDP), which is the highest steady amount of power that the chip can generate while running applications, is used. To keep the number of cores and the amount cache the same throughout this study, the TDP-frequency of 4 core/8 MB cache product family or 2 cores/4 MB or 6 cores/12 MB, all scaled to 4 core/8 MB, are the focus here. The total number of processor core transistors have been fairly constant (about 800 million) and while the graphic engine device count has increased dramatically over many generations, the power of the graphic engine is only a few percentages of the total power especially at high frequencies (circuit details are in [3]).
In this article we use device technology computer aided design (TCAD) to simulate the expected power drop at a given frequency based on device structure. Device structures are obtained from the cross section of high-performance microprocessors products in a given node [10]. We try to match the currents reported for 14++ and 10 nm nodes. We extract the device capacitance and resistance as it scales node-to-node and use that (along with the metallization capacitance and resistance) to predict the power performance at the circuit level.

III. PROBLEM STATEMENT
Intel introduced its first 10nm processor (Core-i3) in early 2019. That part (core-i3 8130U, pink data point) had a noticeable higher power at a given frequency (Figure 1), and a lower turbo frequency [6], as compared to the equivalent 14nm part. The degraded power-performance were reflected in the software benchmarks [7]. Late in 2019, Intel released several parts in 10nm, covering the entire Core-i3 through Core-i7 family [6]. Focusing on the 10nm core-i3 parts (marked red in Figure 1), they still show degraded power performance as compared to 14nm (even the 10 th generation 14, marked in purple), and a reduced turbo frequency.
In order to do a more systematic study of the 10nm parts, and evaluate the benefit of the CMOS scaling in the high-performance space, in this work we focus on the evolution of the node-to-node benefit for the Intel Core-i7 processor for multiple generations (similar to our previous work [5]). Intel introduced the 10 th generation of the Core-i7 in both the 14++ node as well as in 10nm node. The 10 th generation parts in 14++ and 10nm are equivalent, except for some minor difference in the graphic core. The graphic core power is a small portion of the total power. Figure 2(a) is a  plot of TDP power vs. frequency for Intel Core-i7 for the 9 th (in 14++ node) and 10 th (in 14++ and 10nm nodes)  generations. It can be deduced that power-frequency has been constant for the 9 th and 10 th generation design built in the 14++ node. For the 10 th generation design in the 10nm node, there has been an increase in chip power at the same frequency. Further, to consider the technology node-to node benefit, we look at the turbo frequency ( Figure 2 The turbo-frequency of the 10 th generation Core-i7 in 10nm shows slight degradation as compared to the 8 th and 9 th generations Core-i7 in 14++ node. The 10 th generation Core-i7 in 14++ shows some improvement as compared to the 8 th and 9 th generations design.
Based on the early parts released by Intel in 10nm, it seems that there may be an issue regarding improving powerperformance node-to-node as they scaled from 14++ to the 10nm node. This contrasts with the expected trends in the previous node.
The next question that we tried to address, is whether there is a fundamental difference between Intel's 10nm and other industry 7nm technologies. We carried out a structural comparison as well as TCAD simulation. We found that the 10/7 nm node technologies (Intel 10, and TSMC and Samsung 7) are remarkably similar in terms of structures and device features. Furthermore, to within a few percent, they have similar current drive as well as device capacitance.

IV. TCAD DEVICE SIMULATIONS
We use TCAD to understand the performance limitations of the Intel 10 nm and 14 nm products. Initially we used an ideal FinFET structure with nearly vertical sidewall Fin-FET with the dimensions for fin height, pitch, and width as described in Table 1. The Synopsys TCAD Suite [11] was used to model these devices. Although we do not know much of the finer details of the Intel structures, the process and device simulation deck that we used was based on and validated against IBM Research's 7 nm FinFET technology reported in [12]. The conventional Drift-Diffusion model was used, including quantum corrections via a density gradient method found within the Synopsys tool. A mobility model appropriate for thin body MOS devices was used that takes into account mobility degrade due to thin body and high-K effects as well as mobility enhancements due to stress, which also includes the impact of surface orientation and transport direction. Lumped resistances on the source and drain contact were included to model the effect of the MOL resistance including contact resistance. The simulation deck was parameterized such that the fin dimensions from Table 1 could be fed into the process simulation tool and provide a reasonable representation of the Intel FinFET structures. The gate work function in our device simulation deck was adjusted to obtain ∼10nA/µm off current for a nominal gate length device of ∼20nm. Gate work functions of 4.46eV and 4.49eV were used for the intel 10 nm and 14 nm devices respectively. Internal device simulations using NEGF-type device transport tools have shown us that the peak internal carrier velocity of such short channel devices exceeds the conventional saturation velocity that is used in Drift-Diffusion models. In spite of this, we have also found that scaling of the saturation velocity allows us to match device on currents. We therefore scaled the saturation velocity from 1.0 × 10 7 cm/sec to ∼3.0 × 10 7 cm/sec in order to match the reported measured on current for the 10 nm Intel device [13]. For the 14 nm device [14], [15], the MOL resistance is expected to be less than that for the 10 nm device because the CA area is ∼2X larger although CA height is higher. For this reason, we reduced the lumped external resistance values while keeping the transport parameters the same as those for the 10nm device and we were able to VOLUME 8, 2020 match the on current. Table 2 shows the comparison between TCAD and reported values.
We then modified the structural simulation decks to take into account some of the other details of the FinFET. In particular, the two main non-idealities can be seen in Fig. 3. In the cross section of the fin, cut through the center of the channel ( fig. 3a), there is a foot at the bottom of the fin which may impact the electrostatics and off current. In the cross-section parallel to the fin, through the source/drain region ( fig. 3b), the epitaxially grown source/drains do not connect fully to the bottom of the fin. Additionally, the gate profile itself is not vertical, with the top of the gate being narrower than the bottom of the gate. We include these features for both the 10nm node and 14nm node Intel devices. These cross sections are typical across 14++ and 10 nm node technologies from Intel, (as well as the TSMC 7nm node and Samsung Electronics 7 LPP node, equivalent technologies to Intel's 10nm node). The results are summarized in Table 1. It is noticed that across technologies, at the top of the fin for the logic devices, the gate length is about 18-21 nm at the top of the fin and about 25-27 nm at the bottom of the fin. The fin width across these technologies is about 5-7 nm in the middle of the fin, and about 10 nm at the bottom of the fin. The fin height for the 14++ nm is about 44 nm, and for the 10 nm node, the reported fin height is between 46-52nm. For our modeling, we used a fin height of 48 nm for the 10 nm node. Figure 4 is a typical structure used for the TCAD simulation. Figure 5 is an overlay of the simulation structure  and the actual device cross section. We have included the foot at the bottom of the fin, as well as the broadening observed at the bottom of the gate in the TCAD structure. In our simulations, we match the actual fin and gate profiles ( Figure 5). Based on Intel's 14 nm and 10 nm cross section (as represented in Figure 3) [10], it appears that the fin and PC have very similar profiles in 14 nm and 10 nm. The Ion-Ioff curves for nFET devices are represented in Figure 6. The results are summarized in table 2. It is noticed that for the migration to 10nm, the on current per device perimeter is reduced. This is expected, since the reduced fin and contacted poly pitch impact the device resistance and strain.  The impact of having a foot and tapered gate on device performance was studied. Figure 6 compares the simulated Ioff/Ion characteristics of the intel 10nm and 14nm devices for the idealized fin structures and the more realistic structures. For the 14 nm Intel node, the impact of the large foot at the bottom of the fin is immediately apparent. The gate cannot control the off current well, so the poor electrostatics reduces the drive current. For the Intel 10 nm node, it appears that the fin foot also degrades the drive current, but not as significantly as in the 14nm node. This small impact can be explained by recalling that the gate is tapered, top to bottom. For each of these non-ideal device structures, the gate length near the top of the fin is smaller than the gate length at the bottom of the fin. For the 14nm Intel node, the gate is not wide enough to control the leakage in the fin foot region. For the 10 nm Intel node however, the fin process control is seen to be better, so the added gate width at the bottom of the fin yields better electrostatics. Table 3 summarizes the device capacitance node to node. The reduction of fin pitch (effectively reduced gate height), causes a reduction of the PC to PC and junction capacitance. Because of an increase in Fin height (or lack of scaling of the Fin height), the device width does not scale node to node. If the number of fins per function is not reduced by the scaling factor (and in fact, there can even be an increase in the fin height by about 10% node-to-node). This results in a net capacitance increase, per fin node-to-node.

V. BEOL SCALING
RC scaling from node-to-node is driven by several factors including pure dimensional scaling, as well as structural and materials-based changes. Comparing capacitance between 14nm and 10nm, there are several structural changes apparent which individually can act to either increase or decrease total capacitance. For example, the height-to-width aspect ratio of metal lines appears to increase from approximately 1.0 to 1.15 for 10 nm metal lines relative to 14 nm metal lines, which increases line-to-line capacitance. In addition, the via height increases from approximately 0.5 to 0.6-0.7, while the line width-to-pitch ratio decreases from approximately 0.7 to 0.6. Both factors act to decrease line-to-line capacitance. Since the metal pitch and corresponding line width both decrease in going from 14 nm to 10 nm, the metal resistivity and line resistance must also increase. This is compounded due to a change in minimum-pitch conductor from 52 nm in 14 nm to 36 at 10 nm. In the end, the non-linear increase in line resistance dominates the net RC trend, which is projected to increase by ∼2.4x for 1 µm lines and by about 1.2X for scaled lines, assuming a 0.7X average shrink factor for 10nm relative to 14 nm. This is summarized in Table 4.

VI. POWER-PERFORMANCE SCALING INTO THE 10nm NODE
Two benefits have been associated with scaling: Density and power-performance. In recent nodes there has been a marked decrease of improvement in power-performance. Nevertheless, the industry has been able to take advantage of density scaling, and at the same time obtain some power-performance benefit at the product level. In the transition from 14++ to 10 nm, there has been noticeable challenge in power-performance scaling in the high performance space: In Intel's 10 nm node (for the high performance processors) in transition from 14++, there was a full node-to-node density scaling (CPP in 14 was 42 nm), about 15% drop in Ion/µm device width (from 0.96 to 0.818 µA/µm), and slight increase in fin height resulting in about 8% device capacitance VOLUME 8, 2020 increase per fin (Table 3) The BEOL capacitance has scaled properly (by about 70% for scaled lines). In critical path circuits, for the reasons of performance and noise, the circuit capacitance ratio of device to BEOL is split in the range of 70/30 to 60/40. With slight increase in the device capacitance, and a more significant decrease in the BEOL capacitance, one would expect near-constant CV 2 nod-to-node. However, since both the device and BEOL resistance have increased, in order to obtain the same performance, it is necessary to increase the operating voltage of the device resulting in power-performance degradation and reduction in turbo-frequency at the same chip power. The 10nm powerperformance behavior is very similar to what was observed in transition from 22nm to 14nm: where there was noticeable challenge in power-performance: Intel's 14 nm node, was a full node to node density shrink from the previous node (22 nm), i.e. CPP in 14 nm was 70 nm vs. 90 in 22 nm, with full BEOL shrink. There was also 20% drop in Ion/µm device width [5], [14], and slight increase in fin height. Nevertheless, power-performance node-to-node (and software benchmarks) were flat, as shown in Figure 1 for generation 5 and 6 of Core-i3 (and Core-i7) [15]. Only by the introduction of 14+ and 14++ and increasing the Ion [16], it was possible to obtain ∼25% drop in chip power at the same frequency [17] for generations 7 and 8 of the Intel Core family [5].

VII. COMPARISIONS OF INDUSTRY 7nm NODE TECHNOLOGIES
As the data in the prior section indicate, it has been difficult for Intel to obtain noticeable gain in power-performance in migrating from their 14++ nm technology to their 10 nm node. The reason for this difficulty is possibly because Intel's 14 nm exhibits particularly high performance: Table 5 has the key data on several 16-10 nm technologies as practiced by the leading manufacturers in the industry. It can be observed that Intel has the shortest channel length for that generation of the technology, the thinnest fin dimensions, and the most straight sidewalls. Intel's 14++ device is very similar to Intel's 10 nm node. As shown in the previous section (and expected), without scaling channel length or device width, and at the same time shrinking device (CPP) and fin pitch, it is difficult to obtain power-performance benefit. Even though Intel appears to have challenges in powerperformance benefit in migrating their products to 10 nm, Advanced Micro Devices Corporation (AMD), which has a similar product family to that of Intel, has obtained about 50% power reduction in migrating their microprocessors from their 14 nm node to the 7nm node [4]: They attributed 9% of the power reduction to a drop in AC capacitance (C AC ), and about 12% of the power drop to ''7nm Timing'', both enabled by technology migration [4]. AMD's 14nm generation product was manufactured using Global Foundries 12nm FinFET technology. AMD's 7nm product generation was manufactured using TSMC's 7nm technology. The key features of these technologies are listed in Table 6. It is observed that TSMC's 7 nm has a thinner fin (5-10nm vs. 10-18nm) and shorter channel length (22-26 nm vs. 28-30 nm). In the GF 12nm to TSMC 7 nm migration the subsequent drop in fin thickness and channel length is significantly larger than observed in Intel's 14 nm to 10 nm migration. Thus, it appears reasonable to assume a larger drop in device capacitance and in drain induced barrier lowering (DIBL) which results in higher performance. Migration from a relatively low performance technology (as compared to Intel's 14nm) seems to be the reason for AMD being able to obtain noticeable power drop at the same frequency attributable to technology migration.
It is worthwhile to notice that for all the 10/7 nm nodes technologies, implemented by the three major semiconductor fabricators (Intel, TSMC and Samsung Electronics), the device features are remarkably similar. Table 7 summarizes the fin dimensions and channel lengths for a 10/7nm node technologies. Looking at the images of the fin and the  gate, it is noticed that there is always some taper at the bottom of the fin and increase in channel length L at the bottom of the gate. For these technologies, the channel length is about 19-20 nm at the top of the FinFET and about 25-26 nm at the bottom of the FinFET. Furthermore, the fin width is about 5 nm at the middle of the fin, and about ∼10nm at the bottom of the fin. There can be a number of reasons for the taper in fin or gate: One method to address the leakage at the bottom of the fin, can be increasing L in that region. The taper in the gate can also be driven by the process requirements of the gate-last high K process. The taper at the bottom of the fin, can be driven by the requirements for the spacer etch. In order to shut down the punch-through current below the fin in the Si bulk, the very bottom of the fin usually has high doping and slightly higher threshold. Figure 7 and 8 are the TCAD study, highlighting the effect of the taper at the bottom of the fin on the channel length scaling: It cause 4-5 nm shift in L scaling (i.e. shift in DIBL or threshold roll of vs. L). It results in higher off current at the same I ON and L, and larger source/drain resistance. The gate capacitance also increases because of the foot at the bottom of the fin. The taper in the gate and the fin, seen across 10/7nm node technologies and observed across the three leading manufacturers, raises the question if taper is a requirement for FinFET technologies, needed to control the punch through, and maybe related to why Intel has been unable to scale channel length; Industry has announced technology roadmaps for 7 nm (Intel) and 5 nm (TSMC and Samsung) nodes and 3 nm nodes. The challenges faced by Intel in scaling into 10 nm, bring out the question of whether power-performance can be maintained, let-alone improved, with scaling in the high-performance space. One central issue is how to scale channel length. Scaling L (and ''contacted poly pitch'', CPP) is critical for both performance and density scaling. It is not clear if the lack of L scaling for 10/7nm nodes relative to the previous nodes is driven by the short channel issues (caused by the FinFET taper at the bottom of the Fin), or by process related issues (i.e. the gate last removal and re-fill, gate doping, etc.). Fin width is probably at its limit. Fin taper at the bottom of the fin is one area that may be improved.

VIII. DISCUSSIONS AND SUMMARY
Going from 14nm node to 10nm node, for a very well designed 14nm node, the node-to-node intrinsic drivability (current per channel perimeter) is dropping and is compensated by an increase in fin height, principally to satisfy timing requirements for critical circuit paths. However, this increases capacitance and is apparently not adequately offset by fin pitch and gate length scaling, resulting in a net power/performance degradation at the product level for the first iteration of that node. Subsequent versions of the same node (10+, 10++, etc.) are expected to primarily improve the intrinsic drivability (e.g., R ext reduction, transport), and then later reduce device capacitance (e.g., larger CPP), to achieve a net power/performance improvement, assuming history repeats itself. The broader question is whether this is sustainable going forward to 7nm, 5nm, etc. Increasing fin height has diminishing returns owing to increase in capacitance (and lack of device width scaling), and R ext which itself increases as the fin pitch is scaled. This points to a floor in fin pitch scaling (R ext -limited), which also limits capacitance scaling as an offset to increasing the fin height. Short of a breakthrough in carrier transport improvement or R ext engineering, future Fin-FET nodes may plausibly be defined by the essentially fixed FET structure with only some wiring and ground rule tricks to achieve density scaling without any tangible performance increase.