Many-Tier Vertical GAAFET (V-FET) for Ultra-Miniaturized Standard Cell Designs Beyond 5 nm

The GAAFET (gate-all-around FET) is expected to replace FinFETs in future nodes due to its excellent channel controllability. It is also expected to be an impressive device due to its horizontal or vertical transistor structures. Vertical GAAFETs (V-FETs) are expected to be a promising device compared to horizontal GAAFETs (H-FETs) due to their structure, which allows area reduction and significant parasitic reduction. Besides, V-FETs are positioned on top of each other and thus allow more significant size reductions. Therefore, this paper studies the overall potential of many-tier V-FETs by investigating the essential design factors from the layout perspective. First, we study the factors that should be considered for designing many-tier V-FETs. Second, we propose an interconnect structure that maximizes the advantages of many-tier V-FETs. Third, we compare 2-tier V-FET standard cells to one-tier V-FET cells and visualize the advantages that many-tier V-FET cells provide. Our study shows that 2-tier V-FET standard cells provide a −35.6% area reduction with a cost of +16.5% wirelength and +13.2% parasitic capacitance increase compared to 1-tier V-FET cells. Compared to H-FETs and FinFETs, our cells show −50.1% area reductions with −0.3% wirelength reductions and −18.9% parasitic capacitance reductions. We emphasize that the design freedom to place transistors on top of each other and proper interconnect structures lead to ultra-scale miniaturized standard cell designs. We note that the increase in wirelength and capacitance is due to the vertical size increases and detours that must exist in the designs. Thus, careful circuit design is required to obtain the maximum advantages from V-FETs.


I. INTRODUCTION
The so-called ''Moore's Law'' [1] is one of the prominent keywords which explains the growth of today's semiconductor industry, which is ''miniaturization.'' For more than 50 years, miniaturization of transistors and interconnects has been a core factor for increased business profits and is a crucial factor of the evolution of the industry itself. In fact, advances in semiconductor devices have also been introduced in addition to this prominent miniaturization trend. For example, Intel introduced device technologies such as high-k metal gate and strained silicon in their 45 nm CMOS technology to control increased leakage currents and enhance The associate editor coordinating the review of this manuscript and approving it for publication was Yuh-Shyan Hwang. device performance [2]. Currently, FinFETs have completely replaced planar MOSFETs in the latest nodes of major foundries [3], [4]. These changes in devices that have begun in the last decade indicate that devices will continue to face challenges from this everlasting scaling trend and that proper breakthroughs must be introduced for transistor evolution.
In 2011, Intel introduced FinFET as its dominant transistor structure in its 22 nm technology node [5]. FinFET is a transistor type in which the gate covers three sides of a channel, whereas a gate in a planar MOSFET covers only one side. Therefore, FinFETs show better channel controllability and performance compared to conventional planar MOSFETs. However, it is expected that the era of FinFETs will not last too long due to scaling issues [6]- [8]. A major foundry started migrating from FinFETs to advanced transistor structures on their future process nodes. For example, Samsung plans a GAAFET (gate-all-around FET) solution in its 3 nm technology node [9]. 1 GAAFETs, which are advanced structures compared to FinFETs, cover all four sides of a channel [11]. Thanks to their unique channel structure, GAAFETs will have better channel controllability and scalability over Fin-FETs in future technology nodes [12]- [14]. Thus, we anticipate that GAAFETs will be the dominant transistor structure in the near future [15], [16].
Regarding how a GAAFET gate can surround all four sides of a channel, two dominant models represent the majority of these devices: nanowire FETs (NWFETs) and nanosheet FETs (NSFETs). NWFETs originate from FinFETs, where they evolved from double-gate FinFETs, Omega FinFETs, and nanowire FinFETs, to NWFETs [17]. NWFETs provide a subtle form with the best channel controllability. NSFETs are FET structures in that they use nanosheets as channels. A nanosheet is a form which is similar to a nanowire for which a channel is widespread like a sheet instead of being somewhat like a narrow wire. Studies have reported that nanosheet structures can provide higher current due to broader channel widths [18]. NSFETs are referred to by different names based on the manufacturers and foundries. For example, MBCFET (multi-bridge-channel FET) is an example of how NSFETs are referred to by a particular foundry [19].
In addition to the two dominant channel shapes, various alternate channel structures have also been proposed. C. Dupré et al. proposed a NWFET structure with spacers between the channels ( FET, [20]), and P. Feng et al. proposed and compared variously-shaped NSFET structures for maximum transistor performance [21]. In this NSFET structure paper, the authors proposed nanowires and nanosheets that possess hexagonal, rectangle, and oval shapes. Regarding the optimal shape for GAAFET channel structures, both academia and industry are actively searching for better structures to replace FinFETs in the near future.
From a structural point of view, GAAFETs are reported to be fabricated in two different styles: horizontal GAAFETs (H-FETs) and vertical GAAFETs (V-FETs). H-FETs are ''horizontal'' because the channel is parallel to the substrate surface [22]. For the general transistor shape, H-FET transistors are very similar to those of FinFETs. Thus, from the designers' perspective, the only difference in transistors is that NWs or NSs replace fins. Therefore, it is reasonable that foundries will soon adopt H-FETs since they do not require significant changes for circuit design and fabrication [23]. V-FETs are described by their name because their channels are constructed perpendicular to the substrate surface [24] (see Figure 1). The latest studies show that 5 nm diameter nanowires with nano-scale interconnects, and 20 nm vertical nanosheets are manufacturable [25], [26]. Due to their unique channel formation, V-FET circuits do not follow conventional circuit design methodologies. In other words, V-FETs require circuits, such as standard cells and SRAMs, 1 On the other hand, TSMC continues FinFET on their 3 nm node [10]. to be designed in a completely different manner. However, studies have shown that V-FET circuits that use the new design methodologies provide significant advantages over conventional FinFETs and H-FETs. For example, a V-FET D flip-flop footprint reduces by 30%, and SRAM bit cells are 20-30% denser than H-FET SRAMs with 2.6× lower standby leakage current [27], [28]. Based on these reports, we expect that V-FETs will soon be the new type of transistor that follows H-FETs in the ongoing trend of transistor evolution.
Regarding V-FET circuits, H. Na and T. Endoh proposed a 12-transistor V-FET SRAM cell, which is 26% smaller than a conventional 8T-SRAM cell [29]. This size reduction is possible because of the vertical transistors that the authors have fabricated. The interesting part of this study is that the nanowires of this V-FET have two vertical gate layers for channel control. The device current flows when two gate switches are closed. A 1-tier V-FET is a device in which one gate layer covers the nanowire/nanosheet for current control ( Fig. 1(a)). A 2-tier V-FET provides two vertical gate layers to control the channel current. In the same transistor footprint, a 2-tier V-FET can be considered as two transistors placed on the top and bottom instead of one transistor. The proposed SRAM is reported to provide a 47% power reduction due to its advanced structure in spite of the increased transistor count. This study implies a very important point that the vertical positioning of transistor fabrication is possible (see Figure 1 (b)). This brings the full potential of a true 3-dimensional circuit/layout design in VLSI chips when V-FETs are utilized. Therefore, in this trend, we anticipate that V-FETs will be a bridge to designing actual 3-dimensional transistor structures for ultimate miniaturization and optimization.
In this paper, we extend the research of [30] and investigate the practical benefits of V-FETs when these vertical transistors are designed in many tiers. Few papers describe why GAAFETs should succeed over FinFETs at the layout and circuit level. Therefore, we study the advantages of GAAFETs (both H-FETs and V-FETs) over FinFETs by studying standard cells in the layout and circuit perspectives. Standard cells are vital components in digital VLSI design, and we demystify the impact of GAAFETs by focusing our efforts on V-FETs. Inspired by the study of [29], which leads to the possibility of true three-dimensional transistor design, we focus our efforts on investigating the advantages of fabricating devices on top of each other. In the path of 3-D circuitry, we study the practical concerns when digital circuits are designed in a two-tier vertical device fashion. The points below describe our contributions.
1) This paper is the first study on the impact of many-tier V-FETs for optimal design of standard cells. 2) We compare NWFETs and NSFETs to FinFETs from the perspective of parasitics. Our results report that NSFETs may produce greater capacitance levels than FinFETs under certain conditions. 3) We propose an interconnect structure for optimal standard cell design of many-tier V-FETs. 4) We propose a novel methodology and a set of algorithms for designing many-tier V-FET standard cells.

II. PARASITIC CAPACITANCE COMPARISON BETWEEN NANOWIRE FETs (NWFETs) AND NANOSHEET FETs (NSFETs)
This section discusses the characteristics of FinFETs, H-FETs, and V-FETs from the perspective of parasitics. In detail, we study how significant parasitics NWFETs and NSFETs have when these devices are used for H-FET/V-FET inverter (INV). Devices and layouts are closely related in ICs, and analyses of devices that do not consider physical layouts can have weak meanings. By performing the INV comparison, we gain an understanding of how parasitics differ between these devices in various layouts. Note that our focus in this section is on parasitic capacitance 2 and not on resistance. This focus is because parasitic resistances are nearly identical between FinFETs and H-FETs, and a resistance analysis between H-FETs and V-FETs was performed in [30]. 2 Capacitances in VLSI can be sorted into three different categories: (1) device-to-device capacitance, (2) device-to-interconnect capacitance, and (3) interconnect-to-interconnect capacitance. We define parasitic capacitance as those two types of capacitances that form between deviceinterconnect and interconnect-interconnect ((2) and (3)). We focus on parasitic capacitance because 1) it has a critical impact on standard cell performance (more capacitance leads to slower speeds and higher power consumption), and 2) they are highly dependent on standard cell layouts.

A. TECHNOLOGY SETTINGS
We designed a set of industrial-grade 5 nm standard cells for comparing FinFETs, H-FETs, and V-FETs. Figure 2 illustrates some important parameters for these FETs, and Table 1 shows the 30 standard cells that we designed. 3 The transistors and metals used for pin connections in our standard cells are based on the following references: [6], [7], [31]- [33]. Below are some details of our assumptions.
1) Our standard cell circuitry originates from the Silvaco 45nm Open Cell Library [34]. However, Silvaco 45nm cell designs are based on planar MOSFETs. Therefore, we heavily modified the netlist to fit the performance requirements for FinFETs, NWFETs, and NSFETs. 4 2) One minimum-sized transistor consists of two fins.
3) The height of a nanowire and thickness of a nanosheet are identical to the thickness of a fin. Also, the spacings of stacked nanowires and nanosheets are based on [35]. 4) All layouts and design parameters for an H-FET standard cell are identical to those of a FinFET. In an H-FET cell, nanowires and nanosheets replace an equivalent fin. 5) We use Synopsys QuickCap [36] for capacitance extraction. In brief, we note that these experimental settings are similar to the conditions of [30], and these should enable a fair comparison to be conducted.

B. PARASITIC CAPACITANCE COMPARISON BETWEEN FinFET, H-FET AND V-FET
One key question in designing H-FET and V-FET standard cells is ''How many nanowires/nanosheets do we need to supply sufficient current to a circuit?.'' Though the most straightforward method for all FinFETs, NSFETs, and NWFETs is to increase the layout area to increase the number of the fins (or nanowires/nanosheets), this may not be a desirable solution from a layout perspective. FinFETs can increase their fin heights and drive stronger current without adjusting the layout area (see Figure 2 (a)). In H-FETs, devices can be stacked on top of each other. In addition to device stacking, NSFETs allow the width of their nanosheet channel to be wider for stronger driving current (Figure 2 (c)). Therefore, it is essential to understand how each current-increasing method would impact the total parasitic capacitance in a given standard cell layout. FinFET INVs, respectively. Each yellow, red, and blue solid line describes the capacitance change in one-tier, two-tier, and three-tier horizontal NSFETs when the widths of their nanosheets change. NWFET capacitance can be considered as a nanosheet with 5 nm width, and we normalize all capacitance values to the baseline of a 3 Fin FinFET layout. This experiment uses a 44 nm-height FinFET for comparison. A 44 nm fin is based on the maximum width that our layout can support for NSFETs. Figure 3 (a) provides the following information: Unlike the performance enhancement NSFETs are reported to show in devices, it is difficult to state that NSFETs are significantly better than FinFETs from the parasitic perspectives. The ratio of capacitance increase in NSFETs is similar to th'at of FinFETs. In other words, if the channel count and the size of nanosheets are similar to those of fins, the total parasitic capacitance will be similar. Thus, unless NSFETs show significantly better current driving capability, a similarlysized NSFET will have a similar parasitic capacitance as a FinFET in the layout. This is reasonable because a similar physical size (e.g., fin or nanosheet) will result in a similar capacitance.

1) H-FET
On the other hand, NWFETs show significant capacitance reductions compared to FinFETs [30]. However, nanowires are reported to have less current-driving capability when compared to nanosheets. Therefore, foundries should be considering the trade-offs between nanowires and nanosheets for the best current driving capability while considering layout parasitics. From this perspective, Figure 3 (a) provides information on how significant capacitance from nanosheets forms in a given standard cell layout compared to FinFETs. For example, when a designer is designing an H-FET INV which has less capacitance than a 2-fin FinFET INV, they should use a device that is (1) a 3-tier-stacked nanosheet for which the width is smaller than 28 nm or (2) a 2-tierstacked nanosheet for which the width is less than 44 nm. A 1-tier NSFET would also be better if it could drive a sufficient amount of current. Besides, a 3-tier stacked NS H-FET with 12 nm width has a similar total parasitic capacitance as a 1-tier NS H-FET with a 44 nm width. The total effective width of the channel is 36 nm (12*3) in a 3-tier NSFET, while the total effective width for a 1-tier NSFET is 44 nm, which 8 nm longer. Also note that the parasitic capacitance increases by 36.2% (e.g., normalized capacitance: 1.41 to 1.92) in 1-tier NSFETs while it increases by 66.2% (e.g., 1.48 to 2.46) in 2-tier NSFETs and by 89.2% (e.g., 1.58 to 2.99) in 3-tier NSFETs. These results show that the parasitic capacitance is affected differently in various layout scenarios. In brief, nanowires/nanosheets should utilize a secure stacking and width changing strategy, and these strategies change the capacitance of the total layout. Thus, delicate device engineering combined with layout design should be performed. 5 We note that the change of device size in NS/NWFETs is critical to transistor performance (e.g., drive current). Besides, we also emphasize that the transistor drive current will scale with the change of channel dimensions, which is a critical factor in standard cell design. Standard cell design in NS/NWFETs should consider both device strengths and parasitic capacitances for optimal performance.
2) V-FET Figure 3 (b) shows the parasitic capacitance comparison between a 1-fin FinFET and 1-tier V-FETs (nanowires and nanosheets) in an INV design. The design of a V-FET INV layout follows the guidelines of [30]. Note that a V-FET INV can be designed using 50% less area than a FinFET INV. The blue dotted line refers to the capacitance of a 1 Fin FinFET. Each yellow, red, or blue solid line describes the capacitance change in one-tier, two-tier, or three-tier vertical NSFETs, respectively, when the widths of their nanosheets change. The capacitance values are based on the same normalization of a 3-fin FinFET for Figure 3 (a) and (b). Figure 3 (b) provides the following information: First, V-FETs significantly reduce parasitic capacitance compared to H-FETs. Note that for capacitance increases of V-FETs when the nanosheet widths increase, the capacitance increase is not as steep as for H-FETs. This is because the standard cell layout of a V-FET INV is only 50% of an H-FET INV, and the affected interconnect/device area is significantly smaller than that of the H-FET components. Therefore, V-FET circuit designers can design standard cells that outperform H-FETs in terms of parasitic capacitance. 6 Second, H-FETs can stack nanowires/nanosheets without increasing the layout area. This stacking is due to the layout style that H-FETs offer in design. However, the concept of stacking nanowires/nanosheets is not possible in V-FET layouts, because each nanowire/nanosheet must consume a certain area. Nevertheless, we note that increasing the number of nanowires/nanosheets does not always lead to an increase in the design area. A standard cell typically leaves some amount of design margin in its layout. In other words, a standard cell may be designed to have the same layout area when it uses one nanowire/nanosheet or three nanowires/nanosheets in its design. Thus, we design our layouts with the same area for V-FETs with one nanowire/nanosheet to three nanowires/nanosheets. In summary, V-FETs provide significant capacitance reductions compared to H-FETs and Fin-FETs, but careful designs must be performed as stacking nanowires/nanosheets is not possible.

III. INTERCONNECT STRUCTURE AND CIRCUIT DESIGN FOR MANY-TIER VERTICAL FETs
This section discusses an optimal interconnect structure for many-tier V-FET logic cell design. We emphasize that this is the first study that discusses an optimal interconnect structure for many-tier V-FET logic circuits. We compare previouslyproposed interconnect structures to our new structure and discuss why our proposed structure is necessary for circuit designs. Then, we illustrate the assumptions behind our proposed structure and its meanings for circuit design.

A. PROPOSED INTERCONNECT STRUCTURE
Few studies have mentioned interconnect structures for many-tier V-FET circuits [29], [37]. These two studies discussed how an optimal SRAM can be designed when a designer utilizes a 2-tier V-FET structure. According to these studies, the authors designed an interconnect structure in which the top and bottom devices are connected such that no metals are used between the devices (see Figure 4 (a)). However, a critical issue with this structure lies in circuit design. From the schematic point of view, this structure is identical to a two-input switch for which these two inputs are connected in series (Y = A * B). In circuit designs, however, many scenarios occur for which it is highly beneficial for designers to have the freedom to use the intermediate terminal for various purposes (see Figure 4 (b)). Therefore, we propose an interconnect structure in Figure 5 that allows 2-way routing for each source/drain terminal (e.g., SD1, SD2, and SD3 in Figure 4 (b)). Note that our structure supports 2-way routing even between two devices (SD2 using M2 in Figure 4 (b)).  [29], [37], and (b): Our structure and schematic. Transistors can be placed in both tiers (G1 and G2) or in one tier only (G1 or G2).

B. ASSUMPTIONS
Our target is to provide an interconnect structure that enables 2-tier V-FETs to use less area than 1-tier V-FETs in their standard cell designs. To fulfill this goal, our proposed interconnect structure is based on the following assumptions 7 : 7 Considering the manufacturability of advanced nanowire/nanosheet FETs and interconnect structures in the sub-10 nm node [25], [26], we consider our assumptions for the advanced devices and interconnects to be reasonable.

1) Our structure requires a bidirectional interconnect
(both x and y directions) for all M0 (bottom), M2 (middle), and M4 (top), which is similar to [30]. Unidirectional interconnects in M0 and M2 lead to unwanted area overhead [27]. Our structure for the middle interconnect should be (1) metal for low resistance, 8 and (2) should support a bidirectional interconnect for routing. This is for the following reasons: First, we are very close to implementing bidirectional interconnects in advanced technology nodes [39]- [42], and the choice of the number of layers in the middle layer directly impacts the height of the nanowire/nanosheet. For example, when the middle layer uses two metal layers, it requires three intermediate dielectric layers. However, if the middle layer uses only one metal layer instead, it requires two dielectric layers. Thus, using two metal layers requires roughly 2× the height that one metal layer requires. Assuming that a dielectric or metal layer requires a few tens of nanometers for its height, using two metal layers would require a height of a few hundreds of nanometers. Note that this is the intermediate height required before the second-tier poly is placed. Thus, two metal layers for the middle layer may require significant heights, which may lead to fabrication issues and high resistance increases in the nanowire/nanosheet channel. However, the number of metal layers on the top or bottom is not as critical as the number of layers in the middle layer. Our manytier V-FET requires that all terminals have bidirectional interconnects for optimal standard cell designs. 2) We assume that V-FET transistors can be placed both in 1-tier and 2-tier configurations, but can also be separate into 1-tier or 2-tier. 9 Figure 6 illustrates two 8 O. Kilpi et al. successfully manufactured a metal interconnect in the middle of a V-FET channel in [38]. Based on this reference, we assume implementing a metal interconnect inside the V-FET channel to be possible. 9 We assume this possible by mentioning that the SRAM design by [29] requires individual placement of V-FET transistors. examples of transistors being placed on either the 1 st tier or 2 nd tier in a 2-tier INV design. 10 Details of this assumption will be discussed in Sec.III-C1. 3) Our interconnect inherits some assumptions and characteristics from 1-tier V-FET designs [30]. First, we assume poly (e.g., M1 and M3) to be used to create the gate layer for V-FETs. Due to the high resistance of the poly, we assume unidirectional metal layers in M1 and M3 that directly connect to these poly layers. Second, we assume separate vias connecting from M0 to M4 (via0 to via3). References regarding V-FET circuit designs mention their intermediate layers to connect to the top or bottom metal layers (e.g., [37]). Therefore, we assume that these structures are possible.

C. PROPOSED INTERCONNECT STRUCTURE AND ITS IMPLICATIONS FOR CIRCUIT DESIGN
We discuss the physical and logical meaning of our proposed interconnect structure in the circuits' perspective and how it affects the standard cell designs for 2-tier V-FETs. Regarding the number of tracks a particular V-FET standard cell can use, we assume that one pair of transistors (P/N) can use one vertical routing track (see Figure 7). We note that our proposed structure can be extended to a many-tier V-FET interconnect structure based on our guidelines.

1) SELECTABLE TRANSISTORS FOR CIRCUIT DESIGN
As mentioned in Sec. III-A, we assume that transistors can be placed at any tier based on its design. For example, in Figure 4, a channel pillar can have G1 and G2, a separate G1, or a separate G2. In cases where a channel pillar consists of one transistor (e.g., G2 only), we assume the other part to be a regular conducting channel (e.g., the G1 part that does not have a gate). We emphasize that the freedom to design separate transistors in a 2-tier V-FET is very important. As shown in Figure 7 (a) and (b), two styles of INV can be designed by using poly in different tiers (see Figure 6 for side view). More importantly, since an INV requires only one PFET and one NFET, these INVs can be designed only if our assumption is valid.

2) LOGICAL STRUCTURE OF THE MANY-TIER V-FET
As Figure 4 (a) shows, a V-FET channel with two gates has a logically-identical transistor structure as a two-transistor device that the source and drain terminal of each transistor are connected. This means that a netlist with two transistors sharing one net is the best way to utilize a 2-tier V-FET structure. In other words, if a circuit design requires transistors that do not share the same net too much, that design will not benefit from using a many-tier V-FET structure. A buffer (BUF) is an excellent example of this characteristic (Figure 7 (c)). A BUF design in a 2-tier V-FET does not show any area reduction compared to a 1-tier V-FET. A buffer consists of two inverters, and this circuitry does not share any net between the source and drain terminals of PFET/NEFT. Thus, despite the potential for area reduction in 2-tier V-FETs, the BUF design does not benefit. Note that the logical structure of manytier V-FETs also follows this concept. Although many-tier V-FETs allow N number of transistors to be placed vertically, this configuration is not fully advantageous unless a netlist of transistors is designed such that the source/drain terminals share the same net (i.e., N serially connected transistors). From this perspective, a BUF design will always consume a two-transistor footprint area in any type of many-tier V-FET designs. More transistors on the same footprint cause another unique routing issue that occurs in V-FETs. An H-FET footprint supports one net, whereas a 1-tier V-FET footprint supports two nets (e.g., top and bottom), and a 2-tier V-FET supports three nets (e.g., top, middle, and bottom). More nets on the same footprint provide two influences: First, the device layer requires more routing layers. We see that a 2-tier V-FET has three metal layers, whereas a 1-tier V-FET has only two. Second, escape routing becomes more challenging as the number of device routing layers increases. As shown in Figure 7 (a), input pin A is required to connect all the way from the top metal to the 1st-tier poly. It is crucial to guarantee a position for the I/O pins to perform escape routing from the top metal to its destination. In complex cells, situations occur where cells should use more routing tracks because of the escape routing of these I/O pins.

4) IS BIDIRECTIONAL ROUTING NECESSARY IN THE MIDDLE METAL LAYER (M2)?
Despite some references which mention the latest interconnect structures with advanced width and pitch, there is still some controversy regarding the forecast for bidirectional metal usage in advanced nodes. However, based on some previous references which guide the possibility for bidirectional routing support [39]- [42], we describe how 2-tier V-FETs can be advantageous from advanced interconnect technologies. A comparison between Figure 7 (d) and (e) illustrates how a NAND3 gate reduces area by bidirectional routing in M2.
Thanks to its schematics, NAND3 is a gate that can utilize the 2-tier V-FET structure. With our proposed interconnect, a 2-tier V-FET NAND3 uses two-transistor footprints (Figure 7 (d)), while a 1-tier V-FET NAND3 uses three. However, if the interconnect of a 2-tier V-FET does not support bidirectional routing in M2, this requires the design to use one additional y-directional routing track, which should be considered identical to consuming one additional transistor footprint. Therefore, we emphasize that bidirectional routing is essential in 2-tier (and many-tier) V-FET designs for the minimum area.

IV. STANDARD CELL DESIGN METHODOLOGY FOR MANY-TIER V-FETs
This section describes the design methodology and a set of algorithms for 2-tier V-FET standard cell designs. We detail our proposed design methodology and algorithms so that we can generally expand to many-tier V-FET designs. Our design methodology is inspired by [30]. The following subsections discuss how the algorithms in this work are expandable and improve on the previous study.

A. GENERAL DESIGN METHODOLOGY
Our methodology follows the design order of Algorithm 1. First, given a standard cell netlist, this netlist is partitioned into clusters (see Sec.IV-A1 for details). Then, Cluster Placement extracts the ordering between clusters. Once we obtain the cluster order, we perform Mini-Cluster Placement to finalize the one-line ordering of the transistors. Then, we perform N-tier V-FET transistor placement and perform manual transistor placement. Finally, we perform Net Routing to finalize our standard cell design. Net Routing is a normal routing step that is performed in conventional standard cell designs. Our methodology assumes that the transistors are placed in one design track. Thus, in this one-track design, we assume that VDD (PFETs) is placed on the top, and VSS (NFETs) is placed on the bottom.

1) CLUSTER PARTITIONING
We define a cluster as a group of transistors (both PFET and NFET) gathered for a special purpose. The concept and formation of a cluster is identical to [30]. Given a standard cell netlist, transistors form a network of PFETs and NFETs to generate a specific output signal. For example, an INV requires one PFET and one NFET, and a NAND2 (or a NOR2) requires two PFETs and two NFETs to generate an output signal ZN. Two transistors (e.g., 1 PFET and 1 NFET) form a cluster in an INV gate, and four transistors (e.g., 2 PFETs and 2 NFETs) form a cluster in a NAND2 (or a NOR2) gate. We define these gates (INV, NAND2, and NOR2) as singlecluster gates.
Contrary to the concept of single-cluster gates, complexcluster gates require more than two clusters to form a gate. For example, a BUF is a complex-cluster gate that requires two clusters for its gate design. In a BUF, an INV output is cascaded into another INV to export an output Z. Likewise, AND or OR gates are also complex-cluster gates because they require an INV cascaded with NOR or OR clusters. Therefore, a standard cell will be either a single-cluster gate or a complex-cluster gate.

B. CLUSTER PLACEMENT
Once we partition the clusters, we perform Cluster Placement based on Algorithm 2. A set of clusters can be assumed to be a graph, where the nets are edges and clusters are vertices. Our goal of this placement is to reduce the wirelength between VOLUME 8, 2020 clusters to be in its minimum length. In our work, we propose a modified force-directed graph for cluster ordering. A force-directed graph [43], [44] performs a placement of vertices that finds equilibrium in the given solution plane based on two forces, which we refer to as the ''Pullforce'' and ''Pushforce''. A Pullforce is an attractive force that occurs between vertices connected by edges (e.g., 'nets' in a netlist). On the other hand, a Pushforce is a repulsive force between all vertices in the solution plane. A Pullforce is in the relationship shown by the Hooke's law, and a Pushforce is in the relationship shown by Coulomb's law as in Eq. 1 and 2.
The constants in these equations are as follows: x as the distance between clusters (in m), k 1 varies in springs (in N /m), k 2 is 8.987 × 10 9 N · m 2 · C −2 , and q 1 and q 2 , are the magnitudes of C.
We adjust Eq. 1 and 2 based on the following rules: 1) We consider each net between clusters to be a separate edge between vertices. In other words, clusters having more connections between each other will have stronger Pullforces. 2) We scale the strength of the Pullforce by the size of its cluster sizes, which means that larger clusters (i.e., clusters with more transistors) have weaker Pullforces. This scaling prevents huge clusters from being placed very close to each other. Note that huge clusters consume significant design area. Thus, providing opportunities for small clusters to be placed at their optimal locations may be more beneficial from the perspective of total wirelength. With these adjustments, Eq. 1 and 2 become Eq. 3 and 4 where the sizes of clusters are s 1 and s 2 , the number of nets between clusters is N net , and k 3 and k 4 are adjustable constants in the calculations. With these complex forces present in the solution plane, each vertex has a sum of forces that affect other vertices. The Sumforce is the sum of the Pullforce and Pushforce, which can be expressed as a vector in the x and y-directions. Given an initial location, these vertices will traverse through the solution plane based on the Sumforces between vertices and edges. Eventually, when sufficient time has elapsed, these vertices will be stable based on their equilibrium state.
In our algorithm, we fix one vertex in its initial location to reduce runtime. Unlike [30], we place a vertex (e.g., Pulling Vertex) that applies a very weak Pullforce to the leftmost vertex at the very last coordinate in the x-direction plane (-∞). This Pulling Vertex only pulls the vertex that is farthest from the stable vertex. This concept has the following advantages: First, the purpose of the force-directed graph is to obtain the ordering of the vertices. From this perspective, this weak Pullforce helps the solution to be naturally aligned in the x-direction. Second, our method is less calculation-hungry. [30] placed M vertices on the top and the bottom of the solution plane to guide the clusters in the x-direction. Thus, the solution plane needs to calculate the Sumforce between N (clusters) + 2×M vertices. However, our methodology includes only one additional vertex for solution guiding. Thus, our methodology uses (N +1)/(N +2M )% fewer nodes. When the solution converges to equilibrium, we extract the x-coordinates and assign them in order.

C. MINI-CLUSTER PLACEMENT
We propose a Mini-cluster Placement process that determines the order of transistors once the Cluster Placement is complete. Algorithm 3 describes the general flow. Mini-clusters are gatherings of transistors in series or parallel that we consider as transistor groups. The Mini-cluster concept is to reduce the complexity of the netlist by archiving non-critical nets and transistors. The following are the rules for creating a Mini-cluster. 1) Each PFET or NFET forms a separate Mini-cluster.
2) The start of a Mini-cluster (=Level 0 parent vertex) is the output net (e.g., ZN = parent net) of a cluster.

3) A Level 0 parent vertex is the lowest level vertex.
A child vertex of the Level 0 parent vertex is a Level 1 parent vertex. 4) A Mini-cluster consists of a parent net and a child net. 5) From a parent net, a Mini-cluster can be a group of series or parallel transistors. a) Until a child net diverges to two (or more) child transistors, or a child net becomes VDD/VSS in a series connection, those transistors form a Minicluster. b) A group of parallel transistors can be a Minicluster if the source and drain terminals of the transistors are connected to the same parent and child nets. The key concept of our placement is to convert as much parallel transistor data into serial transistor data as possible. A cluster will form a tree of Mini-clusters as in (a) for both PMOS and NMOS. If the tree structure is (a), the Mini-clusters will be in one line, as shown in (b), because child nodes will always be placed to the left of the parent node. However, the designer must still question which child node should be placed ''closer'' to the parent node than the other. In our figure, note that there is no information on the Lv.1 nodes for (a) determining which node should be placed closer to the parent node. However, a standard cell typically has a complementary network between PFETs and NFETs. Therefore, our algorithm scans through the complementary FETs (NFETs in this case) and searches for serial connections in the corresponding Mini-cluster. Figure 8 (c) shows an example of an actual transistor network, and (d) shows how we express this in Mini-clusters. Our design step of P/N Comparison traverses through P/NFETs and checks if there is a corresponding complementary Mini-cluster pair. Note that in (d), NFET has two parallel Lv.0 nodes, but PFET has one Lv.0 node. Thus, our algorithm starts from the Lv.0 node in PFET for ordering. Our algorithm starts from the lower level node and checks if the input pins are identical for its complementary node. For example, the NFET Lv.0-2 node has an identical input (A and B) to the PFET Lv.0-1 node. Thus, NFET Lv.0-2 becomes the complementary node for PFET Lv.0-1. Once the target Minicluster and its complementary node are set, our algorithm checks for serial connections. Due to the complementary characteristics, one Mini-cluster or its complementary node will typically consist of a serial transistor connection. The counter node follows the serial order of the transistors. In (d), NFET Lv.0-2 has a serial connection that the closest is the TR gate = A and is followed by TR gate = B. Thus, PFET Lv.0-1 follows the same order. In (d), The numbers in # indicate the order in which the placement should be performed in our algorithm. If our algorithm cannot find a serial connection in the Mini-cluster pair, it places the transistors in their numerical order. In summary, we generate Mini-clusters and scan for the serial information for transistor ordering.
The advantage of our Mini-cluster Placement is that (1) it is more intuitive for understanding the structure of transistors, and (2) it requires fewer calculations. [30] required an analysis that worked through all possible transistor ordering combinations inside a cluster, which was no different from brute-force tuning for the smallest wirelength. However, our methodology provides a method to analyze the netlist, and the transistor ordering process becomes significantly shorter.

D. N-TIER TRANSISTOR ALLOCATION AND MANUAL TRANSISTOR PLACEMENT
The one-line transistor ordering we obtained in the previous subsections is a solution required for placement in a 1-tier V-FET standard cell design. In fact, this is a universal solution because the goal of placing highly-connected transistors as closely as possible does not change even when we design V-FETs in many tiers. Our N-tier transistor allocation proposed in Algorithm 4 describes how gathering closely-related transistors in the same footprint reduces a more significant amount of area. First, given an order of transistors, we determine if the M th transistor and the following N − 1 transistors meet the requirements for placement on the same footprint. For example, if N = 2, the netlist of those two transistors should be identical to Figure 4. Likewise, for N -tier, where N > 3, V-FET designs follow the same methodology. Any many-tier V-FET should meet the netlist structure requirement to be placed on the same footprint. Once all of the processes described in Sec.IV are complete, we manually select the location of each transistor for best performance and then perform regular routing. Some details of the manual design are mentioned in Sec. V-B7 for discussion.

Algorithm 4 N-Tier Transistor Allocation
Data: Transistor order of a standard cell Result: Minimum footprint layout and transistor count per unit footprint 1 Generate Mini-clusters ; 2 for a transistor (TR i ) in a given order do 3 if TR i . . . TR i+N matches required pattern then 4 Gather TRs into one footprint ; 5 end 6 end VOLUME 8, 2020

V. RESULTS AND DISCUSSIONS
This section displays our results in comparison with [30]. We compare 1-tier and 2-tier V-FET standard cells for the area, wirelength, and parasitic capacitance. In the subsections, we describe the conditions of our experiment. Then, the results and detailed analysis with discussions for future study follows. Our primary goal for standard cell designs targets a minimum area layout.

A. EXPERIMENTAL CONDITIONS
To conduct a fair comparison between 1-tier V-FETs and 2-tier V-FETs, we set our experimental conditions similar to those of [30]. We describe some important details below: 1) The design conditions for the layout followed the details of [30]. The metal width and pitch were identical, and we designed both 1-tier and 2-tier V-FET standard cells to have five horizontal routing tracks in a 5 nm technology node. Vertical connections between two vias were considered as one routing track, and all detailed device/interconnect dimensions were the same as H-FETs for a fair comparison. 2) We compared the results among 30 standard cells.
These cells are noted in Table 1, and the GDSII layouts of DFF, AND2, and HA are described in Figure 9. Since all placements could be accomplished in less than one second, we did not report runtimes.
3) The number of channels used for transistor design was identical to [30]. A transistor consists of four nanowires. 4) The standard cell designs followed the flow of Sec. IV. Once we completed these steps, we manually designed the standard cells for the best performance. We illustrate the reasoning for this manual process in Sec. V-B7. 5) Once we designed the standard cells in a GDSII format, we wrote a parasitic technology file that analyzed the 3-D structure of this layout and extracted the parasitics. For this process, we used Synopsys QuickCap [36].
B. RESULTS ANALYSIS 1) AREA Table 2 shows our comparison results. For the layout area, we see a significant reduction of −35.6% on average. However, it is important to understand that 2-tier V-FET standard cells do not show a 50% area reduction on average compared to 1-tier V-FETs. As mentioned in Sec.IV-D, the footprint reduction of a 2-tier V-FET occurs only for specific layout conditions (as in Figure 4), which are not conditions that all transistors can meet. INV and BUF are good examples of 2-tier V-FETs that do not result in any area reduction at all. Besides, some standard cells are formed as complexcluster cells. The boundaries of these complex cells cannot meet these conditions because the cluster boundaries do not share a common SD2 net. Thus, a 50% area reduction is the maximum achievable but is a very challenging goal to achieve. However, we emphasize that our results of −35.6% represent a significant area reduction that we achieved via a non-scaling approach. Even the area reduction from H-FETs to 1-tier V-FETs was −22.5%. When comparing the area reduction from H-FETs to 2-tier V-FETs, we explore −50.1%, which is highly significant.

2) WIRELENGTH -GENERAL
Unlike the significant area reduction, 2-tier V-FET standard cells show an average of 16.5% increased wirelength compared to 1-tier V-FET cells. This result is contrary to the expectation that smaller cells would report smaller wirelength than larger cells. However, as the vertical design space becomes deeper in the z-direction, 2-tier V-FETs face more unexpected challenges (see Sec. V-B4 for details). From a general point of view, the increase of wirelength in 2-tier V-FETs provides the insight that 2-tier (or any type of manytier) V-FETs may not be the optimal structure for wirelength reduction in certain standard cells. In addition, the vertical routing lengths from the top to bottom metal are now nonnegligible heights. Still, we emphasize that the wirelength of 2-tier V-FETs is similar to that of H-FETs (−0.3%).

3) WIRELENGTH -BREAKDOWN
We provide an insight that the lateral wirelength of 2-tier FETs is similar to that of 1-tier FETs. Table 3 provides a breakdown of the wirelength. In addition to Table 2, we have examined the wirelength of standard cells in two different categories: One is the wirelength for signal routing only, and the other is the signal routing that does not consider any vertical vias. This table provides the following information: First, the lateral signal wirelength of 2-tier V-FETs are similar to those of 1-tier V-FET designs. Thus, we note that vertical routing takes a significant portion of 2-tier designs. However, the average area of 2-tier V-FETs is reducing, but the lateral wirelength is almost the same. Considering the metric of wirelength per area, 2-tier V-FET routing is not as efficient as 1-tier routing. Second, the usefulness of the 2-tier V-FET interconnects is very different in various standard cells. Typically, a 2-tier V-FET is very useful for reducing the size and wirelength of large-sized cells, but it is counter-effective in smallsized cells. For example, a 2-tier AOI211 shows a 68.4% increase in signal wirelength compared to that of a 1-tier V-FET. In contrast, a FA shows −15.6% reduction in signal wirelength compared to its 1-tier counterpart. Finally, 2-tier V-FETs can be designed to be almost identical to 1-tier FETs based on the designer's decision. As shown in Fig. 7 (c), BUF design in the 2-tier case is nearly identical to a 1-tier BUF. In certain cases, at the cost of additional area, better design in terms of wirelength is possible for specific cells. However, this is a designer's choice for the tradeoff between area and wirelength. 11 Figure 10 (a) describes some unwanted scenariosin an AND2 gate with 2-tier V-FET designs. Figure 10 (b) shows a 1-tier V-FET AND2 design for comparison. The following are some highlighted points. First, an AND2 design requires more connections (e.g., height) between the top and bottom metal. More connections translate to increased wirelength and vias. For VDD/VSS, a reinforced global VDD/VSS stripe using both the top and bottom metal will resolve this issue, but an adequate solution does not exist for signal nets. Second, some inefficient net connections occur due to routing area constraints. An internal net connected to the second gate in an AND2 uses both the top and middle metal. The best routing is to connect the top and middle metal, but this is not possible in this AND2 because there is no space (poly1 blocks these two metals). Finally, net A1 faces two issues: (1) the 1 st poly requires more via to be exposed to the top metal, and (2) A1 requires a detour because the optimal space for escape routing is not possible due to other nets. In summary, we emphasize that 2-tier V-FET standard cells face many design issues due to the reduced design space. Besides, we anticipate that these issues will become more severe in many-tier V-FET designs.

5) CAPACITANCE -GENERAL
As reported in Table 2, the capacitance of 2-tier V-FETs increases by 13.2% in general despite the expectation that it would decrease. Here, we highlight some important points from our results. First, the capacitance trends between largesized cells (e.g., DFF, FA, HA, etc.) and small-sized cells are different (e.g., INV, BUF, NAND, etc. . . ). It is difficult to state that the total capacitance follows the general trends of wirelengh increase/decrease. However, we note that the total capacitance and the signal wirelength follow a similar trend. We also emphasize that 2-tier V-FET capacitances are reduced by −18.9% when compared to H-FET standard cells.

6) UNUSED GATES FROM THE PARASITICS PERSPECTIVE
Cells such as INV and BUF show increased capacitance even though that there is no increase in area or wirelength. When a channel pillar uses only one transistor, the designer must understand that a 1-tier V-FET is more favorable than a 2-tier (or a many-tier) V-FET from the perspective of parasitics. Two factors support this statement: First, the channel pillar of a 1-tier V-FET is shorter than a 2-tier pillar. Thus, the resistance of the pillar is smaller in a 1-tier V-FET than in a 2-tier V-FET.
Second, 2-tier V-FETs are potentially exposed to more parasitic capacitance than 1-tier V-FETs. To explain this, we propose the concept of the ''design volume''. Given the same silicon footprint, the 3-dimensional device volume (=''x-width''×''y-width''×''height'') of a V-FET is greater in a 2-tier V-FET. Also, note that more metal layers are in the device volume in a 2-tier V-FET than in a 1-tier V-FET. The greater design volume means that the layout environment in a 2-tier V-FET has more obstacles than in a 1-tier environment. This environment leads to more inherent parasitic capacitance, which unexpectedly handicaps 2-tier designs. Given a standard cell that uses the same area as a 1-tier V-FET and 2-tier V-FET, a 2-tier V-FET will possibly suffer from more capacitance than in a 1-tier V-FET. In the BUF example, the total capacitance of a 2-tier BUF is 24% higher than in a 1-tier V-FET BUF design due to this. In brief, the 2-tier V-FET structure is not so advantageous when compared to a 1-tier V-FET from the perspective of parasitics unless the design footprint is reduced. We expect this trend to be similar for many-tier V-FETs.

7) HOW MANY DESIGN STEPS CAN WE AUTOMATE FOR MANY-TIER V-FET STANDARD CELLS?
Unlike 1-tier V-FET standard cell designs, we performed manual transistor placement in the last steps for 2-tier V-FET standard cells. A general rule for placing the I/O pins on the top metal and placing the internal nets in the lowest metal layer was the key idea for automation in 1-tier V-FET designs. However, cases occurred in 2-tier standard cells that better designs were possible without following these rules. Also, the design freedom to select a tier for vertical transistor placement added more complexity to the design that did not follow our general rules for optimization. Thus, we comment that more studies will be necessary for the general steps of design automation in many-tier V-FET standard cells.

VI. CONCLUSION
In this paper, we investigated the advantages of many-tier vertical GAAFETs (V-FET) for logic cell designs. In particular, we investigated the advantages that 2-tier V-FETs provide compared to 1-tier V-FETs at the layout level. We proposed an optimal interconnect structure and a design methodology that optimizes an arbitrary many-tier V-FET standard cells. We compared the area, wirelength, and capacitance of these standard cells and showed an −35.6% reduction in area, a 16.5% increase in wirelength, and a 13.2% increase in parasitic capacitance. Compared to H-FETs, our results reduced area, wirelength, and capacitance by −50.1%, −0.3%, −18.9%, respectively. We report that these results are due to the advanced interconnect structure and optimized designs. However, we also report that increases in wirelength are due to the unexpected increase in vertical interconnects and that the capacitance increases are due to the increased vertical channels compared to 1-tier V-FETs. We report a general trend that 2-tier V-FETs show better design metrics in larger standard cells but worse metrics in smaller standard cells. For resistances, small standard cells are not as efficient as 1-tier V-FETs. Thus, careful design of interconnects and better devices should be supported for better designs and performance.