Monolithic 3D 6T-SRAM Based on Newly Designed Gate and Source/Drain Bottom Contact Schemes

For the first time, we suggested that the monolithic 3D (M3D) static random access memory (SRAM) with gate and S/D bottom contact (GBC and SDBC) schemes (SRAMSDGBC) and analyzed they could significantly improve the power, performance, and area (PPA) compared to the conventional M3D SRAM (SRAM3D). SRAM3D could not directly connect the top-tier device and the bottom-tier metal line. Thus two tiers had to be connected by bypassing the metal line. As a result, SRAM3D wasted the area to place the monolithic interlayer via and did not get 50 % area scaling. However, gate and S/D bottom contact schemes, GBC and SDBC, could solve these problems. Although these methods required additional process steps, they brought significant advantages in interconnect RC and PPA. Based on a 26 nm width nanosheet transistor, SRAM3D showed a 30 % area reduction compared to 2D SRAM (SRAM2D), whereas SRAMSDGBC showed a 50 % area reduction. In the ideal (worst) case which ignoring (considering) the array resistance, the read and write access time of SRAMSDGBC were improved 7.7 % (19 %) and 8.3 % (33 %) than SRAM3D, and the write dynamic power was improved by 5.9 % (5 %). Especially, SRAMSDGBC showed improved PPA in the worst case compared to SRAM2D_Cu, which had relatively small interconnect resistivity. Namely, GBC and SDBC schemes are essential to enhance the PPA of M3D cells and will be a promising scheme in M3D SRAM and other logic cells.


I. INTRODUCTION
Si fin-shaped transistor (FINFET) and nanosheet transistor (NSFET) have been scaled down to 3-nm node and leading the semiconductor industry [1]. However, it is difficult to scale the contact-poly-pitch (CPP) to less than 42 nm [2] due to the device performance degradation by the short channel effect despite having excellent gate controllability. Therefore, various studies such as source/drain patterning (SDP), buried power rail (BPR), complementary FET (CFET), and monolithic 3D (M3D), which can reduce the cell area in The associate editor coordinating the review of this manuscript and approving it for publication was Rocco Giofrè . different ways without scaling the device itself, are being conducted [3]- [6]. Among them, M3D and CFET are promising with high area scaling by stacking devices, and in particular, M3D has the advantage of less difficulty in device processing than CFET.
However, in the previous M3D studies, 50 % area scaling is impossible because it requires additional space to place the monolithic interlayer via (MIV) to contact the top, bottomtier [6], [7]. In particular, when devices need to be compactly integrated like static random access memory (SRAM) array, additional space consumption for MIV reduces the advantage of M3D. To solve this problem, new contact schemes which directly connect the bottom-tier metal line and the top-tier gate & S/D bottom (GBC and SDBC) are essential. When the device is connected directly, the cell area and length of the bypassing metal are reduced, thus reducing the interconnect RC and improving power, performance, and area (PPA) is expected. Some previous studies [8]- [10] are already applied similar S/D direct contact schemes, but research on the advantages of M3D has not been conducted yet.
This study analyzed the state-of-the-art devices adopting GBC and SDBC schemes with a detailed process flow. And then, M3D SRAM with these two schemes (SRAM SDGBC ) was compared to 2D SRAM (SRAM 2D ) and conventional (Conv.) M3D SRAM (SRAM 3D ) in PPA perspective by using a mixed-mode platform. For a more accurate and comprehensive analysis of the SRAM array, the degradation of the top device and diverse array resistance conditions was considered.

II. DEVICE STRUCTURE AND SIMULATION METHOD
For accurate simulation, mixed-mode simulation was conducted to consider both front-end-of-line (FEOL) and backend-of-line (BEOL) [11]. All TCAD simulation models for the FEOL region were used as described in [3].
The bottom-tier device was fully calibrated to an Intel 10-nm node device [12]. Since the top-tier device had to be made using a low-temperature process, it was partially calibrated to the other previous study [13]. Fig. 1 shows the mixed-mode simulation platform. First, we set device and layout parameters and created 3D FEOL and BEOL TCAD structures. Second, virtual source (VS) modeling [14] and BEOL parasitic RC extraction were performed to proceed with SPICE simulation. This study investigated two cases: the worst case considering the array resistance and the ideal case ignoring it. The reason is, in the sub-3-nm node, a metal line had immense resistances due to the small electrical area with surface and grain boundary scattering (the worst case) [15]. However, since various SRAM assist circuit techniques [16] and methods for lowering the resistivity of BEOL materials [17] were being studied, it was necessary to analyze even the low resistance one (the ideal case).
The overall FEOL and BEOL parameters are shown in Table 1. It was assumed that bi-directional metal placement is possible by the EUV technique [18]. The SRAM interconnect consisted of three metal lines (M1-3) and a gate interconnect layer (GIL) [19], [20]. The minimum gap between metals was set to 12 nm, and the SRAM array size was set to 256 by 256. The width of the word line (WL) and metal line 2 (M2) were set large to mitigate the inevitable large array resistance [16]. In M3D, there is a problem with active sheet resistance due to the post-annealing of the bottom device. Thus, NFET was placed on the top-tier, and PFET was placed on the bottom-tier [21]. Separating N and PFETs for each tier makes it easy to optimize devices at each tier and has the advantage of reducing process time and cost [22]. In the interconnect part, Ru rather than Cu [4] was used as material due to thermal budget and reliability [23] issues by post-annealing. For SRAM 2D , since there was no post-annealing process, Cu with through-cobalt self-forming barrier (tCoSFB), which had a smaller resistivity, thinner barrier, and improved reliability [24], was also used for analysis. In this study, the access time and dynamic power of SRAM were analyzed. Read access time (RAT) was defined as the time intervals between the point when the WL potential increased to 50 % of the operation voltage (V DD ) and when bit line (BL) potential decreased 50 mV from the V DD . Write access time (WAT) was defined as the time intervals between the WL potential increased to 50 % of the V DD and when the internal node (Q) potential fell to 10 % of the V DD . Write dynamic power was measured by writing a different value to SRAM every WL cycle when WL was switched at 1 GHz.

Fig. 2(a) shows TCAD structures of conventional top-tier and bottom-tier devices (Conv Top and Conv Bot ) and top-tier devices with GBC and SDBC schemes (GBC Top and SDBC Top
). The bottom-tier and top-tier devices were made of bulk-NSFET and SOI-NSFET, respectively. Fig. 2(b) and 2(c) show the key process steps of GBC and SDBC. In the GBC process, a gate cut (GC) was performed using an additional mask after the RMG process to form a contact with the bottom-tier. Then the gate material was deposited after etching the dielectric (Fig. 2b). The thickness of the RMG stack is about 6-7 nm, and considering the overlay, if the GC is performed at a distance more than 10 nm away from the channel, the process can proceed sufficiently [25]. In the SDBC process, first of all, low-k was etched using an additional mask, and a sacrificial layer (SL) was deposited before S/D epitaxy growth. In this case, SL was used to reserve the space for the S/D metal (Fig. 2c). Also, SL prevented that S/D epitaxy grows excessively and blocking the way to the bottom metal. When the S/D epitaxy is not large enough, SL deposition can be omitted before S/D formation. Even in this case, the capping layer prevents the bottom-tier metal exposure so that S/D formation and annealing are possible. Finally, after RMG and isotropic etching SL, silicide was formed, then the capping layer was removed to deposit S/D metal. The capping layer prevents metal exposure during the silicide process. Since GBC and SDBC schemes suffer from a high aspect ratio, it is recommended to form the top-tier gate stacks and dielectric thickness as low as possible. Top-tier devices were set to not benefit from the S/D stressor. The reason is, top-tier devices used a process that had never been fabricated together with a low-temperature process so that unavoidable performance degradation may occur [26].
As shown in Fig. 3 Fig. 3(b) shows the gate capacitance (C gg ) of top-tier devices with different schemes. For GBC Top , When GC was performed at 20 nm away from the channel (inset of Fig. 3b), gate parasitic capacitance (C para ) was increased by 1 % compared to the Conv Top . Because HfO 2 was removed from the RMG stack, and metal was filled in its place. For SDBC Top , because the metal was filled all around the S/D, it showed 8.5∼9 % higher C para than the Conv Top . However, after the etching process (inset of Fig. 3b), it was possible to have less C para than Conv Top . Especially, it is a great advantage that the C para of SDBC Top can be further reduced as the S/D metal is etched deeper. C para reduction of the device itself improves the AC performance of the SRAM array directly [28]. However, to see the interconnect aspect intensively, the C para of SDBC Top was set to the same as Conv Top .   size, 5 nm width FinFET was also analyzed only for the area perspective (Fig. 6).
In Fig. 6, SRAM 2D with FinFET had a similar area to high-density FinFET SRAM (0.021 µm 2 ) in the previous study [29], which means that the SRAM structure and area of this study were reasonable. Since SRAM 3D could not directly connect both tiers, MIV was connected by bypassing next to the gate. A 12 nm gap between the gate and the MIV is necessary by design rule, so the CPP 42 nm could not be maintained (inset of Fig. 4, 5a1). Moreover, cell height could not be scaled significantly because of the central Q, QB node ( Figure 5a2). Hence, the cell area was only scaled by 30 % compared to the SRAM 2D in the NSFET case (Fig. 6). Also, since the MIV should not overlap with the adjacent SRAM cell, the location of the MIV had to be placed asymmetrically (Fig. 5a1). Particularly in FinFET, only 24 % area scaling was possible because the same metal line should be occupied in the small cell area.  SRAM GBC connected MIV to the gate bottom, and CPP became the original 42 nm with the symmetric SRAM structure (Fig. 5b1). However, instead of reducing the cell width, the cell height was slightly increased due to the complexity of the top-tier metal lines, especially the Q, QB node (Fig. 5b2). SRAM SDGBC solved this problem by distributing the top-tier Q, QB node to the bottom-tier through an additional metal line (Fig. 4, 5c). As a result, SRAM SDGBC enabled 50 % scaling compared to SRAM 2D regardless of device type and brings a greater scaling advantage in small devices (Fig. 6). Fig. 7 shows the two resistances (R BL and R WL ) and three capacitances (C BL , C WL , and C Q ) that have the most significant impact on the SRAM performance. Interconnect RC values were similar to previous studies, and it could be confirmed that extracted RC were within a reasonable range [16]. Generally, R BL and R WL changed in proportion VOLUME 9, 2021  to cell width and height length (Fig. 7a). Since tCoSFB had a lower resistivity than Ru, SRAM 2D_Cu showed much smaller R BL and R WL values (−30∼-36 %) than SRAM 2D . However, SRAM SDGBC had a smaller R WL value than SRAM 2D_Cu because the cell height was half of SRAM 2D . In terms of capacitance, SRAM 3D had a decreased C BL (Fig. 7b) because the number of M2 lines around BL was reduced from 2 to 1 compared to SRAM 2D (Fig. 4). The decreased WL length reduced C WL , but C Q increased due to the complex top-tier Q, QB node of SRAM 3D . SRAM GBC had reduced C BL and C WL compared to SRAM 3D because it had a small cell width and a small length of M2 lines. In addition, the Q, QB node was simplified by direct contact, thus the C Q was also reduced. SRAM SDGBC had decreased C WL and increased C BL compared to SRAM GBC because the length of the WL was reduced due to a significant decrease in cell height, but the distance between the M2 lines was also significantly reduced. Especially since the top-tier complex Q, QB node was distributed to the bottom-tier, C Q had a smaller value than SRAM 2D . Consequently, SRAM SDGBC had similar or better interconnect RC values than SRAM 2D&3D . Fig. 8 shows RAT and WAT in the ideal (w/o array resistance) and worst case (w/ array resistance). The node potential must change rapidly for a fast access time, and it is essential to have a small interconnect capacitance. According to previous studies, RAT and WAT are greatly affected by C BL and C Q in the ideal case [11]. In the ideal case, SRAM 3D showed a 10.4 % increased RAT because the I eff of the top device was relatively small, despite the C BL was reduced compared to SRAM 2D . Similarly, C BL was significantly reduced in SRAM GBC&SDGBC , but RAT was comparable to SRAM 2D because I eff was small. In WAT, SRAM 3D had a large C Q and small I eff compared to SRAM 2D , so WAT increased by 18 % (Fig. 8b). SRAM SDGBC had a smaller C Q than SRAM 2D , but it had a larger WAT than SRAM 2D due to the small I eff . SRAM 2D and SRAM 2D_Cu had the same access time because array resistance was not considered in the ideal case.

C. M3D SRAM ACCESS TIME AND DYNAMIC POWER
In the worst case, a layout resistance was considered, which greatly affects the performance of SRAM [30]. Thus access time was significantly increased. Especially, RC WL affected the switching speed of the access transistor, and it greatly influenced access time. Therefore, as the RC WL was reduced from SRAM 2D to SRAM SDGBC , it greatly reduced RAT and WAT. Compared to SRAM 2D , SRAM SDGBC improved RAT and WAT by 44 % and 52 % and compared to SRAM 3D by 19 % and 33 %. In particular, SRAM SDGBC even showed a much shorter access time than SRAM 2D_Cu . Through the overall access time tendency, the advantages of GBC and SDBC become remarkable as large interconnect RC. Fig. 9 shows the dynamic write power at 1GHz. In the ideal case, since array resistance was not considered, it consumed much less power than the worst case. Small device current, fast access time, small interconnect capacitance, etc., are conditions for less power consumption. Thus, SRAM SDGBC had the lowest power consumption in both cases, consuming 15.5 % and 11.6 % less power than SRAM 2D . In the ideal case, there was no difference between SRAM 2D and SRAM 2D_Cu , but in the worst case, the power consumption of SRAM 2D_Cu increased slightly.   10 shows components analysis, which is suitable to explain the trend of dynamic write power. Unlike WL and BL, VDD consumed much less power because the potential did not change. Since it was relatively small compared to other components, there was little change in the ideal case (Fig. 10a). In WL power, SRAM SDGBC consumed the least power because the WL power decreased as the C WL became smaller. The interesting thing is, in the worst case, SRAM 2D_Cu had greater WL power than SRAM 2D (Fig. 10b). Since the worst case had a large RC WL , WL switched to another state before it fully reaching the steady state. If switching occurs before reaching the steadystate, the smaller resistance consumes the greater power, thus the SRAM 2D_Cu consumed the largest power. BL power was affected by a decrease in C BL and WAT. Small WAT meant the time of current flow through the access transistor was shortened, resulting in reduced power. Therefore, SRAM SDGBC FIGURE 10. Component analysis of the dynamic write power in ideal case (a) and worst case (b). SRAM SDGBC shows lowest WL, BL, VDD power due to small access time and capacitance. VOLUME 9, 2021 consumed the least power, and SRAM 2D_Cu showed similar values to SRAM 2D . In this study, we calculated only SRAM dynamic power in a 1GHz write operation. However, SRAM SDGBC will show low dynamic power in other cases due to low capacitance and excellent access time.

IV. CONCLUSION
GBC and SDBC schemes with 6T-SRAM were thoroughly analyzed from the PPA perspective. In the device aspect, the GBC Top and SDBC Top had similar I-V and C-V characteristics to Conv Top , and SDBC Top could further reduce C para . In the SRAM aspect, SRAM SDGBC enabled 50 % area scaling compared to SRAM 2D , which was impossible with SRAM 3D . Also, the interconnect RC was significantly improved by optimizing the metal line. In consequence, SRAM SDGBC showed improved access time and dynamic write power than SRAM 3D . Especially, SRAM SDGBC had improved these characteristics even compared to SRAM 2D_Cu except for WAT in the ideal case, despite higher metal resistivity and low top device I on . Namely, GBC and SDBC methods allow the new way to improve PPA and will be promising for M3D technology.