Recently, through-silicon-via (TSV) technology for 3D circuit has been intensively developed so that 3D-SoC (3D-system-on chip) is becoming more likely. One of the most advantageous features expected for 3D-SoC is to decrease interconnect delay, since the interconnect delay increases seriously with CMOS scaling, especially in sub-90 nm CMOS. It means “interconnect-aware design” is much important for performance improvement by 3D-SoC. System level simulation of 3D-SoC [1], [2] suggests that circuit performance is improved mainly by shortening interconnect. However, for such system level study, contribution of shortening interconnect to circuit performance is not clear enough. Hence, it has not been clarified how to decrease interconnect delay effectively. We have deeply analyzed the contribution of interconnect delay on the circuit level from an industrial view firstly, and present an effective layout method, Folding Tape Method, for the 3D circuit layout design based on this analysis.

It is well known that interconnect delay is as large as CMOS FO4 delay in sub-90 nm CMOS. In this paper, we introduce the *“critical length”* to discuss effect of decrease in interconnect delay, where the *critical length* is defined as the length of interconnect whose delay is compatible to the FO4 delay. Fig. 1 shows the *critical length* of interconnects, such as Metal 1 and intermediate interconnect and global interconnect, having repeaters, where delay is calculated using data in ITRS [3]. It shows the critical length of Metal 1 is about 110 um, 70 um and 40 um for 65 nm, 45 nm and 32 nm CMOS, respectively. Note that the *target* interconnects that should be shortened by 3D design to reduce the delay are not short ones but longer ones than the critical length.

Fig. 2 shows frequency of interconnect length, density function of interconnect length, of a CMOS chip with ten million gates in 65 nm CMOS with the interconnect delays as a function of the length. Arrows denote critical lengths for each technology. This calculation uses conventional interconnect distribution theory [4]. Theses figures also show frequency of interconnect in the same chip using normal 3D-layout with simply four divided layers. As shown in Figs. 2, the number of interconnect having critical length is almost the same between 2D layout and 3D layout. Although very long interconnect about 1 mm in length can be shortened by 3D layout, the frequency of such long interconnect is extremely small, compared to that of interconnects having critical length as shown in Figs. 2. To decrease the target interconnect length, the die dimension *Z* has to be decreased to near critical length, and the number of 3D-stacked layers should thus be increased up to more than several ten layers. It is, however, *unrealistic*, considering heat dissipation issue and stacking process cost. Other design method is needed to effectively decrease interconnect delay.

We present a 3D layout method, “Folding Tape (FT)—method”, to decrease the target interconnect length with just several 3D-layers. As depicted in Figs. 3 (a), it starts from the first partitioning, tentative partitioning, on a 2D layout, then it is changed to the second portioning to form 2D-“pseudo-one-dimensional” layout which is called “tape” in this work. Next, it is “folded” like “Origami,” as shown in Fig. 3(b) to (d). Such transformation from 2D layout to 3D layout is valuable, since conventional 2D-CAD tools can be simply used. The tape is folded with each N layers, where N is the number of 3D-stacked layer, from the lowest layer to the highest layer of 3D-circuit and then from the highest layer to the lowest layer one by one, as shown in Fig. 3(d). Then interconnect lying over two layers is shortened by short-cut using TSV, as shown in Fig. 3(e)–(f), thus the final interconnect length is shorter than the folded length in the y-direction, *Lc*. At the same time, repeaters can be removed.

Although other 3D-folding methods by transforming 2D layout like “Origami” [5], [6] are proposed, they are much different from our FT-method, since such 3D-folding method itself does not contribute to shortening interconnect length, and further complex computation for decrease in interconnect and placement & routing is needed to shorten interconnect. Much long computation time is generally needed for such complex computation compared to our simple method. For the longer interconnect lying over more than 3 layers as shown in Fig. 3(f), its final length is also less than *Lc*. For the longer interconnect lying over more than *N* layers (originally longer than *N* × *Lc*), its final length is less than 2 × *Lc*, which means very long interconnect cannot be so shortened as interconnect shorter than *N* × *Lc*. However, since the number of such a very long interconnect is much small as described before, it does not affect decrease in interconnect delay of the chip.

Using notations as defined in Fig. 4, average of interconnect length shortened by the FT-method is a function of interconnect length, *L*, and partial block size, *L*_{c}, and it can be expressed as follows:
TeX Source
$$\int^{Lc}_{0}(L + y - Lc) \cdot dy/Lc ={L^{2} \over 2Lc}\eqno{(1)}$$

Using equation (1) and distribution theory of interconnect length used above, relative frequency of interconnect length on the 3D-layout using FT-method can be calculated. Fig. 5 shows the calculation results, where the *critical length* is selected for *Lc* for each CMOS technology and 3D design uses five stacked layers (N = 5).

Fig. 5 indicates that the number of interconnect longer than the *critical length* has been dramatically reduced to less than several % of that for 2D. To analyze contribution of decreasing interconnect length to the chip performance, we introduce the *interconnect delay factor* that is defined as the product of frequency and interconnect delay. The *interconnect delay factor* is thought to be strongly related to the total delay of the chip. Fig. 6 shows the *interconnect delay factor* for 65 nm CMOS technology. It indicates that the *delay factor* around the critical length is decreased by more than 30 times due to FT-method. Thus, it has been confirmed that FT-method is much effective to reduce the interconnect delay, even though the number of 3D-stacked layers is small.

Fig. 7 depicts an ideal layout for CMOS logic circuit with TSV on a layer designed by FT-method. It suggests that area overhead of TSV is negligible, if the TSV diameter is smaller than distance between V_{DD} line and GND line. For a 45 nm CMOS case, when the diameter of TSV is less than 0.5 um, 3D-circuit does not have any area overhead. Also, the density and pitch of TSV needed for this case is about 0.12 um^{− 2} and 8.1 um. Such a small and high-density TSV can be fabricated by diverting fabrication technology for a deep trench capacitor of e-DRAM (e.g., diameter: 0.2 um, depth: 8 um @ 90 nm CMOS), and similar dimension of TSV have been already fabricated in [7]. If some area overhead is accepted, larger TSV can be used.

Using the FT-method, about 95% repeaters can be removed for a 45 nm CMOS case calculated from results in Fig. 6, as shown in Fig. 8. The repeaters thus removed makes up about 10% of all logic gates in the chip, and area and power reduction by 10% can also be expected. To our best knowledge, this is the most effective and simple method to effectively reduce interconnect delay among those ever reported.

Based on interconnect delay-aware design, or critical length-aware design, CMOS digital circuit for which 3D circuit design is effective to reduce interconnect delay can be clarified. Fig. 9 shows relative interconnect delay compared to FO4 gate delay as a function of interconnect length. In this graph, area for typical digital CMOS circuits is also plotted, which indicates longest interconnect containing in each digital circuits. These show that small circuits like standard cells (NAND, NOR, XOR, Flip-flop..) are not targets for 3D circuit design. Even 16-bit counters and multipliers are not the targets, as the reduction of interconnect delay by 3D circuit design is negligible. Large circuit blocks like processor element (PE), FFT, general SoC, Processor and CMOS contain long interconnect whose delay is not neglected compared to gate delay, as shown in Fig. 8.

In conclusion, since decrease in interconnect delay is the most valuable effect for 3D-SoC, “interconnect-delay aware design” is needed. We have systematically analyzed interconnect-delay in sub-90 nm CMOS and compared with gate delay for each CMOS generation, suggesting importance of critical length aware design. Based on the analysis, we have presented new design method: Folding Tape Method (FTM) to shorten interconnects with critical length. Improvement of interconnect delay by FTM and decrease in repeaters with small overhead has been shown. Thus, performance improvement of 3D circuits strongly depends on the design.