Browse

• Abstract

SECTION I

## INTRODUCTION

One of the remarkable technological achievements of the last 50 years is the integrated circuit. Built upon previous decades of research in materials and solid state electronics, and beginning with the pioneering work of Jack Kilby and Robert Noyce, the capabilities of integrated circuits have grown at an exponential rate. Codified as Moore's law, integrated circuit technology has had and continues to have a transformative impact on society. This paper endeavors to describe Moore's law for complementary metal–oxide–semiconductor (CMOS) technology, examine its limits, conider some of the alternative future pathways for CMOS, and discuss some of the recent proposals for successor CMOS technologies. In the spirit of the editorial guidance for this issue, an analysis of the living cell as an information processor is offered and estimates of its performance are given. For comparison, an equal volume CMOS cell is postulated, equipped with extremely scaled technologies, and performance estimates are generated. Indications are that the living cell is architected and operates in such a way that it is extraordinarily energy efficient relative to the performance of the comparison CMOS cell. This analysis is offered with the hope that it will encourage radical rethinking of possible future information processing technologies.

SECTION II

## BENEFITS OF SCALING: MOORE'S LAW FOR SEMICONDUCTORS

In 1965, Gordon Moore [1] observed that the number of transistors on a chip could be expected to double annually for at least ten years. At different time points in the ensuing decades, it has appeared that doubling time has varied from 18 months to three years. Overall, however, the chip transistor count has continued to increase, and conversely, the size of each transistor has decreased, at an amazing rate, and Gordon Moore's postulate became known as Moore's law. Fig. 1 is a plot of transistor count for a variety of microprocessor chips (listed in Table 1) versus time. Moore was indeed prescient for the transistor count, on average, has doubled approximately every two years.

Fig. 1. The number of transistors per microprocessor chip versus time, showing introduction of new enabling technologies.
TABLE 1 Microprocessor Data Used to Create Fig. 1

As feature sizes have decreased, the real density of transistors has correspondingly increased supporting either more functionality for a given chip size or the reduction in chip size to obtain a given level of functionality. The latter benefit enabled the fabrication of more chips per wafer and thus continued cost reduction trends. Cost reductions have also resulted from increased wafer sizes, again allowing the production of more chips/wafer. Also the individual transistor switching time and energy decrease with feature size scaling.

Minimum feature sizes circa 1980 were on the order of 3 $\mu$ m, while today they are about 32 nm (a 100-fold decrease), giving four orders of magnitude increase in device density. Supply voltages in this time frame have decreased from five to one volt in an effort to reduce power consumption.

It has been the history of the semiconductor industry that, as obstacles are encountered, scientific and engineering solutions are developed to continue the cadence more or less as indicated by Moore's law. In the 1990s, it became evident that scaling was encountering a number of barriers including increasing interconnect power consumption, transistors that were consuming increased power in their off state, etc. This led to the search for new material systems and associated processes to sustain the growth in transistor counts that was providing increasing performance and functionality for the electronics and other industries. An example of technology innovation was the introduction of copper interconnects to replace aluminum-based interconnects on chip [2], [3]. Initially, this was viewed as a difficult task since copper diffuses in silicon (Si) and can be detrimental to metal–oxide–semiconductor field-effect transistor (MOSFET) performance. However, barrier systems were developed so that the higher conductivity of copper could be exploited. As another example, the decrease in the gate oxide thickness to a few nanometers was leading to increased gate leakage currents and higher off state power consumption. Research led to the incorporation of new gate materials with a higher dielectric constant (e.g., hafnium oxide) so that drive capacitance could be maintained and tunneling reduced [4], [5]. Due to the incompatibility of the high- $k$ dielectric with the traditional polisilicon gate, a new metal gate technology was introduced [5]. In order to increase channel mobility, new strained Si channel technologies were developed [6]. Looking ahead, the pace of innovation continues with, for example, research to determine if higher channel mobility might be achieved by introducing compound III-V channel materials into the Si MOSFET [7]. Even the structure of the MOSFET is under renewed consideration, e.g., the different variations of the multiple-gate FET devices [8], e.g., Trigate, are now being introduced into production. These innovations have continued to provide increases in integrated circuit performance [9].

One indicator of the ultimate performance of an information processor, realized as an interconnected system of binary switches, is the maximum binary throughput (BIT), that is, the maximum number of on-chip binary transitions per unit time. It is the product of the number of devices $M$ with the clock frequency of the microprocessor $f$ TeX Source $$\beta=Mf.\eqno{\hbox{(1)}}$$(Note that $\beta$ is an aggregate indicator of technology capability.)

The computational performance of microprocessors $\mu$ is often measured in (millions) of instructions per second (IPSs) that can be executed against a standard set of benchmarks. There is a strong correlation between system capability for IPS $(\mu)$ and the binary throughput $\beta$, as shown in Fig. 5, and to a good approximation TeX Source $$\mu=f(\beta)=k\beta^{p}.\eqno{\hbox{(2)}}$$

For the selected class of microprocessors, $k\sim 0.1$ and $p\sim 0.64$ with a high degree of accuracy (the determination coefficient ${R}^{2}=0.98$). This strong correlation suggests a possible fundamental law behind the empirical observation.

Fig. 2 also shows an estimated capability of the human brain in ${\mmb\mu}$${\mmb\beta} metrics. While it is difficult to quantify the brain operations, there have been several attempts to estimate computational performance of the brain. In [10], an estimate of equivalent binary transitions was made from the analysis of the control function of brain: the equivalent number of binary transitions to support language, deliberate movements, information-controlled functions of the organs, hormone system, etc., resulting in an “effective” binary throughput of the brain \beta\sim 1019 b/s. An estimate of the number of equivalent IPSs was made in [11] from the analysis of brain image processing capability resulting in \mu\sim {\hbox {10}}^{14} IPSs. It is clear that the brain is not on the microprocessor trajectory in Fig. 2, giving rise to the hope that there may exist alternate technologies and computing architectures offering higher performance (at much lower levels of energy consumption). In the following sections, some of such technologies, complementary and/or alternative to CMOS circuitry, will be discussed in detail. Fig. 2. Benchmark capability \mu (instructions per second) as a function of \beta (bits per second). Without customers for the increased volume of transistors and chips, there would be little incentive to continue to drive scaling at an exponential pace. The 300 billion dollar semiconductor industry (circa 2011) provides essential components for the much larger electronics devices and systems industries as well as automotive, entertainment, medical device, and other technology-based industries. One effect of the rapidly increasing capability of integrated circuit systems is to support rapid growth in functionality and this has translated into a feature-driven market. That is, electronics customers, as a rule, replace their older electronic systems well before they are no longer useful because the newer devices offer a compelling increase in capability. This contrasts with many other industries where purchases are driven by the need for replacement at the end of the useful life of the product. The net effect is that electronic products have been part of a closed cycle where increasing markets have supported the capability of industry to invest in the continuous reduction of semiconductor costs via the introduction of new technologies. Moore's law is actually connected to a more fundamental premise, known as the learning curve, which relates decreases in the price of product to corresponding increases in the volume of production of that product. The learning curve for semiconductors has shown much less volatility over the years than Moore's law. The learning curve states that the cost per unit decreases by a fixed percent every time total cumulative volume doubles. This is applicable across a wide range of products but the striking difference for semiconductors is that the rate of decrease in cost per unit for semiconductors has occurred at a much higher rate than for many other industries. Figs. 3 and 4 (courtesy of W. Rhynes of Mentor Graphics [12]) illustrate the learning curve for transistors and for microprocessors. Fig. 3. Transistor cost as a function of the cumulative number of transistors shipped [12]. Fig. 4. Personal computer cost (inflation adjusted) per millions of instructions per second versus cumulative units shipped [12]. Note from Fig. 3 that the per-transistor cost decreases by approximately a factor of two for every doubling of transistor production. To put this in a temporal perspective, the average compound annual rate of cost reduction for transistors is on the order of 35% per year. In Fig. 3, the cost per millions of instructions per second (MIPS) for personal computers has shown an even more dramatic rate of cost decrease—a reduction on the order of a factor of nine in cost-per-MIP every two years. It is interesting to contemplate whether integrated circuits will be able to sustain these steep learning curves indefinitely. There exist formidable technical and economic challenges to doing so and some of these are considered in this paper. A good example is the search for new lithographic tools that can provide efficient and cost-effective patterning for features less than 10 nm in size. There is a focus today on extreme ultraviolet lithography to replace optical lithography but this technology is not yet production ready. At the same time, research in directed self-assembly [13], [14] continues to make good progress and offers the hope for some relief for optical methods. Moreover, there are clear and fundamental limits for the scaling of electron-based devices [15]. In spite of directed research programs that seek to provide alternatives to the MOSFET switch for logic applications where other nonelectron representations for information might be used, such as the Nanoelectronics Research Initiative (www.src.org/program/nri), no compelling replacement options have yet been identified [16], [17]. It appears that for many of the proposed alternative devices, their unique properties might be utilized to advantage to achieve a special function when integrated with CMOS technologies. On the other hand, continuing research in alternative memory technologies has resulted in the identification of potential replacements offering the potential for smaller sizes that could meet information processing performance specifications. Indeed, progress in memory technology may foretell changes in memory architectures for information processing and could support an increased focus on data-centric as opposed to logic-centric processing. Nevertheless, what will happen when we reach scaling limits for CMOS-like technologies? Can the learning curve shown above be continued at the same rates as have been sustained to date? Continuation of the semiconductor learning curve beyond the end of scaling rests on several factors. 1) It is essential to sustain an ever-broadening applications space for integrated circuits since this provides the revenue base for advances in semiconductor technology. 2) As scaling of features becomes more difficult, it will be necessary for advances in design, architectures, 3-D packaging, etc., to play an increased role in cost reduction. 3) Parallel fabrication to decrease manufacturing costs per unit transistor needs to be emphasized, e.g., by increasing wafer size [18]. In this paper, possibilities for continuing the semiconductor virtuous cycle are explored. The perspective is that the future for semiconductor technologies is very bright, even as we face scaling limits, primarily because the opportunities for integration of new functionalities with CMOS is at an early point and the possibilities for expanding applications incorporating new on-chip physical domains (e.g., mechanical, thermal, chemical, optical) of operation is just beginning. Examples include the integration of sensors that respond to a wide range of stimuli, new architectures that can reason from data leading to integrated systems that can assess and respond, inclusion of devices operating in new physical domains to increase the energy efficiency and performance of information processing systems, the introduction of 3-D packaging technologies, etc. All of these opportunities will require advances in science and engineering, but this is nature of the semiconductor enterprise. SECTION III ## “MORE MOORE”: EXTREMELY SCALED CMOS LOGIC AND MEMORY DEVICES In order to assess the performance characteristics for extremely scaled transistors and memory cells, it is instructive to consider the generic structure and the physical layout of a transistor (binary switch) and a nonvolatile memory cell. The approach taken is to utilize simple physical and geometrical models to make evident the essential performance characteristics of FETs at the limits of scaling. ### A. Electronic Switch (FET) An energy barrier is used to control electron transport in FETs. The barrier can be formed, e.g., by doping that creates built-in charges in the barrier (channel) region, as shown in Fig. 5(a). The height and the width of the barrier determine essential operational characteristics of transistors such as device size, switching speed, operating voltage, off leakage current, etc. In order to control the barrier height, a gate electrode is coupled to the barrier region, separated by a thin layer of gate insulator, e.g., SiO2 or HfO2. When a voltage is applied to the gate, an electric field is created between the gate and the barrier region. This electric field changes the barrier height, thus allowing electrons to pass through the channel. The width of the barrier is defined by device fabrication, and is represented by geometrical characteristics such as channel length L_{ch} or gate length L_{g}. In the following, it will be assumed L_{ch}\approx L_{g}\approx F, where F is the critical feature size. In order to retain gate control with scaling, it is necessary to decrease the gate insulator thickness T_{ox} proportionally to the decrease of the channel length. For an optimized FET structure, a rule of thumb suggests T_{ox}/L_{ch}\sim 1/30 [19]. The device platform for modern microelectronics is known as MOSFET. Fig. 5. Semiconductor FET: (a) materials system; (b) generic floorplan; (c) scaling limits; and (d) connected binary switches. The barrier representation of a binary switch (e.g., FET) shown in Fig. 5(a) also suggests a generic topology for the ultimately scaled device [Fig. 5(b)]. The 2-D floor plan of a smallest possible binary switch is a 3F\times F rectangle consisting of three square “tiles” of the same size F (representing the source, channel, and drain regions of the MOSFET). Further, it can be assumed that the associated insulator and metal layout elements are also composed of tiles of minimum size F. Finally, the metal interconnects, which connect individual devices in more complex logic circuits, can also be represented as a combination of the square tiles, as shown in Fig. 5(b). It is straightforward to show from both topology and physics considerations that in the limiting case, it is useful to consider the size of the interconnect tile as equal to the device tile F. The tiling framework is a useful tool for circuit/system physical-level explorations of different scenarios of extreme scaling.1 A detailed treatment of the tiling framework can be found in [20], and two important examples will be considered in Section III-C. One fundamental issue that limits physical scaling of MOSFETs, and therefore the minimum tile size F, is quantum mechanical tunneling, which dramatically increases the off leakage current, as depicted in Fig. 5(c). A simple estimate of the tunneling limit can be made using the Heisenberg relation [see numerical insert in Fig. 5(c)]. For typical parameters for Si FET, this estimate results in a minimum channel length of \sim4 nm. More detailed calculations yield similar findings—it is argued in a number of studies that tunneling off-state leakage becomes overwhelming for L_{ch}= 4–7 nm, and this is sometimes cited as the “ultimate” FET [21]. These assessments are consistent with the International Technology Roadmap for Semiconductors (ITRS) [22], which projected the minimal physical gate length in high-performance logic FET to be in the range from 4.5 nm (2007 ITRS) to 5.9 nm (2011 ITRS). Note that the result for the smallest channel length in Fig. 5(c) depends on the mass of the information-bearing particles, e.g., the effective mass of electrons in Si. Heavier particle mass could, in principle, allow for further scaling. One approach to deal with the severe leakage in a scaled transistor is to develop families of FETs, optimized for specific applications. For example, if highest switching speed is the goal, the smallest channel length and therefore thinnest gate dielectrics are required. As a result, the leakage can be relatively high, which still could be tolerated in some applications. On the other hand, in other applications, such as, e.g., mobile devices, standby power minimization is mandatory. This can be achieved by increasing L_{g} and T_{ox}, and thus giving up some performance and device density. A set of parameters projected for extremely scaled transistors developed by ITRS is shown in Tables 2 and 3 for high-performance and low standby power transistors, respectively. TABLE 2 ITRS Performance Projections for Extremely Scaled High-Performance FETs (2007 ITRS Edition [22]) TABLE 3 ITRS Performance Projections for Extremely Scaled Low Standby Power FETs (2007 ITRS Edition [22]) ### B. Nonvolatile Electronic Memory In memory cells that store electron charge, such as flash, dynamic random-access memory (DRAM), or static random-access memory (SRAM), two distinguishable states 0 and 1 are created by the presence (e.g., state 0) or absence (e.g., state 1) of electrons in a specific location (the charge storage node). In order to prevent losses of the stored charge, the storage node is defined by energy barriers of sufficient height E_{b} to retain charge (as shown in Fig. 6). The properties of the barrier, i.e., barrier height E_{b} and width a, determine the retention time of a memory cell. Fig. 6. The two-energy-barrier model for a memory cell: (a) the principle of storage and sensing; (b) write operation; and (c) read operation. In order to obtain a nonvolatile memory cell, sufficiently high barriers must be created to retain the charge for a long period of time. As it was argued in [23], for > 10 y retention, the barrier height E_{b} must be more than \sim1.7 eV. High barriers are formed by using layers of insulator (I), which surround a metallic storage node (M). Such an I–M–I structure forms the storage node in the floating gate cell, the basic element of flash memory. The barrier height E_{b} is a material-specific property [see a table insert in Fig. 6(a)]. As shown in Fig. 6(a), the stored electrons can “leak” from the storage node either over the barrier (if the barrier is not sufficiently high), resulting in leakage current I_{o-b}, or by tunneling through the barrier (if the barrier is not sufficiently wide), thus resulting in leakage current I_{T}. For long retention (e.g., > 10 y) the theoretical barrier width must be > 5 nm for all known dielectric materials (typically > 7 nm in practical devices). The corresponding practical minimum size of the floating gate cell is \sim10 nm [23]. The requirement for a large barrier insulator height and thickness also results in a fundamentally high operating voltage, both for write and read. For example, during the write operation, electrons are injected into the storage node, and this requires operation in the Fowler–Nordheim (F-N) tunneling regime for faster injection. The condition for F-N tunneling is eV_{b}>E_{b}, i.e., the potential difference across the barrier between the storage node and the external contact must be larger than the barrier height. Since the storage node is isolated from the external contacts by two barriers (i.e., it is floating), this requires the total write voltage applied to the opposite external contacts of the memory cell to be more than the doubled barrier height: eV_{\rm write}>2E_{b}, as shown in Fig. 6(b). (A symmetric barrier structure is assumed.) Thus, the floating gate structure inherently requires high voltage for the write operation: for example, for SiO2 barriers, {V}_{\rm writemin}> 6 V, and the write voltage should be > 10–15 V for faster (\sim\!\hbox{ms-}\mu \hbox{s}) operations. The presence or absence of stored electric charge in the storage can be detected by an electrometer type device [shown schematically in Fig. 6(a)]. The sensing device should be in immediate proximity to the storage node. A FET is commonly used as a sensor, and a complete nonvolatile floating gate memory cell consists of a stack of metallic and insulating layers on the top of a FET channel, as shown in Fig. 6(b). The sensing FET is controlled by the voltage V_{\rm read} applied to an external electrode, the control gate. The source-drain current of the FET depends on the presence or absence of charge in the floating gate, thus the memory state can be sensed by measuring the FET current. The control gate allows modulation of the semiconductor channel of the FET by external commands, similarly to the logic FET. However, differently from a conventional transistor, the degree of accessibility of the channel from the control gate is rather limited. First, the control gate is physically far from the channel, since the minimal thickness of both top and bottom dielectric layers is large due to the retention requirements, and the minimal thickness of the insulator stack is > 10 nm. Second, the control gate affects the channel only indirectly, as the floating gate lies between the control gate and the channel. Therefore, a large read voltage must be applied to the control gate for reliable on/off transitions of the sense transistor. The maximum read voltage is however limited by the condition for the F-N tunneling discussed above, and for the read operation, it is eV_{\rm read}\ <\ 2E_{b}, for example, for SiO2 barriers, V_{\rm readmax}\ <\ 6\ \hbox{V} for a nondisturbing read. In practice, a typical read voltage is 4.5–5 V. ### C. 2-D and 3-D Layouts of Logic and Memory Circuits Binary switches in logic circuits will be assumed to be isolated, thus allowing for arbitrary wiring. From the tiling consideration, the most compact layout for an array of isolated devices (assuming at least one tile between each device for insulation) results in maximum packing density of binary switches on a 2-D plane [Fig. 7(a)] TeX Source$$n_{L}={1\over 8F^{2}}.\eqno{\hbox{(3a)}}$$In the following, (3a) will be assumed as the device density in the logic layout. Next, interconnects need to be added. To estimate the minimum number of interconnect tiles per device, assume that in a three-terminal device, for each terminal, at least one “contacting” interconnect tile (three total) is needed and one “connecting” interconnect tile (three total) is needed. This results in six interconnect tiles per binary switch. Including the contacting tiles would result in eight interconnect tiles per switch. Thus, the average interconnect length obtained from the tiling consideration is \langle L\rangle=(6-8)F. This estimate is consistent with the wire-length distribution analysis in practical microprocessors [24]. For this densest arrangement, at least three additional layers of interconnects would be needed. Fig. 7. Representation for maximum device density for (a) logic and (b) memory circuits. Memory cells are typically organized in regular X$$Y$ arrays, thus only simple regular wiring is needed. Regularly wired memory cells in an array can be connected in series, thereby enabling higher packing density, as shown in Fig. 7(b) TeX Source $$n_{M}={1\over 4F^{2}}.\eqno{\hbox{(3b)}}$$[The serial connection of Fig. 7(b) represents a nand array, the typical array architecture of mainstream flash memory products.]

The tiling framework provides a methodology to estimate the average energy per bit in an arbitrary logic circuit. As was argued in [25], at the limits of scaling, the energy per tile is nearly the same for both devices and interconnect tiles and approximately equals to the device switching energy ($E_{sw}$ in Tables 2 and 3). For the total number of tiles $k$ (devices and interconnects), the average switching energy per bit is TeX Source $$E_{\rm bit}(k)={1\over 2}k\cdot E_{sw}.\eqno{\hbox{(4a)}}$$(The factor 1/2 originates from the assumed 50% activity factor.)

Now assuming the average interconnect length $\langle L \rangle=6F$, the total number of tiles per device $k=3+6=9$, then from (4a), we obtain TeX Source $$E_{\rm bit}={9\over 2}E_{sw}.\eqno{\hbox{(4b)}}$$

The total dynamic energy consumption by a circuit of $N$ binary switches will be $NE_{\rm bit}$. Correspondingly, the energy dissipated by transistors themselves (without interconnects) is $NE_{sw}$. It follows from (4b) that the ratio of transistor energy use to the total energy consumed by a logic circuit constitute about 2/9 or 22% of the total dynamic energy consumption, which is consistent with the energy breakdown analysis in practical microprocessor chips [26].

For memory arrays, due to the regular wiring, in many instances, the properties of interconnecting array wires determine the operational characteristics of the memory system. A given cell in an array is selected (e.g., for read operation) by applying appropriate signals to both interconnect lines, thus charging them. The relatively large operating voltage of flash results in rather large line charging energy $\sim\! C_{\rm line}V^{2}$, where $C_{\rm line}$ is the line capacitance. For $F=$ 10 nm and a 128 × 128 array, the line capacitance is $\sim\!\! 10^{-14}$ F [27]. If there is a random access read with $V_{\rm read}\sim 5\ \hbox{V}$, there results an energy per line access (and therefore for random access operation) of $\sim$10-13 J, or $\sim$10-15 J/bit for serial access. For write operation with $V_{\rm write}\sim 15\ \hbox{V}$, the write energy is $\sim$10-12 J/line. (In practical flash memory devices, the read energy is of the order of 10-13–10-11 J/bit read and 10-9–10-10 J/bit write [28], [29].)

A model for a tightly integrated 3-D system is useful as a conceptual tool in estimating ultimate performance of CMOS systems. The limits for the 3-D integration can be conceived using the methodology for stacking of 3-D tiles. For example, in logic circuits, the thickness of the FET layer is $\sim$3 F (including vertical extension due to gate and 1/2 interlayer insulation from each side) and the interconnect layer thickness 2 F (with insulation). As was mentioned above, for this densest arrangement at least three additional layers of interconnects would be needed. Thus, the resulting thickness of one logic circuit layer is 9 F. In memory circuits, the thickness of one layer can be 6 F, which includes a layer of array grid interconnects and interlayer insulation. Taking into account (3a) and (3b) obtain limiting 3-D density for logic to be $1/72F^{3}$ and for memory $-1/24F^{3}$. Table 4 summarizes the essential parameters of CMOS logic and memory devices in the limits of scaling. A question that arises is whether alternative technologies exist that could offer further improvements beyond CMOS. Some examples are considered in Section V.

TABLE 4 “Ultimate CMOS”: Limiting Density and Energetics
SECTION IV

## NOT JUST MOORE (MORE THAN MOORE)

The possibility to extend the functionality of CMOS circuits by integration with other technologies has been referred to as “more than Moore” [30]. An example that has already encountered an extraordinary market success in the last few years is provided by CMOS imagers which can be found in any cell phone camera [31]. There Si photodetectors or phototransistors constitute the optical sensors, which are monolithically integrated on a CMOS chip [32]. Another multifunctional combination within Si technology is provided by the integration of microelectromechanical system (MEMS) devices with CMOS [33]. In this case, hybrid integration is already available at a prototype level; monolithic integration will follow once issues related to thermal mismatch between the two processes are solved. Along the same line, digital micromirror devices (MDMs), micrometer size mirror realized on top of a CMOS circuit which controls their 3-D movement, are already used in projectors and TV sets [34]. A more ambitious path will be the integration not just of different functions within one material system, but also the integration of different technologies, as, for instance, Si and III-V semiconductors. Several examples can be provided to indicate how relevant such integration would be. As one looks at the fact that interconnection delay is already the bottleneck for the speed of an integrated circuit, it is clear that optical links would provide an obvious solution. Nonlinear optical passive components can be fabricated in Si technology, which could provide guiding, routing, and other optical functionality directly on chip. Light sources though are still a domain of compound semiconductors like GaAs and InP. Thus, the integration of light-emitting diodes (LEDs) or lasers realized with such materials is necessary. Lattice mismatch of the semiconductors, thermal mismatch of the processes, and fabrication compatibility pose serious challenges to integration, even at a hybrid level. Beside optics and radio frequency (RF), the integration with CMOS of other functions such as sensing, biological screening, or information storage can be of great interest. A review of some “more than Moore” devices and a brief discussion of the open challenges are provided below. Such a concept is nicely summarized in the scheme of a possible future chip shown in Fig. 8, which would be based on CMOS technology but would incorporate several other functionalities coming from alternative technologies [35]. The integration can be achieved directly on chip, requiring that the new technologies be fully compatible with CMOS. This is referred to as a “system-on-chip” approach. Alternatively, a 3-D integration is possible, where several chips, possibly realized with different technologies are stacked on top of each other (“system on package”).

Fig. 8. Illustration of the integration of many technologies on a single CMOS substrate [35].

### A. RF Technologies

Wireless communication has witnessed an unprecedented (and maybe unexpected) development in recent years. It is therefore more and more important to effectively bridge the two worlds of digital information processing and of RF transmission. If one looks at a modern cellular phone, a variety of specialized modules are present which perform specific tasks. Ideally, one unique chip should incorporate all necessary functionalities. RF functions are provided by antennas, filters, switches, and converters, which in turn use RF transistors, mechanical filters, and other active or passive devices. Many new materials and concepts have been introduced recently, which might have an important impact on RF applications in the future. In addition, they might help in the integration with CMOS technology.

MEMS technology has reached maturity and commercial success in recent years thanks to its application in electronic components for automotive and mobile communication [36]. One of its appealing features is the full compatibility with CMOS technology. By scaling MEMS to the nanoscale, further RF functionalities could be reached (e.g., nanoelectromechanical system (NEMS) resonators). Traditionally, some RF components such as the quartz crystals used in the reference oscillator are kept off-chip. In fact, integration leads to very poor quality factors and temperature instability, mainly due to the poor performance of the integrated inductors and capacitors. The best opportunities for miniaturization and integration of reference oscillators are provided by capacitively transduced microelectromechanical and nanoelectromechanical (MEM/NEM) resonators, which have reached in recent years operating frequencies of several gigahertz. Scaling to nanometer dimensions poses several problems connected, e.g., to fluctuations, friction, and dissipation mechanisms at the nanoscale. Nanostructures have been used and demonstrated in NEMS such as platinum [37] or Si [38] nanowires and carbon nanotubes (CNTs) [39], [40]. They are appealing due to their high stiffness, low density, defect-free structure, and ultrasmall cross section. Recently, graphene material has also attracted considerable attention for these applications [41].

The signal produced by this class of devices is very small and impedance matching can be a problem. One alternative is provided by movable gate transistors, which combine the advantage of a vibrating micro/nanostructure with the large output signal provided by the transistor drain current. Highly scaled versions of the in-plane resonant gate transistors with a front-end process have been reported based on silicon-on-nothing technology [42]. However, the sub-100-nm gaps and 400-nm-thick single crystal resonators suffered from the poor electron mobility. Nevertheless, full CMOS compatibility is guaranteed. An alternative structure has been proposed, the vibrating-body FET, where a combined effect due to modulation of the carrier density and of the piezoresistance in the channel is achieved. Si nanowires are the ideal channel material due to their pronounced piezoelectric properties.

Another appealing candidate for an RF oscillator that is fully CMOS compatible is provided by spin transfer torque effects [43]. In nanosized magnetic multilayer structures, metallic spin valves and magnetic tunnel junctions can drive uniform precession of the free-layer magnetization under an external input (provided by a magnetic field or an electrical current). This precession produces voltage responses that make those magnetic multilayers high-frequency spin torque oscillators. Oscillation frequencies ranging from several hundred megahertz to tens of gigahertz have been demonstrated [44]. The challenges that need to be overcome for practical applications include 1) reaching output powers in the milliwatt range, 2) improving the spectral purity of the oscillator, and 3) realizing auto-oscillating structures, thus eliminating the need for external magnetic fields.

RF mixers are an important building block of an RF front end. One candidate technology is the resonant tunneling diode [45]. Another candidate that should have provided low-noise solutions at RF, the single-electron transistor, could only demonstrate interesting performance at low temperature [46]. In principle, carbon-based devices possess the necessary nonlinearities in electrical characteristics to demodulate an AM signal [47], [48]. The main challenge consists, once again, in the integration of such technologies with CMOS [49].

It could be mentioned here that an alternative technology for RF applications (at least up to few megahertz) has been developed in recent years, based on polymeric devices and circuits. The advantages of such components are the low cost of their fabrication, the independence on the type of substrate used, and the possibility of large area manufacturing. Full organic radio-frequency identification (RFID) circuits including electronics and antennas have been realized. Such circuits include more than 1000 organic thin film transistors (OTFTs) and can operate up to 13.5 MHz [50]. Organic electronics [51] will not compete in speed with CMOS-based solutions, since the low mobility of organic semiconductors (typically below 1 cm2/VS) limits the maximum achievable frequency. Nevertheless, applications such as active matrix backbones for OLED display, ultralow-cost RFID systems, or biocompatible/disposable sensors can be envisaged.

### B. Optical Technologies

As mentioned earlier, interconnections have become one of the limiting factors of the speed of integrated circuits. Moving to optical interconnect would greatly enhance the available bandwidth, reduce the heat dissipation on-chip, and assure immunity from electrical interference. Si or Si-compatible optical components have long been demonstrated [52], [53]. Attempts to obtain efficient Si-based light sources have instead not been very successful. Si-based micro/nanostructured devices have been introduced [54]. Considerable interest has been attracted by the demonstration of a Si laser based on the Raman effect [55]. Currently, only optical pumping has been demonstrated. For any realistic application, electrical pumping has to be achieved. The field of passive components (waveguides, filters, connectors) has witnessed considerable advances with the discovery of photonic bandgap structures [56]. Nanotechnology has allowed researchers to realize 3-D, 2-D, or 1-D periodic structures with tailored spectral transmission properties. By inserting defects into an otherwise perfectly symmetric lattice, it is also possible to deflect a light beam over distances of a few nanometers. Thus, an unconventional way to guide and deflect light on a chip can be fabricated with unprecedented properties [57]. Furthermore, by exploiting the properties of surface plasmons in quantum dots structures, the absorption and/or transmission properties of materials and surfaces can be engineered [58]. Being based on Si or on Si-compatible materials, such optical components can be easily integrated with CMOS.

Another optical component which is fully based on Si technology is the CMOS imager, which has witnessed considerable commercial success in recent years mostly due to their use in camera phones. Keys to this success have been features inherent to CMOS technology, such as size, weight, power consumption, mechanical robustness, and price. Two challenges remain on the agenda: the extension of the detectable range, especially to infrared, and scaling of the optical components to keep pace with the CMOS miniaturization.

Typically, CMOS imagers employ a Si photodetector or phototransistor as optical sensor. Spectrally, the sensitivity of such components is limited to the visible range because of the Si energy gap. In order to move into the infrared and far infrared range, which is very attractive for a variety of applications in the fields of security, screening, and environmental monitoring, a possibility is to use small gap semiconductor materials. Photodetectors and imagers using, e.g., InAs or CdTe have been demonstrated and even commercialized [59]. The main problems with such technologies are on the one side the need to cool down the detector in order to have an acceptable noise level and, on the other hand, the difficulty to integrate III-V and II-VI materials with CMOS. A different approach uses a different concept. Rather than converting the optical signal into an electrical one via creation of electron-hole pairs following photon absorption, one can use the change is some property of the sensing material, for instance, resistance or temperature, under illumination. The most successful component of this type is the bolometer, which can be structured into arrays and driven by CMOS circuitry. A thorough review of infrared detector technologies can be found in [60]. Infrared uncooled cameras based on microbolometers integrated on CMOS are available on the market. They provide acceptable performance but they are quite expensive.

Photodetectors can be realized also using conductive polymers [61]. The advantages of such materials are: 1) the possibility to realize devices, circuits, and systems on large area substrates, which in turn can be of different nature (e.g., glass, plastic, textile, and paper); and 2) the reduced fabrication cost since the material can be processed from solutions. Thus, cheap preparation techniques such as ink jet printing, spin coating, and spray casting can be used [62]. The component that has received great attention not only in the research field but also on the market is the organic light-emitting diode (OLED) [63]. In fact, OLED displays have been introduced in several cellular and smartphones. This is the first time that an organic device, based on conductive polymers, has entered a large volume market. Besides OLEDs, other electronic and optoelectronic organic components have been demonstrated [64]. Organic solar cells based on blends of conducting polymers have also been fabricated with roll-to-roll processes, displaying a conversion efficiency of few percent [65]. Due to the flexibility in fabrication methods, organic photodetectors (OPDs) can be integrated onto CMOS. Inverted structures for OPDs that can directly be fabricated on CMOS as end-of-the-line process have been demonstrated [66]. CMOS imagers with OPD active pixels would guarantee a much larger fill factor with respect to Si photodetectors. In connection to IR imagers, either low gap polymers [67] or hybrid system combining polymers with quantum dots [68] have displayed room temperature sensitivity in this wavelength range. Their use for hybrid CMOS imagers would allow the realization of IR imagers at a cost comparable to conventional CMOS imagers.

Concerning pixel reduction, current technology allows for 2.2-$\mu$ m pixel pitch, and demonstrations exist for 1.7 $\mu$ m. Pixel size reduction in active pixel sensors is crucial since it leads to higher numbers of pixels at almost constant price. A further reduction is challenging, both due to limited optical capabilities and to signal noise. A large part of today's imager cost is taken by lenses, whose complexity is bound to increase with miniaturization. In order to reverse such a trend: 1) innovative strategies exploiting coherent effects at the nanoscale have to be found, for instance, exploiting plasmonics and photonic bandgaps; and 2) image correction via on-chip computation will have to be implemented, fully exploiting the capability of the CMOS chip.

### C. Sensing Technologies

The computational power and the maturity of CMOS technology can be of great advantage in the sensor field, where environmental parameters have to be determined and corresponding actions undertaken. Since the signals to be sensed are mostly nonelectrical in nature, appropriate transducer elements are needed. Examples of external physical stimuli are mechanical (pressure, motion, vibration), electrical (voltage), thermal (temperature difference), electromagnetic (light), chemical (presence of a particular chemical species), etc. In response to the external stimulus, the transducer generates an electrical signal that is further processed by accompanying circuitry and is used to provide actionable information to the end user. Sensor metrics include sensitivity, selectivity, and repeatability. Nanotechnology can provide adequate solutions in the form of novel materials and structures with high sensitivity and into the Si mainframe technology. Two kinds of transducers are currently receiving considerable attention for sensing: 1-D structures (e.g., nanowires and nanotubes) [69], [70] and NEM devices [71].

Sensors may potentially be everywhere, providing instrumentation for the state of the environment, security, supporting regulation of different processes, etc. It is necessary in many applications for the sensors to communicate their data to a central information collection/decision-making authority. This gives rise to the need to establish sensor communication networks that can be used, for instance, to create autonomic systems that are user-transparent, self-healing, self-configuring, self-optimizing, and self-protecting analogous to many of the functions of the human nervous systems such as control of heart rate, breathing, etc. Of course, sensor networks in general represent an important field of endeavor where issues of configuration, optimum communication protocols, and information carrying capacity are essential concerns [72].

One example of the many application areas for sensors is in fields related to biology. The state of the living system can be monitored by sensing different physical parameters, e.g., chemical, electrical, optical, thermal, magnetic, etc. There are indications that 1-D structures, such as semiconductor nanowires and CNTs, may offer superior sensitivity to planar devices and allow for picomolar detection of biomolecules [73]. An additional attractive feature of 1-D structures is that they might lend themselves to minimally invasive probes to contact or even puncture the cellular membrane, or even to be ingested into the cell itself. This suggests the intriguing possibility of electrically monitoring processes inside the cell [74]. Recently, it has been shown that nanowires and nanotubes can also be used not individually but rather as a conductive film, which can be used as semitransparent electrodes, sensors, transistors, and in general, for flexible and stretchable electronics [75], [76], [77], [78], [79]. Such solution-based technology is compatible with CMOS.

In many applications, sensors should operate in a standalone mode at extremely low levels of energy consumption. In some cases, operational energy could be harvested from the sensor environment, e.g., in the form of solar, thermal, electromagnetic, or mechanical energy. Currently, there is continuing progress in miniature energy harvesting devices to support autonomous operations of sensor units [80], [81]. Understanding of maximum performance potential of such energy harvesting devices, given practical size constraints, requires further studies. In parallel with the need to scavenge energy from the environment, there will be an increasing need to store it in batteries or capacitors.

Nanostructure and nanodevices, as well as novel materials, can be decisive in the search for efficient power solution of future electronic systems. Some very promising results have already been obtained. Among them, the so-called third-generation solar cells promise enhanced efficiency and/or reduced costs by using quantum nanostructures or organic semiconductors [82]. CNT or graphene sheets provide ideal solutions for compact, long-lasting miniature super capacitors [83]. Nanowires possess optical, electrical, and theormelectric properties which can be useful in a number of energy-related applications [84]. In most cases, the technologies and devices just mentioned can be used as standalone technologies or in hybrid systems integrated on CMOS.

SECTION V

## BEYOND MOORE

### A. Terminology and Context

There is an international effort underway to identify an alternative to the CMOS transistor, which within one-to-two decades will no longer submit to feature size and voltage scaling [22]. Many of these alternative devices operate using state variables other than charge and some of them may offer functionalities beyond those of a binary device that could be useful for more complex operations. Indeed, the choice of state variable for a device not only has ramifications for device performance but echoes up the abstraction hierarchy to impact device-to-device communication, achievable chip complexity, and ultimately system performance capability. The dependency is depicted in Fig. 9. The symbols in Fig. 9 have the following meaning:

 $L_{sw}$ the smallest device (switch) feature, e.g., gate length in CMOS; $t_{sw}$ device switching time, i.e., time required to change state; $E_{sw}$ the energy required to change the device state (switching energy); $N_{\rm car}$ the number of information carriers required to transmit state to downstream devices; $M$ the device count, a measure of system complexity; $\beta$ binary information throughput, a measure of technological capability; $\mu$ instructions per second, a measure of information processor capability.

The state of a binary switch is that minimum set of physical variables that fully describe the system and its response to a given set of control variables. In characterizing the functionality of various candidate devices, it is important to draw a distinction between the physical entities used in their realization and the properties of these entities utilized in the operation of the device, which we refer to as variables. For example, physical entities might include electrons, atoms, ferromagnetic (FM) domains, etc. Associated with these physical entities are properties such as charge, spin, magnetic dipoles, etc.; the same entity might be used in two different devices, each exploiting a different property of that entity. In the following, “property” is used as a synonym for the word “variable” to agree with conventional usage.

Fig. 9. State variable and different facets of information processing system.

Now each device has input, state, and output variables: for example, the FET utilizes the electron as the physical entity, and the properties of charge are used for the input, output, and state variables. On the other hand, the spinFET utilizes electrons but it is controlled by electron spin, its state is defined by spin, and its output is transferred as charge.

Table 5 provides a tabulation of physical entities and the properties employed by several of the emerging devices. Also, an expanded taxonomy employed by the ITRS Emerging Research Devices Chapter [22] is shown in Fig. 10.

TABLE 5 Taxonomy for Candidate Information Processing Devices
Fig. 10. ITRS taxonomy for information processing nanotechnologies [22].

### B. Novel Device Examples

#### 1) III-V, Ge Channel, and Nanowire FET

It is well known that III-V compound semiconductors are ideal candidates for high-speed devices (several tens of gigahertz), due to their excellent bulk electron (e.g. 33 000 cm$^{2}{\hbox {V}}^{-1}{\hbox {s}}^{-1}$ for InAs and 80 000 cm $^{2}{\hbox {V}}^{-1}{\hbox {s}}^{-1}$ for InSb) and hole (1250 cm$^{2}{\hbox {V}}^{-1}{\hbox {s}}^{-1}$ for InSb and 850 cm$^{2}{\hbox {V}}^{-1}{\hbox {s}}^{-1}$ for GaSb) mobilities. The integration of GaAs and InP on Si substrates has been long sought but never achieved. Advances in epitaxial techniques have recently offered new perspectives on this challenge. In particular, Sb-based compound semiconductors are seen as realistic CMOS channel replacement materials due to the high mobilities for both electrons and holes [85], [86]. A further appealing system is provided by InAs, which can be grown in the form of nanowires directly on Si substrates with excellent material quality [87]. The major challenges include the need for high-quality, high- $k$ gate dielectrics (if MOSFETs are going to be used), damage-free low-resistivity junctions, and heterointegration on a very large-scale integration (VLSI)-compatible Si substrates. Similarly to III-V semiconductors, germanium (Ge) is also a potential channel replacement material because of its excellent bulk electron mobility of 3900 cm$^{2}{\hbox {V}}^{-1}{\hbox {s}}^{-1}$, almost three times higher than in bulk Si. Unfortunately, the poor quality of the Ge/dielectric has resulted in much lower mobilities in fabricated transistors [88]. Strain engineering of Ge $n$-channel MOSFETs has also been studied as a performance booster technology and its effectiveness has been demonstrated at a small strain level. An open issue is whether the low electron saturation velocity in Ge will limit the short channel performance of $n$-channel Ge MOSFETs relative to Si $n$-channel MOSFETs. In conclusion, III-V compound semiconductor and Ge FETs are considered viable candidates to extend CMOS to the end of the Roadmap.

Nanowire FETs are structures in which the conventional planar MOSFET channel is replaced with a semiconducting nanowire. Such nanowires have been demonstrated with diameters as small as 0.5 nm. They may be composed of a wide variety of materials, including Si, Ge, various III-V compound semiconductors (GaN, AlN, InN, GaP, InP, GaAs, InAs), II-VI materials (CdSe, ZnSe, CdS, ZnS), as well as semiconducting oxides (${\hbox {In}}_{2}{\hbox {O}}_{3}$, ZnO, ${\hbox {TiO}}_{2}$) [89]. Nanowires can exhibit quantum confinement behavior, i.e., 1-D conduction, that can lead to the reduction of short channel effects and other limitations to the scaling of planar MOSFETs. Vapor–liquid–solid (VLS) growth mechanism has been used to demonstrate a variety of nanowires, including core-shell and core-multishell heterostructures [90], [91]. Heterogeneous composite nanowire structures have been configured in both core-shell and longitudinally segmented configurations using group IV and compound materials. The longitudinally segmented configurations are grown epitaxially so that the material interfaces are perpendicular to the axis of the nanowire. This allows substantial lattice mismatches without significant defects. Vertical transistors have been fabricated in this manner using Si, InAs, and ZnO, with quite good characteristics [92], [93], [94]. The small lateral dimension of the nanowires allows their direct growth on lattice-mismatched substrates without the typical problems of dislocations and defects encountered in films. Thus, for instance, InAs of very good morphological quality has been grown directly on Si. Circuit and system functionality of nanowire devices has been demonstrated, including individual CMOS logic gates and other prototype circuit elements [95], [96]. Still a lot of work is needed to minimize parasitic components and achieve the high frequencies which have been predicted.

One of the crucial parameters controlling the power dissipation of CMOS devices is the subthreshold swing. In conventional MOSFETs, the thermal injection of carriers from the source to the channel sets a room temperature limit value of 60 mV/dec.

Tunnel FETs based on a gated p-i-n junction are expected to display an abrupt ${I}_{\rm on}/{I}_{\rm off}$ transition, thus lowering the subthreshold swing below the intrinsic MOSFET limit [97]. Such improvement is intrinsically connected to the quantum mechanical band-to-band tunneling process [98], which reacts sharply to variation of the gate voltage. High-performance tunnel FETs have been explored using low bandgap materials like Ge [99], SiGe [100], or based on Si nanowires [101] and CNTs [102]. A major challenge is the integration of such materials and structures on advanced Si platforms [103].

A completely different type of switch can be achieved exploiting the mechanical displacement of a solid beam controlled electrostatically to create a conducting path between two electrodes [104]. Such micro/nanoelectromechanical (M/NEM) switch has two major advantages with respect to MOSFETs: negligible leakage and negligible subthreshold swing. Thus, standby energy dissipation as well as dynamic energy consumption can be drastically reduced. The most recent developments suggest that M/NEM switches are attractive for ultralow-power digital logic applications. In addition, it is expected that the energy performance as well as the functional densities can largely improve with scaling. M/NEM switches can be fabricated by top-down approaches using conventional lithography techniques on Si, reaching actuation gaps as small as 15 nm [105]. Alternatively, bottom-up approaches employing CNTs [106] or Si nanowires [107] have been followed. In all cases, the leakage was virtually zero. The main weakness is switching speed, as the beam requires around 1 ns to move from the off position to the on position. A further challenge for M/NEM switches is the control of the surface forces and the reliability of the contacts.

#### 2) Carbon Electronics

The previous century has been the Silicon Century. The pervasion of electronic and optoelectronic devices in whole sectors of the society has been made possible mostly due to the success of CMOS technology (along the line of Moore's law). The new century might be the Carbon Century. Diamond has very interesting semiconducting properties, for instance, great heat and charge conductivity. Unfortunately, it is very difficult to obtain diamond in crystalline form at wafer level. Twenty years ago, CNTs were discovered. Some of their attributes make them very appealing in view of the miniaturization of electronic components. Despite a huge research effort, CNTs have not yet found a real application in nano and optoelectronics. Part of the problem is the difficulty to control the exact morphology (which in turn determines the CNT electronic properties) in a reliable and reproducible way. Recently, applications for a CNT network have emerged which make such system competitive with polymer materials for large area, low-cost electronics, and optoelectronics. A new carbon-made material has now appeared on the scene, receiving a great deal of attention. Graphene, a 2-D hexagonal grid of carbon atoms, has unique electronic, electrical, optoelectronic, and mechanical properties. It is therefore an appealing candidate for a variety of components like, e.g., transistors, sensors, electrodes, lasers. Although it is too early to forecast the market impact of graphene, the academic and industrial community as well the funding agencies are betting strongly on that novel nanomaterial. In the following, we will briefly discuss some of the important achievements for carbon-based devices and outline the main challenges.

CNT FETs are attractive because of the high mobility of charge carriers, the intrinsically small dimensions, and the possibility of minimizing short channel effects via all-around gate geometry. In the past two years, significant advances have been made in fabricating and characterizing CNT FETs [108], [109]. For instance, transistors with 15-nm channel length displayed no short channel effects and a transconductance of 40 $\mu$ S for a single channel [110]. Frequencies as high as 15 GHz have been reached [111]. Nevertheless, major challenges remain, in particular concerning the ability to control 1) bandgap energy and nanotube chirality with sufficient precision for industrial applications; 2) the positioning of the nanotubes in required locations and directions; 3) the deposition of a gate dielectric; and 4) the formation of low-resistance electrical contacts.

Thanks to the extremely high electron mobilities, graphene is an ideal material for RF transistors [112], [113], [114]. Very high values of cutoff frequency have been demonstrated, in excess of 200 GHz [115]. In order to achieve better performances, the quality of the source and drain contacts have to be improved, especially in the top gate configuration. Graphene FETs were first based on exfoliated graphene to form a transistor channel, which offers the highest mobility, but is hardly manufacturable [116]. Recently, epitaxial graphene on SiC substrates and chemical vapor deposition (CVD)-grown graphene on, e.g., copper foils, have been obtained [117], [118]. Back-gated graphene FETs with SiO2 dielectric were typically shown to have room temperature field-effect mobilities up to around 10 000 cm 2/Vs [119]. Suspended graphene or graphene sheet on flat and inert substrate such as boron nitride can reach mobilities above 100 000 cm2/Vs at room temperature [120], [121]. In top gate devices, lower mobilities are found, possibly because of a degradation of the channel properties when the gate dielectric is deposited [122]. Due to the peculiar band structure of graphene, electron and hole mobilities are similar. It also displays no energy gap, at least for extended single sheets. One crucial consequence are bipolar transport characteristics, which imply very small ${I}_{\rm on}/{I}_{\rm off}$ ratios. This is of course a major limitation for digital applications. Several methods to open a bandgap have been proposed, as, for instance, through the use of graphene nanoribbons [123].

#### 3) Memristors

Recently, interest on hysteretic devices has risen in the context of nonvolatile memories. Such devices, named memristors, were pioneered in the work of Chua in the 1970s [124]. There, he indicated the memristor as the missing element, in addition to inductors, resistors, and capacitors, needed for a coherent description of electronic circuits. Much later, other groups rediscovered the definition in connection to nonlinear elements embedded in a crossbar architecture [125].

One possible memristor structure can be based on a polymeric film sandwiched between two metal electrodes [126], [127]. As pointed out earlier in connection to polymeric devices, the main motivation for using such material is the low fabrication cost. On the other hand, scaling has not been widely discussed. Although polymeric resistive memory arrays have been demonstrated, including a 3-D stack of three active layers, the memory operation mechanisms are still unclear [128]. Some research suggests that the changes in resistance could be due to intrinsic molecular mechanisms, charge trapping, or redox/ionic mechanisms [129].

Another type of memristic devices is the so-called “atomic switch,” basically an electrochemical switch based on the diffusion of metal cations and their reduction/oxidation processes to form/dissolve a metallic conductive path [130]. The metal atoms are introduced into the ionic conductive materials from a reversible electrode. The atomic switch was initially developed as a two-terminal device using sulfide materials that were embedded in a crossbar architecture with scalability down to 20 nm [131]. Later, an atomic switch using fully CMOS compatible materials was developed to enable the formation of these devices in the metal layers of CMOS devices. This configuration resulted in the development of new type of programmable logic device [132]. Three-terminal atomic switches characterized by high ${I}_{\rm on}/{I}_{\rm off}$ ratio, low on-resistance, nonvolatility, and low-power consumption have also been demonstrated [133]. Several operating mechanisms have been proposed, including gate-controlled formation and annihilation of a metal filament, and gate-controlled nucleation of a metal cluster, but no complete understanding of the process currently exists. Switching speed, cyclic endurance, uniformities of the switching bias voltage, and resistances both for the on-state and the off-state should be improved for general usage as a logic device [134].

In a variety of materials, ion migration combined with a redox process can cause a change in resistance of a metal–insultor–metal structure [135]. For instance, for silver electrode, Ag+ cations can drift through the insulator in the presence of an applied voltage, forming a highly conductive filament connecting the metal electrodes resulting in the on-state of the cell. Reversing the applied voltage, an electrochemical dissolution of these filaments takes place, resetting the system into the high-resistance off-state [136]. In the case of transition metal oxides, such as TiO2, the motion of oxygen vacancies is responsible for the change in the cell resistance. In a third class of materials, a unipolar thermochemical mechanism leads to a stoichiometry change due to a current-induced increase of the temperature. In some cases, a formation process is required before the bistable switching can be started. Since the conduction is often of filamentary nature, memories based on this bistable switching process can be scaled to very small feature sizes. The switching speed is limited by the ion transport, typically rather slow. Thus, the distance between the electrodes has to be limited to a few nanometers. Although the microscopic nature of the switching process has yet to be understood in detail, recent experimental demonstrations of scalability, retention, and endurance are encouraging [137].

From an architectural point of view, memristive devices could be coupled with two-terminal select devices in order to build passive memory arrays (crossbars) [138]. The general requirements for such two-terminal switches are sufficient on-currents at proper bias to support read and write operations and sufficient on/off ratio to enable selection even in the absence of a transistor. These specifications are quite challenging and severely limit the maximum size of a crossbar array [139]. Currently, two approaches to integrating a two-terminal select device with storage node are being pursued. The first approach integrates the external select device in series with the storage element in a multilayer stack. The second approach uses a storage element with inherent nonlinear properties. The simplest realizations of two-terminal memory select devices use semiconductor diode structures, possibly in a back-to-back configuration for bipolar memory cells. Alternatively, a selector exhibiting resistive switching behavior could be used. That is, the selector works on the same principle as the restore element, the main difference being that it can be volatile. One possible device is based on a metal–insulator transition and exhibits a high resistance for voltage below a given value. As an example, a VO2-based device has been demonstrated as a select device for NiOx resistive random access memory (RRAM) element [140]. The main challenge for switch-type select devices is to identify the right material and the switching mechanism to achieve the required reliability, drive current density, and ${I}_{\rm on}/{I}_{\rm off}$ ratio.

In addition to memories, it has been suggested that logic gates can also be built using memristors [141]. Furthermore, neuromorphic architectures based on memristive crossbars have been investigated [142], [143].

#### 4) Molecular Electronics

One approach to beyond CMOS electronics is based on the use of single conductive molecules [144], [145]. Due to their intrinsically small size and the possibility to use self-assembling techniques, single molecules could be an alternative to Si nanostructures for nonvolatile memories, diodes, or switches [146], [147]. In fact, when properly functionalized, single molecules can display nonlinear electrical characteristics and, in some cases, hysteresis [148]. In a molecular memory, data are stored by applying an external voltage that causes a transition of the molecule into one of two possible conduction states. Data are read by measuring resistance changes in the molecular cell. The concept emphasizes extreme scaling; in principle, one bit of information can be stored in the space of a single molecule, namely, few nanometers. Computing with molecules as circuit building blocks is an exciting concept with several desirable advantages over conventional circuit elements. Because of their small size, very dense circuits could be built and bottom-up self-assembly of molecules in complex structures could be applied. However, major challenges still exist. First, the very nature of the molecular conduction and molecular switching has not been fully understood. The role of the metallic leads is not clear and parasitic effects due to the environment could appear which might determine the transport characteristics of a molecular device. In any case, prototypical molecular memories have been built, which show remarkable endurance and reproducibility [149], [150]. At an architectural level, both molecular quantum cellular automata (QCA) and crossbar structures have been investigated [151], [152].

#### 5) Magnetic Components

Electronic systems combining computing and storage capabilities could be realized based on magnetic structures. Magnetic random-access memories (RAMs) [153] are a mature technology with some products already on the market. The control of single spins of either atoms or electrons has also been proven a promising new way to achieve electronic functionalities. The possibility to build logic circuits with magnetic nanostructures has been demonstrated at a prototypical level. There, a novel architecture based on field coupling [called magnetic quantum cellular automaton (MQCA)] is adopted [154], where the spatial arrangement of coupled nanomagnets can be used to build logic functions and complete circuits. In the following, we will briefly describe some of the suggested magnetic components.

In spin transistors, the current is controlled by the magnetization configuration of the ferromagnetic electrodes or by the spin direction of the carriers [155]. Thus, feature could lead to low-power circuit architectures that are inaccessible to ordinary CMOS circuits. Recently, an experimental demonstration of spin FET was reported [156], [157]. Oscillatory spin signals controlled by a gate voltage were observed implying spin precession of spin-polarized carriers in the channel. However, the origin of the observed spin signals is not yet clear. Spin MOSFETs using ferromagnetic electrodes have also been proposed but not yet demonstrated [158].

Spin wave devices (SWDs) are a type of magnetic logic exploiting collective spin oscillation (spin waves) for information transmission and processing [159]. The spin waves are generated in a magnetoelectric cell which is driven by external voltage pulses. Such a cell also acts as detector and storage element. The information is encoded into the initial phase of the spin wave. Spin waves propagate through spin wave buses and interfere at the points of junction constructively or destructively, depending on the relative phase. The result of computation can be stored in the magnetization or converted into the voltage pulse by the output magnetoelectric cells. The primary expected advantages of SWDs are: 1) the ability to utilize phase in addition to amplitude for building logic devices with a fewer number of elements than required for transistor-based approach; 2) nonvolatile magnetic logic circuits; and 3) parallel data processing on multiple frequencies at the same device structure by exploiting each frequency as a distinct information channel. Prototypes operating at room temperature and at gigahertz frequency have been demonstrated.

In nanomagnetic devices, binary information can be encoded in the magnetization state. Fringing field interactions between neighboring nanomagnets can be used to perform Boolean logic operations [160]. A functionally complete logic set based on nanomagnets has been demonstrated [161]. In addition, nanomagnetic devices have nonlinear response characteristics, the output of one device is capable of driving another, power amplification (or gain) is present, and dataflow directionality can be obtained. Nanomagnet logic (NML) has therefore a great potential for low-power applications. A clock modulates the energy barriers between magnetization states in an NML circuit. Recently, experimental demonstrations of individual island switching as well as the reevaluation of NML lines and gates with CMOS-compatible clock structures have been reported [162]. Furthermore, NML appears to be scalable to the ultimate limit of using individual atomic spins. Whether a circuit ultimately exhibits reliable and deterministic switching is a function of how it is clocked—and requires additional study.

Field coupling via magnetic interaction belongs to a novel class of architectures called MQCA [154]. A cellular automaton (CA) is an array of cells, organized in a regular grid [163], [164]. Each cell can be in one of a finite number of states from a predefined state set, which is usually a set of integers. The state of each cell is updated according to transition rules, which determine the cell's next state from its current state as well as from the states of the neighboring cells. The functionality of each cell is defined by the transition rules of the CA. Typically, each cell encodes one bit into a single electrical or magnetic dipole. The cell-to-cell communication is guaranteed by magnetic interaction between neighboring dipoles. A QCA architecture has some appealing features: its regular structure has the potential for manufacturing methods that can deliver huge numbers of cells in a cost-effective way. Top-down as well as bottom-up manufacturing methods can be used. Furthermore, the design of a cell can be relatively simple as compared to that of a microprocessor unit, so design efforts are greatly reduced. Wires are completely unnecessary since the cells can interact with their neighboring cells through some physical mechanism. Thus, interconnection delay and power dissipation through interconnects are avoided. Clearly, QCAs also have some drawbacks and challenges. For instance, input and output of data to cells with nanometer dimensions may be difficult. Clocking the cells requires additional wires or external inputs. Speed might be a limiting factor. Room temperature operation has to be assured for any realistic application, which, up to now, has only been demonstrated for magnetic QCAs.

A concept that combines spin-controlled devices and nanomagnetic logic has been proposed recently [165]. In the all-spin logic (ASL), the information stored in the nanomagnets propagates as spin current in spin coherent channels. Recent advancements have shown that a combination of spintronics and magnetics can provide a low-power alternative to charge-based information processing. Key elements of ASL are the spin injection into metals and semiconductors from magnetic contacts and the switching of magnets by injected spins. Major challenges to be overcome are room temperature operation and a further improvement of the energy-delay product. It should be mentioned that ASL could also provide a natural implementation for biomimetic systems with architectures that are radically different from the standard von Neumann architecture.

#### 6) New Architectures for Beyond CMOS

Research on architectures that exploit the properties of the devices described in this section is at an early stage of development. Many different possibilities exist, two of which are CA and neural-inspired networks. CA typically use a form of nearest neighbor communication and they can be shown to be universal. Theoretical and experimental quantification of CA performance relative to the von Neumann architecture remains an open question [164]. Neural networks take a different approach by seeking to emulate structures in the brain and these have been studied for decades. So far it appears that neural networks can offer advantages for special classes of problems. There are indications that memristors in crossbar arrays may be able to emulate neural behavior. In the next section, a perspective on architectures for computation inspired by the operation of living cells is provided.

SECTION VI

## BIOLOGICAL COMPUTATION: LIVING CELL EXAMPLE

### A. A Basis for Quantitative Comparisons

The reliance of CMOS and many other proposed information technologies on electron charge to support their operations places them at risk as features scale downward into the few nanometer regime. Not only does tunneling become detrimental to performance, but also smaller features usually make the devices more susceptible to minute, manufacturing-induced, variations in material structure and composition. It has been said that the creativity of nature far exceeds that of humans and it seems reasonable to seek inspiration for new information processing technologies from this source. In that which follows, it is argued that the living cell can be viewed as an information processor that is extraordinarily efficient in the execution of its functions. The living cell is, in a sense, a universal constructor as suggested by von Neumann, which is capable of creating copies of itself [166], [167]. The model that is used in the following is the E.coli cell which has dimensions on the order of 1 $\mu$ m and which has been heavily studied so that quantitative estimates of its complexity, performance, and energy efficiency are available. Given this, it is important to point out that many of the mysteries of cell operation are yet unresolved and are the focus of continuing investigations.

In order to provide a benchmark for E.coli cell operation, we first extrapolate the capabilities of a 1-$\mu$ m scale CMOS information processor when end-of-scaling CMOS technology is utilized. Favorable assumptions for the 1-$\mu$ m CMOS cell are offered including the stipulation that no volume is required for energy storage and for communication. A development is then offered, from available data, of the information processing capability of the E.coli cell. It is argued that the information processing capabilities of the E.coli cell far exceed that of the 1-$\mu$ m CMOS cell and inferences are drawn suggestive of directions for future information processing technologies. The terminology in silico is used to refer to the semiconductor benchmark cell and the term in carbo is used to refer to the E.coli cell in the following.

### B. Bio-$\mu$ Cell Information Processor

Are there example information processing systems now extant from which inspiration might be drawn for new technologies? It has been recognized that individual cells, the smallest units of living matter, possess amazing computational capabilities, and are indeed the smallest known information processors [168]. As is argued in a number of studies, individual living cells, such as bacteria, have the attributes of a Turing machine, capable of a general-purpose computation [168], [169], [170]. It can also be viewed as universal constructor in the sense of von Neumann because it manufactures copies of itself, thus a computer making computers [169].

Just how does the cell go about implementing its information processing system? The cell is a very complex organism and any brief attempt to describe its operations is bound to be inadequate. A vastly oversimplified view of cellular processes is presented below.

A cell's primary functions can be described as follows.

1. Reproduction: making cells by acquiring/processing information from internal storage (DNA) and utilizing the structural building blocks and energy from the nutrients.
The reproduction task requires a massive information processing effort, a crude estimate of which is made in Section VII-A. In short, elementary structural building blocks (22 amino acids and five nucleotides) need to be synthesized or acquired, and then utilized to form functional building blocks, which include different proteins, RNA, and DNA molecules. Finally, all building blocks need to be properly placed within cell's volume for assembly. A special cell-cycle control mechanism regulates the sequence, timing, etc., of the cell assembly process.
2. Adaptation for survival: Acquiring/processing information from external stimuli with feedback from DNA.
Single-cell organisms, such as E.coli bacteria, could not survive without the ability to sense the environment and adapt to its changes (positive or negative). For example, in response to the external presence of specific nutrients, particular proteins are produced within the cell to facilitate the uptake and digestion of those nutrients. In the absence of nutrients, the cell can switch to a resting mode, where the reproductive process is inhibited. In addition, single-cell organisms can respond to a variety of external stimuli such as temperature, light, presence of toxic chemicals, magnetic field, etc. Many single-cell organisms also possess motility organs (e.g., flagellae in case of E. coli).
3. Extracellullar communication: Sending and receiving signals to coordinate community behavior.
Many unicellular organisms communicate to each other by the release and detection of special signal molecules. Cells use chemical signaling to detect population density and to exchange information about the local environment. Cell-to-cell communication coordinates the behavior of a cell population to increase access to nutrients, provide for collective defense, or enable the community to escape in case of threats to its survival.

In the following, we offer a simple estimate of single-cell computational capabilities based on two different approaches. A bottom-up approach counts the cell hardware, i.e., the number of memory and logic elements in the in carbo processor. The top-down approach deals with total amount of computation needed to implement operations to assemble a new cell.

### C. Cell Hardware

Fig. 11 shows a cartoon of a cell as information processor. It contains a localized long-term memory block ${\bf M}$ (DNA molecule), a number of short-term memory and logic units ${\bf L}$ (different protein and RNA molecules), (input) sensors ${\bf S}$ to monitor both outside environment and the cell interior (extracellular and intracellular receptor proteins), and two output units: the ribosomes, where new structural building blocks for reproduction are “printed,” and signaling units that “wirelessly” connect to neighboring cells by sending signal molecules.

Fig. 11. Unicellular organism as information processor.

The cell hardware is made from three types of macromolecules: proteins, DNA, and RNA. Table 6 presents a summary of the statistics and functions of these molecules in E.coli cell. A description of essential features of different parts of the cell hardware is given below.

TABLE 6 Essential Parameters of the E.coli Molecular Processor

#### 1) Logic Hardware

Many proteins (Fig. 12) in living cells have as their primary function the transfer and processing of information, and are therefore regarded as logic elements of the in carbo processor [171], [172], [173], [174]. In fact, as recent studies indicate, the proportion of components devoted to computational networks increases with the complexity of the cell, and are absolutely dominant in humans [173]. Proteins can alter their 3-D structural shapes (conformation) in response to external stimuli, and different conformations can represent different logic states. These nanomechanical changes form a state variable, sometimes called conformon [175]. The essential functions of the protein devices are determined by their conformational states. A simple example of the “binary” conformational change is the ion channel protein, which is embedded in a cell's membrane and acts as a gate for ions, and can be opened or closed depending on command from either internal or external sources, e.g., light, pressure, chemical signal, etc. Different nanomechanical conformations of these protein devices are recognized by other elements of the in carbo cell circuit by a process based on selective affinity of certain biomolecules with given conformational states. Molecular recognition implemented with conformons plays a fundamental role in the communication of information packages within the processor, and it facilitates targeted interactions between different elements, e.g., protein–protein, protein–DNA, RNA–ribosome, etc.

Fig. 12. Protein molecule formed from different amino acids (shown as circles of different colors).

The protein conformons control all processes in the cell, such as sensing, signaling, information retrieval, etc. Some examples will be given in the next section.

#### 2) Memory Hardware

All data about structure and operation of a living cell are stored in the long DNA molecule. DNA coding uses a base-4 (quaternary) system. The information is encoded digitally by using four different molecular fragments, called nucleobases, to represent a state: adenine (A), cytonine (C), guanine (G), and thymine (T). The four molecular state symbols are attached in series to a flexible “tape” or a “backbone” made of sugar and phosphate groups. The complete DNA unit consists of two complementary “tapes” forming the so-called double helix. Each state symbol (base) on the first tape forms a pair (base pair) with a complementary state symbol on the second tape: adenine forms a pair with thymine, while cytosine forms a pair with guanine. Information content in each tape is identical, but is written with different (complimentary) sequences of symbols. Thus, the base pair (bp) is a natural unit of information stored in DNA. One bp equals to two bits of binary information and corresponds to approximately 0.34 nm of length along the tape, as shown in Fig. 13.

Fig. 13. A fragment of DNA molecule formed from four different nucleotides.

Some examples of DNA storage capacity (genome size) are given in Table 7. Note that the storage density of molecular DNA memory is $\sim$10 Mb/$\mu$ m3 or 1019 b/cm3, which is much denser than the density limits for the electronic long-term memory evaluated in Section III-B. Also it is interesting to note that the single-cell organism Amoeba Dubia stores a huge amount of information (1.34 Tb), compared to $\sim$6 Gb stored in the human genome.

TABLE 7 DNA Storage Capacity (Genome Size) for Several Representative Cellular Organisms
##### a) read from long-term memory

Different parts of the DNA memory of the cell are continuously accessed to support its operation. One example is signal transduction, which is the DNA-controlled process of cellular response to external stimuli.

##### b) Writing to long-term memory

The view that DNA is a read-only memory has undergone a dramatic change in recent years. The copying of the parental DNA to the offspring, called vertical gene transfer, is the basis for inheritance and until recently was regarded as the only or at least the vastly predominant mechanism for transferring the genetic information. There is, however, an alternative mechanism for information transfer, which is lateral gene transfer. This can happen: 1) by a direct uptake (swallowing) of naked DNA from the cell environment; 2) by a virus; and 3) by direct physical contact between two cells. Fragments of DNA, imported from outside, can be integrated into the host DNA, and thus new information is written in the memory unit. Until the advent of the genome-sequencing era, a prevailing opinion among the research community was that lateral gene transfer was a rare and insignificant event. Currently, it is recognized that in prokaryotic, e.g., bacterial cells, lateral transfer is the predominant form of genetic variation and is one of the primary driving forces for bacterial evolution. In fact, the scale of lateral gene transfer can be very large: for example, two different strains of E.coli differ more radically in their genetic information than all mammals.

SECTION VII

## QUANTITATIVE ESTIMATES FOR THE BIO-$\mu$ CELL AND THE Si-$\mu$ CELL

### A. The Bio-$\mu$ Cell

The overall information content of a material system consists of information about the system's composition and shape [176]. For example, if a case of von Neumann universal constructor is considered, i.e., a computer with the task of controlled the assembly of the structure (e.g., another computer) from building blocks, a certain amount of information must be processed, which is related to the complexity of materials system. For each step, the computer must: 1) select the appropriate category of the building blocks; and 2) calculate $x$-, $y$-, $z$-coordinates of the position for each of the building blocks. If there are $N$ different building blocks (in the case of ultimate bottom-up construction, these building blocks could be atoms composing a material structure), information content of selection in step 1 is TeX Source $$I_{s}=\log_{2}N\ {\hbox {(bit)}}\eqno{\hbox{(5)}}$$ and the information of the $xyz$-positioning is TeX Source $$I_{xyz}=3n\eqno{\hbox{(6)}}$$ where $n$ is the lengths of a binary number representing each coordinate. In this estimate, $n=$ 32 b will be used, which is sufficient for representing numbers with practically arbitrary precision (“floating point” format).

Thus, if the total number of the building blocks in a material structure is $K$, the total information processed in assembly is TeX Source $$I_{K}=K(\log_{2}N+3n).\eqno{\hbox{(7)}}$$

Now, consider the task assembling of living cell of E.coli bacterium from individual atoms. The elemental composition of the bacterial cell is known with high accuracy and is shown in Table 8 [177]. The cell is mainly composed of ten different atoms with the total number of $\sim\! 3\times 10^{10}$ atoms. Thus, using (7), we obtain TeX Source $$I_{\rm cell}\sim {\hbox {3}}\times {\hbox {10}}^{10}\times(\log_{2}10+{\hbox {3}}\times {\hbox {32}})\sim {\hbox {3}}\times {\hbox {10}}^{12}\ {\hbox {bit}}.$$

TABLE 8 Elemental Composition of E.coli [177]

This result is remarkably close to the experimental estimates of the informational content of bacterial cells based on microcalorimetric measurements which range from 1011 to 1013 bits per cell. In the following, it is assumed that ${I}_{\rm cell}\sim {\hbox {10}}^{11}$ bit, i.e., the conservative estimate is used.

### B. The Si-$\mu$ Cell

For a system-level comparison between extremely scaled Si-based technology and carbon-based computational elements in biosystems, consider a hypothetical computer that is realized in a cube of 1 $\mu$ m in size (the volume of the bio-$\mu$ cell). Such computer, later referred to as Si-$\mu$ cell, must contain logic circuitry and nonvolatile memory to store program. Suppose further all components of the computer are to be implemented in ultimately scaled Si technology summarized in Table 4. In the following, 3-D-stacked logic and memory circuit layers will be used to fill the 1-$\mu{\hbox {m}}^{3}$ volume. (In Table 4, the thickness of one layer in the stack is assumed to be $9F$ for logic and $6F$ for memory.) The corresponding densest conceivable 3-D arrangement of FETs is ${\hbox {1.5}}\times {\hbox {10}}^{17}\ {\hbox {transistors/cm}}^{3}$, and thus 1-$\mu{\hbox {m}}^{3}$ volume could contain up to 150 000 logic transistors. For nonvolatile memory, the densest 3-D stack of nand layers is ${\hbox {4.2}}\times {\hbox {10}}^{16}\ {\hbox {b/cm}}^{3}$, or 42 kb of memory in 1-$\mu{\hbox {m}}^{3}$. Comparing to the bio-$\mu$ cell in Table 6, the Si-$\mu$ cell can contain $\sim\! \! 10\times$ less logic elements and more than 100× less memory. (Note that this estimate was made even before partitioning of the 1-$\mu{\hbox {m}}^{3}$ volume between logic and memory and not including an energy source.)

Next, according to Table 4, the off-state leakage power in the ultimate FET circuit is $\sim$2.34 nW per transistor, thus $\sim$357 $\mu$ W of total static power dissipation in a system of 152 000 FETs. This results in catastrophic heat densities in the 1-$\mu{\hbox {m}}^{3}$ cube: $q=$ 357 $\mu$ W/6 $\mu$ m2= 5800 W/cm2. This is almost equal to the heat density at the Sun's surface ($\sim$6000 W/cm2, as shown in Table 8). Clearly, such Si-$\mu$ cell computer cannot exist. Therefore, for such a system, larger scale devices or/and smaller device count must be used.

What is the smallest device count that could suffice for the Si-$\mu$ cell? von Neumann has argued that the minimum logic circuit complexity required to implement general-purpose computing is of the order of a few hundred devices [178]. In an attempt for a more accurate estimate of the von Neumann threshold, a model 1-bit minimal Turing machine (MTM) has been constructed with total device count of about 320 binary switches/transistors and requires 8-b instruction words for its operation [25]. In the following, it will be assumed that the logic processor of the Si-$\mu$ cell is implemented by an MTM. This also allows the maximization of the amount of memory in Si-$\mu$ cell (which is much less than the bio-$\mu$ cell).

Suppose the MTM is implemented within the Si-$\mu$ cell using FETs with ${L}_{g}\sim$ 4.5 nm with the parameters listed in Table 4. The remainder of the 1-$\mu$ m cube is available for memory.

Implementation of each of the MTM instructions requires a minimum of three sequential operations/cycles [25]. On average, $\sim$50% of transistors are active during each cycle, thus $\sim$160 switching events per cycle or $\sim$500 switching events per instruction. Since execution of one instruction results in one output bit, the ratio of the total transistor switchings (raw bits) to the output bits is 1/500. Therefore, in order to generate 1011 output bits, the typical outcome of biological computation, at least ${\hbox {5}}\times {\hbox {10}}^{13}$ raw bits must be processed in the MTM. It takes ${\hbox {3}}\times {\hbox {10}}^{11}$ MTM cycles to complete the computational task (i.e., three MTM cycles per one output bit). If it is required that these events occur over 2400 s (to match the bio-$\mu$ cell), the cycle time $t_{\rm cycle}=$ 8 ns. This appears to be easily achievable by CMOS technology. The total switching energy and power per MTM cycle are TeX Source $$E_{\rm cycle}=N \cdot E_{\rm bit}={\hbox {320}}\times {\hbox {2.93}}\times {\hbox {10}}^{-18}={\hbox {9.36}}\times {\hbox {10}}^{-16}\ {\hbox {J}}\eqno{\hbox{(8a)}}$$[note that $E_{\rm bit}$ in (8a) corresponds to the 50% activity factor (4b)] and TeX Source $$P_{\rm active}={E_{\rm cycle}\over t_{\rm cycle}}={{\hbox {9.36}}\times {\hbox {10}}^{-16} \over {\hbox {8}}\times {\hbox {10}}^{-9}}={\hbox {1.17}}\times {\hbox {10}}^{-7}\ {\hbox {W}}.\eqno{\hbox{(8b)}}$$(There is also leakage power consumption and this is approximately 749 nW.)

Next, the energy consumed by the memory access needs also be taken into account. At each cycle, an 8-b instruction must be read from the memory block. Assuming a serial read (typical for nand memory) with only one line in memory array charged, the energy for reading eight serial bits is close to 10-13 J TeX Source $$P_{M_{\rm cycle}}={E_{M}\over t_{\rm cycle}}\sim{{\hbox {10}}^{-13}\ {\hbox {J}}\over {\hbox {8}}\times {\hbox {10}}^{-9}\ {\hbox {s}}}={\hbox {1.25}}\times {\hbox {10}}^{-5}\ {\hbox {W}}.$$

A summary of energetics of Si-$\mu$ cell implemented with ultimate high-performance CMOS and also for low standby power technologies is given in Tables 9 and 10. As follows from the tables, the Si-$\mu$ cell cannot operate in this mode due excessive heat generation. Therefore, the cycle time has to be increased to reduce the power dissipation. Note that the predominant source of power consumption in both cases is a consequence of charging memory access lines.

TABLE 9 Energetics of Si-$\mu$ cell Implemented With Ultimate High-Performance CMOS Technology
TABLE 10 Energetics of Si-$\mu$ cell Implemented With Ultimate Low-Standby Power CMOS Technology

How much heat could be tolerated by a Si-$\mu$ cell computer? Table 11 provides some reference numbers for several model heat generators along with heat removal capabilities for different cooling techniques. If it is postulated that only passive cooling can be used for the Si-$\mu$ cell (i.e., no additional space overheads), the maximum heat flux through the walls of the cube must be < 1 W/cm2 ($\sim$ max. free water convection cooling rate).

TABLE 11 Cooling Capabilities for Air and Water and Examples of Representative Heat Generating Systems

If only passive air or water cooling is used, the max heat flux should be < 1 W/cm2, thus total power dissipation < 6 × 10-8 W, which limits the cycle time to be > 1.70 $\mu$ s. For this, the total time needed to emulate the bio-$\mu$ cell task (i.e., equivalent of 1011 output bits) will be 510 000 s, which is more than 200× larger than time needed for the bio-$\mu$ cell.

As follows from the above, the bio-$\mu$ cell outperforms the Si-$\mu$ cell in all respects. A summary of the comparison between the two $\mu$ cells is presented in Section VII-C and some of the implications are discussed.

### C. Comparisons and Implications

As is clear from the previous section, the Si-$\mu$ cell fundamentally cannot match the bio-$\mu$ cell in the density of memory and logic elements, or operational speed, or operational energy. A core challenge is the MTM requirement for a large number of memory accesses per output bit. The above analysis suggests that there is much to be learned from the designs of nature and they may provide hints as to how future technologies could evolve. Fig. 14 provides a brief summary of comparative data. As follows from the analysis, the number of functional elements in the bio-$\mu$ cell is extraordinary and far exceeds foreseeable device densities of semiconductors. This may be at least in part due to different mass of information carriers: As was argued in Section III, the smallest size for both memory and logic devices depends on the mass of the information-bearing particles, e.g., the smallest barrier width in Si devices is $\sim$5 nm due to low effective mass of electrons in Si. Heavier particle mass, in principle, allows for smaller device size, which seems to be realized in the in carbo logic and memory elements. Emerging technologies discussed in Section V-B3, such as memristors, atomic switch, and redox memory, also use heavier mass particles, and their potential for very dense circuits needs to be further explored.

Fig. 14. Comparison of significant parameters of the bio-$\mu$ cell and the Si-$\mu$ cell.

As was shown in the previous section, memory access is the most severe limiting factor of Si-$\mu$ cell. Not only is there simply not enough nonvolatile memory bits, but also access to them to support computations takes too much energy. In larger scale computers, this problem is easily circumvented by initial massive serial readout from the nonvolatile media (e.g., hard disk drives or flash memory) and buffering these data in low-energy SRAM or DRAM. However, at the scale of the 1-$\mu$ m cube, there is no space for the buffers, and just direct access to nonvolatile memory was assumed for the Si-$\mu$ cell operation. Another related observation is that organizing solid state memory in crossbar arrays, while an elegant solution is at larger scale, also contributes to excessive energy dissipation. In this regard, access to the DNA memory can be viewed as similar to access to hard disc drives [179], [180]. It could be argued that at least in theory, the serial access principle of hard disc drives might be a better solution for low-energy systems (of course, in practice, the mechanical overheads significantly add to the total energy consumption of hard disk drives).

The architectural organization of computation in in carbo systems appears to be much more efficient than for Si computers. As was mentioned in Section II, the biological processors, such as brain, are not on the computational trajectory for Si microprocessors in Fig. 2, suggesting that there may exist alternate technologies and computing architectures offering higher performance (at much lower levels of energy consumption). One key factor here is that basic algorithms need to work in very few steps [181]. Indeed, it appears that the bio-$\mu$ cell utilizes fine-grained and massive parallelism per instruction, i.e., by sending out into the cytoplasm multiple copies of DNA instructions by RNA messengers. Also, the bio-$\mu$ cell utilizes undirected thermally driven motion of the mRNA molecules to achieve connectivity to the ribosomes in the cytoplasm. A correct transfer to an appropriate ribosome is achieved by electrostatic attraction arising from conforms, i.e., from the specific shapes of the transmitted and recipient molecular structures. In contrast, data transfer in electrical circuits follows predetermined routes that require an expenditure of energy. Whereas electrical circuits utilize a controllable energy barrier whose operation requires an expenditure of energy and whose physical extent is determined by electron tunneling considerations, it is not clear that there is a similar use of energy barriers in the ribosome's execution of RNA instructions.

As a side remark, although not emphasized in this study, the bio-$\mu$ cell incorporates within its volume the capability to incorporate and transform materials from its environment into energy yielding molecules in a form accessible to its processes. Such additional processes for energy transformation were not included in the analyses of the Si-$\mu$ cube.

Finally, in 1959, Feynman [182] gave a presentation in which he suggested the possibility of building computers whose dimensions were “submicroscopic.” Although the progress of CMOS technology has been extraordinary, submicroscopic computers remain outside our grasp. As has been indicated above, nature appears to have successfully addressed the submicroscopic design challenge.

SECTION VIII

## SUMMARY

Feature size scaling has enabled a very steep learning curve for CMOS technology that has helped to create a feature-driven marketplace. Although there are compelling physical arguments that physical scaling must end for CMOS, it appears that the benefits of Moore's law will continue for some time, aided by the advent of new materials, processes, and device structures. Very likely, the application space for CMOS technology will continue to grow rapidly as new functionalities are combined with more traditional information processing and communication capabilities.

At the same time, there is intense research underway to find alternatives to CMOS technology that have the potential to extend the benefits of Moore's law scaling for decades into the future. It was pointed out that there are many options at this time, but there is no one-for-one substitute for CMOS technology yet available. Replacement options may eventually be identified, but it appears that a likely scenario is that this research will yield devices with functionalities that can be integrated with CMOS technology to provide unique capabilities or to replace CMOS modules with special-purpose structures based on the novel devices.

It also may be that dramatic improvements in information processing technologies will result from a radical rethinking of both architectures and supporting technologies. A comparative analysis between the bio-$\mu$ cell and the Si-$\mu$ cell was offered to stimulate thinking about alternative scenarios. As the bio-$\mu$ cell goes about its complex task of creating a copy of itself, it does so using fine-grained processes, devices, and architectures that are completely different and much more energy efficient than existing CMOS/von Neumann paradigms. Perhaps, the design of nature's information processors can inspire radical breakthroughs in inorganic information processing.

There is substantial momentum to sustain Moore's law for many more decades because of the benefits that it accrues to society. The challenges that lie before us to achieve this are substantial but so is the creativity of scientists and engineers. Although the road ahead is not well marked, there are many indications that there are no insurmountable barriers that would deny progress in information processing technologies for the foreseeable future.

## Footnotes

R. K. Cavin, III and V. V. Zhirnov are with Semiconductor Research Corporation, Research Triangle Park, NC 27709 USA (e-mail: Ralph.Cavin@src.org; Victor.Zhirnov@src.org).

P. Lugli is with Lehrstuhl für Nanoelektronik, Technische Universität München, D-80333 Munich, Germany (e-mail: lugli@nano.ei.tum.de).

1A practical device is somewhat larger than the ideal shown in Fig. 5(b); e.g., wraparound gates, larger area of source and drain to minimize contact resistance, increased gate width to increase on current, etc. However, the idealized representation shown in Fig. 5(b) will be used in this paper to cast MOSFET technology most favorably for packing density.

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available

## Need Help?

About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2013 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.