Emerging CMOS Compatible Magnetic Memories and Logic

As scaling of the feature size - the main driving force behind an outstanding increase of the performance of modern electronic circuits - displays signs of saturation, the main focus of engineering research in microelectronics shifts towards finding new paradigms. Any future solution must be scalable and energy efficient while delivering high performance, superior to that of CMOS-based circuits. In order to benefit from the outstanding potential of highly advanced silicon processing technology, any new solution must be CMOS compatible. Emerging nonvolatile memories, including magnetoresistive memories, satisfy the necessary requirements: purely electrical addressability, simple structure, high endurance and fast operation. In this work we present the recent developments in the research of spin-transfer torque and spin-orbit torque random access memories and give a brief overview of spin-based logic. Here, the advantages and challenges of these two main contenders in the magnetic memory field are described and the current technological trends are noted. Areas facing computational challenges due to the long-range interaction of the demagnetizing field are highlighted and an existing solution is presented. The use of reinforcement learning to optimize handling of a purely electrically controllable spin-orbit torque memory cell is introduced and first results showing the switching reliability of an optimized switching pulse sequence under thermal fluctuations are reported.


I. INTRODUCTION
The continuous miniaturization of metal-oxidesemiconductor field effect transistors is slowing down. Even though single devices with a gate length of a few nanometers were demonstrated [1], their control, reliability, fabrication costs, and the integration of several tens of billions of these devices results in an almost unbearable hurdle. In addition, a growing consumption of stand-by power due to increased leakages and dynamic power due to slow V DD scaling result in rapidly increasing heat generation.
An attractive path to mitigate the undesired trend of increasing power dissipation, especially at the stand-by mode, is to introduce nonvolatility in the circuits.
Nonvolatility is the ability to retain the data, when the power supply is turned off. Introduction of nonvolatility enables standby power-free integrated circuits. Apart from stand-alone applications as well as critical program and data storage devices, nonvolatility is particularly promising for use in the main computer memory as it eliminates the need for data refreshment cycles in conventional CMOS-based dynamic random-access memory (DRAM) [2]. In addition, if the memory is embedded on chip, it removes one of the bottlenecks of the Von Neumann computing architecture, namely the lengthy and energy consuming procedure of data exchange between memory and CPU. However, in order to compete with traditional memory circuits, including nonvolatile flash memory, emerging nonvolatile memory circuits must offer fast switching time, high integration density supported by good scalability, long retention time, high endurance, and low power consumption. The circuits must also possess a simple structure to reduce fabrication costs. An important requirement is that emerging memories must be compatible with CMOS technology to benefit from the advantages provided by the well-developed CMOS fabrication technology.
CMOS operation is intrinsically based on the electron charge. As the downscaling of charge-based CMOS is saturating, another intrinsic characteristic of the electron, the electron spin, attracts attention as a possible candidate for complementing or even replacing the charge degree of freedom in future microelectronic devices [3], [4], [5]. The electron spin is characterized by the two possible spin projections on a reference axis and therefore possesses the property necessary for digital information. Indeed, magnetic hard disk drives (HDD) employ the electron spin or, to be more precise, the magnetization orientation to store the data. However, HDDs require magnetic fields created in magnetic heads in order to read and write the information. As they are not operated by electric currents, they are not CMOS compatible. In order to create a purely electrically manipulated magnetic memory, an efficient coupling between the electric current or field and the magnetization must be achieved. The discovery of the giant magnetoresistance (GMR) effect facilitated a reliable, purely electrical read operation of the information encoded in the magnetization orientation [6], [7]. The emerging generation of storage devices with even higher density is based on the unique properties of magnetic tunnel junctions (MTJs), sandwiches of two ferromagnetic layers separated by a thin tunnel barrier. The tunneling current through the MTJ structure strongly depends on the relative orientation of the magnetization in the ferromagnetic contacts. The difference in the MTJ resistivity reaches a factor of seven [8] in structures with in-plane magnetization and a factor of three [9] in MTJs with the magnetizations perpendicular to the layers, at room temperature. An efficient way of converting the magnetization orientation into an electrical resistance allows using MTJs for data storage. As the typical resistance of an MTJ is similar to the one of a MOSFET, no extra amplifiers are required for the converted signal. Thus, MTJ-based magnetoresistive random access memory (MRAM) is compatible with CMOS circuits.
In the following sections, an overview of the developments in research on spin-transfer torque MRAM as well as spinorbit torque MRAM is given. Additionally, current research findings in the respective areas are presented and the topic of spin-based logic are discussed. This work is an extension to what was previously published in [10].

II. SPIN-TRANSFER TORQUE MRAM
Although a very fast magnetization switching induced by magnetic fields has been achieved, MRAM writing induced by a magnetic field is not fully compatible with scaling. The magnetic field is generated by an electric current passing through write lines next to the cell. For downscaling, the wires' cross sections must be reduced, which makes higher current densities necessary to achieve the same field [11]. Therefore, a purely electrical way of MRAM information writing without magnetic field must be implemented.
The spin-transfer torque (STT) effect [12], [13] has been proven to be perfectly suited for purely electrical data writing. By passing a current through an MTJ, electrons become spin-polarized along the magnetization of the fixed layer. When they enter the free layer (FL), they become aligned with its magnetization almost immediately. Since the total angular momentum is conserved, the change in the electron spin is transferred to the magnetization, which results in a torque acting on the FL magnetization. If the current is strong enough to overcome the damping, this torque causes magnetization switching. STT-MRAM is considered a perfect candidate for future universal memory applications. It is fast (5ns), possesses high endurance (10 12 ), and has a simple structure. It is compatible with CMOS technology and can be straightforwardly embedded in circuits [11]. It is particularly promising for stand-alone as well as for embedded applications as a replacement of conventional volatile CMOS-based and nonvolatile flash memory in Systems on Chip (SoC). For embedded applications, a successful implementation of 8Mb 1T-1MTJ STT-MRAM on a 28nm CMOS logic platform [14] was demonstrated. An embedded MRAM solution compatible with 22FFL FinFET Intel's technology is available [15].
One critical aspect of the presently used STT-MRAM technology is a quite high switching current density, due to the fact that one has to overcome the energy barrier separating the two stable memory states. The barrier must be large -at least sixty times the thermal energy -to negate the errors due to undesired thermally agitated magnetization switching in big (1Gb) memory arrays at long (10 years) retention time. It turns out that in-plane magnetized structures the barrier which must be overcome at STT switching is higher than the thermal barrier. After the current is applied, the magnetization starts precessing around the magnetic easy axis. At the same time, while precessing, the magnetization deviates from one of its stable states further and further due to the STT. Therefore, during the switching process the magnetization gets out-of-plane. This demagnetization energy has to be expended in addition to the thermal barrier energy. As the system must be pushed over this large barrier by STT, the reduction of the switching current and the current density is challenging.
Several plausible approaches to address these issues are currently pursued. They include the use of composite free recording layers [23], decoupling the write and read current paths as discussed later, controlling magnetization by voltage, and employing new materials with improved properties and characteristics. However, replacement of the in-plane magnetization orientation in MTJs with perpendicular magnetization is the path adopted by industry.
In perpendicular MTJs (p-MTJ), the magnetization is already out-of-plane. The large demagnetizing energy penalty one has to pay while writing an in-plane structure does not exist for switching perpendicular structures. p-MTJs with the thermal barrier equal to the switching barrier allow to reduce the switching current. In addition, p-MTJs are better suited for high-density memory. A critical technological step was the discovery of an interface-induced perpendicular anisotropy at the CoFeB/MgO interface [23] with thin perpendicularly magnetized CoFeB layers.
In order to further reduce the switching current density and, at the same time, to preserve the large thermal barrier, one has to reduce the Gilbert damping and increase the spin current polarization. An interface-induced p-MTJ structure with a composite MgO/CoFeB/Ta/CoFeB/MgO free layer with two MgO interfaces [25] allows boosting the thermal barrier, while reducing the Gilbert damping. To scale the diameter of the MTJ below 10nm, the use of shape anisotropy was suggested [26]. It has been shown that the thermal stability can be increased for small diameters without sacrificing the tunneling magnetoresistance (TMR) and without any need for new materials, as FeB for the ferromagnetic free layer and MgO for the tunnel barrier were used.
A large TMR ratio is required for fast reliable reading in MRAM. In addition, the middle reference resistance to which the low and high resistance MTJ states are compared must be well separated from either of them. However, with downscaling it becomes increasingly difficult to control the bit-to-bit resistance. Resistance variation results in sacrificing the read error margin. Therefore, obtaining a large TMR is required to continue scaling the devices down.
Micromagnetic simulations can be used to investigate the influence of different materials or novel structures on the switching performance of the memory cells. The Landau-Lifshitz-Gilbert (LLG) equation is the central equation in micromagnetics, describing the magnetization dynamics in switching processes of MRAM devices [27], [28].
In case a finite element method (FEM) discretization is used, especially the computation of the demagnetizing field describing the long-range interaction poses a challenge which requires a careful computational treatment. A hybrid approach presented in [29] combines the FEM with the boundary element method (BEM) and thus enables a more efficient computation of the demagnetizing field. The demagnetizing field H d is commonly described by the introduction of a scalar magnetic potential u: In the hybrid FEM-BEM approach, the basic ansatz is a splitting of this potential u in two parts u 1 and u 2 satisfying a Poisson and a Laplace equation with appropriate boundary conditions. The relation of the two partial potentials u 1 and u 2 in its discretized form results in the matrix-vector multiplication which is used for evaluating u 2 at the boundary. These values are used as Dirichlet boundary conditions for the Laplace equation.
In a micromagnetic simulation framework based on the open-source libraries MFEM [30] for FEM discretization and H2Lib [31] for basic BEM functionality, we implemented an algorithm for the calculation of the demagnetizing field of disjoint magnetic domains, as they appear in MTJs. This algorithm makes use of the hybrid FEM-BEM approach and allows for the incorporation of long-range interaction between disjoint magnetic domains. For a multilayered structure (potentially) containing non-magnetic domains, first the surface mesh of the magnetic regions is extracted. The BEM library subsequently creates the discretized boundary operator B. As the boundary operator only depends on the structure of the device, this procedure has to be performed only once in the beginning of the simulation. Fig. 1 shows the magnetic potential calculated for a multilayered structure with uniform magnetization in the two layers as indicated by the arrows in Fig. 1a. The magnetization in the upper layer is m = (1, 1, 1), while in the bottom layer it is = (0, 0, 1). If there were no interaction between the layers, the potential would vary linearly along the direction of the magnetization. The top view of the cylindrical structure in Fig. 1b shows how the potential is shifted towards the y-direction due to the interaction with the upper layer.
Advanced STT-MRAM is faster than flash memory and is suitable for last-level caches. However, STT-MRAM loses the power consumption competition to SRAM [21]. A new technological solution is thus required to use MRAM in higher level caches.

III. SPIN-ORBIT TORQUE MRAM
Among the somewhat recently discovered physical phenomena suitable for next-generation MRAM is the spin-orbit torque (SOT) assisted switching of a ferromagnetic layer at room temperature in heavy metal/ferromagnetic [32] or topological insulator/ferromagnetic [33], [34] bilayers. In SOT-MRAM cells the MTJ's free layer is grown on a material with a large spin Hall angle/SOT. The SOT is generated by passing the current through this material. The switching current is injected in-plane along the heavy metal/ferromagnetic bilayer and, due to the spin Hall effect, creates a transverse spin current exerting a torque on the magnetization in the ferromagnetic free layer. Although the current density is quite large, the current does not flow through the MTJ. The high current is still a problem, and its reduction is the pressing issue for SOT-MRAM development. Topological insulators (TIs) are promising materials for reducing the switching current as they are characterized by a large efficiency of charge to spin conversion. The electrical conductivity of TIs, however, is usually too low for power-efficient applications. Recently, BiSe [32] and BiSb [34] TIs were reported as suitable candidates for emerging SOT-MRAM. They possess a charge to spin conversion efficiency of 18.8 and 52, respectively, and large conductivities allowing the reduction of the switching current by two orders of magnitude as compared to tungsten-based SOT-MRAM.
Despite the progress in developing and prototyping SOT-MRAM, there is one important issue which has not been convincingly resolved so far. Namely, a static magnetic field is still required to guarantee deterministic SOT switching of a perpendicularly magnetized free layer. In SOT-MRAM prototypes on 300 mm wafers an additional cobalt magnet was incorporated to provide such a field [35]. The external magnetic field free approaches are based on cell mirror symmetry breaking [36] with respect to the plane formed by the normal vector to the free layer and the current direction. In this way, the two left and right in-plane magnetization orientations perpendicular to the current become non-equivalent, and the magnetization relaxes deterministically to the opposite stable states from these orientations. The symmetry can be perturbed by modifying the shape of the dielectric [37] or the free layer [38], by controlling the crystal symmetry of the metal layer [39], and by biasing the free layer via an exchange coupling to an antiferromagnet [40] or to an in-plane ferromagnet [41], [42], [43]. The latter experiments ignited the interest towards surprisingly little explored SOTs in ferromagnetic materials.
Approaches based on the breaking of the cell symmetry demand a considerable adaptation in the fabrication process and may compromise the large-scale integration of memory cells. Recently, a magnetic field free switching was reported in a structure of stacked heavy metal lines with opposite Hall angles [44]. This finding is exciting as the structure is compatible with a CMOS fabrication process. Another promising approach is based on a purely electrical dynamic way to induce the required magnetic field by means of two orthogonal current pulses [45]. In this approach, the part performing the writing of the memory cell consists of a perpendicularly magnetized free layer (FL) sandwiched between two orthogonal heavy metal wires (NM1 and NM2). The NM1 wire at the bottom of the FL is fully overlapping, while the NM2 wire on top only overlaps with a part of the FL (see Fig. 2). Previous publications have investigated the robustness of switching with two pulses with respect to fluctuations in the overlap/delay of the pulses and the corresponding influence on the switching speed [45], [46].
Machine learning (ML) is being increasingly applied in the realm of physics [47]. While the predominant amount of ML applications uses supervised learning approaches which require large amounts of data beforehand to train neural networks, the subbranch of reinforcement learning (RL) [48] has gained interest in recent years. Here, an agent interacts with an environment, trying to maximize the cumulative reward it receives from it based on a certain objective, like balancing an inverted pendulum for as long as possible. First  RL breakthroughs were achieved using games like chess or Go [48], but these types of algorithms have also successfully been applied in physics, e.g., [50], where strategies for quantum error correction are found through RL.
In order to simplify and automate the search for a faster switching scheme in the two-pulse switching approach, RL is a promising approach which is confirmed by first research findings. Fig. 3 shows how the two-pulse switching of an SOT memory cell can be transferred into an RL setting. Two main components are needed: the agent and the environment. At the core of the environment is the simulation of the memory cell for which an in-house developed finite differences code is used [51]. The dimensions of the structure (Fig. 2) are 40nm × 20 nm with a free layer thickness of 1.2 nm. The overlap of the NM2 wire is 50%. The simulation was adjusted in a way such that it can be controlled from the outside to change the state of the pulses and that it returns the current state of the simulation together with a reward after every iteration. The policy built up by the agent is approximated by a neural network.
The RL algorithm employed for finding faster switching schemes is the deep Q-network (DQN) algorithm [52]. A neural network is used in the DQN algorithm to approximate the so-called Q-function of the optimal policy, which assigns a quality estimate to every state-action pair: Based on this estimate the agent selects its action. In order to direct the algorithm to take actions such that the switching time is reduced, an appropriate scheme for rewarding the agent must be chosen. The current implementation of the learning algorithm returns a negative reward for every time step the memory cell has not yet switched, and once it has switched, i.e., the z-component has reached −0.5, a large positive reward is returned. The inherent working principle of RL algorithms is based on maximization of accumulative reward. A faster switching scheme thus leads to a less negative reward and in total to a larger accumulated reward. The description of the state returned from the environment every iteration consists of 11 variables described in Table 1. These variables are the basis for the learning algorithm to make its decisions for the next action. One needs to make sure that, based on the state information, it can be deduced which action is best to take. If for example the change of the average magnetization components was not included, by only knowing the current value of the components, it would not be clear in which direction the magnetization is moving at this point.
To simplify the learning process, the amount of possible actions has been restricted. The current setup of the experiment allows the agent to individually switch the two pulses on or off, with a minimum pulse width of 100 picoseconds for each pulse. The amplitude of the two pulses was fixed to 130 μA for the first pulse (NM1) and 100 μA for the second pulse (NM2). A learning episode was considered finished once the z-component of the magnetization reached −0.5. Fig. 4 shows the learning curve of the performed experiment. The plot shows the mean switching time over the course of the learning period. After an initial increase in switching time, the agent improves its policy and can switch the memory cell faster with a final mean value of around 250 picoseconds. Single runs were able to arrive at an even better policy, which resulted in a switching time of 145 picoseconds. The learned policy of the agent switches  both pulses on, right in the beginning. The first pulse is switched off after 100 picoseconds, while the second pulse stays on until it achieves the requirements given to the RL algorithm, to switch the z-component to −0.5 as fast as possible. When −0.5 is reached, the second pulse is turned off, as shown in Fig. 5. Comparing the best scheme with two slightly adapted pulse sequences (Fig. 5), one can see that, indeed, switching both pulses on right in the beginning leads to the fastest time to reach −0.5 under the given conditions.
In order to evaluate the reliability of the pulse sequence, simulations of 50 realizations were performed. A random thermal field at 300K was considered for these simulations. The results are shown in Fig. 6. The variation between the realizations is very small and the magnetization is switching indeed reliably. With this RL approach, further experiments are also conceivable. Whereas in the current setup, the current values of the pulses could not be varied by the algorithm, by adding this degree of freedom, the objective for the algorithm could likely be set to finding the most energy efficient switching scheme. Thus, one can see that by using the assistance of machine learning approaches, tedious manual work can be outsourced to algorithms which can easily cope with large amounts of data and find hidden correlations and working principles.

IV. SPIN-BASED LOGIC
The introduction of nonvolatility in CMOS circuits reduces the power consumption by half, with an outstanding 90% reduction for specific logic-in-memory circuits [53]. Placing the actual computation into the nonvolatile domain results in a non-conventional in-memory computing architecture. Any two MTJ-based cells can perform the conditional switching of a target MTJ depending on the state of the source MTJ, an operation called material implication (IMP). IMP completed with the FALSE operation covers the whole space of all Boolean operations. A compact IMP-based full adder realization involving only six 1T-1MTJ cells and 27 subsequent FALSE and IMP operations can be realized [54]. Recently, a massively parallel NOT operation based on the IMP implementation was proposed [55].
Another option for non-volatile computing is to proceed along a more conventional path with memory and computing units separated. Both elements are implemented in the magnetic domain and are nonvolatile. The idea of combining MTJs with a common free layer enables the realization of a nonvolatile magnetic flip-flop [56]. The processing unit consists of an STT-based nonvolatile majority gate and nonvolatile magnetic flip-flops used as memory registers in an entirely nonvolatile processing environment [56]. Recently, a substantial progress in fabricating such devices was reported [57].
The availability of high-capacity nonvolatile memory enables new logic-in-memory and computing-in-memory architectures for future artificial intelligence and cognitive computing. Nonvolatile MTJs are suitable for neural network realizations as they can be considered a current-driven programmable resistor, a memristor. MTJ-based neural networks featuring nonvolatile synapses allow for high-speed pattern recognition with a reduction in gate count of about 70% and a 99% improvement in speed as compared to their CMOS counterparts [53]. All-spin binary neuromorphic computing systems with probabilistic inference and online learning can enable a new generation of state-compressed and lowpower computing platforms [58]. Neuromorphic computing is becoming a reality, with the first self-learning chips already in production.

V. CONCLUSION
We are witnessing the beginning of nonvolatile MRAM entering into the stand-alone and, especially, embedded memory market, as all major foundries announced the start of production of embedded STT-MRAM based on advanced silicon-on-insulator FinFET technology nodes. It will likely result in an exponential expansion of the MRAM market with a momentous impact on information storage and processing in the near future. In addition, a successful adoption of nonvolatility in microelectronic circuits and systems by developing various logic-in-memory and in-memory processing architectures promises a disruptive impact on cloud computing and edge applications by introducing new concepts in ultralow-power high-performance computing. Furthermore, it has been shown that modern ubiquitous technologies, like machine learning, do not only benefit from the development of application-specific chips, but the development and control of the underlying circuits themselves can take advantage from learning algorithms.