High-Density Spin–Orbit Torque Magnetic Random Access Memory With Voltage-Controlled Magnetic Anisotropy/Spin-Transfer Torque Assist

This article explores an area saving scheme for spin–orbit torque (SOT) magnetic random access memory (MRAM) by sharing the SOT channel and write transistor among multiple magnetic tunnel junctions (MTJs). We use two write mechanisms to selectively write the MTJs, i.e., voltage-controlled magnetic anisotropy (VCMA)-assisted write in the presence of an external magnetic field and field-free spin-transfer torque (STT)-assisted write. Using micromagnetic simulations that are augmented by the rare-event enhancement, we study various trade-offs among write current, time, and energy, write error rate (WER), and the number of MTJs on an SOT channel. We quantify the issue of IR drop on the SOT channel as a function of the SOT layer thickness and number of MTJs. Our results show having more than four MTJs on an SOT channel poses major challenges in terms of IR drop and WER. In addition, we evaluate the impact of the proposed scheme on read performance.


I. INTRODUCTION
S PINTRONIC memories are being actively pursued for various applications, such as last-level cache [1], [2], embedded memory [3], and deep neural networks [4]. Spintransfer torque (STT)-based and spin-orbit torque (SOT)based magnetic random access memories (MRAMs) are two major examples of spintronic memories being explored. The STT-MRAM offers high cell density due to compact cell requiring only one transistor; however, it suffers from issues, such as low read margin, low charge to spin conversion efficiency, and oxide degradation. Moreover, large write current needed for STT-MRAM poses a challenge in terms of scaling. The SOT-MRAM is an emerging alternative for STT-MRAM. The SOT-MRAM has lower write energy while also improving the read operation by decoupling the read and write paths. There have been major advances in large-scale adoption of the SOT-MRAM technology in recent years. For example, wafer-level integration along with sub-nanosecond magnetization switching has been demonstrated [5]. However, one key issue with SOT-MRAM is the large cell area compared with STT-MRAM, as SOT-MRAM requires two separate transistors for read and write operations.
There have been works on reducing the cell footprint of SOT-MRAM by sharing the SOT channel among multiple magnetic tunnel junctions (MTJs) with the help of STT [6] or voltage-controlled magnetic anisotropy (VCMA) effect [7], [8]. However, such schemes would require many trade-offs and a detailed evaluations of such schemes that proves low write error rates (WERs) and adequate selectivity accounting for thermal noise, variability, and the IR drop on the SOT layer are missing. Likewise, the potential impact of such schemes on write/read energy and latency as a function of cell density is also lacking.
In this article, we discuss transistor sharing schemes for SOT-MRAM with the help of VCMA effect and STT while considering the limitations in terms of WER and IR drop in the SOT channel. We provide detailed thickness optimization of the SOT layer while considering the effect of IR drop, write energy, and MTJ selectivity. For both the VCMA and STT-assisted write operations, we evaluate the impact of increasing the number of MTJs on an SOT channel in terms of WER and write energy. Moreover, in the case of SOT + STT scheme, we study the impact of pulse timings of the SOT and STT write currents. In addition, we evaluate the read performance of the cell as a function of oxide thickness and present the associated trade-offs in terms of read and write operations.
The rest of this article is organized as follows. After this introduction, Sections II and III describe the SOT + VCMA and SOT + STT schemes, respectively. In Section IV, we evaluate the read performance. Section V presents the optimization and benchmarking results for cell area and write performance, and the key findings of this article are summarized in Section VI.

II. SOT + VCMA
The first write mechanism we discuss is to use VCMA effect to selectively write into MTJs on a shared SOT channel.

A. CELL DESIGN AND WRITE OPERATION
The 3-D layout and schematic of the cell are shown in Fig. 1. The SOT channel is shared among multiple MTJs while having a single SOT write transistor. Selecting a specific MTJ for writing data is achieved by applying a voltage on the desired MTJ through the corresponding read/write select transistor. The write operation is based on utilizing the VCMA effect [9] to lower the thermal stability ( ), thereby lowering the switching current by applying a voltage across an MTJ. The applied spin current is then selected such that it is large enough to switch MTJs with reduced thermal stability and small enough to avoid flipping nonselected MTJs. Writing to all MTJs on an SOT channel can be accomplished in two cycles, as shown in Fig. 1(b). In Cycle 1, all the 1's can be written, while all the 0's can be written in the next cycle by reversing the direction of the SOT current. The write operation requires the presence of an external magnetic field, which can be generated on-chip by using a cobalt magnetic hard mask [5]. For driving the SOT channel, the driver design described by earlier work [10] can be used. Fig. 2 shows the schematic of the write driver, which uses eight fin transistors to provide sufficient SOT current. The pitch and height of the write driver are 8F and 28F [11], respectively, with half metal pitch (F) being 32 nm. The write drivers may occupy ≈7% of the total area for an array size of 256 × 128. Fig. 3 shows the memory array based on shared SOT channel.

B. IR DROP IN THE SOT LAYER
The length of the SOT layer depends on the number of MTJs (N MTJ ) integrated on it. A longer SOT channel results in a higher resistance (R SOT ); hence, a larger voltage drop V SOT across it. A large IR drop across the SOT channel can result in larger write voltages that can pose several challenges, such as large variation in the effective VCMA voltages and the requirement for high-voltage transistors. To lower the resistance, the thickness (t SOT ) of the SOT channel can be increased. However, a larger t SOT may require a larger write current (I w ) to maintain a sufficient current density (J SOT ) in the SOT channel. In addition, damping-like spin-torque efficiency (ξ DL ) may also change with t SOT , according to the drift-diffusion model of spin generation and transport [12] where θ SH is the spin Hall angle, λ sd is the spin diffusion length in the SOT material, G r is the real part of the spin-mixing conductance (G ↑↓ ), and σ SOT is the conductivity of the SOT material.
For the SOT channel, we use AuPt [13], which is a well studied SOT material with low resistivity (83 µ cm) and large ξ DL . Fig. 4(a) shows the required I w as well as R SOT versus t SOT . The inset plot in Fig. 4(a) shows the variation of ξ DL with t SOT . Increasing t SOT results in an increased I w despite the increase in ξ DL as J SOT decreases. The resistance;  however, decreases with increasing t SOT , resulting in an overall reduction in V SOT , as seen in Fig. 4(b). Write energy (E w ), on the other hand, is nonmonotonous, and the lowest E w is obtained when t SOT is 3.5-4 nm.

C. DEVICE SIMULATIONS
To obtain various trade-offs among write current, write time, and WER and to evaluate the VCMA selectivity of the MTJs, we use object oriented micromagnetic framework (OOMMF [14]) simulations augmented with the rare-event enhancement [15] method. The simulation framework has already been validated with experiments [16]. We use perpendicular MTJ with a diameter of 51 nm and a free-layer thickness of 1.2 nm. The room temperature saturation magnetization (M s ) and interface anisotropy (K i ) are 1.257 MA/m and 1.3 mJm −2 , respectively [17], which provides a room temperature of ≈90. Required symmetry breaking for SOT switching can be achieved by applying a magnetic field of 32 mT [5]. In addition, we assume a field-like to dampinglike torque ratio of 0.18 [18]. Fig. 5(a) shows the obtained WER versus applied spin current for various values of voltage applied across the MTJ (V MTJ ). The duration of the write current is 1 ns. We use a VCMA coefficient of 100 fJ/Vm [19]. To quantify the VCMA selectivity, we also calculate the accidental write rate for the nonselected MTJ, as shown in Fig. 5(b). Here, accidental write rate refers to the probability of a nonselected MTJ (V MTJ = 0) getting switched. Here, we have ignored the effect of any STT current due to V MTJ , as the design requires minimization of STT current as discussed next. In addition, field-assisted switching of perpendicular magnets usually requires larger damping coefficient [20] (≈0.1), which effectively suppresses the effects of the STT current.
One key challenge with regards to the VCMA selectivity of MTJs is the current injected in the SOT channel due to the applied V MTJ . Application of V MTJ results in a finite amount of current being added into the SOT channel. This increases the overall SOT current in the channel. This extra current ( I ) can help reduce the WER for the selected MTJs; however, it will also have the unintended effect of accidentally switching the nonselected MTJs. This extra current can be quantified as a function of N MTJ and oxide thickness (t ox ). The worst case, corresponding to maximum I , occurs when (N MTJ −1) consecutive MTJs are written parallel (P) to antiparallel (AP) in one cycle, while the remaining MTJ is written AP-P in another cycle, as shown in Fig. 6. In this case, we can define, where R P is the resistance of the MTJs in P state. The maximum allowable value of I is determined by the available switching margin (I margin ), which is defined as the difference in the write currents for the selected and nonselected MTJs corresponding to a target error rate, as shown in Fig. 5(b). It is also important to note that during the write operation, nonselected MTJs will experience negative voltage (V MTJ < 0) due to the finite potential of the SOT channel. Thus, the available I margin will be higher than the depicted value in Fig. 5(b). For reliable operation, I < I margin is required. The obtained values of I margin for a WER of 10 −6 are ≈88 and ≈127 µA, when the voltage across the nonselected MTJ is 0 and −0.42 V, respectively. In comparison, the corresponding I margin values for a WER of 10 −4 are 97 and 137 µA, respectively. Here, for the nonselected MTJ, it is not possible for us to calculate accidental write rates below 10 −4 due to the limitations imposed by the computation time. Improving the VCMA coefficient and lowering the charge to spin conversion efficiency can increase I margin . In addition, I margin depends on , as shown in Fig. 7. Further optimization of the magnetic parameters is required to improve I margin . Reducing V MTJ can lower I ; however, that would also reduce I margin . The best trade-off among I w , V SOT , and I margin can be achieved by selecting V MTJ = 1.5 V and t SOT = 6 nm.
Another way to suppress I is to increase t ox , which increases the MTJ resistance and lowers the current passing through it. It is shown in Fig. 8(a) where the values of I corresponding to different t ox values are plotted against N MTJ . Here, the MTJ resistance values are obtained from experiment [21]. Fig. 8(b) shows I versus N MTJ for various values of V MTJ at t ox = 1.7 nm. Increasing t ox beyond 1.6 nm can significantly suppress I , allowing the integration of a larger number of MTJs on a single SOT channel. However, a large t ox comes with a read performance penalty as discussed in Section IV.

III. SOT + STT
Another way of sharing the SOT channel among MTJs is to use a small STT instead of the VCMA effect.

A. CELL DESIGN AND WRITE OPERATION
The cell design is the same as the SOT + VCMA scheme. In this case, the deterministic magnetization switching is achieved by applying a small STT current. First, an SOT current is applied to move the magnetizations of all the MTJs toward the in-plane meta-stable direction. After that, the SOT current is stopped, and a small STT current is applied through each MTJ. The direction of the STT current determines the final MTJ state. All the MTJs are written at once by applying appropriate polarities of STT currents. The write scheme is demonstrated in Fig. 9. Also, as the direction of the SOT write current remains the same, a separate driver for SL is not required.

B. DEVICE SIMULATIONS
The diameter and thickness of the free-layer ferromagnet used here are 42 and 1.3 nm, respectively, giving a room temperature of ≈60. Contrary to the SOT + VCMA case ( ≈ 90), used here is lower, as the SOT + VCMA scheme requires a large to effectively suppress the accidental write rate for nonselected MTJs. The SOT + STT scheme has no such restriction, and the value of can be chosen based on the retention time requirement. Fig. 10(a) shows the magnetization switching for a single MTJ, illustrating the write scheme used. The spin current generated by the SOT is fixed at 600 µA, which is applied for 1 ns. The magnitude and direction of the STT current are varied to obtain various WERs, as seen in Fig. 10(b). Here, we assume the STT efficiencies of 0.6 for AP-P and 0.3 for P-AP switching [22].
Similar to the VCMA-assisted write, the number of MTJs on a single SOT channel is limited by the SOT current in  the worst case scenario for the write operation. During the STT switching phase, there will be a finite amount of current injected into the SOT channel due to STT current. The current flowing in the SOT channel will apply an in-plane torque on the magnetization of the free layer. If this current becomes too large, it will result in the magnetization being stuck in-plane, suppressing the effect of STT. This will cause switching errors and increased WER. The worst case scenario is when all the MTJs are being switched from P to AP state, as shown in Fig. 11. To reduce the SOT current seen by MTJs, we ground both write bitline (WBL) and SL during the STT phase. This allows the current to flow in both directions within the SOT channel and lowers the voltage drop. To calculate the resulting current density in the SOT channel below each MTJ, we use COMSOL-based finiteelement simulations. Fig. 12 shows the obtained SOT current densities below each MTJ for N MTJ = 4 and N MTJ = 6 cases due to the applied STT current of 16.7 µA. The resulting current density data are used in micromagnetic simulations to calculate WER and find the limit on the number of MTJs. Fig. 13 shows the magnetization dynamics corresponding to the worst case write operation for the MTJ seeing  the most SOT current when N MTJ is 4, 6, and 8. Large amount of current flowing in the SOT channel results in increased switching failures, as seen in Fig. 13(c). WER in the worst case for different MTJs on an SOT channel for N MTJ = 4 is shown in Fig. 14(a). Fig. 14(b) depicts WER for the MTJs experiencing the largest SOT current in the worst case write operation for N MTJ = 4, 6, and 8. The results show that increasing the number of MTJs leads to higher WER.

C. ROBUSTNESS TO WRITE PULSE TIMING
Another key metric for the circuit is its sensitivity to the timing of SOT and STT pulses. Based on SPICE simulation results, we show that the circuit is robust with regards to any variation in the relative timings of SOT and STT pulses, as shown in Fig. 15. We apply the STT pulse 100 ps before the SOT pulse ends; assuming the uncertainty due to jitter and skew does not exceed 100 ps. This ensures that as soon as SOT ends, STT will begin to switch the magnetization in the desired direction. A delay between SOT and STT may lead to switching errors, as the magnetization remains in the meta-stable state, and thermal noise may move it in the unwanted direction. During the SOT phase, SOT channel remains at finite potential, while read bitlines (RBLs) are grounded. If read wordline (RWL) is enabled before RBLs are charged [solid lines in Fig. 15(b)   toward the fixed layer for a small amount of time. However, this will not be an issue, as this unintended STT current is much smaller (<10%) in magnitude than the SOT current applied on the MTJs and will not affect the magnetization dynamics, as shown in Fig. 15(e).

IV. READ OPERATION
The read performance is evaluated based on SPICE simulations. We use a differential sensing scheme [23], [24] for the read operation. Only one MTJ on a single SOT channel can be read at a time, as the read current path is shared among them. This is not an issue, as the number of MTJs that can be read at once is limited by the number of sense amplifiers (SAs). We assume one SA for every 64 bitlines as commonly done in STT-MRAM arrays. These 64 bitlines are  multiplexed together and then compared with the reference MTJ. The bitline voltages corresponding to the MTJ being read and the reference MTJ are compared using a double-tail latch-type voltage SA [25].
The read performance strongly depends on t ox and tunnel magnetoresistance (TMR) ratio. To evaluate the read performance, we consider t ox from 1.2 to 1.9 nm. The resistance area (RA) product values are obtained from experimental data [21]. We assume a constant TMR ratio of 120% [17]. Fig. 16(a) shows the resistance of the MTJ with a diameter of 51 nm in P and AP states. Read performance is evaluated using SPICE simulations for a 256 × 128 array with four MTJs on each SOT channel. We use 14-nm FinFET models from the Predictive Technology Model (PTM) by Arizona State University (ASU) [26] with a half metal pitch of 32 nm. Table 1 lists the parasitic resistance and capacitance values used in the simulations. The capacitance values are obtained from prior benchmarking work [24], and the resistance values of wires are calculated based on Cu resistivity values reported in [27]. Fig. 16(b) shows the obtained read margins in P and AP states for the nominal case where the read margin is defined as the voltage difference seen at the input of the SA. The read margin reduces drastically for t ox < 1.4 nm and t ox > 1.7 nm, especially for the AP state.
To account for variation, we use 3σ variation of 10% in MTJ area and 10% uniform variation in the supply voltage while also accounting for thermal noise. We use a read time of 5σ higher than the mean value to obtain read error rate below 10 −6 . The total read delay can be written as follows [24]: where R drive (=5 k ) and R RWL are the resistances of the drive transistor and RWL, respectively, C RWL is the capacitance of RWL, and t sense accounts for the delay to reach the required voltage margin. The effective read delay and energy, including the effects of variation, are shown in Fig. 17 for a read margin of 70 mV. For t ox = 1.3 nm and t ox = 1.9 nm, the available read margin is <60 mV. Optimal read performance is observed for t ox within the range of 1.4-1.6 nm. Increasing t ox initially results in lower read energies because of smaller read currents; however, beyond 1.7 nm, the read energy starts to increase, as the delay goes up rapidly due to read current being too small. The choice of t ox based on the reliability of write operation is different from the read performance optimization. The SOT + VCMA scheme requires a larger t ox to suppress any extra current due to V MTJ , while the SOT + STT scheme requires lower oxide thickness to reduce the write energy.

V. BENCHMARKING
We benchmark this SOT + VCMA/STT scheme against other competing memories, such as SRAM, STT-MRAM,  and in-plane magnetic anisotropy (IMA) and perpendicular magnetic anisotropy (PMA)-based conventional two transistor SOT-MRAM. Fig. 18(a) shows the cell area per bit versus the number of MTJs for the SOT + VCMA/STT scheme. In Fig. 18(b), the cell area per bit of the SOT + VCMA/STT scheme with four MTJs on a shared SOT channel is compared against those of other magnetic memory options. In both plots, the 14-nm technology node (F = 32 nm) and the layout rules described in prior benchmarking work [11], [28] are used to calculate the cell areas. Compared with the conventional 2T SOT-MRAM, ≈2× bit density can be achieved. The write energies for the SOT + VCMA and SOT + STT schemes, calculated using SPICE simulations, are shown in Fig. 19. The write voltages and current for the SOT + VCMA scheme are listed in Table 2, and the same for the SOT + STT are listed in Table 3. The write energy values are benchmarked against other memory options [16], as shown in Fig. 20. For the conventional 2T SOT-MRAM cell, the write energy results for various SOT materials are included, such as PtCu [29], AuPt [13], BiSe [30], β-W [31], and BiSb [32]. The SOT + VCMA scheme has a higher write energy but much lower write delay compared with the SOT + STT scheme. The higher write energy can be attributed to the large (≈90) requirement as discussed in Section III-B and the large energy associated with charging RBL capacitance due to the application of V MTJ . The higher thermal stability can be useful, as it will increase the data retention time. The higher write delay observed in the SOT + STT scheme with SOT channel sharing compared with the conventional 2T SOT + STT MRAM cell is due to lower STT current requirement to   suppress WER in the worst case write as discussed previously. Overall, both the SOT + VCMA and SOT + STT schemes discussed here provide major density advantage over the conventional SOT-MRAM while sacrificing a bit in the write performance.
One important question here is that improving which material properties would more significantly improve the array-level performance of the proposed schemes. Some key material properties, which are considered here for benchmarking, are STT efficiency, SOT efficiency, and VCMA coefficient. There are not any known approaches to improve the STT efficiency, and the current values that are commonly used (60%) are not too far from the ideal value, which is 100%. On the other hand, improving the SOT efficiency is an active area of research with many promising materials being explored. For the SOT + VCMA scheme, increasing the SOT efficiency while keeping the SOT layer thickness will reduce the available I margin , resulting in higher error rates. Similarly, for the SOT + STT scheme, a higher SOT efficiency may result in an increased SOT during the STT phase and increased error rate. However, a higher available SOT efficiency may allow increasing the SOT layer thickness, which can help the IR drop issue, improving the device performance and reliability. Also, for the SOT + VCMA scheme, improving the VCMA coefficient will have the most impact, as it will lower the write energy and increase I margin , thereby lowering the WER.

VI. CONCLUSION
This article presents a comprehensive modeling, optimization, and benchmarking of transistor sharing schemes for SOT-MRAM devices using VCMA and STT effects. Using experimentally validated micromagnetic simulations augmented with rare-event enhancement along with SPICE simulations, we demonstrate that the number of MTJs that can be put on a single SOT channel is limited by the write error induced due to the injection of current in the SOT channel through the MTJs and voltage drop on the SOT channel. For the SOT + VCMA scheme, we quantify the WER, unintentional write rate, and the current injection through MTJs as a function of the MTJ oxide thickness. For the SOT + STT scheme, finite-element simulations are used to calculate the SOT current density in the SOT channel underneath each MTJ and the resulting WER. In addition, we quantify the IR drop along the SOT layer in terms of the number of MTJs and provide a way to optimize the SOT layer thickness while considering the write energy, current, SOT channel resistance, and the voltage drop along the SOT layer. Our results indicate that having four to six MTJs on a single SOT channel provides the best trade-off among the write energy, bit density, WER, and IR drop. The SOT + VCMA/STT schemes show a ≈2× bit density improvement over the conventional two transistor SOT-MRAM and a ≈6× bit density improvement over SRAM. While the energies are slightly higher than the conventional 2T SOT-MRAM, the SOT + VCMA/STT schemes are still more energy efficient than STT-MRAM. We also quantify the read performance in terms of oxide thickness and show the read penalty associated with sharing SOT channel among MTJs. Our read simulation results show read times <4 ns for both the schemes. Moreover, since the current through the select transistors is significantly smaller than that of STT-MRAM, this approach may enable adopting SOT-MRAM to more advanced technology nodes.
While both the SOT + VCMA and SOT + STT schemes look promising, there are certain challenges that must be addressed. For the VCMA + SOT scheme, a relatively large VCMA coefficient (>100 fJ/V-m) is needed to keep the required VCMA voltage below 1.5 V. A tighter control over variation in magnetic properties is also required to ensure sufficient I margin . For the SOT + STT scheme, there is an additional cost associated with the peripheral circuits that can supply both the positive and negative voltages for the write operation.