Improving Bit-Error-Rate Performance Using Modulation Coding Techniques for Spin-Torque Transfer Magnetic Random Access Memory

In non-volatile random-access memory (RAM) technologies, the spin-torque transfer magnetic random-access memory (STT-MRAM) is a promising candidate. STT-MRAM has attracted attention owing to its advantages, such as a high density, high endurance, and high-speed writing/reading. Moreover, STT-MRAM is utilized to replace dynamic random-access memory (DRAM) in Internet of Things (IoT) and artificial intelligence (AI) applications. However, because the magnetic tunnel junction is used to write data, STT-MRAM must face process variation and thermal fluctuation problems. This causes errors in writing and reading processes. These errors are independent of each other. Therefore, to solve them and increase the reliability of STT-MRAM, the data are encoded before storage in STT-MRAM devices. In this study, we propose a method for designing codewords using modulation coding techniques. Our codewords increase the minimum Hamming distance and inheritance of the sparse code characteristic to exploit the asymmetric probability of the write errors. The simulation results show that our codewords can improve the bit error rate performance of STT-MRAM compared with previous works.


I. INTRODUCTION
In recent years, numerous mobile devices have been developed in large quantities. This has led to the development of a storage system with a high capacity and fast access. Among non-volatile memory (NVM), spin-torque transfer magnetic random-access memory (STT-MRAM) has been predicted as a promising candidate for the storage class memory (SCM) [1]. STT-MRAM consumes a low switching energy [2] and is the best solution for mobile devices. Additionally, high endurance, high density, and fast write/read speed are the key advantages of STT-MRAM. With these interesting features, STT-MRAM is considered to replace dynamic random-access memory (DRAM). However, it still faces a high write cost [3] (write pulse width of 10 ns The associate editor coordinating the review of this manuscript and approving it for publication was Luca Barletta. or longer), process variation, and thermal fluctuation problems. For high write costs, many circuit design-and architecture-level solutions have been proposed to alleviate this issue, including increasing the size of the driving transistors [4], overdriving the word-line voltage [5], adding write buffers, and hybrid memory design with both static randomaccess memory (SRAM) and STT-MRAM [6]. For process variation and thermal fluctuation, many studies are performed to analyze and evaluate the thermal noise and fluctuation model during the magnetic tunnel junction (MTJ) resistance switching process [4], [7], [8].
In STT-MRAM, the data are presented as the values of the MTJ resistance. To change the value of the MTJ resistance, a spin-polarized current was passed through the MTJ, and the state of the MTJ was switched according to the direction of the spin-polarized current. Thermal noise and fluctuations distort the required value of the current. Simultaneously, the values of the resistance can be an error of the sensor, which measures the resistance of the MTJ. This caused independent errors in both the write and read processing and degraded the performance of the STT-MRAM.
In signal processing, to solve the error of STT-MRAM, researchers developed and introduced advanced error correction coding techniques to correct memory cell errors and improve the performance of STT-MRAM. A (71, 64) singleerror-correcting Hamming code was applied to improve the reliability of Everspin's 16 Mb MRAM [9]. To improve the storage density, BCH codes with multiple error-correction capabilities were proposed in [10], [11], and [12]. In [13] and [14], low-density parity-check (LDPC) codes were proposed to closely approach the Shannon capacity for a wide range of channels. In [15], rate-compatible (RCP) LDPC codes were proposed to improve the performance of the STT-MRAM. However, LDPC decoding can lead to long latency because of iterative decoding algorithms. Adaptive errorcorrecting schemes were proposed in [16] and [17] to reduce the redundancy of error correction code (ECC).
Recently, in [18], based on the fact that the error probability of 0 → 1 is higher than that of 1 → 0 in the STT-MRAM channel, Nguyen proposed the sparse code to reduce the error in the writing and reading processes. For the sparse code, the author proposed codewords with Hamming weights of 2 and 4. Therefore, the codewords achieved a minimum Hamming distance (MHD) of 2. Moreover, in each codeword, the number of bit-1 is always less than the number of bit-0.
Inspired by this idea, in this study, we propose a method to increase the MHD for the sparse code (Method 1). Particularly, with an MHD of 3, based on the Hamming code theory, we propose a faster method to create the sparse code and estimate the code rate for the sparse code with an MHD of 3 (Method 2). Moreover, we proposed a trellis method to encode and decode these codewords and improve the bit error rate (BER) performance of the STT-MRAM. In Method 1, to satisfy the sparse code condition, we first selected the codeword c 0 = 00. . . 0. Then, by designing codewords with MHD = d min , we select codewords with a weight of d min . These codewords and c 0 are assigned as set C. Finally, each codeword in C is compared with the other codewords in C to remove those that do not satisfy the MHD condition. This method is presented in Section III. In Method 2, we created a generator matrix. All user data were then multiplied by the generator matrix to create codewords with an MHD of 3. The major contributions of this study are as follows.
-The general method to increase the MHD of the sparse code is proposed. -A fast evaluation of the code rate of the sparse code with an MHD of 3 is conducted. -Focusing on an MHD of 3, we introduce a unique method to create a sparse code with an MHD of 3. -Trellis modulation coding is introduced to the encoder and decoder for the STT-MRAM. -We discuss the capability of the sparse code with an MHD of 3 compared with previous codes.
In addition, when applying to the STT-MRAM caches, the method in [19] and [20] can be applied to our proposed codes to be optimal ECC structure in STT-MRAM caches. The remainder of this paper is organized as follows. In Section II, we introduce a cascaded channel for the STT-MRAM. Simultaneously, we propose methods to increase the MHD for the sparse code in Section III. Section IV presents and discusses the simulation results. Finally, the conclusions are presented in Section V.

II. CASCADED CHANNEL MODEL FOR STT-MRAM A. STT-MRAM CELL STRUCTURE
In the STT-MRAM cells (Fig. 1), the data are stored in the MTJ and an nMOS transistor is used to control the current, which serves the writing or reading process [2], [21]. The MTJ includes three layers: two ferromagnetic layers and a tunneling oxide layer. It can be observed that a tunneling oxide layer is sandwiched between the two ferromagnetic layers. One ferromagnetic layer was the reference layer, whose magnetization direction was fixed. The other was the free layer, whose magnetization direction can be switched by passing spin-polarized currents. The data stored in the MTJ cell depend on the resistance of the MTJ. When the magnetization directions of the reference and free layers were parallel (P) (Fig. 1a), the MTJ cell achieved a low-resistance state. When the magnetization directions were antiparallel (AP) (Fig. 1b), the MTJ cell obtained a high-resistance state. We assumed that R P and R AP are the MTJ resistances in the P and AP states, respectively. Therefore, R P can represent a data bit-0 and a low resistance, and R AP can represent a data bit-1 and a high resistance. Fig. 2 shows how we write and read the data for STT-MRAM. To write bit-0 or achieve the P state to an STT-MRAM cell, the voltage V DD is connected to the bit-line (BL) and word-line (WL). Simultaneously, the source line (SL) was connected to the ground (GND). Current flowed from the free layer to the reference layer. While writing bit-1, the direction of the current flow was reversed to obtain the AP state. SL and WL were connected to the V DD , and BL was connected to the GND. For the reading process, a current that is much lower than the write current passes through the MTJ. The state of the MTJ is detected using a memory-sensing

B. CASCADED CHANNEL MODEL
In STT-MRAM, the process variation and thermal fluctuation significantly affect the data reliability. This leads to both write and read errors [2], [4]. For the write errors, owing to its lower spin-transfer efficiency, 0 → 1 switching requires a higher write current than 1 → 0 switching. Therefore, the write error probability was asymmetric [21], [22], [23]. By denoting the write error rate for 0 → 1 as P 1 and 1 → 0 as P 0 , P 1 is much higher than P 0 . In addition, when we consider STT-MRAM caches as in [24] and [25], which ground STT-MRAM cells, the error detecting and correcting codes impose extra read and write operations on the system, so they increase read disturbance, area, and energy overheads. However, to simplify, we consider an STT-MRAM cell. Therefore, we focus on the affection of the code types reducing the number of ''1'' bits in our channel.
The read error is categorized into the read-decision error and read-disturbance error. A read-decision error arises when the MTJ state is incorrectly sensed/detected during reading. The high and low resistances of the MTJ depend exponentially on the tunneling oxide thickness τ and are inversely proportional to the cross-sectional area A [26]. Therefore, the process imperfection-induced variations in τ and A of the MTJ led to widened distributions of the low and high resistances, causing read-decision errors. The read-disturbance error can only flip 1 → 0 in the write-0 direction or 0 → 1 in the write-1 direction. This is because the read current can only be used in one direction (write-0 or write-1), and a large read current occurs owing to the process variation and thermal fluctuation [26], [27].
Based on the above characteristics, a cascaded channel model was introduced for the STT-MRAM in [28] and [29]. The cascaded channel model consists of three parts. One is a binary asymmetric channel (BAC), [30] which represents the write error. The read-disturbance error was then modeled as a Z-channel. Finally, to represent the readdecision error, the distributions of the low and high resistances of the STT-MRAM cells were modeled using Gaussian distributions [31], [32]. A block diagram of the cascaded model is shown in Fig. 3.
In Fig. 3, u t ∈ {0, 1} represents the user data, a t represents the data written into the memory cell, and the crossover probabilities 0 → 1 and 1 → 0 are P 1 /2 and P 0 /2, respectively. Signal b t is a readback signal without a read-decision error. P r is the crossover probability 1 → 0 or 0 → 1 for reading in the write-0 and write-1 directions, respectively. R 0 and R 1 are the resistance values for the data bit-0 and data bit-1, respectively. R 0 and R 1 follow Gaussian distributions with the mean µ 0 and σ 0 ; and µ 1 and σ 1 , respectively. Therefore, a Gaussian mixture channel (GMC) is used to model the readdecision error. Finally, signal y t is the output signal of the channel model.
We can simplify the cascaded channel model by combining the write error model and read-disturbance error model. In Fig. 4, the combined model is illustrated.
The simple channel models for reading in both the write-0 and write-1 directions are similar. However, the probability parameters are different and are calculated as follows: -The reading with write-0 direction: -The reading with write-1 direction: In this study, we used the parameter from [33] for the simulation of the cascaded channel model with write-1 direction VOLUME 11, 2023  for reading. These parameters correspond to a 45 nm × 90 nm in-plane MTJ under a PTM 45-nm technology node, with µ 0 = 1 k , µ 1 = 2 k , and σ 0 /µ 0 = σ 1 /µ 1 .

A. INCREASING MHD FOR SPARSE CODE
Based on the asymmetric error probability of the STT-MRAM (P 1 > P 0 ), the sparse code was designed to change the distribution of bit-1 and bit-0. In this case, to improve the BER performance of the STT-MRAM, the number of the bit-0 is much larger than that of bit-1. In [18], the author showed that codewords with an MHD of 2 can achieve a code rate of (n -1)/n, where n is the codeword length. By choosing half of the codewords for the sparse code, the code rate of the sparse code with an MHD of 2 is (n -2)/n. Similarly, we can consider codewords with an MHD of more than 2. We can then choose codewords with more 0s than 1s.
To determine the codewords with an MHD of d min , we first choose the length of the codewords as n (n ≥ d min ). The total number of codewords was 2 n . First, we select the all-zero codeword, c 0 . To ensure that all the subsequent codewords are d min bits from the first codeword c 0 , we collect codewords c i (i = 1, 2, 3, . . . ) with a weight of more than d min -1 and assign C = {c 0 , c 1 , c 2 , . . . }. The number of codewords in C is given as follows: In (3), we removed the codewords with weights of 1, 2, . . . , d min -1 to obtain the elements in C. We then chose c 1 as the reference and determined the codewords with different d min bits from the codeword c 1 in C. Here, it was necessary to compare c 2 to the end of set C. We removed the codewords that do not satisfy this condition. After this step, the size of C and index of the codewords were changed. However, c 0 and c 1 did not change. We only updated from c 2 to the end of C. Next, we chose c 2 as the reference and compared it with c 3 to the end of set C. Similarly, we removed the codeword with a smaller distance than the d min bits comparing c 3 to the end of C and updated the size and index of set C. Generally, we compared c i from c i+1 to the end of set C. When updating the size and index of C, the size of C was reduced. The stopping condition is that the increase in i is equal to the decrease in C (i = |C|). The procedure for Method 1 is as follows.
Step 1: Choose the length of the codewords n and MHD d min .
Step 2: Choose the first codewords with all 0s as c 0 .
Step 4: In set C, compare c i with c i+1 to the end of set C, remove codewords that do not satisfy the condition (different d min bits), and update the size and index of C.
Step 6: Set C includes the codewords with MHD of d min . Table 1 lists the number of codewords in C with d min = 3 when the algorithm is complete.
With Method 1, we could not estimate the size of set C when n increased significantly. Therefore, we analyzed the codewords in C following the Hamming code theory to estimate the size of set C and the code rate. In the (n, k) Hamming code, codewords c are calculated using the following formula: where u is the user data with length k, G is the generator matrix, which is represented in the following form: where I is the identity matrix and P is the matrix of the paritycheck coefficients. Matrix P includes binary tuples with a weight greater than 1 to achieve an MHD of 3. If the length of the tuple in P is m = nk, we can obtain the maximum number of tuples in P as where m ≥ 2. In (6), we remove tuples with weights of 0 and 1. The number of rows in matrix G is equal to k. Therefore, we can obtain the following condition: We replaced m = nk with (7) and obtained the following equation: Then, (8) is expressed as k is an integer, and the maximum number is needed. Therefore, from (9), we can calculate k as follows: where ⌊x⌋ is the largest integer ≤ x. The code rate R H of the codewords with an MHD of 3 is After achieving the list codewords with an MHD of 3, to achieve the sparse code, we apply the following condition to select the codewords in C.
where ω (c i ) is the weight of codeword c i . Because a bit sequence always exists a corresponding inverse bit sequence (e.g., 010110 and 101001), using (12), almost half of the elements in C are eliminated. In addition, assuming that if the set of elements, which have the numbers of non-sparse code more than a half, we can inverse these non-sparse elements to achieve the sparse elements and half of the elements in C are also eliminated. Finally, the code rate of the sparse code is Moreover, we can use the generator matrix G to create codeword list C with an MHD of 3 instead of using Method 1. This is referred to as Method 2. Method 2 is divided into two parts: finding the rows in matrix P and creating the codewords in C. It is implemented according to the following steps: Step 1: Find matrix P, which has rows with a weight greater than 1.
Step 2: Create the matrix G from matrix P.
Step 3: List the user data u.
Step 5: Remove the codewords that do not satisfy (12) to achieve the sparse code with an MHD of 3. The time consumption was significantly improved compared to that of Method 1. Table 2 presents a comparison of the time consumption of the two methods.

B. APPLYING TRELLIS FOR ENCODING/DECODING
In addition to the above methods, we can improve the BER performance of STT-MRAM using trellis modulation for encoding and decoding codewords. In the trellis modulation, we create the relationship between codewords using the trellis modulation structure in Table 3.
In Table 3, the structure includes 2 k states and one branch between the current and next states. In state S q (q = 0, 1, 2, 3, . . . 2 k − 1), the outgoing branches labeled 0, 1, 2, . . . , 2 k − 1 are allocated codewords c q mod2 k , c k (q+1) mod2 , . . ., c k−1)mod 2k (q+ 2 , respectively. To encode, we start at state S 0 ; the encoder's output is the codewords in C corresponding to the k input bits and then moves to the next state. After α -1 codewords, a parity word (000. . . 0) is input into the encoder to force the process to return to S 0 . Thus, the actual code rate becomes k(α -1)/(nα), where α is the interval between the terminating parity symbols. Fig. 5 shows an example of the encoding process. For example, we assume that the input sequence is 5-bit and α = 5.   To decode, the Viterbi algorithm was used in the decoder with the Euclidean distance (ED) metric. The branch metric was calculated as follows: where s a is the current state, s b is the next state, y t is the bit of the received codeword, and x t is the bit of the branched codeword. When reaching the α-th codeword, which makes the state return to S 0 , we select the path with the minimum cumulative path metric. The decoding procedure then outputs the original data corresponding to each surviving branch. An example of the Viterbi decoding process is shown in Fig. 6. We use the example shown in Fig. 5 for decoding.

IV. SIMULATION AND DISCUSSION
In this section, we describe the simulation of the model using the diagram shown in Fig. 7. The original data u t are encoded into the encoded signal x t , which is stored in the MRAM device. The received signal y t is then decoded by the decoder to recover the original signal u t . For encoding and decoding, we used the mapping and trellis methods. In the mapping, a k-bit input (00 . . . 00, . . . , 11 . . . 11) is encoded into an n-bit codeword (c 0 , . . . , c k 2−1 ) in C using a oneto-one mapping algorithm. For example, we used the 5-bit input. If the input bits are 00111, they are converted into codeword c 7 from the list in C. When we decoded using the mapping algorithm, we calculated the Euclidean distance (ED) between the received codeword and the codewords in C. In the mapping method, the codewords are pre-stored. Therefore, encoding does not spend much time. However, because the received signal will compare with all the codewords in the decoding. The decoding spends processing time. For the power consumption, encoding will spend more power at the buffers, which store the codewords, and decoding will spend more power to compare the received signal with the all codewords. The received signal was assigned to the codeword with the minimum ED. The trellis decoding method was implemented as described in Section III-B. For the trellis method, we proposed the theory because of the high complexity and spending more time. Now, the trellis mothed is not suitable for real-time. In the future, we will reduce the complexity and develop the trellis method to be suitable for the real-time.

A. CREATING THE CODEWORDS
To consider the ability of MHD of 3 and sparse code, we created the following codeword lists: • The codewords of the code rate 16/21 with an MHD of 3.
• The sparse codewords of code rates 14/20, 15/21, and 16/22 with an MHD of 3. Additionally, we compared our proposed code with a previous study on 7/9 sparse codes in [18]. The results are presented in the next section.

B. RESULTS FOR MAPPING AND TRELLIS METHODS
In the simulation, we used the same parameters as in [28]. The write error rate P 1 = 2 × 10 −4 (with P 0 and P r being two orders of magnitude lower than P 1 , and hence, p 1 = 1.02 × 10 −4 and p 0 = 10 −6 ). In the first experiment, we compared two factors: the MHD and sparsity of the codes. Based on the code rate 7/9, we created codewords with a weight of less than 3 and allowed an MHD of 1. These codewords increase the sparse characteristic compared to the 7/9 sparse code in [18]. Moreover, we created 7/9 non-sparse code with an MHD of 2. The results are shown in Fig. 8. It can be observed that the sparse code with an MHD of 1 can hardly improve the BER performance. However, with the combination of sparsity and an MHD of 2, the codewords can significantly improve compared to only an MHD of 2 or sparse. Particularly, at BER 10 −3 the sparse code with an MHD of 2 achieves a gain of 2% compared with the nonsparse code with an MHD of only 2. Next, we implemented the encoder and decoder with the mapping method and codewords of code rates 16/21, 15/21, 16/22, and 7/9. The results are shown in Fig. 9.
In Fig. 9, we varied σ 0 /µ 0 to calculate the BER performance of the codes. When the MHD is increased, the codes can significantly improve the performance of STT-MRAM. Specifically, the 16/21 Hamming coding with d min = 3 achieves a gain of 2% and approximately 3% over the 7/9 sparse code and without coding, respectively, at 4 × 10 −4 BER. In this simulation, we compared the sparse codes with MHD values of 2 and 3. The codewords of code rate 15/21 were chosen from the codewords of code rate 16/21 when applying the sparse code condition. It can be observed that the 15/21 code achieves a gain of 0.5% compared to  From the results in Fig. 10, it can be observed that when applying the trellis method to the encoder and decoder, the gain can be improved by approximately 1.6% at 10 −4 BER compared with the mapping method. The improvement can achieve a gain of 2.5% at 10 −4 BER for each code rate. Additionally, we created a 14/20 sparse code with d min = 3. This helps in deducing that the BER performance improves when the code rate is increased by k and n because when n increases, the sample space of the codewords is extended. This causes the codewords to have a higher separation. However, we must tradeoff with the reduced code rate of trellis k(α -1)/(nα); here, we choose α = 30. Additionally, when k increases, the trellis for the decoder achieves very high VOLUME 11, 2023 33011 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. complexity. Therefore, the trellis method cannot be used when the code rate is increased by increasing k. Table 4 presents the tradeoff between the code rate loss and coding gain for our proposed code and the 7/9 sparse code in [18]. From 14/20 to 16/22 sparse codes with MHD of 3 in Table 4, it can be observed that they achieve coding gain. In subsequent experiments, we assumed that the size of the MTJ cell decreased. This leads to a reduction in the thermal stability factor and a lower critical current. Therefore, during the writing process, the probability P 1 is changed [34]. During the reading process, it can be observed that the MTJ is affected by higher temperatures. Thus, the high resistance R 1 decreases, whereas the low resistance R 0 does not change [35]. In other words, we added the offset to the mean and variance of the Gaussian distribution for R 1 in the cascaded model channel [36].
In Fig. 11, we fix σ 0 /µ 0 = σ 1 /µ 1 = 9.5% and vary the probability P 1 during the writing process. The results show that the sparse code 16/22 can still improve the performance when P 1 is less than 10 −3 . When P 1 was greater than 10 −3 , the performances of 16/22 and 15/21 were mostly the same. Moreover, when P 1 is less than 10 −5 , which is very small, the error in the writing process does not affect the performance of the codes, but the read-decision error occurs at the GMC with the resistance spread σ 0 /µ 0 = σ 1 /µ 1 = 9.5%. As in [36], we fixed σ 0 /µ 0 = 9.5% and varied the mean offset µ off from -0.25 to -0.05 with σ off /µ 1 = 5% and 7% in Fig. 12.
From Fig. 12, it can be observed that the 16/22 sparse code achieves the best performance when the mean offset does not change by more than 0.25 for σ off /µ 1 = 5% and 7%. However, the 15/21 sparse code can improve the performance compared with the 7/9 sparse code when the mean offset changes by more than 0.2 and 0.17 for σ off /µ 1 = 5% and 7%, respectively.

V. CONCLUSION
In this study, we proposed methods to improve the BER performance using the modulation coding technique for STT-MRAM. Increasing the MHD of the sparse code is a simple idea for improving it. However, to increase the MHD for the sparse code, we must face a very high complexity problem and spend more time finding codewords. Therefore, we proposed a faster method to improve the MHD of the sparse codes. In this method, we first used the generator matrix of the Hamming code to obtain codewords with an MHD of 3. These codewords were then considered with the sparse code condition to choose the codewords for the sparse code with an MHD of 3. We used the mapping and trellis methods to encode and decode these codewords. The simulation results showed that the proposed code is better than that of a previous study. Particularly, the gain can reach approximately 2% to 3% compared with the previous study and without coding, respectively. Moreover, our proposed code can resist a worse environment when the size of the MTJ cell decreases.