Time to digital sensing for multilevel RRAM cells

Memristors offer the possibility of implementing multilevel cells in Resistive RAMs providing high-density non-volatile data storage solutions. However the writing and reading processes when these devices are integrated into a crossbar is a problem that has not yet been fully solved. Process and temperature variations must be carefully handled when designing reliable interface circuits for RRAM cells. In this work we present a time-domain reading circuit for multilevel RRAM cells. The sensing operation is performed based on the well-known RC circuit that generates a pulse whose width depends on the state of the cell, providing excellent results in terms of area, energy per bit and scalability. Temperature and process variations have been analysed in depth to validate the effective number of levels a cell may store to be read accurately.


I. INTRODUCTION
Since the scientists at Hewlett-Packard laboratories created the first nanoscale memristor in 2008 [1] the subject has given rise to a lot of interest. The memristor offers a good line of research for new non-volatile memory devices. The analog behaviour of memristors along with their low power consumption and fast switching capability has encouraged their use. Furthermore, memristors have a relatively low cost and allow a fine grain integration with CMOS and a higher density and endurance than other technologies [2]. Nevertheless, memristors remain at an early development stage due to their complex physical mechanisms and the absence of a commercial manufacturing process.
The key feature of memristors is that they can change their resistance through the application of a given electric field. This property makes memristors a perfect fit for Resistive Random Access Memories (RRAM), which are memories that depend on the change in resistance rather than charge to store information [3] [4]. Different RRAMs have been designed based on memristors and they can be organized into two groups. The first group allows two different states in the memristor to be achieved, usually named high and low resistance states (HRS and LRS) and use them as a conventional digital memory in which each cell can may store one bit and the key point being the relationship between resistance states [5]. The advantage of using RRAM is its high integration density and non-volatility. The other group takes advantage of the analog behaviour of the memristor to deal with multilevel cells (MLC) to significantly increase the storage density. The idea is to reach different resistance states in order to save more than one bit in each cell. Strongly related to the multilevel capacities, in-memory computing has appeared, in which the analog properties in RRAM are used to make calculations at the time that the values are written or read [6]. Different methods may be used to store different resistance values in an MLC [3]: by changing the current through the memristor during the 'set' operation, by controlling the reset voltage while the current is kept constant or by varying the writing pulse width while the amplitude of the pulse is constant. Once a resistance level is programmed, its reading requires that the stored state remains unmodified.
Intensive research has been carried out related to the design of writing and reading circuits because the solution is not obvious. In the memristor model used, the number of elements in the circuit and the variability associated to the manufacturing process have to be taken into account while writing and reading, as they affect the number of levels that can be considered. Environmental variations such as temperature should be also taken into account. In this work we focus on the reading circuitry for MLC storage.
Several reading methods have been published during in recent years, most of which are based on carrying out different types of comparison. Some reading circuits compare the resistance value in the memristor against a set of resistances in order to determine the stored value [7] or using ADCs [8]. Others try to apply a well-controlled current through the memristor and read the voltage taking care not to change the stored value using different types of pulse chains [9]. Both cases require feedback loops and consequently the reading time is not well bounded. Furthermore, ADCs and operational amplifiers are common in these proposals, which results in the large size of the readout circuits. There is also a less explored field of research related to time domain reading [10]. In that type of circuit the main idea is to be able to develop a relationship between the memristor state stored and the time. This way the reading time is bounded and the required area for the circuit is significantly reduced since no ADC is required.
In this work we propose a time-domain circuit for reading MLCs based on the well-known behavior of an RC circuit. The main contributions of this proposal can be summarized as follows: • We introduce and validate a circuit for reading multilevel cells based on the time-domain approach. • The scalability analysis carried out shows that it may be shared by all of the cells in a crossbar structure. • The number of reliable levels that can be stored in the MLC has been analyzed in depth in presence of process and temperature variations. • The resulting circuit is characterized by a small area and bounded reading time, which is essential for applications such as neuromorphic computing. • The power consumption of the circuit has been analysed and compared with other circuit proposals.
This paper is structured as follows. First we expose the fundamentals of the circuit taking care of how it works, digitization and scalability. Second we include two different sources of variation, process and temperature and analyse their impact. Third, a comparison against other literature proposals has been made in terms of complexity and power consumption. Finally, several conclusions are reached based on the simulations results.

A. WORKING PRINCIPLE
One of the critical functions in the design of a reading circuit is to guarantee that the memristor state (resistance) will remain unaltered during the process. The voltage and current ranges in which the memristor state changes can be different among devices due to their manufacturing process and resulting in different models. We have used the model developed by Arizona State University (ASU) [11], [12] which is a physics-based model validated against physical devices. This compact model describes the resistive switching mechanism by means of a simplified one-dimensional conductive filament inside a dielectric. The conductive filament is a channel which connects the top and the bottom electrodes of the device. The state of the device will be between the low-resistance state (LRS) and the high-resistance state (HRS), depending on whether the conductive filament is fully formed or dissolved. The gap distance is defined as the average distance between the electrodes, the minimum value is 0.1 nm and it is related to the LRS~10 kΩ and its maximum value, 1.8 nm, is related to the HRS~3.5 MΩ. The R-ratio, defined as HRS/LRS, is~350. It is important to note that these values of resistance are those which are readable at 0.1 V as the model proposes. From now on the value used as memristor state will be the gap instead of the resistance. This model also offers a Verilog-A file to add a memristor component to circuit simulation tools easily. As explained in [13] no voltages under 2 V change the memristor resistance. We have fixed our working voltage at 1.1 V to make it easily compatible with the commercial transistor technology used in our designs.
The proposed circuit follows the time domain approach. The idea behind this is the development of a relationship between the resistance of the memristor representing its state and the width of a pulse. Once this relationship is developed we can digitize this pulse width and easily get the state of the memristor. Our proposal is based on an RC circuit in which the R is represented by one of the most used structures in the RRAM field, the 1T1R cell, one transistor one memristor cell. This cell configuration is suitable for avoiding the sneak path problem [14]. We have added a few extra components to control the read-write cycles and the access in a possible crossbar configuration. The schematic of the proposed circuit is shown in Figure 1. Figure 2 represents a chronogram with all involved waveforms to understand easily how the circuit works. Our goal when reading a 1T1R cell is to generate a pulse whose width is proportional to the memristor state at read out . There is a transistor M c , which provides a discharge path after reading, therefore it is in cutoff mode while the target cell is selected. Let us assume capacitor C was discharged at the beginning of the cycle, which means that V inv is "1". When read is activated, the read out signal is also high. If the cell is selected (its pass transistor M 0 is activated with a logic "1") capacitor C is charged through the memristor. When the voltage in capacitor V c reaches the inverter threshold the voltage at its output changes to logic "0" and the read out signal is disabled again. The read out signal remains activated from the start of the reading process until the capacitor reaches the inverter threshold; the read out pulse is the time domain variable that we were looking for, since its width is related to the internal value of the cell. Once the read operation has finished the read voltage turns low and the capacitor is discharged through transistor M c giving rise to a new reading operation in the same conditions as before.

B. DIGITIZATION
Before digitizing the output, the relationship between the pulse width and the state of the memristor must be established. Due to the characteristics of the model we use, the state of the memristor can be identified with the gap parameter. This parameter corresponds to the empty distance between electrodes. According to the model, that gap value can change between 0.1 nm (gap_min) and 1.8 nm (gap_max), but in our work we have reduced the available range from 1.55 nm to 1.7 nm. This reduction is based on the writable area described in [13] in which a temperature-aware writing driver was developed.
The proposed digitization is based on the use of counters. The read out signal can enable a digital counter whose count represents the digitized value after the reading process. Figure 3 shows this digitization process illustrating the difference between the pulse length and the count for the bounds of our range of available resistance values. More details on the clock frequency and the number of bits will be explained later. Once the gap is bounded it is necessary to fix the other parameter that greatly affects the pulse time, and the capacitance C. The capacitor value should be a trade-off between area, time and the available clock frequency which can be understood as the resolution. In this work the target capacitor value has been fixed at 120 fF but due to its design the final value is slightly different. In Figure 4, represented in blue, we have the relationship between the pulse time and the internal gap in the memristor. Highlighted as red circles and blue crosses two different multilevel proposals can be found, one for four levels and the other for eight levels respectively. The obtained pulse width goes from~100 ns t min to~150 ns t max which is a range in which digitization by counters is easy. It can be bounded using the maximum width a worst-case scenario and adding a guard time, therefore in this case the reading time could be fixed at~175 ns. This reading time could be considered the latency of the circuit. The clock frequency proposed for the counter is 1 GHz which provides a resolution of 1 ns, enough for the range we are dealing with. With a period of 1 ns and time values of less than 175 ns, the number of bits for the counter should be 8. This frequency is a tradeoff between resolution and area, a smaller period provides a higher resolution but it increases the size of the counter and problems related to high-speed clocks could appear.

C. SCALABILITY
Most applications would require the proposed circuit to work with a cell that is part of a crossbar with multiple cells. This circuit proposal offers excellent performance in terms of scalability. Any number of 1T1R cells can be placed in parallel, making up the biggest part of the circuit, which is related to VOLUME , 2021     the capacitor, the inverter and the read out signal generation, as shown in Figure 5. To make this crossbar configuration work properly it is necessary to activate only the selection transistor M i associated to the memristor every reading cycle and keeping the others in cutoff mode. Each new cell increases the associated pulse width in~650 parts per million, which means that the difference between 1 and 64 cells is 4%. In Figure 4 the green line represents the behaviour with 64 cells in parallel as explained before. Again, highlighted as red circles and blue crosses two different multilevel proposals can be found, one for four levels and the other for eight levels.
It is important to highlight that the curve is only slightly shifted from the 1-cell curve, consequently all affirmations made for one cell can be extrapolated to a higher number of cells. From here all simulations have been done using the 64cell array.

III. VARIATION ANALYSIS
Reliability is a key point when dealing with devices such as immature and highly non linear as memristors. To get a reliable circuit we have performed an in-depth variation analysis considering both process and temperature variations. Two main sources of process variations have been taken into account, memristor and transistor variations. It is important to note that all components belong to a commercial 40 nm technology. An additional source of variation could be the crossbar circuit-level non-idealities, such us wire resistance and capacitance, driver and sensing resistances, etc. Interconnect resistance is considered the most significant for MLC if subnanosecond read operations are carried out [15]. Furthermore, experimental values of wire resistances are reported below 1 Ω per feature size of the MLC for a 45 nm technology node [16], which can be considered negligible for the reading circuit presented here. Another source of uncertainty is the cycle to cycle variation which mainly appears when the device is written several times. For these reasons the reliability analysis described below will focus on device and temperature variations.

A. MEMRISTOR PROCESS VARIATION
Memristors are not characterised in terms of variability as well as transistors are. This is because they are not included in a commercial manufacturing process. When the statistical models for process variation are not available variability can be modelled using injectors to introduce variation into a specific parameter [17]. We have considered the variation of two memristor's parameters: gap_min and gap_max. These parameters represent the limits of the non-conductive part of the memristor and therefore they are the limits of the state of the memristor. We have considered a variation of ±5% [18] on these parameters, resulting in a proper behaviour of the device without problems because, as explained before, our working region goes from 1.55 nm to 1.7 nm. This variation does not exceed the device's bounds explained in II-B.

B. PROCESS VARIATIONS IN CMOS TECHNOLOGY
Process variations in the 40 nm technology may affect the correct operation of the device. As an a-priori hypothesis it is expected that the critical points will be the capacitor and the inverter placed just after it. The idea behind this hypothesis is based on the fact that these two components are the ones that are actually in charge of carrying out the digitization and therefore they work at the interface between both worlds, the analog and the digital. Small changes in devices that work in the analog field have a greater impact than the same variations in the digital field. Actually, digitization comes from the threshold voltage of the inverter inv in figure 1.
After evaluating process variations in each transistor of the circuit through Monte Carlo simulations, we can conclude that this hypothesis was correct and that variations in the inverter's transistors are the ones that matter. The simulations have been carried out using the configuration in table 1 and the pulse width obtained in this case is~54 ns in which the initial gap (gap_ini) is fixed at 1.37 nm. This parameter represents the starting point of the memristor, the previously written value. The pulse width has suffered a variation in the range of pico-seconds when process variations are applied to the AND gate or to the selection transistor, which may be ignored. However, the range grows to nanoseconds when the variation is applied to the inverter and passive components such us the capacitor.
As the capacitor we have used an MOM (Metal Oxide Metal) structure provided by the manufacturing process. Its target capacitance is~120 fF and we obtained 124.801 fF after configuring the component. A small difference here is not important because the idea of the driver is to be able to distinguish the levels but the exact time corresponding to each of them is not relevant. The area required by this capacitance is 32 um 2 .
In trying to minimize the impact of process variations in the inverter it is necessary to modify the size of its transistors. For a gap_ini of 1.37 nm, for a fixed point is required to work, we analyze the impact of the size of the inverter in the Monte Carlo simulation (standard deviation of the pulse width). All of the data obtained after the simulations are summarized in Table 2. Finally, a good trade-off between area and accuracy can be obtained with the following values: L=480 nm W=960 nm. For this design it is not necessary to balance the delay of pull-up and pull-down transistors in the inverter, the key parameter is the threshold voltage of this gate. Once all of the circuit parameters are fixed we run the Monte Carlo simulations with 100 points for every level to analyse the multilevel capability of the circuit. All of the simulations consider both variations in the transistors and the capacitor. This can be seen through the overlap between levels when the working range is divided into four or eight levels. Results are shown in Figure 6.a and Figure 6.b for four or eight levels respectively. Colours stand for different levels. The X axis represents the pulse width that is read by the circuit. When only four levels are considered, the overlap becomes non-existent, therefore the levels are completely distinguishable and we may state that process variations will not affect the multilevel storage. On the other hand, when the number of levels is increased to eight, it is possible to appreciate a little interference in the tails of the Gaussian curves, but the levels still remain distinguishable. In the presence of process variations, no more levels can be handled in this particular working region.

C. TEMPERATURE VARIATION
It is well known that the temperature affects the behaviour of memristors. For this reason it is important to choose a memristor model that considers this parameter, which is the case of the ASU model used in this work. The relationship between the state and the pulse width changes with temperature, as is shown in Figure 7. The temperature curves are almost parallel, which means that they do not cross each other. Knowing the pulse width and the temperature there is only one possibility for the memristor state.
The results obtained for the proposed read circuit after repeating the Monte Carlo simulations with temperature variation are shown in Figure 8.a when four levels are considered and Figure 8.b for the eight-level cell. Again, the colours represent different levels. Analysing both figures it is easy to see that four levels are still clearly distinguishable, but the eight-level plot exhibits serious overlaps that make the reading invalid.
A more exhaustive analysis has been carried out and it VOLUME , 2021  is shown in Figure 9. In this figure the eight levels we had in Figure 8 are represented again, but since these Gaussian curves come from the aggregation of Gaussian curves obtained at different temperatures we have also included the temperature curves for 0 ºC (in blue) and 100 ºC (in red), which are the coldest and hottest temperatures we have considered. As can be seen in the plot, overlap occurs between the coldest temperatures of one level and the hottest temperature in the next level but never at the same temperature. As a consequence of this analysis it can be deduced that if the temperature is controlled or, if it is taken into account together with the pulse width measurement given by the circuit, the eight levels would still be available for this multilevel cell. A tiny smart temperature sensor [19] may be used to obtain the temperature measurement and then work with only one of the lines in Figure 7 for the reading process.

IV. COMPARISON WITH OTHER PROPOSALS
As was mentioned in the introduction, different multi-level reading methods have been studied and published. To make a fair comparison, we will evaluate the proposed circuit against other proposals to find the differences both in positive and negative terms. One of the first MLC reading proposals was based on a reference resistance array [7]. This circuit includes a set of resistances used as references, each resistance with its own comparator for every level, which this structure does not scale well when the number of levels increases. Additionally comparators and PWM generators are needed to complete the circuit based on the connection of different current mirrors. In this early work there is no information regarding variability effects or power consumption. The circuit explained in [8] is based on a voltage divider in which a couple of memristors and transistors are used to generate the divider. This is a small configuration but there is a serious area problem associated with the ADC needed to distinguish the generated voltage levels. It consists of two stages, one with a comparator and another with an encoder. There is no information on the process variation analysis but there are some data on the power consumption which will be explained later. In [10] an interesting mixed of current sensing and time domain circuit is proposed. The authors generate a current proportional to the memristor state through a current mirror and then it is used to create an oscillation with different frequencies depending on the memristor state. A counter is required at the final stage to calculate which state the memristor was in. The circuit tolerance is evaluated in terms of percentage of current variation in the sensing process but it is not explained whether this tolerance includes process variations applied to the transistors. Unfortunately the power consumption of the circuit is not provided.
The circuit proposed here does not contain any type of voltage or current references nor resistors and it does not need expensive ADCs or operational amplifiers. The pulse time generation only needs 11 transistors plus a capacitor. Two transistors are significantly bigger than the others which are minimum size. Of course the time needed to carry out a read operation is larger than other proposals, because of the nature of the architecture, but it is bounded and reasonable (175 ns).    The strongest point of our proposal is the variability analysis considering process and temperature variations, which is not found in other works.

A. POWER ANALYSIS
A power analysis of the proposed architecture has been carried out. All simulations related to the readout circuit have been carried out using Cadence and the analysis of the counter has been made using Synopsys and post synthesis tools. We have divided the analysis of power consumption in two different blocks. The first one involves only the readout circuit without taking into account the digitization interface. The second measurement also includes the digitization block, where we have proposed a counter synthesized from a standard cell library as a first approach. Table 3 details all of the data related to power consumption in the proposed circuit compared with other approaches found in the literature. As can be seen, the whole circuit requires an average power of 54.70 uW and is highlighted with a low figure of energy per read of 8.20 pJ. This results in an energy per bit of 2.73 pJ when eight levels are stored.

V. CONCLUSIONS
In this paper we have presented a reading architecture for multilevel RRAM cells. The architecture is based on the time-domain paradigm in which digitization is carried out from a pulse derived from the charging of a capacitor through the RRAM cell. The proposed circuit has a small area and scales easily when used in a whole crossbar array but, as expected, and because of its nature, the time needed to read any value is greater than in conventional memories. An indepth reliability analysis has been carried out focusing on process and temperature variations, which may jeopardise the multilevel capabilities of the cell. A power consumption analysis has also been carried out in order to compare it easily with other approaches. The proposed idea can be considered resilient to process variations and it can handle thermal variations with some extra hardware.