Cryogenic In-Memory Computing for Quantum Processors Using Commercial 5-nm FinFETs

Cryogenic CMOS circuits that efficiently connect the classical domain with the quantum world are the cornerstone in bringing large-scale quantum processors to reality. The major challenges are, however, the tight power budget (in the order of milliwatts) and small latency (in the order of microseconds) requirements that such circuits inevitably must fulfill when operating at cryogenic temperatures. In-memory computing (IMC) is rapidly emerging as an attractive paradigm that holds the promise of performing computations efficiently where the data does not need to move back and forth between the CPU and the memory. Hence, it overcomes the fundamental bottleneck in classical von Neumann architectures, which provides considerable savings in power and latency. In this work, for the first time, we propose and implement an end-to-end approach that investigates SRAM-based IMC for cryogenic CMOS. To achieve that, we first characterize commercial 5 nm FinFETs from 300 K down to 10 K. Then, we employ the first cryogenic-aware industry-standard compact model for the FinFET technology (BSIM-CMG) to empower SPICE to accurately capture how cryogenic temperatures alter the electrical characteristics of transistors (e.g., threshold voltage, carrier mobility, sub-threshold slope, etc.). Our key contributions span from (1) carefully calibrating the cryogenic-aware BSIM-CMG against commercial 5 nm FinFET measurements in which SPICE simulations come with an excellent agreement with the experimental data, (2) exploring how the robustness of SRAM cells against noise (during the hold, read, and write operations) changes at extremely low temperatures, (3) investigating how the behavior of SRAM-based IMC circuits changes at 10 K compared to 300 K, and (4) modeling the error probabilities of IMC circuits that are used to calculate the Hamming distance, which is one of the essential similarity calculations to perform classifications.


I. INTRODUCTION
Q UANTUM computing promises to resolve a wide range of computational problems that are fundamentally challenging, if not impossible, to be resolved by classical computing.Synthesizing new materials, optimizing drugs [1], and notably, simulating quantum systems [2] are examples of the types of problems that could be superbly tackled by quantum computing, potentially reshaping the future of mankind.Nevertheless, for this to happen, a large number of high-fidelity qubits are indispensable, and hence, quantum computer up-scaling becomes a necessity.However, this demands CMOS-based compute circuits that effectively connect the classical domain (where information is processed) and the quantum domain (where qubits are present).
These circuits are prerequisites in such scaled-up quantum computers as they are responsible for: (1) processing the measurements received from the qubits, (2) performing the required classification to translate the readout data to the digital world as well as (3) performing the necessary error corrections [3], [4].
The inevitable need for cryogenic circuits: Today, qubits operate at near absolute zero (e.g., 10 mK) to ensure they stay in the required superimposed state for as long as possible.Such a coherence time is often short (ranging from nanoseconds to milliseconds) and, more importantly, extremely sensitive to noise and heat due to the fragility of qubits.On the other hand, control circuits currently operate at room temperature (i.e., 300 K), which causes a profound input-output bottleneck for existing quantum computers.This is further exacerbated by the fact that every qubit might even need to be individually controlled [5].This overwhelming problem has been recently exemplified by state-of-the-art experiment that demonstrated the need for approximately 200 wide-band coaxial cables along with 45 bulky microwave circulators and a rack of electronic circuits to control merely 53 qubits [5], [6].Despite the isolation, a significant temperature gradient (300 K ↔ 0.1 K) induced at the two ends of every wire creates a heat flux that might still leak from the control circuits (outside the fridge) towards the qubits (inside the fridge) jeopardizing the entire quantum system.Further, timing constraints are already tightened by the short coherence time of qubits.As long cables introduce large latencies, satisfying the timing constraints can be a challenge or an entirely infeasible task in the worst-case scenario.Hence, it becomes inevitable to move the CMOS circuits from room temperature down to cryogenic temperatures to locate them as close as possible to qubits.Otherwise, scaled-up quantum computers, where a large number of qubits are being coherently and reliably operated, are not possible.
The key challenges behind cryogenic circuits: Operating CMOS circuits at cryogenic temperatures impose tough power constraints on the circuits.This is because of the highly limited power dissipation capability of these circuits that becomes consequential at extremely low temperatures.For instance, at 10 K and 0.1 K, the control circuits must operate within a power budget that does not exceed merely 100 mW and 10 mW, respectively [7].Otherwise, the generated heat can rapidly disturb the state of qubits and even destroy them.Therefore, when operating CMOS circuits at cryogenic temperature, power optimization becomes the primary goal.In addition, the latency of the performed computing must be short enough to meet the tight timing constraints imposed by the short coherence time of qubits.
In a nutshell, cryogenic circuits required for quantum processors must not only be ultra-low power but also fast.
The need for cryogenic-aware compact model: Current commercial SPICE tools and compact models are not yet aware of the fundamental changes that cryogenic temperatures cause in the underlying semiconductor physics governing CMOS transistors, and the research is still in its infancy.Some key changes caused at cryogenic temperatures are as follows: leakage current decreases, transistor sub-threshold swing decreases, and carrier mobility improves while transistor threshold voltage increases.This makes the existing compact models lack the necessary information on how the key electrical characteristics of p-and n-FinFET are impacted at extremely scaled-down operating temperatures.
The promise of in-memory computing: Classical computing using the existing von-Neumann architecture inherently suffers from significant power and latency overheads due to the physical separation between the computing units and memory units.On the contrary, In-memory Computing (IMC) is rapidly emerging as an attractive alternative in which the memory is augmented by a certain capability to perform some types of computation.This eliminates the fundamental bottleneck and considerably accelerates computation while reducing the consumed power.IMC can be realized using either classical CMOS-based SRAM memories [8], [9], [10], [11], [12], [13], [14], [15], [16] or emerging beyond-CMOS nonvolatile memories such as ferroelectric FET [17], [18], [19], [20], [21], [22].Unlike beyond-CMOS technologies, which are still in their infancy, classical CMOS-based SRAMs are much more mature and suffer significantly less from process variation and variability effects.Therefore, in this work, we focus on implementing IMC circuits using conventional SRAMs.
Our main contributions within this paper are: (1) We characterize commercial 5 nm FinFETs from 300 K all the way down to 10 K. We then employ the obtained measurements to carefully calibrate the first cryogenic-aware industry-standard compact model for FinFET technologies for accurate SPICE simulations.
(2) Our calibrated FinFET compact model (which empowers SPICE to come up with an excellent agreement with the experimental data) is then used to analyze how the resiliency of 6-T SRAMs against noise (during the hold, read, and write operations) from 300 K down to 10 K.
(3) We investigate how the behavior of SRAM-based IMC circuits changes at extremely low temperatures and how errors in computations can emerge.
(4) We model the induced error probabilities of IMC circuits (when used to calculate the Hamming distance) at 10 K compared to 300 K, revealing how cryogenic temperature can affect the reliability.

II. RELATED WORK
In-memory computation increases energy efficiency by performing computations within the memory itself instead of transferring enormous amounts of data back and forth to the processing units.Recently, there has been significant research interest in using IMC to perform data-intensive computations, e.g., in deep neural networks and machine learning [8], [9], [10], [11].It has been studied for both the conventional CMOS [8], [9], [10], [11], [12], [13], [14], [15], [16] and beyond CMOS devices [17], [18], [19], [20], [21], [22].However, only a few works have studied SRAM in the context of IMC at cryogenic temperatures.Authors in [8], [9] analyzed the IMC-based deep neural network and convolutional neural network performance at cryogenic temperatures using the 28 nm and 55 nm CMOS technology, respectively.In [10], the authors propose to implement the surface code for Quantum Error Correction (QEC) using the SRAM-based IMC.However, these studies were limited to the older generation of CMOS technologies.Although authors in [11] reported the SRAM IMC macro on the 7 nm technology platform, they performed the evaluation only at 300 K.In another recent publication, X-SRAM, a modified variant of conventional SRAM, was introduced by [16] for performing IMC.However, the study primarily relied on a predictive technology modelcard for bulk MOSFET and restricted the analysis to temperatures of 300 K and above.Previous studies were carried out either with emerging memory technologies or older-generation CMOS technologies.SRAM is a more mature memory technology and well-suited for quantum processors due to its compatibility with the cryogenic CMOS circuitry [4], [23].
All in all, in this work, we present the 5 nm FinFET SRAM-based IMC using the Ternary Content Addressable Memory (TCAM) and X-SRAM arrays.We also investigate the impact of process variations for both cryogenic and room temperatures.Our study spans from the transistor level to the circuit level all the way to the error probabilities that can be induced at the system level.

III. CRYOGENIC MEASUREMENT SETUP
This work presents the measurement of the minimum channel length, multi-fin, multi-finger FinFETs of a commercial 5 nm FinFET technology.We use a cryogenic probe station called "Lakeshore CRX-VF" to perform the on-wafer DC measurements in the temperature range of 10 K to 300 K Fig. 1).The primary components of the probe station are a two-stage closed cycle refrigerator (CCR), tungsten probes, probe positioners, the vacuum pump, sample stage, and microscope.The first stage of CCR unit cools the probes.The second stage cools the sample stage.With the help of temperature sensor and CCR unit, the probe station automatically cools the ambient temperature down to desired cryogenic temperatures.
In order to minimize the chance of condensation, we keep the sample at a higher temperature during the cooling process.As the probe arm temperature can go as low as 15 K, the thermal load from the probe arms limits the sample stage temperature up to 3.5 K to 8.5 K. To minimize temperature fluctuations during the measurements, the probe station operates till the lower limit of 10 K. Once we set the cryogenic temperature for measurement, the CCR unit starts the cooldown process.The measurement requires voltage/current sources and meters with the smallest noise floor possible.We use Keysights' B1500A parameter analyzer with high-resolution and medium-power source measurement units (SMUs).To avoid human interference in the measurement process, we control all the measuring instruments with a computer using the GPIB interface.

IV. 5 NM FINFET COMPACT MODEL CALIBRATION FROM ROOM TEMPERATURE DOWN TO CRYOGENIC TEMPERATURE
The electrical characteristics of semiconductor devices, such as transistors, diodes, MOS varactors, etc., are highly temperature-dependent due to the temperature-dependent charge carrier concentration and mobility.Hence, setting the appropriate simulation environment becomes the most important step for extracting compact model parameters.To begin the model parameter extraction, we first set the appropriate temperature environment in every simulation setup using the SPICE keyword TEMP (the ambient temperature for each simulation) and the model parameter TNOM (transistor's nominal temperature).We have utilized a modified cryogenic-aware FinFET compact model to capture the transistor electrical characteristics from 300 K to 10 K [24], [25].
The following subsection describes the model extraction procedure for room and cryogenic temperatures.

A. TRANSISTOR MODEL CALIBRATION
After setting the correct simulation environment, we extract the process-dependent model parameters, e.g., doping, oxide thickness, and gate material work function.Interface trap charges and source-drain coupling substantially influence the sub-threshold behavior of the transistors.Using the BSIM-CMG model parameters PHIG, CIT, and CDSC, the simulations accurately imitate room temperature sub-threshold characteristics (300 K) of the measured FinFETs [25].We extract the low field mobility and field-dependent mobility degradation parameters (i.e., U0, UA, UD, EU, and ETAMOB) from the transfer characteristics (I DS − V GS ) when the transistor operates at low drain voltage (V DS ) and moderate inversion Fig. 2(a)).On the other hand, we extract the series resistances model parameters (RSW, RDW, RSWMIN, and RDWMIN) from the strong-inversion regime (higher gate voltage).The model parameters ETA0, PDIBL2, and CDSCD, capture the impact of Drain-Induced Barrier Lowering (DIBL).To extract the influence of DIBL, we focus on the sub-threshold region of operation and optimize the above-mentioned model parameters by simultaneously observing the I DS − V GS Fig. 2(b)) characteristics at lower and higher V DS .With V DS increase, carrier velocity begins to saturate, and transfer (I DS − V GS ) and output characteristic (I DS − V DS ) show a very small increase in drain current with a further increase in V DS .The velocity saturation model parameters VSAT, VSAT1, MEXP, and KSATIV accurately capture this effect.At higher V DS and gate voltage (V GS ), we extract the impact of velocity saturation and channel length modulation by minimizing the error between measurement and simulation data of The cryogenic operation of MOS transistors improves the transistor performance due to a reduction in carrier scattering [24].Because of the decrease in carriers' thermal energy at cryogenic temperatures, a lower over-the-barrier transport reduces Sub-threshold Swing (SS) and OFF-state current (I OFF ).Consequently, silicon-based transistor characteristics differ considerably at cryogenic temperatures from 300 K. Some dominant effects at cryogenic temperatures are as follows: nonlinear temperature-dependence in SS characteristics, increase in threshold voltage (V TH ), surface roughness scattering, coulomb scattering, and nonlinear velocity saturation effect [24], [26], [27].To account for these effects in SPICE simulations of FinFETs, we use the model equations presented in [24] along with the industry-standard BSIM-CMG compact model [25].As the existing BSIM-CMG model is based on Maxwell-Boltzmann (MB) statistics, we use the MB statistics, along with the modifications presented in [24].This allows us to capture the impact of Fermi-Dirac (FD) statistics from 300 K to cryogenic temperatures for electron density calculation.Since the effective density of states, surface potential, and charges are highly temperature dependent, we first extract an effective temperature (T eff ) at cryogenic temperatures from the SS behavior with respect to temperature [24].The SS deviates from the Boltzmann limit of kT/q in the cryogenic range, which is caused by the presence of band-tail states [26], [27].To model the SS saturation, the ambient temperature (T amb ) is smoothly clamped to the temperature below which SS starts saturating using the Equation (1) [24], which is subsequently employed to determine the total charge density.
Here, T 0 is the T amb at which SS saturates, and D 0 is a smoothing parameter.To model the impact of an increase in V TH due to band-tail states and an increase in trap states below Fermi energy level, a V TH correction factor is introduced into the existing BSIM-CMG with KT11, KT12, and TVTH as the model parameters [24].The temperaturedependent lattice vibrations decrease at cryogenic temperatures and thereby improve peak mobility.Nevertheless, as the temperature drops, the thermal velocity of the charge carriers also decreases.At cryogenic temperatures, these low thermal velocity carriers experience increased Coulomb and surface roughness scattering, which decreases their effective mobility at very low and high vertical electric fields, respectively.To account for the temperature dependence of these mobility components, the current mobility models, which rely on simple power-law relationships, are substituted with a linear temperature-dependent power law formulation, as demonstrated in Equation (2) [24].
Here, μ P refers to different mobility model parameters, namely U0, UA, and UD.μ P1 and μ P2 represent temperatureindependent parameters of U0, UA, and UD.The influence of surface roughness on mobility at higher electric fields is taken into account by EU, which is modeled as a linear temperature-dependent parameter.The impact of Coulomb and surface roughness scattering at cryogenic temperatures has been incorporated into the extracted mobility model by optimizing the following temperature coefficient of the mobility model: UD1, UD2, UA1, UA2, EU1, etc.To account for the non-linear temperature dependency of the saturation velocity, the parameters related to velocity saturation effects, such as VSAT and MEXP, are modeled using Equation (3).This formulation enables a more accurate representation of the impact of temperature on the saturation velocity [24].

P(T) = P(T nom
In Equation (3), the parameters P T , P T1 , and P T2 correspond to AT, AT1, and AT2, respectively, for the temperature dependence of both VSAT and VSAT1.Similarly, for the temperature dependency of MEXP, these parameters represent TMEXP, TMEXP1, and TMEXP2, respectively.Lastly, the parameters P T , P T1 , and P T2 represent KSATIVT, KSATIVT1, and KSATIVT2 for the temperature dependence of KSATIV.Fig. 2 presents the model validation with experimental data from 10 K to 300 K.At lower gate voltage, the measurement's intrinsic randomness causes noisy data.Noise in the measurements at low currents is the likely cause of discrepancies between the simulated and measured results.

B. IMPACT OF VARIATION ON TRANSISTOR PERFORMANCE
The characteristics of bulk FinFETs are significantly affected by different sources of fluctuation, including interface traps, work function variation, process variations, and random dopants [28].The impact of device variability is becoming more significant in advanced FinFET technologies and circuits as a result of the stringent circuit design margin driven by the constant down-scaling of transistor dimensions and supply voltage.To characterize the effects of random variations, we have measured the DC transfer characteristics of FinFETs fabricated at different dies on a single wafer.Fig. 3(a) and 3(b) present the I DS − V GS measurement results of p-FinFET and n-FinFET, respectively, operating in linear and saturation regions.We observe that variations impact the I DS − V GS of p-FinFET and n-FinFET to different extents at different V DS .For example, the p-FinFET shows V TH fluctuation (σ V TH ) of 5 mV and 26 mV for linear and saturation while n-FinFET exhibits σ V TH of 9 mV and 18 mV.

V. SRAM RELIABILITY ANALYSIS
The nanoscopic dimensions and complex fabrication process of state-of-the-art FinFET technologies have significantly increased the impact of process variations and inhibited performance improvement.The quantized transistor dimensions in the High-Density Cell (HDC) exacerbate variabilityinduced yield reduction.The following operational issues in SRAM govern the yield: (1) Read failure, defined as switching the cell content during reading (2) Write failure, the inability to write into a cell (3) Hold failure is flipping the cell state while holding the data and (4) Access failure defined as the unacceptable increase in cell access time.
In this work, we focus on the first three issues, i.e., hold, read, and write noise margins.For this purpose, the mismatch or statistical variability of the transistors is included in the SRAM cell analysis by incorporating the measured σ V TH in the BSIM-CMG compact model.Authors in [29] reported that silicon lattice constant experiences a mere 0.022 % change at cryogenic temperatures.Therefore, in this work, we assume the lattice constant of silicon to be constant across all temperatures.As the process variations arise during the device fabrication, with the assumption mentioned above, geometry variation will also not change much at cryogenic temperatures.Previous works also report negligible temperature dependence on V TH variability [30], [31], [32].Hence, we apply the σ V TH of p-FinFET, and n-FinFET extracted at 300 K in cryogenic SRAM simulations.We use the Monte-Carlo analysis in our SRAM framework to characterize the variability impact on the Static Noise Margin (SNM) of the HDC.The Monte-Carlo simulations were performed for varying V DD and at multiple temperatures ranging from 10 K to 300 K to assess the impact of V DD and temperature variations.
The schematic view of the conventional six-transistor SRAM (6T-SRAM) cell comprised of two access transistors (PG1 and PG2) and one cross-coupled inverter pair is shown in Fig. 4. The inverter pair in the SRAM keeps the cell in a bi-stable state while holding the data stored inside it.Voltage fluctuation due to the noise at the input node of inverters degrades the ability of the cell to store the data.The SRAM cell can withstand a certain voltage noise during the hold operation before switching internal states, defined as Hold Noise Margin (HNM).To read the data stored in the cell, we pre-charge the bit line (BL) up to V DD .The pre-charged BL starts discharging after Word Line (WL) activation if the data stored in the cell (voltage at node SL) is 0. The maximum amount of static noise at which a cell can retain the stored data during a read operation is known as the Read Noise Margin (RNM).The lowest BL voltage needed to modify the internal cell state during the write operation is called Write Noise Margin (WNM).During the write operation, initially, the data that needs to be written is transferred on the BL and complement bit line (BLB).Subsequently, access transistors are turned ON by WL activation to access the internal storage node of the cell.
Fig. 5 shows the statistical spread in noise margins for the HDC from 10 K to 300 K, at nominal supply voltage (V DD,nom = 0.75 V).Cryogenic temperatures result in a lower I OFF and improve the data retention capacity of SRAM cells.Fig. 5(a) shows that the HNM at 10 K has a higher mean (μ) value than at 300 K, which reflects the increase in HNM at cryogenic temperatures.
In a read operation, n-FinFET of the pull-down-network (PD2) and access transistor (PG2) form a voltage divider, and the voltage of node SR (V SR ) starts increasing.This leads to an increase in the sub-threshold leakage current of the n-FinFET (PD1) and lowers the voltage at node SL (V SL ).It further results in an increase in V SR , such that if V SR exceeds the switching threshold of the left side inverter, then the SRAM cells' state flips.This results in a destructive or unstable read operation.Therefore, the worst-case scenario due to process variations for read stability arises when the access transistor (PG2) and pull-up transistor (PU2) become strong and PD2 becomes weak.Fig. 5(b) presents variations' impact on RNM at three different temperatures.Since at cryogenic temperatures, the leakage current is considerably small compared to 300 K, the read stability of the cell improves, and we observe a relative increase in the RNM at cryogenic temperatures Fig. 6(a)).However, as SS starts saturating below 77 K Fig. 2), further lowering the temperature does not show the same amount of improvement in I OFF and subsequently in noise margins, as we observe from 300 K to 77 K. Cryogenic temperature also results in higher V TH , which increases the on-resistance (R ON ) of the PD2 and PG2 transistors.The rise in R ON of PD2 causes an increase in V SR , and higher V TH of PG2 decreases the current flowing into the node V SR .The hindrance imposed by PG2 in the current flow is more crucial and further leads to an increase in read stability.
During the write operation (e.g., writing '0' in the cell), PU1 and PG1 conduct.In this case, if V SL falls below the right side inverter's switching threshold, a successful write takes place.At room temperature, a higher drain current helps in the discharging of V SL .Hence, the WNM of the SRAM is larger at 300 K compared to cryogenic temperatures (Fig. 5(c)).With an increase of V TH at cryogenic temperatures, the switching threshold of the right side inverter of SRAM increases (improves the writing ability).However, higher V TH of PG1 increases the V SL and leads to degradation of write stability.As V TH increase in the n-FinFET is higher compared to p-FinFET, we observe a stronger impact of PG1 on the WNM at cryogenic temperatures.Fig. 6(a) shows that the μ of WNM reduces from 178 mV to 162 mV for a temperature change from 300 K to 10 K.The HNM and RNM μ increases from 244 mV to 297 mV and 84.7 mV to 109.2 mV, respectively, with temperature decrease.Fig. 6(b) and 6(c) show that the variation in the temperature has a negligible impact on the normalized standard deviation (σ/μ) for V DD,nom .However, at TABLE 1. SNM of SRAM for different temperatures at VDD = 0.75 V lower supply voltage (V DD ), we observe a minor temperature dependency on σ/μ.The percentage variation in σ/μ shows that the characterized SRAM cell at V DD,nom is highly resilient to temperature variations.Table 1 summarises the impact of process variations and temperature on the SRAM SNM.

VI. SRAM IN-MEMORY COMPUTING A. TCAM ARRAYS FOR HAMMING DISTANCE COMPUTATION 1) SINGLE TCAM CELL
To implement a single TCAM cell, 4 CMOS-transistors along with two Static Random Access Memory (SRAM) cells (S1 and S2) are used as shown in Fig. 7(a) [33].The data of a TCAM cell (C) is stored by the two SRAMs in a complementary fashion.For example, at C = 1, S1 and S2 are in the logical 1 and 0 states, which are read out at the labeled nodes ('L' and 'R'), respectively.Although S1 holds 1, the read-out nodes are placed on the negated side to ensure correct functionality.Therefore, in this example, 'L' holds the inverted value of S1, i.e., 0. Before the lookup, the Match Line (ML) is pre-charged to V DD .Then the query data Q is applied to Query Line (QL) (corresponding to left/S1) and inverted data to Query Line Bar (QLB) (corresponding to right/S2).When C = Q match, the inverted 'L' and QL are complementary (C = Q); therefore, no conductive path forms from ML to GND.Similarly, the non-inverted 'R' and inverted QLB (C = Q) are complementary on the right-hand side.In this case, the TCAM cell is OFF as both discharge paths are blocked, and the voltage stays high (output 1).In the case of a miss, either the left or the right discharge path is active as their associated transistors are 1 at the same time, forming a conductive path to discharge ML.The TCAM cell is ON, and the output is 0.

2) TCAM ARRAY FOR HAMMING DISTANCE CALCULATION
Using the TCAM cells and combining them via a shared ML forms a block.Inside such a block, all TCAM cells share the same periphery and access logic as shown in Fig. 7(c).This includes the Bit Line (BL) and Word Line (WL) to write the data into the cells.Typically, the write operation is a one-time initialization phase for associative memories, whereas this work focuses on the read-out.Hence, we exclude the write operation from the evaluation, and the SPICE circuit implementation simplifies by using individual voltage sources for BL and WL.
The shared ML of all TCAM cells within a single block is an integral part of the block design.After the ML is pre-charged, a query bit string is applied through the respective lines to the block.Subsequently, each individual TCAM cell compares stored and query data as described in Section I.A miss leads to a conductive path and discharges ML.Due to the parallel circuit configuration of the TCAM block, more misses form more parallel conductive paths, leading to a faster ML discharge as the total resistance decreases.Consequently, the discharge rate is proportional to the number of cells reporting a miss.Computing the sum of bit-wise similarity checks is more widely known as the Hamming distance and can be performed with such a TCAM block.To derive a sharp, distinct output signal depending on the discharge rate, the ML is connected to  a Clocked Self-Referenced Sense Amplifier (CSRSA) with its schematic shown in Fig. 7(b) [33].It converts the discharge rate from the voltage domain to the temporal domain (i.e., from how fast to when the voltage drops).Thus, the operation latency determines the Hamming distance of the block, and an example of a block with 10 cells and all possible Hamming distances is shown in Fig. 7(d).It can be observed that the margins between the misses decrease, which makes differentiating the individual misses increasingly harder.With the decreasing margins, the variability tolerance decreases, e.g., coming from process variations, which we will evaluate next.

B. SRAM-BASED IN-MEMORY BOOLEAN COMPUTATION
X-SRAM is an enhanced version of 8 + T SRAM that incorporates a modified peripheral circuit to perform IMC [16].Fig. 9(a) and 9(b) illustrate the modified 8 + T SRAM and Sense Amplifier (SA), respectively.Unlike the conventional 6T SRAM cell, X-SRAM employs decoupled read-write ports, allowing simultaneous activation of two read word lines (RWLs) without any read disturb issues.During the read operation of X-SRAM, read bit-lines (RBL and RBLB) are first pre-charged to V DD .Subsequently, the RWLs corresponding to the desired rows are activated.RBL or RBLB discharges depending on the stored value of the bit-cell (either "1" or "0").An asymmetric differential SA is utilized to detect the voltage difference between RBL and RBLB, and further generate stable voltages at its outputs.To make the differential SA asymmetric, transistors connected at RBL and RBLB are made of different widths.This asymmetry leads to varying discharge rates of the SA output nodes (OUT and OUT) based on the current-carrying capabilities of the transistors.This modification enables the implementation of bit-wise NAND and NOR operations using the stored data.
The Boolean operations within X-SRAM can be demonstrated through the following examples, where we discuss the OR/NOR operation.The read operation of X-SRAM is similar to a conventional 6T SRAM cell.To perform the read and IMC operation, first, we precharge both of the RBLs to V DD , and next, we activate the RWLs of the respective cells, followed by enabling the SA.For IMC, two RWLs corresponding to two different bit cells are activated at the same time.In the case when both cells store logic "0" or "1", RBLB (and subsequently "OUT") or RBL (and subsequently "OUT") discharges from V DD (logic "1") to 0 V (logic "0").However, when the stored data in the two rows differ (one storing "0" and the other storing "1," or vice versa), both RBL and RBLB discharge simultaneously.If we increase the width of the transistor in SA that is connected at RBLB, then the complementary output node of SA (OUT) discharges faster, causing the SA output node (OUT) to stabilize at logic "1."Based on the aforementioned discussion, it can be concluded that when the transistor connected to RBLB in the SA is wider than the transistor connected to RBL, the SA generates the OR gate output at the node OUT and the NOR gate output at the node OUT.Similarly, by making the transistor connected to RBL wider than the one connected to RBLB, the AND/NAND output of the stored values can be obtained.Furthermore, by incorporating an additional NOR gate at the AND/NAND and OR/NOR outputs of SA, the XOR operation can be performed on the stored values within the memory.

C. EVALUATION OF IMC TEMPERATURE DEPENDENCE
To evaluate the temperature impact on the described inmemory compute scheme, we apply process variations to the TCAM cell impacting their drive strength in the mismatch case.The variation in discharge rate due to process variations will consequently affect the operation latency of the CSRSA.Here, operation latency is the time interval within which the CSRSA is enabled, and its output reaches 10 % V DD .Fig. 8(a) and 8(b) show the operation latency distribution per Hamming distance at 300 K and 10 K, respectively.While the heaps for the Hamming distance of 1 bit and 2 bit are clearly separated in Fig. 8(a), with further increase of the Hamming distance, the margins reduce, and the heaps start to overlap more.

1) IMPACT OF CRYOGENIC TEMPERATURES ON ERROR PROBABILITY IN TCAM ARRAY
To quantify the overlap between heaps of two Hamming distances, we calculate error probabilities in the Hamming distance.First, we use the nominal cases to place boundaries halfway between neighboring operation latencies.The boundaries serve as ranges for the latency intervals associated with the respective Hamming distances.We then sort the Monte-Carlo samples of each Hamming distance into the ranges and count how many samples are outside the correct range.The distributions of the Hamming distances are closer to each other at 10 K, as shown in Fig. 8(b), compared to the 300 K case shown in Fig. 8(a).Fig. 10(a) presents that the error probability increases with increasing Hamming distances.The error probabilities reach the maximum at the second to last Hamming distance.As the last Hamming distance is only bounded on one side, it can only overlap to one side, drastically reducing the error probability.
In the preceding paragraph, we provided an explanation of our methodology for calculating the error probability in the TCAM array.In the subsequent text, we will delve into the underlying factors that contribute to the increased error probability observed at cryogenic temperatures.The input signals are connected at QL and QLB, which act as the gate voltage of M1 and M4, respectively, as shown in Fig. 7.The rise time (T rise ) of the input signals (QL/QLB) decides the rate of increase in V GS for M1 and M4.M1 (or M4) will turn on as soon as the input voltage reaches the V TH of these transistors.After M1 (or M4) is turned on, there will be a significant amount of current flow in the discharge path, and ML will start discharging.As the discharge process is very fast, some of the discharge work is done while the input signals are still pulled up.Therefore the discharge process heavily depends on the rise time of the input signals and clock signal (CLK).Due to an increased V TH at cryogenic temperatures, the I DS at 10 K for V GS ≤ 0.55 V in saturation regime is smaller than the I DS at 300 K Fig. 2(b), resulting in higher operation latency.This operation latency increases at lower temperatures, especially for higher Hamming distances Fig. 8(a) and 8(b).These higher latencies increase the overlaps between distributions of consecutive Hamming distances Fig. 8(b).This leads to a maximum error probability increase of 1.65 × at 10 K compared to 300 K Fig. 10(a).

2) IMPACT OF CRYOGENIC TEMPERATURES ON POWER DISSIPATION OF TCAM ARRAY
At cryogenic temperature, due to the Fermi-Dirac Statistics, the probability of finding an electron in the conduction band reduces significantly.Consequently, there are not enough high-energy electrons to climb the source-to-channel barrier, and hence, at a constant V GS , there is a very small electron concentration in the conduction band.This reduction results in a substantial improvement in the OFF state leakage current of the transistors.The higher static power consumption is a major drawback in SRAM-based circuits.The significant reduction in the I OFF at cryogenic temperatures helps in mitigating the static power consumption of SRAM cells and, subsequently, the total power consumption of SRAM-based TCAM array.Fig. 10(b) presents the total power consumed during the search/read operation of a 1×10 TCAM array.An approximately five-order of reduction in I OFF at cryogenic temperatures results in ∼50 % improvement in power consumption.

3) IMPACT OF TEMPERATURE ON 8 + T SRAM ARRAY PERFORMANCE
In Section VI-B, we have discussed the architecture and working principle of X-SRAM to perform the basic Boolean operations.Here, we present the performance evaluation of 5 nm technology-based 1×32 X-SRAM array at both room and cryogenic temperatures.Fig. 11 illustrates the results of the SA output for different Boolean operations, including OR, NOR, AND, and NAND.It is observed that operating at cryogenic temperatures results in increased latency for most of the IMC operations in X-SRAM.Initially, both outputs of SA (OUT and OUT) decrease simultaneously until they surpass the threshold of the p-FinFET in the crosscoupled inverter pair.Beyond this point, the signals diverge, and the positive feedback loop in cross-coupled inverters guides them into their respective stable states.At cryogenic temperatures, a greater reduction is necessary (due to the increased threshold voltage), leading to increased delay.To compare ourselves with the existing literature, we present a comparison of operating latencies at 300 K in Table 3.The 5 nm technology-based X-SRAM is faster than the previously reported X-SRAM and is indeed suitable for the fast interface necessary for classification in quantum computers.

VII. DESIGN GUIDELINES FOR CRYOGENIC TEMPERATURES
In the previous section, we observed that lowering the temperature increases the operating latencies and error probability for the higher number of mismatch cases.In this Section, we present two methods to improve the reliability of the designed TCAM array.

A. TRANSISTOR OPTIMIZATION
At cryogenic temperatures, an increase in V TH results in a higher overlap between two consecutive heaps of Hamming distances, as depicted in Fig. 8(b).However, this increase in V TH can be mitigated through work function engineering.By doing so, one can achieve the iso-I OFF operation at cryogenic temperature (i.e., I OFF at a cryogenic temperature is similar to I OFF at 300 K).Additionally, work function engineering results in an increase in the transistor's current at all V GS , as illustrated in Fig. 12. Fig. 10(a) shows that when transistors operate at the iso-I OFF condition, the error probability at 10 K significantly reduces.The higher current levels achieved through work function engineering contribute to lower operation latency and an error probability approximately 2.94 × lower than that observed at 300 K.

B. SEARCH PULSE RISE TIME OPTIMIZATION
As the transistor I DS has an exponential dependence on V GS in the sub-threshold region and quadratic dependence in the strong inversion region, one can expect a higher impact of process variations in the sub-threshold region.In a TCAM cell, the ML starts discharging even before the voltage at QL or QLB reaches V TH of M1 or M4, respectively.For a faster QL signal (smaller T rise ), ML discharging takes place when the transistor operates only around V GS = V DD .On the other hand, in the case of the slower QL signal (higher T rise ), the ML discharging process involves transistor operation from sub-V TH to above-V TH to strong inversion.Hence, we observe the higher impact of process variation in the case of a slower QL signal.The reduction in T rise of QL/QLB results in a higher voltage at the gate terminal of M1/M4 at a particular time instant.This leads to operating the transistor more in the inversion than in the sub-threshold region.Hence, lower impact of variations, higher I DS , and a faster discharge of the ML.The ML discharge with a smaller T rise results in a smaller overlap between the distributions of Hamming distances.Fig. 13(a) demonstrates that a smaller T rise gives lower error probabilities for TCAM cell operating at all temperatures.The faster discharge of ML also helps to lower the operating latency and results in reduced overall power consumption, as shown in Fig. 13(b).

VIII. CONCLUSION
In this work, we have analyzed the 5 nm FinFETs-based IMC circuits at cryogenic temperatures for the first time.To do so, we have experimentally characterized the 5 nm technology FinFETs from 10 K to 300 K and the impact of process variations at 300 K. We have shown the impact of process variations and cryogenic temperatures on SRAM noise margins and TCAM error probabilities.We have presented that TCAM cells exhibit higher error probabilities at cryogenic temperatures than 300 K.The impact of the process variations on TCAM cells can be minimized by reducing the T rise of the QL/QLB signal and operating the transistors at iso-I OFF conditions.Transistors operating in iso-I OFF condition resulted in 2.94× lower error probability at 10 K compared to 300 K.Only a few ps of delay in the X-SRAM array constructed using 5 nm technology highlights the suitability of SRAM-based IMC circuits for Cryogenic CMOS circuitry in the interfacing layer of quantum computers

FIGURE 1 .
FIGURE 1. On-wafer Lakeshore's Cryogenic Probe Station.During measurements, B-1500A and CCR units are used to precisely control the voltages and temperature, respectively.

FIGURE 2 .
FIGURE 2. Transfer characteristics of p-and n-FinFET for multiple temperatures ranging from 10K to 300K in (a) Linear (VDS = 50mV) and (b) Saturation (VDS = 750mV).Symbols and lines show the data from measurements and calibrated model simulations.

FIGURE 3 .
FIGURE 3. (a and b) present the measurement results of p-FinFET and n-FinFET, respectively, and show the impact of process variations on IDS − VGS at 300K.The measurements are done for both linear (VDS = 50mV) and saturation (VDS = 750mV).

FIGURE 4 .
FIGURE 4. Schematic view of the 6T-SRAM cell used later in the TCAM array.

FIGURE 5 .
FIGURE 5. Impact of process variations on High-Density Cell SRAM (a) hold (b) read and (c) write noise margin for temperatures ranging from 10K to 300K at VDD = 0.75V.

FIGURE 6 .
FIGURE 6.(a) Impact of temperature and supply voltage on Mean value of the SNM.The normalized standard deviation for (b) hold and write noise margin and (c) read noise margin.

FIGURE 7 .
FIGURE 7. (a) Standard 16T TCAM cell schematic.(b) Clocked Self-Referenced Sense Amplifier (CSRSA) schematic, with CLK being the enable signal.(c) TCAM block with variable cell count.Only the query line (QL, input) and match line (ML, output) are drawn.(d) Output voltage waveforms of the CSRSA for a block size of 10 bit at room temperature.

FIGURE 8 .
FIGURE 8. Impact of process variations on operation latency at (a) 300K, and (b) 10K.Simulation results are from a block size of 10 bit with the input signal (QL/QLB) rise time (Trise ) of 50 ps.We perform 1000 Monte-Carlo SPICE simulations for each Hamming distance.

FIGURE 9 .
FIGURE 9. Circuit schematic of (a) the 8 + T SRAM and (b) asymmetric differential sense amplifier.

FIGURE 10 .
FIGURE 10.Impact of cryogenic temperatures on (a) error probabilities and (b) power consumption with Trise of 50 ps for the input signals (QL/QLB).Dashed line with symbol shows the simulation result at 10K for iso-IOFF condition.Here, iso-IOFF refers to the condition when IOFF of the transistor at 10K is equal to its IOFF at 300K.

FIGURE 13 .
FIGURE 13.Impact of input signal rise time on (a) error probabilities and (b) power consumption at 300K and 10K.