Cryogenic Embedded System to Support Quantum Computing: From 5-nm FinFET to Full Processor

Quantum computing can enable novel algorithms infeasible for classical computers. For example, new material synthesis and drug optimization could benefit if quantum computers offered more quantum bits (qubits). One obstacle for scaling up quantum computers is the connection between their cryogenic qubits at temperatures between a few millikelvin and a few kelvin (depending on qubit type) and the classical processing system on chip (SoC) at room temperature (<inline-formula><tex-math notation="LaTeX">$300 \,\mathrm{K}$</tex-math></inline-formula>). Through this connection, outside heat leaks to the qubits and can disrupt their state. Hence, moving the SoC into the cryogenic part eliminates this heat leakage. However, the cooling capacity is limited, requiring a low-power SoC, which, at the same time, has to classify qubit measurements under a tight time constraint. In this work, we explore for the first time if an off-the-shelf SoC is a plausible option for such a task. Our analysis starts with measurements of state-of-the-art 5-nm fin-shaped field-effect transistors (FinFETs) at 10 and <inline-formula><tex-math notation="LaTeX">$300 \,\mathrm{K}$</tex-math></inline-formula>. Then, we calibrate a transistor compact model and create two standard cell libraries, one for each temperature. We perform synthesis and physical layout of a RISC-V SoC at <inline-formula><tex-math notation="LaTeX">$300 \,\mathrm{K}$</tex-math></inline-formula> and analyze its performance at <inline-formula><tex-math notation="LaTeX">$10 \,\mathrm{K}$</tex-math></inline-formula>. Our simulations show that the SoC at <inline-formula><tex-math notation="LaTeX">$10 \,\mathrm{K}$</tex-math></inline-formula> is plausible but lacks the performance to process more than a few thousand qubits under the time constraint.

Operating CMOS circuits at cryogenic temperatures is riddled with a unique set of challenges pertaining to power optimization. This is because the circuits face power dissipation constraints at low temperatures. If existing power constraints are unaddressed, the resulting heat can affect not only the state of the qubits but also lead to the worst-case outcome of qubit destruction. Hence, the power constraints have the highest priority, even higher than the achievable clock frequency.
Circuits operating at cryogenic temperatures require an extremely tight power budget. The control circuits strictly function within a power upper limit of only 100 mW at a temperature of 10 K, further lowering to 10 mW at 0.1 K [5]. In addition, computations have to be fast enough to satisfy the time constraints dictated by the short qubit coherence time. Given these caveats, a sophisticated and reliable CMOSbased circuit operating at cryogenic temperatures has to express the characteristic features of processing qubit information: 1) extremely rapidly and 2) at ultralow power.

C. NEED FOR A CRYOGENIC SYSTEM ON CHIP (SOC)
An SoC combines the control and readout circuitry for the manipulation and measurement of qubits with a generalpurpose processor. The addition of this processor enables the execution of arbitrary software codes, removing the dependency on dedicated hardware for every task or connections with the 300 K domain. This advantage was recently recognized by Intel and they demonstrated the first cryogenic SoC for quantum computing [6]. However, their focus was on the implementation of the qubit circuitry, and they did not evaluate the capabilities of the included processor. Thus, it is unclear what processing can be done under the strict power budget that can be spared at cryogenic temperatures. Classification of the quantum measurements is not the only task for the classical processing part of the circuit. A general-purpose processor with its high flexibility is required to enable crucial tasks, such as running calibration protocols, loading the next quantum computation, and improve the runtime of popular quantum computing paradigms relying on classical processing, such as dynamic circuits [7] or variational quantum algorithms [8], [9]. Ultimately, to achieve fully error-corrected quantum computers, complex quantum error correction protocols have to be executed.
Dedicated hardware solutions for each of these tasks are costly and slow to develop. In the fast-paced field of quantum computing, the hardware could be outdated before it was even deployed. The processor included in [6] is much more flexible, but might miss an important instruction or be limited in the available memory. A new cryogenic SoC would have to be designed. Off-the-shelf SoCs, designed for room temperature use, are available in a wide range of specifications and capabilities and could quickly be swapped in and out, depending on the requirements of the tasks. However, the question arises if deploying such SoC is even plausible in a quantum system, with power consumption (i.e., heat dissipation) and processing speed being critical. Both are impacted as a CMOS transistor at 10 K exhibits different power and timing characteristics.

D. NEED FOR A CRYOGENIC-AWARE TRANSISTOR COMPACT MODEL
State-of-the-art SPICE models do not capture the indisputable influence of unconventionally low temperatures on the physics of the semiconductor transistors. The underlying changes are marked by a decrease of leakage current and transistor subthreshold swing (SS), and an increase of carrier mobility and transistor threshold voltage. Thus, SPICE models are ill-equipped to account for the aforementioned fundamental processes at cryogenic temperatures, and research in this direction is presently infantile. Without a cryogenicaware transistor compact model, not only correct SPICE simulations are not possible, but standard cell library characterization (which is indispensable for creating cryogenic-aware cell libraries for logic synthesis) is also not possible.

E. OUR MAIN CONTRIBUTIONS WITHIN THIS WORK
We are the first to explore the plausibility of deploying an offthe-shelf SoC at cryogenic temperatures for the classification of the quantum measurements. Fig. 1 provides an overview of our work and serves as an outline. To enable our exploration, we first measure the characteristics of a state-of-theart 5-nm fin-shaped field-effect transistor (FinFET) transistor at room temperature (300 K) and at cryogenic temperature (10 K). Using those measurements, we calibrate the modified cryogenic-aware Berkeley Short-channel IGFET Model common multigate (BSIM-CMG) transistor compact model to reproduce the measurements. Two standard cell libraries are characterized by employing this new compact model. We perform logic synthesis and physical design of a RISC-V SoC with the 300 K standard cell library as a baseline for the off-the-shelf system. Then, we perform power and timing analysis using the 10 K library to explore the impact of this significant change in temperature on the SoC. Finally, we simulate the execution of two classification algorithms to answer the question if an off-the-shelf SoC can classify the qubit measurements under tight power and time constraints.

II. OPERATION OF QUANTUM COMPUTERS
An n-qubit quantum computer can store, manipulate, and measure an n-qubit quantum state |ψ defined as where |x are basis states and α x are complex probability amplitudes whose modulus squared sum up to one [10]. Upon measuring |ψ , the bit string x, corresponding to the basis state |x , is read with probability |α x | 2 . The target quantum state must, therefore, be prepared and measured repeatedly to obtain a precise distribution of the measurement bit strings. When using a quantum computer to solve problems, such as integer factorization [2] or the simulation of a molecule [1], one or multiple quantum states are prepared and measured sequentially to estimate the desired solution.
A quantum computer must typically be calibrated before it can start manipulating and measuring quantum states. During calibration, the quantum state manipulation primitives are fine-tuned to the prevailing operational parameters of the quantum computer, and a classifier is trained that maps the electrical signal of the quantum computer's measurement apparatus to its corresponding bit value [11], [12]. For the IBM quantum computers based on superconducting qubits, typically a boxcar integrator is used to project the measurement signal onto the I/Q plane where a single-qubit measurement results in a complex number that is represented as an in-phase and quadrature component [11], [12]. As seen in Fig. 2, the measurement signals of the ith qubit are generally in close proximity in the I/Q plane if they correspond to the same measurement outcome. The measurement classifier is trained by the data obtained through preparing and measuring each qubit individually in the |0 and |1 basis state while ignoring the remaining qubits. After calibration, the quantum state manipulation primitives and the measurement classifier are available for quantum computations.
IBM Quantum currently offers the largest quantum computer based on 127 superconducting qubits over the cloud via their qiskit framework [13] and plans to build a quantum computer with over 4000 qubits by 2025 [14]. The classified and also the raw I/Q plane measurement data of quantum computations are accessible through the qiskit framework and used in this work.

III. CRYOGENIC CMOS TRANSISTORS
Operating a CMOS transistor at cryogenic temperatures offers multiple advantages. A smaller SS, lower leakage current, higher mobile charge carriers' mobility, reduction in thermal noise, and parasitic resistances are a few to name. The smaller SS leads to near-ideal steep switching and reduction in over-the-barrier charge carriers' transport results in a lower off-state current I OFF and higher mobility due to lower carrier scattering results in a higher on-state current I ON . These remarkable improvements in transistor characteristics are not new and have been an active area of research since the early 1980s [15], [16]. However, cryogenic temperature also results in some challenges, such as a higher threshold voltage V th of the transistor, carrier freeze-out in the substrate, and kink in the drain current [15], [16]. Ongoing scaling of the CMOS technology driven by Moore's law has reduced the minimum feature size to 5 nm. These reduced dimensions result in a higher mismatch between the electrical characteristics of the two identical transistors fabricated on the same chip. Mismatch in transistor characteristics and V th increase at cryogenic temperature are major challenges faced by circuit designers and affect the circuit design significantly [17].
Authors in [18] and [19] showed that transistors fabricated using 160 and 40 nm bulk CMOS technologies result in an almost equal amount of performance improvement. However, the 40 nm technology with higher gate control and improved short-channel effects outperforms the older generation technologies at both 300 and 4 K. The authors in [20] and [21] showed that FinFETs from both 14-and 10-nm technologies can offer a significant power reduction while operating at cryogenic temperatures for a similar speed. A detailed study on the impact of ionized donor impurity on 10 nm technology node-based transistors suggests that direct transport through individual dopants results in a higher leakage current [22]. However, in our measurements, we have not observed the impact of resonant tunneling due to ionized dopants. Although the authors in [20] and [21] reported the FinFETs cryogenic characterization, these studies were limited up to 77 K. Han et al. [23] presented the 16-nm FinFET cryogenic characterization from 2.5 to 300 K. In our previous work [24], we have characterized 5-nm FinFET technology at 300 and 10 K.

A. TRANSISTOR COMPACT MODEL CALIBRATION AND VALIDATION FOR 5-NM FINFET TECHNOLOGY
As described in our previous work [24], we obtain the process-dependent model parameters, such as doping, oxide thickness, and gate material work function after setting the appropriate simulation environment. The subthreshold behavior of the transistors is affected by interface trap charges and source-drain coupling. Subthreshold characteristics of the measured FinFETs at room temperature (300 K) are captured by BSIM-CMG [25] model parameters for work function (PHIG), interface traps (CIT), and coupling capacitance between source/drain and channel (CDSC). The low-field mobility U0 and field-dependent mobility degradation parameters (i.e., UA, UD, EU, and ETAMOB) are extracted from the transfer characteristics (drain-source current I DS and gate voltage V G ) when the transistor operates at a low V DS and moderate inversion (see Fig. 3). Subsequently, series resistances model parameters (RSW, RDW, RSWMIN, and RDWMIN) from the strong-inversion regime (higher V G ) are also obtained.
To capture the impact of drain-induced barrier lowering, we use the ETA0, PDIBL2, and CDSCD model parameters. Model parameter optimization is achieved by observing the I DS -V G (see Fig. 3) characteristics at lower and higher V DS . As V DS increases, carrier velocity begins to saturate. With a further increase in V DS , transfer (I DS -V G ) and output characteristic (I DS -V DS ) show a slight increase in drain current. This is realized through the velocity saturation model parameters VSAT, VSAT1, MEXP, and KSATIV. At higher V DS and V G , the impact of velocity saturation and channel length modulation is captured by minimizing the error between measurement and simulation data of I DS -V G and I DS -V DS .
Metal-oxide semiconductor (MOS) transistor performance at cryogenic temperatures (10 K) improves with the reduction in carrier scattering [26]. Cryogenic operations result in a very small electron concentration in the conduction band at the same V G because of Fermi-Dirac statistics (probability of finding an electron in conduction band reduces drastically with reduction in temperature), and there are simply not enough high energy electrons to climb the barrier, which reduces over-the-barrier transport. This decrease results in a huge improvement in SS and reduction of I OFF . This causes a drastic change in the fundamental characteristics of semiconductor transistors at cryogenic temperatures, relative to 300 K. Some dominant effects at cryogenic temperatures are as follows: nonlinear temperature dependence in SS characteristics, increase in V th , surface roughness scattering, Coulomb scattering, and nonlinear velocity saturation effect [26], [27], [28]. For example, our measurements show a 47% and 39% increase in V th for n-channel FinFet and p-channel FinFET at cryogenic temperatures. Nevertheless, V th remains at a low value due to the ultralow-V th transistors. To account for these effects in SPICE simulations of FinFETs, we use the model equations presented in [26] along with the industry-standard BSIM-CMG compact model [25]. It describes the behavior of a transistor through the underlying physics-based models that take carefully into account many necessary aspects, such as temperature dependency, short-channel effects, and quantum confinement, among others. This allows for an accurate and detailed modeling with which experimental measurements can be reproduced.
As the existing BSIM-CMG model is based on Maxwell-Boltzmann statistics, we use it along with the modifications presented in [26]. For electron density calculation, this model captures the impact of Fermi-Dirac statistics from 300 K to cryogenic temperatures. The effective density of states, surface potential, and charges are highly temperature dependent, and thus, we first obtain an effective temperature at cryogenic temperatures [26]. The nonlinearity in the SS is caused by the band-tail effect [27], [28]. Recently, source-to-drain tunneling has been proposed as a possible mechanism for the SS saturation at low temperatures and could be the major cause of higher leakage current [29]. In this work, the impact of the band-tail effect and traps on SS and V th is captured using T0, D0, KT11, KT12, and TVTH model parameters [26].
Peak mobility is enhanced as the temperature-dependent lattice vibration decreases at cryogenic temperatures and the thermal velocity of the charge carriers decreases. The effective mobility of charge carriers with lower thermal velocities decreases at higher vertical fields due to increased surface roughness scattering. Through the optimization of temperature coefficients for Coulomb scattering (UD1 and UD2), and the temperature coefficients for phonon/surface roughness scattering UA1, UA2, and EU1, the impact of Coulomb and surface roughness scattering is accounted for in the mobility model. Additional model parameters are used to obtain the nonlinear temperature dependency of the velocity saturation and pinch-off voltage. Those parameters include the effective drain-to-source voltage (V dseff ) smoothing TMEXP, its temperature coefficients TMEXP1 and TMEXP2, the temperature coefficients for the saturation velocity (AT, AT1, and AT2), and KSATIVT, KSATIVT1, and KSATIVT2 to model the temperature dependence of the channel pinch-off effect. The model is validated against experimental data, as shown in Fig. 3. Intrinsic randomness of the measurements is observed at lower V G and is the likely cause of discrepancies between the simulated and measured results at lower current.

IV. CRYOGENIC-AWARE STANDARD CELL LIBRARIES
Standard cell libraries bridge the gap between the modeling of individual transistors and the design of complex digital circuits. They are indispensable for the indispensable for the electronic design automation (EDA) tool flows to perform logic synthesis. In the standard cell characterization process, our calibrated transistor model is placed into a wide range of standard cells and simulated using a commercial SPICE simulator. The resulting figures of merit are then collected to build standard cell libraries that are fully compatible with the existing commercial EDA tool flows to seamlessly perform logic synthesis, timing signoff, and power signoff.  simulations. During standard cell library characterization, the only parameter changed in the compact model is the number of fins, which acts as a current multiplier [25]. Selfheating is considered in the compact model but has only a negligible impact on the standard cells at 10 K. It is noteworthy that self-heating effects even decrease in advanced bulk FinFET devices at cryogenic temperatures [21]. In contrast, they increase in SOI technologies [30]. Besides the transistor model, SPICE netlists for a wide range of combinational and sequential standard cells are provided to the characterization flow. In this work, we obtain 200 different standard cells from the open-source ASAP7 PDK [31], including parasitic resistances and capacitances. The cells are designed for a 7-nm technology node and, thus, geometrically very close to our 5-nm transistor model.
When integrated into a larger circuit, a single cell can exhibit very different behavior based on its experienced timing arcs, input signal slews, and output load capacitances. To obtain an adequate model of each standard cell for a wide range of conditions, each cell is characterized under 7 ×

B. IMPACT OF CRYOGENIC TEMPERATURES ON THE STANDARD CELL CHARACTERISTICS
With a transistor model calibrated for a wide temperature range, the corresponding standard cell libraries can be generated by adjusting the temperature in the operating conditions. In this work, we generate standard cell libraries for operation at room temperature (300 K) and cryogenic temperature (10 K). With the standard cell libraries at hand, the impact of cryogenic temperatures can be explored at the cell and circuit level. Fig. 5 shows a histogram of all delays occurring in the standard cell libraries. The histograms span data from all cells and conditions stored in the library, giving a holistic picture of the technology under 300 and 10 K in red and blue bars, respectively. Although slight differences are observable, the histograms overlap to a large degree, indicating only minor differences in delay when operated at cryogenic temperature. In addition, average dynamic power is reduced slightly for some cells and increased for others. Most importantly, leakage power is reduced dramatically at 10 K. This behavior can be explained by the electrical characteristics of the extracted transistor model. While temperature merely shows an impact on the I ON of the transistor, I OFF is reduced by multiple orders of magnitude when operating at cryogenic temperatures, as shown in Fig. 3.

V. CRYOGENIC SOC AND APPLICATIONS
To evaluate the plausibility of a full SoC, the whole SoC is synthesized and placed with the room temperature library. Then, the cryogenic-aware standard cell libraries are employed to analyze power and timing at 10 K.

A. SOC DESIGN FLOW
The employed SoC is a fully functional system, including a RISC-V CPU core, caches, and periphery like a memory controller. A single five-stage in-order Rocket CPU [32] is combined with a split L1 cache for data and instructions, each with 16 and a shared L2 cache of 512 . The hardware description language (HDL) code is created with the assistance of the Chipyard framework [33]. Then, a commercial synthesis tool is employed to create a gate-level netlist. At this stage, the previously described 300 K standard cell library is employed. The gate-level netlist is then fed to a commercial place and route tool in combination with the 300 K standard cell libraries.
Static random-access memory (SRAM) arrays, the core building block of L1 and L2 caches among others, are provided through the ASAP7 PDK [31] as internet protocol (IP) cores. However, these IP cores only include the physical size and timing but not their power consumption. We add the missing power values based on our previous work [24]. In [24], we have modeled SRAM cells and peripheral circuitry, such as sense amplifiers and write drivers, based on the same calibrated BSIM-CMG transistor compact model at 300 and 10 K. This enables a complete power estimation for all the SRAM utilized by the SoC. Read and write accesses as well as hold and leakage are included. Quantum computing specific peripheries, such as signal generators, are not included because of their specificity to the physical implementation of the quantum system. The focus of this work is on the impact of a full SoC on the cooling and time budget.

B. QUANTUM MEASUREMENT CLASSIFICATION
To evaluate the dynamic power consumption accurately, two classification algorithms are implemented in C-Code and simulated. While more complex algorithms promise a higher accuracy [34], they are also more computationally demanding. To estimate a baseline, two simpler classification algorithms are selected in this work. First, k-nearest neighbors algorithm (KNN) is a nonparametric clustering method [35]. The calibration phase is performed offline and returns the center points for each qubit, as shown in Fig. 2 and described in Section II. After qubit measurement, the distances of the new data point to its qubit's centers in the IQ plane are calculated. The label (0 or 1) of the nearest center is returned as the result. In this work, the Euclidean distance d is computed between a center (x C , y C ) and the measurement (x M , y M ) with (2).
After the distances to the two centers for 0 and 1 are computed, they are compared and the closest selected as the classification result. The computation of (2) can be optimized because the square root is a linear operation. In other words, the radicand will be larger for a longer distance and, thus, comparing the radicands is sufficient. Hence, the computationally expensive square root operation is unnecessary and removed. Hyperdimensional computing (HDC) is a machine learning method based on large vectors: hypervectors [36]. The components of the vectors can be simple bits, making its implementation very light weight, e.g., the bind operation ⊕ is a binary xor. A point P = (x P , y P ) is encoded into a hypervector with (3) employing the item hypervectors x P and 5500611 VOLUME 4, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
Such item hypervectors are constant and generated once during the program compilation. A size of 128 bits per hypervector is sufficient, and a total of 32 are created to cover the x and y value range. Because each bit in a hypervector is independent, each 128-bit HDC operation can be split into two 64-bit instructions for the 64-bit RISC-V SoC. For each qubit, the center points from the calibration phase are encoded using (3) into C 0 and C 1 . After the measurement phase, each measurement is first quantized and encoded into M. Then, the Hamming distances to its C 0 and C 1 are computed, and the closest selected as the result. The Hamming distance is the popcount of C ⊕ M. This computation can be partially simplified from two to one xor operations, as shown in the following: Since these hypervectors are themselves the products of xor operations, their order can be rearranged. Instead of computing the xor of x M and C every time, x C0⊕M is precomputed and replaces the item hypervectors for the x component. A drawback is the doubling of the memory consumption of the executable to store C 0 ⊕ x M and C 1 ⊕ x M . Because of the few item hypervectors and small dimension of 128 bits, the memory footprint is increased by only 256 bytes.

A. PROCESSOR TIMING ANALYSIS
As a baseline, the physical design of the SoC is synthesized with the 300 K library. The clock period is set to 0 ns to force the EDA tools to optimize as much as possible. The reported critical path delay (worst-case slack) from the tools determines the possible operating frequency. Guardbands, e.g., for process variation, are assumed to be equal at both temperature corners, effectively nullifying themselves in a comparison. Hence, they are not considered in this work. The gate-level netlist representing the physical design of the SoC is provided to a commercial static timing analysis tool. The timing analysis is repeated with both libraries and the results are reported in Table 1. At 300 K, the critical path has a length of 1.04 ns, which corresponds to a clock frequency of 960 MHz. As shown in Fig. 5, some standard FIGURE 6. Average power consumption of the KNN for quantum measurement classification. The dynamic power at cryogenic temperatures is reduced by 10% from 63.5 to 57.4 mW. However, the major contributor is the leakage from SRAM, which is suppressed and reduced to only 0.48 mW at 10 K. This large reduction makes the SoC feasible given a cooling capacity of 100 mW. pt cells are slower at 10 K, which leads to an increase in the critical path delay to 1.09 ns or 917 MHz. This represents a 4.6% slowdown. Fig. 3 shows that the I ON of n-FinFET and p-FinFET transistors at 10 K is similar to 300 K. Therefore, the switching delay of the transistors is similar, thus the propagation delay of the cells and, thus, the hold times of the circuit are not impacted.

B. PROCESSOR POWER ANALYSIS
Power analysis is often done with statistical switching activities, e.g., 20% of all cells are activated per cycle. However, such statistical switching activities do not reflect the actual power consumption because, for simpler tasks, such as classifying a measurement, only parts of the SoC have to be engaged. Therefore, the two classification algorithms for quantum measurements are simulated with the gate-level netlist of the physical design. The actual switching activity numbers are extracted from these simulations and, in combination with the physical design, processed by Cadence Voltus to calculate the average power consumption of the whole SoC.
Since the two algorithms for classification represent less demanding workloads, the Dhrystone benchmark [37] is also simulated to report a general average. The dynamic power consumption at both temperatures is similar, as shown in Fig. 6. At 300 K, standard cells for logic contribute about 11 mW to the leakage power, whereas the 581 total onchip SRAM contribute 193 mW. Operating at nominal supply voltage combined with ultralow-V th transistors results in such a high SRAM leakage, which is in line with other works [38]. In addition, the short channel effect and quantum tunneling are other key causes. Hence, the SoC would be infeasible for a cryogenic system given the limited cooling capacity of 100 mW [5]. However, the significant reduction of the leakage current of transistors when operated at cryogenic temperatures is reflected at circuit and SoC level. At 10 K, the leakage from logic and SRAM can almost be neglected with 0.48 mW, a reduction by 99.76%. Consequently, the SoC becomes feasible for a cryogenic system and demonstrating that on-chip memories can be enlarged for systems at 10 K.

C. QUANTUM MEASUREMENT CLASSIFICATION: EXECUTION TIME ANALYSIS
The execution times of the two classification algorithms for quantum measurements are evaluated. Table 2 shows the average number of clock cycles needed for a classification of a single measurement. Although HDC comprises simpler binary and logical instructions, it is 3.3× slower than the distance computations with floating point calculations. The main contributor is the lack of a popcount instruction in the RISC-V instruction set architecture, which is essential for the HDC similarity computation. Hardware support would reduce the computation time significantly.
While the time for a single KNN classification is small, the challenge arises from scaling up the quantum system to thousands of qubits that have to be classified within a given time frame. We assume here that this time frame is set by the maximum duration of a continuous quantum computation. Therefore, to not stall the quantum computer and let the classification become a bottleneck, the data processing has to be faster than the given time frame, as shown in Fig. 2. This time frame is determined by the decoherence time of the quantum computer, which specifies the maximum time in which a quantum state can retain its properties. Our experiments on the IBM Falcon quantum processor report this time is around 110 μs. However, typically users strive toward shorter quantum computation durations to minimize the error from the exponential decay due to decoherence. Hence, the numbers given in Fig. 7 portray a best-case scenario in which the full decoherence time is available for classification and no other tasks have to be performed by the processing system. Such other tasks include loading the next quantum computation, providing the confidence intervals for the different solutions, or executing quantum error correction protocols, among others. Thus, the SoC has to perform other tasks and cannot be fully occupied with classifying measurement results.

VII. PERSPECTIVE AND DISCUSSION
Currently, quantum computing is only possible at cryogenic temperatures. Cooling any computing system to such low temperatures is challenging and heat dissipation, i.e., power consumption, must be limited to not overwhelm the cooling. This work shows that it is easily possible to deploy an offthe-shelf system developed for room temperatures at cryogenic temperatures. The timing is impacted only marginally and is within expected guard bands. Power consumption is even reduced, especially through the significant reduction in leakage power, demonstrating the plausibility of a cryogenic SoC for quantum measurement classification. Further power reduction could be achieved by work function engineering at the transistor level, supply voltage reduction, alternative SRAM designs, or clock and power gating at the circuit level. However, scaling to large quantum systems with thousands of qubits is still challenging for off-the-shelf SoCs.
The evaluated RISC-V SoC becomes a bottleneck for classifying the quantum measurements for about 1500 qubits while consuming half of the available cooling budget. Other hardware components, such as signal generators and analogto-digital converters, have to be cooled as well, opening a new field of potential co-optimizations. On the one hand, exceeding the cooling budget and increasing the temperature of the quantum system increases the error rate. On the other hand, faster processing enables more repetitions of the same quantum computations to overcome erroneous computations or more sophisticated error correction algorithms. Furthermore, heat transfer is comparatively slow, creating the potential for short but high-power processing bursts followed by a low-power idle phase without impacting the qubits. Such tradeoffs and power management strategies can be explored and experimentally evaluated with flexible, softwarecontrolled SoCs more efficiently and faster than with fixed hardware implementations. This work shows that off-theshelf SoCs are plausible at cryogenic temperatures and that there is no need for dedicated chips only to explore such tradeoffs, error correction algorithms, and power management strategies.
Integrating a cryogenic SoC into the quantum computer setup offers indeed a wide range of possibilities and the flexibility to explore several improvements that can have a huge impact on the throughput, result quality, and duration of quantum computations. For hybrid quantum-classical algorithms, such as the quantum approximate optimization algorithm or the variational quantum eigensolver, an integrated SoC decreases the data movement and would, thus, allow for more optimization steps given a specified runtime budget leading to higher quality results. Furthermore, the time required for the calibration of the quantum computer would decrease and, thus, increase the throughput further.
Potentially, it would also allow to include more data, e.g., to classify qubit measurements with a higher precision. In the near-term, an integrated cryogenic SoC would also be to enable to reduce the runtime requirements of dynamic circuits [7] and also apply error mitigation algorithms on-thefly, further improving the throughput. In the future, a range of quantum error correction protocols could be evaluated more thoroughly, reducing the time required to achieve fully error-corrected quantum computers.
This work shows that processing a large number of qubits is not feasible with a regular processor but will require hardware support. Dedicated SoCs, such as [6], already include dedicated hardware for quantum measurement classification. Nevertheless, this dedicated hardware is fixed after the design and cannot be improved or replaced without a redesign of the SoC. Hence, an SRAM-based field-programmable gate array (FPGA) fabric could be an interesting addition to SoC. The SRAM's leakage power is very low at 10 K, and FPGAs offer a large degree of flexibility yet consume comparatively little power. Similar to the exploration of different power management strategies and quantum error correction methods outlined above, the FPGA fabric can be reconfigured to select between a high-power low-latency or a low-power high-latency classification algorithm, depending on the complexity and error-robustness of the intended quantum computation.

VIII. CONCLUSION
In this work, we explored for the first time how SoC implemented with a cutting-edge 5 nm technology at room temperature would behave at cryogenic temperatures. Such a general-purpose system can not only classify quantum measurements of the qubits but support other tasks, such as quantum error correction as well. Further, by analyzing an off-theshelf design aimed at room temperature operations, already existing hardware can be deployed quickly. The SoC's power consumption is reduced and can fit within the 100 mW cooling capacity. We have shown that the timing of the system is comparable to room temperature and the significant reduction of the leakage power enables SoCs with large on-chip memories.