A Scalable Cryo-CMOS Controller for the Wideband Frequency-Multiplexed Control of Spin Qubits and Transmons

Building a large-scale quantum computer requires the co-optimization of both the quantum bits (qubits) and their control electronics. By operating the CMOS control circuits at cryogenic temperatures (cryo-CMOS), and hence in close proximity to the cryogenic solid-state qubits, a compact quantum-computing system can be achieved, thus promising scalability to the large number of qubits required in a practical application. This work presents a cryo-CMOS microwave signal generator for frequency-multiplexed control of $4\times 32$ qubits (32 qubits per RF output). A digitally intensive architecture offering full programmability of phase, amplitude, and frequency of the output microwave pulses and a wideband RF front end operating from 2 to 20 GHz allow targeting both spin qubits and transmons. The controller comprises a qubit-phase-tracking direct digital synthesis (DDS) back end for coherent qubit control and a single-sideband (SSB) RF front end optimized for minimum leakage between the qubit channels. Fabricated in Intel 22-nm FinFET technology, it achieves a 48-dB SNR and 45-dB spurious-free dynamic range (SFDR) in a 1-GHz data bandwidth when operating at 3 K, thus enabling high-fidelity qubit control. By exploiting the on-chip 4096-instruction memory, the capability to translate quantum algorithms to microwave signals has been demonstrated by coherently controlling a spin qubit at both 14 and 18 GHz.

Q UANTUM computers promise significant advantages over classical computers in solving several computing problems. These include near-term applications requiring hundreds of quantum bits (qubits), such as elucidating the hidden mechanisms of chemical reactions [1], and long-term applications requiring millions of qubits, such as the efficient search in huge databases with Grover's algorithm [2]. While today's quantum computers comprise only a few tens of qubits (<100) [3]- [6], implementing the required large-scale quantum computers (10 3 -10 6 qubits) advocates a scalable approach both for the qubits, e.g., the use of high-fidelity solidstate qubit technologies [7], such as spin qubits and transmons, and for the classical electronics required to drive and read out the qubits.
The most complex state-of-the-art quantum computer (with 53 qubits) requires tens of bulky custom-made electronic modules [digital-to-analog converter (DAC), LNA, This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ ADC, and so on] operating at room temperature and connected to cryogenic qubits via hundreds of coaxial cables [3]. Although it is an impressive engineering feat, such a complex approach is hardly scalable, especially due to the very limited reliability and compactness of the large number of wires required in a million-qubit computer.
A better alternative would be to integrate the qubits and the control electronics on the same die or package and operate them at the same temperature [8]. Toward this goal, control electronics able to operate at cryogenic temperatures in close proximity to the qubits must be developed. CMOS operating at cryogenic temperatures (cryo-CMOS) has proved to be a reliable technology platform for realizing complex cryogenic integrated circuits, as demonstrated by the cryogenic operation of individual circuit blocks [9]- [16] and a pulse modulator for qubit control [17]. As a stepping stone toward a scalable cryogenic electronic interface for a large-scale quantum processor, this works demonstrates a single-chip cryo-CMOS controller (operating at 3 K) optimized for controlling 128 qubits (operating at 20 mK) and requiring minimum interfacing to room-temperature equipment [18]. To support the effectiveness of the designed systemon-chip (SoC), single-qubit operations on a single-electron spin qubit are demonstrated.
Designing a cryogenic controller for large-scale quantum computing comes with several challenges. First, qubits require highly accurate and low-noise microwave control signals to ensure high-fidelity single-qubit operations. For instance, for a π-rotation with 99.99% fidelity in 50 ns, a carrier with 35-kHz frequency accuracy is required with an SNR of 50 dB in a 10-MHz band around the carrier [19]. Besides that, accurate control of the phase of the microwave signal (<0.2 • ) with respect to the qubit's phase is essential to perform coherent qubit operations, i.e., rotations around a well-controlled axis, over the entire duration of the quantum algorithm.
Furthermore, the cooling power available at cryogenic temperatures in typically employed dilution refrigerators is strictly limited to a few watts at 3 K and less than 1 mW at temperatures below 100 mK, thus complicating the integration of a large number of high-performance microwave signal generators. In this work, the focus is on the design of a controller operating at 3 K because of the higher available cooling power. This does not restrict a future co-integration with qubits at the same temperature as the electronics since "hot" qubits operating at temperatures above 1 K have recently been demonstrated and are likely to evolve further in the next few years [20], [21].
While cryo-CMOS circuits have been shown to operate down to 30 mK [22], the device characteristics are different at cryogenic temperatures, and no mature models were available at the design time to accurately predict the behavior of passive and active devices at cryogenic temperatures. Consequently, the circuits need to be designed for robustness against these variations and additional tuning circuitry is required. For instance, a higher threshold voltage is expected for CMOS transistors at cryogenic temperatures, limiting the stacking of transistors in analog circuits [23], unless a supply voltage higher than the nominal is adopted, probably at the cost of , and (c) g m /I D ×IIP3 at 300 and 5 K for the adopted 22-nm FinFET CMOS technology. The 5 K simulation models were developed from preliminary device characterization performed at 5 K instead of 3 K due to limitations in the probe station temperature control and are expected to be valid from 3 to 20 K, as demonstrated in [35] for another CMOS node. reduced reliability, based on the limited research available on cryo-CMOS reliability [24]. As an example, Fig. 1 shows the expected change in g m /I d -efficiency and linearity of an NMOS FET over temperature. The intrinsic gain is enhanced at cryogenic temperatures, in line with what is reported for other CMOS technologies [23]. It has been shown that device matching degrades at cryogenic temperatures [25]. This directly impacts the linearity of ADCs and DACs and leads to increased offset in differential amplifiers. Furthermore, due to mismatch in the switching devices of a mixer, the degraded matching allows the second-order nonlinearity of the transconductance device to propagate to the output. On the contrary, carrier mobility increases, offering higher driving currents [23], and thermal noise is lower, potentially allowing a lower power consumption. However, the noise power spectral density does not scale linearly with temperature and is only expected to be approximately 10× lower at 3 K compared with 300 K [26]. Some devices are not strongly affected by the cryogenic operation, e.g., the thin-film resistors used in this work show negligible change at 5 K compared with 300 K. The capacitance of metal-oxide-metal capacitors and the inductance of on-chip inductors are expected to slightly change at cryogenic temperatures, while the inductor quality factor can double [27].
Finally, relocating the controller physically closer to the qubits is advantageous for scaling only if a limited number of control lines from room temperature are required. Hence, all or part of the quantum algorithm execution controller needs to be co-integrated at cryogenic temperatures. To ensure a power-efficient design for such a complex SoC with algorithm capabilities, this design leverages the use of qubit frequency-division multiple access (FDMA) [6] to obtain a power-efficient multi-qubit controller. However, employing FDMA introduces several additional challenges. First, the required data bandwidth scales with the number of qubits and the qubit operation speed, ultimately requiring a data bandwidth in the order of 1 GHz. To pack more qubits in the available frequency spectrum, pulse shaping needs to be applied to optimize the spectral content of the microwave pulses. Moreover, a high spurious-free dynamic range (SFDR) is required to ensure that no power is delivered to the qubits that are not addressed at a given time, and a mechanism should be incorporated to efficiently track the phase of all qubits to ensure coherent operations. In addition, phase corrections must be applied to all qubits after every operation to compensate for the ac Stark shift in a frequency-multiplexing scheme [28].

II. SYSTEM ARCHITECTURE AND SPECIFICATIONS
The controller contains four transmitters, each designed following the design methodology presented in [29] to achieve a 99.99% fidelity for controlling 32 frequency-multiplexed single-electron spin qubits or transmons in the frequency range from 5 to 20 GHz. In order to drive X-or Y -rotations on a single qubit 1 in a frequency-multiplexing setup, power needs to be applied only at the frequency of the qubit that needs to perform an operation, with the amount of power setting the speed of the operation, or Rabi frequency ( f R ). To avoid addressing the wrong qubit in a frequency-multiplexing scheme, pulse shaping and sufficient qubit frequency spacing are required. It can be shown that a 2-GHz output data bandwidth is required to achieve the target fidelity for 32 frequency-multiplexed qubits with an operation speed up to f R = 10 MHz, as a channel spacing larger than 5 f R is required [19].
As discussed in [29], the architecture employing single-sideband (SSB) modulation in the analog front end and direct digital synthesis (DDS) in the digital back end, as shown in Fig. 2, is the most suitable for multiple reasons. Such a system requires only a single LO, setting the desired output frequency band, while supporting multiple qubits at different frequencies in the 2-GHz output band, due to the direct digital synthesizer containing numerically controlled oscillators (NCOs) to keep track of the phase of each qubit. To realize a highly linear RF transmitters with low output power, operating up to 20 GHz, multiple-return-to-zero (MRZ) DAC topologies are typically preferred over a standard RF DAC architecture [30]. However, in comparison, a generic SSB architecture offers the widest data bandwidth, as required to control many qubits, and more flexibility in the choice of the output frequency band independently of the data bandwidth so as to efficiently address different qubit types.
The specifications for the digital back end and analog front end to achieve the desired fidelity of 99.99% are summarized in Table I [29]. The 44-dB output power range is dictated by the support of different qubit types, the desired operating speed 1 Z -rotations are obtained by updating the tracked phase. from 1 to 10 MHz, and the expected variability between different quantum processors. In order to also support transmons, the targeted output frequency range was extended to include 5 to 9 GHz, a range typically used in transmon-based quantum processors. The output power has also been made controllable over a vast range as transmons generally require a lower output power. Finally, baseband polar modulation was added to support the generation of complex envelopes, such as DRAG pulses, which are typically employed in high-fidelity transmon control. The wide output power range, output frequency range, and the support for polar modulation of the envelopes ensure the compatibility with both spin qubits and transmons. Memory organization of the integrated controller comprising instruction lists, instruction tables, and envelope memories to gradually reduce the data rate.
The digital back end is designed for the 10× higher fidelity of 99.999%, in order to relax the analog specifications and to achieve a proper balance between the expected analog and digital power consumption [29]. Consequently, the number of bits in the data path is optimized to obtain an SFDR and SNR of 54 dB, as required to achieve this higher fidelity. The number of bits in the NCO is chosen to ensure a frequency inaccuracy lower than the frequency noise of state-of-the-art qubits, i.e., 1.9-kHz rms, determined by the nuclear spin noise in isotopically purified silicon [31]. The maximum frequency inaccuracy of an NCO can be calculated as f clk /2 b NCO −1 , where b NCO refers to the number of bits in the NCO. An electronics/quantum co-simulation of the system considering only the finite number of bits in the digital circuitry confirms that the desired fidelity is obtained for all qubits [29].
The specifications for the analog front end are derived from [19] under the assumption of equal error contributions from the different error sources. A reconstruction filter is required to sufficiently attenuate the DAC replicas that can fall in-band after upconversion. In addition, sufficient in-band flatness should be achieved in order to ease qubit control by removing the need for pre-distorting the microwave pulses.

III. DIGITAL-CIRCUIT DESIGN
The digital back end comprises a controller for algorithm execution and memory management and a digital signal-generation unit. The signal-generation unit employs a DDS, but, unlike the quadrature modulation in [29], polar modulation is adopted to reduce the power consumption by saving two multipliers and an adder. The coefficients used in the I/Q calibration network are selected based on the active qubit channel, i.e., based on the output frequency band, to compensate for the frequency-dependent phase and gain imbalances in the analog circuit. The entire DDS block is replicated to allow for the simultaneous excitation of 2 qubits (see Fig. 2).
As the two signal-generation units require an input data rate of (8 bit + 10 bit) · 2.5 GHz·2= 90 Gb/s, a quantum algorithm execution controller has been integrated, comprising an envelope memory containing the desired pulse envelopes, an instruction table for each qubit referencing the envelopes, and an instruction list containing the sequence of instructions to be executed (see Fig. 3). Since a pulse of 500 ns, or 1250 samples at 2.5 GHz, is required for the lowest operating speed of 1 MHz and the largest rotation angle of π, 2560 samples are available per qubit in the envelope memory (40 960 samples shared over 16 qubits). The envelopes can be efficiently reused for rotations around different axes, as the axis, and the respective phase shift, is defined in the instruction table, which has eight entries per qubit to define the instructions. This is expected to be sufficient, as a typical instruction set will contain a limited set of rotations, e.g., π, π/2, and π/4, around the X-and Y -axes. Moreover, the controller automatically performs the qubit Z -rotations required to compensate for the ac Stark shift in a frequency-multiplexing scheme [19], [32], by applying a phase shift defined from a programmable Z -correction table to all NCOs after each generated pulse. Due to this level of digital integration, the external data rate is lowered to ≈1 kb/s using the instruction list during the quantum algorithm execution.
The external interface consists of an SPI interface for programming the various internal memories and triggering the start of the algorithm execution and a dedicated 150-Mb/s shift register to quickly trigger the execution of a single quantum instruction, as often as every ≈75 ns. As an alternative to the execution of the pre-programed instruction list (see Fig. 3), this operation mode allows for fast feedback and conditional branching in the quantum algorithm execution.
Since a cryogenic model of the standard-cell library was not available at the time of design to close timing for synthesis and automated place-and-route (APR), derating factors were implemented to extrapolate the timing behavior at 3 K from the room-temperature models of the standard cell library. The derating factor for the delay of sequential gates is extracted by comparing the simulated oscillation frequency of a nine-inverter ring oscillator at room temperature (using the standard foundry device models) and at 5 K (using a preliminary cryogenic dc device model). Similarly, the derating factors for setup and hold times are extracted by transistor-level simulations of a standard D flip-flop. A common derating factor of ∼1.3 is determined for all cases, implying about 30% reduction in gate delay and setup/hold times. Using such derating factor for gate delays for synthesis and APR results in effective timing slacks at 5 K for both min and max delay, i.e., timing margins for hold and setup violations, equal or greater than the values targeted for room temperature. Interconnect delay should also be scaled accordingly when using room temperature models to predict 5 K behavior. From transistor measurement de-embedding data, it is evident that interconnect capacitance does not change significantly at 5 K, whereas resistance is reduced by about 50%. A 0.5× derating factor is therefore used for the room temperature extracted resistances during APR to model 5 K interconnect delays.
The SoC is implemented as a digital-on-top system with four transmitters sharing one common I/O block. Timing is resolved at 2.5 GHz, with the SRAMs for the envelope memory operating at 1.25 GHz with 2× time interleaving. The SRAM supply voltage can be controlled independently to ensure correct operation in the presence of an increased threshold voltage at cryogenic temperatures. Standard digital-circuit design-optimization techniques, such as pipelining and time interleaving, along with the aforementioned derating, were used to resolve timing at 2.5 GHz.

IV. ANALOG AND RF CIRCUIT DESIGN
Current-mode design prevents the transconductance non-linearity and is thus adopted for the analog baseband, as the baseband circuitry requires a fairly high bandwidth and linearity (>44 dB) and the RF mixer (see Section IV-D) requires an input current. The baseband circuitry comprises a current-steering DAC, a current-mode gm-C reconstruction filter, and a current-mirror-based variable-gain amplifier (VGA) feeding the mixer. The filter is discussed first, as it sets the required baseband signal swing to achieve the desired dynamic range while obtaining the lowest power consumption.

A. Reconstruction Filter
A second-order Chebyshev-I filter with 1.8-GHz cutoff frequency is chosen as it meets the stopband requirement, while its peaking results in the required improved inband flatness near the end of the passband by compensating for the DAC zero-order-hold filter response. A passive implementation of such a filter, as desirable for low noise, distortion, and power consumption, would require a prohibitively large inductor of a few nH, limiting future scaling of the controller. Instead, an active current-mode gm-C filter implementation [structure in Fig. 4(a)] is considered [33]. Due to the cross-coupled transistor pair, the impedance at the output is effectively negative and the equivalent single-sided circuit of Fig. 4(b) is obtained, from which the transfer function H (s) and input impedance Z in (s) follow as: assuming the same transconductance g m for all transistors. The transfer function and input impedance of the designed filter are plotted in Fig. 4. The ratios C A /g m and C B /g m set the transfer function and are therefore fixed by the desired filter response.
The linearity of such a filter is limited by the third-order distortion in the transconductance of the transistor, which leads to a non-linear modulation of the capacitor voltage resulting in non-linear components in the capacitor current and, hence, in the output current. Therefore, the transistors are biased at an overdrive V gt,opt corresponding to the first peak in the IIP3 plot [see Fig. 1(b)] at both 3 and 300 K as guaranteed by the tunable filter bias current. A high intrinsic gain is also obtained, ensuring an accurate filter transfer function.
For a given linearity, and hence a fixed overdrive V gt,opt , the maximum signal current swing scales proportionally to the bias current I bias . This assumes that the filter components are scaled appropriately to maintain the filter transfer function, i.e., by scaling C A,B ∝ I bias and the transistor width ∝ I bias so that g m ∝ I bias . The current noise of the filter is dominated by the bias current sources, which scales ∝ (I bias ) 1/2 . Consequently, the dynamic range of the filter increases by 3 dB when doubling the bias current (I bias ), and the minimum bias current to achieve the required dynamic range can be found. Moreover, the required bias current is expected to be ∼ 10× lower at 3 K than at 300 K, as the transistor linearity is not expected to change significantly over temperature, whereas the thermal noise power is expected to be ∼ 10× lower at 3 K than at 300 K. As it is impractical to design the circuit to work over a decade in bias current change, the minimum bias current is chosen for achieving the desired dynamic range only at 3 K, and the resulting bias and signal current is used over the entire temperature range from 3 to 300 K, with a lower expected dynamic range at 300 K. 2 Due to the peaking of the filter transfer function, and the DAC sampling replica in the second Nyquist zone, the peak signal swing is about 1.67× higher than the amplitude of the fundamental near the end of the band, requiring a larger bias current for the same linearity. Moreover, as the structure of Fig. 4(a) requires the stacking of four transistors, and the threshold voltage is expected to increase at 3 K, the structure is folded, resulting in a 4× higher power consumption. The final circuit is shown in Fig. 4(c). The capacitors are tunable (C A from 50% to 125% and C B from 75% to 200% of their nominal value, respectively) to account for the DAC output capacitance and changes in the transfer characteristic at cryogenic temperatures, as the transistor transconductance is expected to increase at 3 K [see Fig. 1(a)]. The optimal differential input current of the filter to achieve the required dynamic range at 3 K is 125 μA p and is used at both 3 and 300 K as guaranteed by the on-chip bias current generator (see Section IV-F3). The single-ended input impedance [see (2)] peaks to a worst case 60 around the corner frequency at 300 K (see Fig. 4).

B. Digital-to-Analog Converter
From the system specifications and the filter design, it follows that a 10-bit current-steering DAC is required, with a unit current of 125 μA/2 10 = 122 nA from a PMOS current source. Due to the significant overdrive voltage to reduce the effect of threshold voltage mismatch and noise, a low g m /I d ∼ 5 V −1 is expected at 3 K. Moreover, assuming a typical device noise excess factor γ ∼ 2 for short-channel devices and a pessimistic junction temperature T = 30 K when operating at 3 K, the integrated noise in a 10-MHz bandwidth is for the total DAC with N = 2 10 current sources and a circuit noise excess factor NEF = 2 to account for the noise from the bias current sources at the DAC output (see Fig. 5, top left). This corresponds to an expected peak SNR of 77 dB for a single tone, making the noise contributed by the DAC negligible.
The DAC is segmented in 5-bit unary and 5-bit binary sections as a tradeoff between differential non-linearity (DNL) and decoder complexity (see Fig. 5). A unit current source matching of 0.5% is targeted to achieve a 99.7% yield for 0.5-LSB integral non-linearity (INL) [34]. To account for the expected increase in mismatch at cryogenic temperatures [35], the area of the current sources is doubled. A Monte Carlo simulation shows about a 3-dB loss in SFDR due to current source mismatch at 3 K, achieving ∼56-dB SFDR for a single tone (half the DAC swing). Another important source of distortion is the code-dependent output impedance, leading to [36] where Z o is the output impedance of the unit current source, and Z L is the load impedance, i.e., the input impedance of   Fig. 5). 3 Due to the very small DAC unit current and constraints in transistor size, there are settling issues. This is resolved by switching the combined current of three, four, or five current sources using a single switching pair with current bleeding and cascoding. The currents for the two least significant bits are obtained by subtraction of larger currents (i.e., 4 LSB − 3 LSB = 1 LSB and 5 LSB − 3 LSB = 2 LSB) at the output. As the switches are still implemented using minimum size devices, the switch glitch energy is minimized, and a single switch driver can drive up to four switches. The switch driver consists of two latches, with the last stage supplied from the analog supply and with a back-toback inverter at the output for improved symmetrical switching. For the thermometer decoder, the standard row-column decoder [36] has been extended to a 3-D row-column block decoder, as it only requires trivial 2-bit thermometer decoders and reduces the number of lines routed differentially (for minimum crosstalk) to the switch drivers (see Fig. 5). Although the 3-D decoder is slightly slower due to the increased number of stacked transistors, it is not a limiting factor for the required sample rate in the adopted technology.

C. Variable-Gain Amplifier
The VGA is implemented as a tunable current mirror. An additional output branch feeding a buffer is added to monitor the baseband output signal (I out,test in Fig. 6). The filter output current is ∼15× smaller than required by the mixer to generate the maximum required output voltage. Hence, the circuit in Fig. 6 is used to provide a 4-bit tunable gain up to 15×. As the filter bias current is much higher than the mixer bias current, part of it is sunk at the filter output while maintaining sufficient VGA linearity, and the residual excess bias current is removed at the VGA output (i.e., the mixer bleed current in Section IV-D). Both of these current-bleeding sources are tunable to ensure optimal performance at 3 K.
Due to the reduced bias current in the VGA, and the significantly large output transistor, achieving the required linearity over the full bandwidth is difficult, but it is ensured by adding a single-stage amplifier (PMOS differential pair with current-mirror load) that increases the loop gain and delivers the non-linear current required on the large mirror gate capacitance. Finally, to reduce the LO signal leaking back, a 500-thin-film resistor is added in the current mirror, providing first-order filtering. 1) Lower Frequency Band: The required bandwidth (e.g., 15 GHz) and the parasitic capacitance of the output driver, mixer, and their interconnection set a maximum limit on the load resistance (e.g., ≤ 70 ). On the other hand, based on the linearity requirements of the output driver (as will be discussed in Section IV-E), the output swing (V out,mixer ) of the mixer has to be less than 35 mV. Hence, the required current swing (I sw ) to be fed from the VGA to the mixer can be estimated by

D. Mixer
To tackle the voltage-headroom issue due to the stacking of four transistors (operating in saturation region) and a resistor, current bleeding is implemented to lower the current in the switching devices, cascodes, and resistor, without sacrificing the required linearity in the VGA [38]. In addition, the resulting smaller switching devices present a lower load capacitance to the LO driver, thus enabling lower power consumption for the LO driver.

2) Higher Frequency Band:
The output current of the mixer at 3 × f L O − f B B versus the LO swing is shown in Fig. 7(b). The LO swing is chosen to be 300 mV since a further increase in swing does not significantly improve the conversion gain at the cost of higher power consumption in the LO driver. Note that the third-harmonic output current is 15 dB lower than the fundamental at 300-mV LO swing, which is compensated by amplification in the following stages. A tuned inter-stage matching network is designed to amplify the third harmonic while attenuating the fundamental tone. To boost the mixer gain at the third harmonic, a relatively narrowband design is chosen, which is more susceptible to unwanted variation at cryogenic temperatures. Hence, switchable resistor and capacitor tuning networks are employed to compensate for this variation. Fig. 8 shows the schematic of the output driver consisting of a class-A amplifier with an output matching network. This design is used for both lower and higher frequency band outputs with different device sizing and matching networks.

E. Output Driver
The specification of the output driver is to deliver −16-dBm output power (P out ) to a 50-load, with 50-dB SFDR setting an OIP3 requirement of 9 dBm. Since V OIP3 = V IIP3 × g m / I D × I D × R L , and both V IIP3 and g m /I D are determined by the intrinsic device characteristics, the maximum point of this product (V IIP3 ×g m /I D ) at 3 K is chosen to obtain the required linearity at the lowest power consumption while considering voltage headroom and signal swing, as shown in Fig. 1(c). Thus, an overdrive voltage V gt =0.25 V has been chosen, leading to V IIP3 =0.63 V and g m /I D = 8. Consequently, the maximum input swing (V in,max ) to obtain an IM3 of 50 dB can be calculated as The output matching network can be analyzed as a trans-impedance (Z 21 ) network to convert the drain-current swing of the driver transistor to the required voltage swing at the output. The pole (ω 1,2 ) and minimum (ω 3 ) frequencies of a matching network can be derived from the maxima and minima of Z 21 , respectively, given as where L s is the secondary inductance, C s is the secondary capacitance, ξ = (L s C s /L p C p ), L p is the primary inductance, C p is the primary capacitance, and k m is the coupling factor. In a wideband design, both poles should lie in the bandwidth of interest, and to obtain a flat transfer function, the transimpedance at these poles should be equalized, i.e., |Z 21 (ω 2 )/Z 21 (ω 1 )| = 1. For a lossless matching network, this can be obtained by setting ξ = 1 [39]. However, with practical quality factors for the inductors, this needs to be increased to, e.g., 1.5, as shown in Fig. 9(a). Note that the effect of a practical quality factor on the transimpedance ratio between the poles and the minimum [|Z 21 (ω 3 )/Z 21 (ω 1 )|] is negligible. For a lossless matching network, the bandwidth factor (BWF) (BWF = (ω 1 − ω 2 /ω 3 )) is minimum at ξ = 1, as shown in Fig. 9(b). Hence, to obtain a flat transfer function (ξ ∼ 1) and high BWF, one has to maximize k m , which is ultimately limited by the physical realization of the transformer. To further increase the BWF for the maximum attainable k m , ξ should be increased at the cost of flatness in the transfer function, as shown in Fig. 9(a). However, the flatness can be restored by lowering the quality factor at the cost of passive efficiency. Fig. 9(c) shows the dependence of |Z 21 | on N (N = ((Ls/Lp)) 1/2 ). A higher |Z 21 | or lower N increases the equivalent resistance seen by the driver transistors. Hence, a relatively lower current swing can produce the same output voltage swing. This in turn would demand lower dc bias current and improve the efficiency, as long as the transistor does not enter the triode region, affecting the linearity. This leads to smaller transistors and, consequently, higher bandwidth of the mixer due to lower input capacitance presented by the output driver. Since ξ is already fixed by the flatness and BWF, minimizing N would require maximizing C s and minimizing C p , as N = (ξ(C p /C s )) 1/2 . The minimum value of C p is determined by the parasitic capacitance of the output driver, while the optimum C s can be obtained from the value of loaded quality factor of the secondary side (Q L = R L C s ω) that maximizes the passive efficiency of the matching network at a given frequency [40]. Finally, N = 0.8 is obtained.
An increase in the quality factor (Q) of a transformer by a factor of ∼2 expected at cryogenic temperatures, due to lower substrate losses and a higher metal conductivity [27], can affect the flatness of the transfer function. The transfer function can shift toward higher frequencies due to a reduction in effective inductance and capacitance of the transformer at cryogenic temperatures [27]. To compensate for these variations that are not well predictable, capacitor-and resistor-tuning networks were implemented at the windings of all matching networks.
To maintain a better efficiency at lower output voltage swing, a gain control of 24 dB is achieved by selectively switching 15 unit cells, each consisting of a class-A amplifier and cascode transistor. To further improve the power efficiency, the supply voltage of the driver is lowered without significant impact on linearity since the required output voltage swing is significantly lower than the supply voltage.

F. Auxiliary Circuits
An LO driver, a clock receiver, and a constant-g m bias circuits are also implemented in each transmitter (TX). Four  transmitters are integrated into a single chip to increase the number of qubits that can be controlled and to allow for the simultaneous control of 4 qubits at the same frequency through individual transmitter outputs.

1) LO Driver:
An LO driver with 20-dB voltage gain and 15-GHz bandwidth is designed to deliver the required voltage swing to the mixer while incorporating single-ended to differential conversion. On-chip co-planar waveguide transmission lines are used to connect the input of the LO driver to the I/O bumps. This allows to reduce phase and gain imbalance by allowing the LO driver output to be abutted to mixer switches. Fig. 10(a) shows the schematic of the LO driver. The first stage serves as an active balun converting a single-ended signal into a differential signal while providing wideband input-impedance matching [41]. For proper operation, the input matching is achieved by adjusting M 1 gate bias such that 1/g m,M 1 = 50 and by setting the gain of the common-gate (CG) path g m,M 1 R CG equal to the gain of the common-source (CS) path g m,M 2 R CS .
The required gain of 5× at 15 GHz sets the required gain-bandwidth (GBW) to be 75 GHz. For the active balun to directly drive the mixer switches, a load capacitance C L = 40 fF (due to parasitic capacitance of mixer switches, M1/M2 devices, and routing traces) limits the maximum load resistance to 180 and, consequently, the gain to 3.6. Hence, to achieve the required GBW, a high-speed differential CML amplifier stage is cascaded to the first stage.
The required phase noise specification of −116 dBc/Hz at a 1-MHz offset from the carrier is achieved over the entire frequency range with a power consumption of 7 mW for both I&Q branches.

2) Clock Receiver:
A clock-receiver circuit provides the rail-to-rail-swing clock signals for the DAC and the digital blocks, powered using the digital supply. All the supplies are substantially decoupled on-chip to reduce the supply noise feedthrough between different circuit blocks. To share a single external clock signal between all four transmitters, each transmitter is ac coupled with an input termination of 200 (R T ) to present an equivalent input impedance of 50 . A self-biased inverter with power-down option and a transmission gate are employed to individually switch OFF the clock receiver in each transmitter while preventing feed-through during the OFF state. A half-period time shift can be introduced between the clock fed and the digital circuits (DIGITAL) with respect to the DAC (DAC_I, DAC_Q), enabled by a digitally controlled on-chip register PH via an XOR gate, as shown in Fig. 10(b). This can address any potential data timing issue at the digital/DAC interface due to layout mismatch and changes in digital propagation delay at 3 K. A fan-out of 3 is maintained at each stage to obtain the required jitter.

3) Bias Circuit:
The bias currents are generated by a standard constant-g m circuit [see Fig. 10(c)]. The desired g m = 1/R is set by a tunable resistor, which allows the output bias current to be adjusted over a range of 50%-200% relative to the nominal value at 300 K, to ensure the same signal current at 3 and 300 K while accounting for changes in the device transconductance at 3 K. A stack of four diodes is used to start-up the bias circuitry. An externally applied bias current can also be selected and used to start-up the circuit if the stack of diodes is not sufficiently strong due to the increased threshold voltage at cryogenic temperatures.  The employed LDO is custom designed using discrete components (AD8086 opamp with TSM2314 MOSFET). The BPF used in the measurement presented in Fig. 22 is placed directly at the RF low output, before the SP6T switch. A niobium-titanium (NbTi) coax cable, without attenuators, is used between the 3 K and 20 mK stage. Fig. 11 shows the micrograph of the chip fabricated in Intel 22-nm FinFET (22FFL) technology [42]. The transmitter architecture shown in Fig. 2 is replicated four times (TX0…TX3) with each instance occupying an area of 4 m 2 m with a single shared SPI controller on the die.

A. Measurement Setup
The chip is placed on the 3 K plate of a dilution refrigerator. Dual-pole-dual-throw (DPDT) microwave switches are used in the fridge to select the chip or the room-temperature signal generator on one side and the qubit device or the room-temperature spectrum analyzer on the other side (see Fig. 12). This enables proper characterization of the chip performance and the comparison of the qubit control by the room-temperature equipment and the designed chip. A fieldprogrammable gate array (FPGA) is used as the master to synchronize the chip with the other instruments used for qubit readout and initialization.
The die is flip-chip bonded to a BGA324 package with impedance-matched traces and on-package discrete capacitors for supply decoupling. A six-layer PCB is designed to route the RF signals on the top layer with RT/duroid 6002 microwave substrate and dc signals on the bottom layers with FR4 dielectric. The solder-mask areas on the top and bottom layers are minimized to allow better heat transfer. To reduce the number of cables between the room-temperature  LO generator and the chip inside the dilution refrigerator, each LO line is shared between two transmitters. A customdesigned Wilkinson power divider (WPD) on the PCB with discrete wire-bonded quadrature hybrids was used to generate the required LO signals for the transmitters, as shown in Fig. 13. All the abovementioned components were individually tested at 3 K to verify their performance.
A gold-plated copper enclosure housing the PCB acts as a heat sink for proper thermalization of the chip to the 3 K plate in the fridge, as shown in Fig. 13. Indium foils were sandwiched between the die and the enclosure to maximize the contact surface area and minimize thermal resistance. Due to its high malleability compared with other metals, indium can compensate for the mismatch of the thermal expansion between the two mating surfaces (silicon and gold) at cryogenic temperatures.
To monitor the die temperature, on-chip diodes were placed across the chip, as shown in Fig. 14. These are calibrated using an external silicon diode temperature sensor (with an accuracy of 0.25 K) mounted close to the enclosure, with the chip powered down. Fig. 14 shows the junction and plate temperature as a function of the chip power consumption, which is varied by changing the clock frequency and the supply voltage of the digital circuitry. Although the die self-heating increases significantly with power consumption, the plate temperature is only slightly affected. As the dilution unit is connected to a separate plate with an independent pulse tube cooler, the qubit temperature is not affected.

B. Electrical Characterization
While the functionality of all four transmitters has been verified, the performance of one transmitter is reported in the following. Fig. 15. Power consumption breakdown, resulting in a total power consumption per qubit of (330 mW + 54 mW)/32 qubits = 12 mW/qubit. Fig. 15 shows the power consumption of the various circuit blocks at the 1-GHz clock frequency. The digital back end dominates the power consumption due to the lack of clock gating in a substantial part of the memory and would increase further with clock speed. Hence, to limit the temperature increase of the fridge plate, the chip is operated at a maximum clock frequency of 1 GHz, limiting the available data bandwidth to 1 GHz. The analog power consumption is dominated by the output drivers due to high-linearity requirements and the support of a 50-load. The total power consumption of 12 mW/qubit would allow the control of >320 qubits in a state-of-the-art dilution refrigerator, over only ten RF lines, with a single SPI interface wired to room temperature. This is well beyond the number of qubits available in the largest solid-state quantum processor today [3]. Moreover, this work presents a first implementation of the controller, and further power reduction is possible as significant margins were taken during the design to ensure that functionality, large output power, and frequency ranges were included to support multiple qubit technologies; the currently dominating digital power consumption could be reduced by, e.g., clock gating. With such optimizations, scaling to thousands of qubits is expected to be possible in the near term, while a larger cooling power is expected to extend the scaling in the longer term [43]. Due to the integrated digital controller, an external data rate of only ∼1 kb/s over a single trigger line is required, allowing scaling to a large number of controllers sharing a single high-speed connection to room temperature. Moreover, due to the use of FDMA in this work, the number of connections to the quantum processor is reduced by 32×. However, supporting millions of qubits in the future with the proposed approach would still require a large number of connections to the quantum processor, but this could be eased by co-integrating the controller and the qubits on the same package or die at the same temperature. Fig. 16(a) shows the measured output power versus frequency at 3 K for both the output paths. The flatness of the transfer function is deteriorated due to additional ground inductance introduced in the layout between the output matching network and the on-chip solder bumps.
To quantify the attenuation of the sampling replicas and flatness of the baseband transfer function, the measured output at the baseband monitoring node is shown in Fig. 16(b). An inband flatness of 1.5 dB is obtained up to 500 MHz, as shown in the inset of Fig. 16(b). The SFDR obtained for single-and two-tone signals at various output frequencies is shown in Fig. 17. From the single-tone spectra shown in Fig. 17(a) and (b), it can be observed that the SFDR is limited by the image-rejection ratio (IRR) of 45 dB obtained after calibration. The SFDR measured for various NCO frequencies over the entire data bandwidth is better than 42 dB, as shown in Fig. 19(a). The achieved LO rejection does not affect the SFDR since it can be avoided by proper choice of the LO frequency.
The SFDR of the two-tone spectrum with a tone spacing of 19 MHz shown in Fig. 17(c) is limited by the second-order intermodulation (IM2) component. Such IM2 can be attributed to the INL of the DAC that shows a quadratic behavior, as shown in Fig. 18(b). This is due to a linear gradient, i.e., systematic mismatch, in the DAC layout that does not use a fully common-centroid layout due to practical layout constraints, but an arrangement only similar to a common-centroid one. This systematic mismatch increases at 3 K. Moreover, random mismatch is degraded at 3 K, as can be seen in the DNL plot in Fig. 18(a) [35]. The large jumps in the DNL plot correspond to the unary element transitions in the segmented DAC. The measured IM3 component with a two-tone spacing of 10 MHz is better than 47 dBc at the highest output power over the entire RF-low bandwidth, as shown in the Fig. 19(b).
The measured SNR at the maximum output power over a 25-MHz bandwidth is greater than 48 dB as shown in Fig. 19(a) complying with the system requirements presented earlier.
Engineering the pulse shape is critical for addressing multiple qubits over a frequency-multiplexed line [19], [32] as the shape of the pulse provides a tradeoff between the speed of operation on the addressed qubit versus unwanted energy leaking into the unaddressed qubits. To demonstrate the pulse shaping capabilities of the chip, various pulse envelopes were applied at different offset frequencies as shown in Fig. 20, which shows the time-(at baseband frequency) and frequency-domain response of the chip output.

VI. QUBIT EXPERIMENTS
The chip is used to control operations on a single spin qubit [44]. The information is encoded in the spin state of a single electron trapped in a Si/SiGe quantum dot in isotopically purified silicon [see Fig. 21(a)] and can be manipulated by applying a fast-oscillating electric field to the electrode above the (c) Two-tone output at 6.25 and 6.26 GHz, generated using the two DDS banks shown in Fig. 2. For the measured two-tone spectrum around 18 GHz, refer to [18].   quantum dot through electric dipole spin resonance [45], [46]. The qubit die is mounted on a PCB [see Fig. 21(b)] operated at the base temperature (20 mK) of the dilution refrigerator [6].

A. Rabi Oscillation Experiment
To demonstrate qubit control, the oscillatory behavior of a two-level quantum system can be produced in a Rabi experiment. The amplitude of the pulse applied to the qubit determines the speed of rotation, i.e., the Rabi frequency. By applying pulses with increasing duration, the qubit angle of rotation is increased, producing a typical oscillating pattern. In this experiment, the qubit is first initialized to state |0 and then excited by a rectangular microwave pulse with a given duration, and finally, the quantum state is read out. By varying the pulse duration and averaging the results over multiple runs, a Rabi frequency of 1 MHz and 400 kHz at 13.4 GHz (RF-low output) and 17.5 GHz (RF-high output), respectively, has been measured (see Fig. 22). A similar performance obtained with the room-temperature control validates the effectiveness of the cryo-CMOS controller. The visibility of the adopted Elzerman readout [47], i.e., the difference between the highest and lowest probabilities obtained after readout, is affected by noise on the qubit device gates. To improve the readout visibility, a bandpass filter (BPF) with 2-GHz passband has been added to the chip output to remove out-of-band spectral content. As shown in Fig. 22, this resulted in an improved readout visibility compared to [18], comparable to that obtained with the room-temperature control. The currently used discrete fixed-frequency BPF could be replaced by an surface acoustic wave (SAW) filter on the PCB or by an on-chip higher-order reconstruction filter and/or by a passive filter at the mixer output when the frequency of qubits is fixed to a certain range.

B. Ramsey-Style Experiment
To demonstrate coherent qubit control over two axes, a Ramsey-style experiment is carried out [6]. Here, the qubit is initialized to state |0 and two rotations around the X-axis are then applied (R X ((π/2))) sandwiched by a Z-gate of varying angle from 0 • to 360 • (R Z (θ )). This resulted in a cosinusoidal variation in the measured |1 probability (see Fig. 23) as expected. The X-rotation is implemented by a microwave rectangular pulse with a duration directly proportional to the rotation angle. Since the electron rotates around the Z -axis under the influence of an external magnetic field, a Z -rotation can be achieved by waiting for a certain time proportional to the rotating angle, without generating any signal. However, in this experiment, the Z -rotation is implemented by updating the reference phase of the NCO (applying a digital phase offset), which continuously keeps track of this phase evolution. The experimental data closely tracking the theoretical expectation prove coherent qubit control and the capability of correctly executing any type of single-qubit gate.
Based on the measured electrical performance of the controller and co-simulations with the qubits [48], we expect to achieve the targeted fidelity of 99.99 %. Ultimately, a randomized benchmarking experiment should be performed to measure the control fidelity. In this work, a Ramsey-style experiment has been employed to demonstrate the controller capabilities, specifically the ability to perform coherent operations and the ability to perform software Z -rotations.  RF output to support multiple qubit technologies, frequency multiplexing for scalability with low power consumption, and a digitally intensive back end with an arbitrary-waveform generation memory of >40k points and the support of an instruction set for low-latency quantum-algorithm execution.

VII. CONCLUSION
By leveraging their very large scale of integration, cryogenic CMOS circuits can help solve the interconnect bottleneck between the quantum processor and its control electronics, thus enabling to scale up the number of qubits in quantum computers. The cryogenic microwave signal generator demonstrated in this work comprises an integrated digital controller that can translate qubit gate operations into the microwave signals necessary for the execution of quantum algorithms. Although the qubit fidelity limits the performance of experimentally driving a spin qubit, the chip is capable of controlling 128 qubits with a 99.99% theoretical fidelity due to the spectral purity of the generated signals. The achieved power efficiency (12 mW/qubit) enabled by a digitally intensive architecture and the frequency multiplexing allows for operating the chip at 3 K within the cooling capabilities of standard cryogenic refrigerators. This paves the way toward large scale-quantum computers exploiting control electronics and qubits operating in close proximity at a similar cryogenic temperature. Her research interests include field-programmable gate array (FPGA) and chiplet co-packaging, smart chiplets leveraging machine learning and artificial intelligence, low-power digital CMOS design, circuits and systems for advanced wireless communication, wireless security, cryptography, signal/image processing, and design of intelligently adaptive radio systems. She has published over 40 articles and has filed 22 patents. Her research focuses on distributed/non-distributed massive multiple-input-multiple-output (MIMO) circuits and architectures, high-frequency wireless control for quantum computers, adaptive and deep learning-based architectures and systems for next-generation intelligent wireless systems, cryogenic CMOS circuits, and 2-D/3-D multi-die heterogeneous integration.
Dr. Sheikh has received multiple academic/industry/publication awards. She received the Chancellor's Medal for her B.Eng. degree. His group at QuTech demonstrated the first universal quantum gate set in germanium and realized quantum operations above 1 K with silicon qubits.