A 40-nm Cryo-CMOS Quantum Controller IC for Superconducting Qubit

This article presents a cryo-CMOS quantum controller IC for superconducting qubits. The proposed globally synchronized clock system internally generates different local oscillator (LO) frequencies using multiple phase-locked loops (PLLs) driven by a common reference clock. It provides flexibility in spectral management as well as scalability for expansion to a large-scale quantum controller. The test chip includes two PLLs, four pulse modulator channels, and two receiver channels. Implemented chip in 40-nm CMOS shows full functionalities at 3.5 K. The designed pulse modulator circuits are verified with the specifications for expected fidelity of 99.99%.


I. INTRODUCTION
W ITH a remarkable achievement in the number of superconducting qubits, the next-step research toward the realization of a fault-tolerant scalable quantum computer is addressing challenges in scalable hardware system development [1], [2]. The superconducting qubit is an LC resonator with a nonlinear inductor formed by the Josephson junction whose inductance is quantized by energy states [3], [4], [5], [6]. The superconducting qubit can be designed to have a specific natural frequency that makes a transition between the states |0 and |1. The state of the qubit can be controlled through microwaves at the natural qubit frequency [7].
The state of a single qubit can be represented as a point on a Bloch sphere. Controlling the state corresponds to a rotation of the point on the Bloch sphere [8]. The phase of the driving microwave signal at the qubit frequency is the angle of the rotation projected on the XY -plane [see  Fig. 1(a)]. The total amount of microwave energy determines the amount of the rotation [6], [7], [8], [9], [10], [11]. The natural frequency for the superconducting qubit is typically designed to be 4.5-6 GHz with a difference of tens of MHz between neighboring qubits. The state of a qubit is read through an additional resonator whose frequency is tuned at 6.5-8 GHz. Since the resonator is coupled to the qubit, the state of the qubit would be reflected at the resonant frequency through dispersive interaction [6], [7], [8], [9], [10], [11], [12]. Therefore, the state of a superconducting qubit can be sensed by measuring either the magnitude or the phase of the probing signal coupled with the resonator [see Fig. 1(b)]. The requirement of multiple cables per qubit for control and readout becomes the major challenge in the system development as the number of qubits increases. To reduce the number of cables for the readout,the states of multiple qubits with different resonant frequencies could be simultaneously read using a single microwave feedline [13], [14]. However, reducing the number of cables for the readout only is a temporary measure, and the total number of cables to the room temperature eventually increases proportionally to the exponentially growing number of qubits. This issue in scalability leads to extensive research to develop a microsystem sitting at the 4 K stage in the same dilution refrigerator to control thousands or millions of qubits at 10 mK (see [13], [15]. There have been cryo-CMOS IC implementations of microwave pulse modulators for the spin qubits [16], [17], [18], [19]. The pulse modulation with the direct frequency synthesis demonstrated the feasibility of driving multiple spin qubits using frequency-division multiplexing (FDM) and timedivision multiplexing (TDM) [16], [17], [18], [19]. Reference [16] reported the first implementation of a cryo-CMOS system-on-chip (SoC) including both read and write chains for the spin qubits. However, the previous works used external local oscillator (LO) sources for frequency synthesis. External feeding of LO requires a precise matching; otherwise, a failure would cause a significant LO leakage to the qubit driving channel through PCB traces, resulting in an ac-Stark shift of the qubit state by unwanted tones at near-resonance.
For the superconducting qubits, the Josephson parametric amplifier (JPA), as the first-stage amplifier in the readout channel, shows a near-quantum-limited noise level in a bandwidth of ∼GHz [20]. The FDM-based readout has been demonstrated using a wideband JPA at 10 mK followed by an HEMT-based low-noise amplifier (LNA) at 4 K [21]. Since this low-noise legacy receiver chain greatly relieves the burden on noise performance, the receiver frontend integrated into the CMOS SoC at 4 K can support a stable wideband amplification for the subsequent processing by analog-to-digital conversion.
Unlike the readout, however, multiple access has been hardly considered for the driving of the superconducting qubits. Compared to the qubits based on the spin energy (e.g., semiconductor, trapped-ion, and diamond), the superconducting qubit has a much shorter energy relaxation time (T 1 ) of 10−100 μs. Thus, superconducting qubits should be driven in a short pulse of 15−30 ns for each gate operation so that many gate operations could be completed within the given coherence time. However, the shorter pulsewidth brings about a sacrifice of spectral selectivity resulting in a nonnegligible spectral leakage. Considering a finite anharmonicity separation of 150−350 MHz between the frequencies ω 01 (for |0→|1) and ω 12 (for |1 → |2) as well as the spectral leakage, the use of FDM or TDM with a single RF cable would not be feasible. Therefore, the major challenge in the driving of superconducting qubits is the low-power implementation of individual pulse modulators per qubit [22], [23], [24]. In addition to the challenges in low-power implementation, the internal generation of thousands or millions of qubit frequencies also presents a still another tough problem in spectrum management. The previous works assume the use of a common LO frequency for the single-sideband (SSB) mixing in direct frequency synthesis [23], [24]. The synchronized driving of the thousands or millions of internal LO buffers brings about a huge simultaneous switching noise. The synchronized LO tone will dominate the spectrum of power lines over the whole chip, resulting in a significant LO leakage that could destroy the gate fidelity.
This article presents a cryo-CMOS controller IC with a globally synchronized clock system. It includes multiple phase-locked loops (PLLs) for the internal generation of LOs. There has been previous demonstration on including a PLL for internal LO generation for the readout of spin qubits with FDM [25]. This work proposes employing multiple PLLs, which enables managing different LO frequencies with each PLL responsible for one LO frequency for a group of qubits, where the group can be selected out of thousands or millions of qubits. Employing multiple PLLs enables managing different LO frequencies with each PLL responsible for one LO frequency for a group of qubits, where the group can be selected out of thousands or millions of qubits. As the number of PLLs for different LOs increases, the LO leakage is effectively suppressed by uniformly distributing the LO power in a band of frequency. It provides a similar effect to the spread spectrum clock generation which is a common way to overcome the issues of electromagnetic interference (EMI) in wireline communication. In addition, the internal generation of LOs by the PLLs provides an extra benefit of global synchronization. By applying a common reference clock (∼500 MHz) to all the PLLs, the whole system using different LOs can be synchronized to the given reference clock. Though this prototype chip demonstrates a small system with two PLLs for thepulse modulator and the receiver channels, respectively, the architecture can be readily expandable to a larger system with a number of PLLs. Implemented chip in 40-nm CMOS shows full functionalities at 3.5 K. This article is an extended version of one presented at the international solid-state circuits conference (ISSCC) [26]. The original chip at ISSCC has a problem with configurations, so only a few configurations were allowed in the test. The measurements reported in this article have been conducted with a revised chip in which the configuration block was modified to have larger setup/hold margins. It enables an optimal setting of PLLs and results in an improvement of the integrated jitter from 274 to 115 fs.  modulators, two sets of receiver frontend, and two digital PLLs (DPLLs). These blocks and a memory for storing instructions and data are managed by a global controller operating in a single clock system. The pulse modulator uses direct digital synthesis (DDS) and SSB mixing to generate the frequency and the phase needed for individual qubit control. As a result, an internally generated intermediate frequency (IF) is added to or subtracted from an LO frequency provided by one of the two PLLs. The two integer-N DPLLs receive a common reference clock. They have an identical architecture with programmable bandwidth and multiplication factors. One is fora low-band (LB) PLL for the qubit driving, and the other is for a high-band (HB) PLL for the qubit readout. Each has an LC-based digitally controlled oscillator (DCO) with a dedicated frequency tuning range. The LB-and HB-PLLs have lock ranges of -13 and 1-16 GHz, respectively. Their outputs are divided by 2 to provide quadrature phases. The pulse modulator selects one of LB-and HB-PLL as the LO source to synthesize pulses for control and readout, respectively. The LO frequency is divided by 8 or 16 to use as the system clock.
The role of the receiver chain is to develop the inputsignal to have an appropriate voltage swing for the baseband processing. The receiver chain consists of a CMOS LNA, a passive quadrature generation network, and two I/Q down-conversion mixers. The receiver operates at 5-7.25 GHz. The superposition of different tones under the FDM raises the peak-to-average power ratio (PAPR) of the input waveform. Therefore, the LNA is required to provide a stable wideband amplification with enough dynamic range. There are two known ways for the down-conversion: 1) differential RF with quadrature LO mixing and 2) quadrature RF with differential LO mixing. In this work, the latter was adoptedto reduce the number of LO buffers, hence reducing the power consumption. The quadrature-balanced layout matching was easier for the RF signal than for the LO signal that includes longer routings.

III. SPECIFICATION
The gate fidelity of the currently available superconducting quantum processor is at the level of 99.9% [27]. Thus, the target specifications for the overall circuit operation of the proposed work were set for an error rate of less than 0.01%. The error is caused by imperfections of analog circuit operation, finite resolution of digital control, and phase noise at the pulse modulator output. To achieve the target error rate of 0.01%, the effects of individual error contributors need to be much lower than 0.01%.

A. Digital Control
The DDS-based microwave frequency generation is performed by SSB mixing of a digitally synthesized IF and the LO. The waveform of the generated pulse output, P OUT (t), is where A(t) represents the shape of the pulse amplitude of the microwave and φ IF is the initial phase of the IF. The total amount of microwave energy in A(t) is determined by a combined control of the height and width of A(t). Controlling A(t) with more than 9b resolution is required for an error rate of 0.01% [8].
The phase error in φ IF should be less than 0.7 • for an error rate of less than 0.01% [8]. The phase of IF, φ IF , remains constant at P OUT (t) by continuous accumulation of a frequency code using a 15b digital accumulator. The 15b expression would be enough for the error rate of less than 0.01. For example, when driven with a 1.5-GHz system clock, the 15b resolution leads to an error rate of 0.002% with a Rabi frequency of 5 MHz [28].

B. Spectrum Management
The energy at unwanted tones needs to be suppressed by more than 40 dB for an error rate of 0.01%. The conservative target for the suppression of relative power at ω 12 needs to be set to 50 dB [8]. However, suppressing all the tones other than the target tone by 50 dB is very difficult because of the nonlinear operation of analog circuits, mismatches among the cells in the DAC, and various interferences from a number of other pulse modulators. It should be noted that the tone at ω 12 causing |1 → |2 transition is the major factor that degrades the gate fidelity. The proposed architecture of multiple LOs gives great benefits in avoiding the spectral leakage at theω 12 .
Since ω 12 is at 150−350 MHz lower than ω 01 , the LO can be set to be far from ω 12 . In addition, by positioning the synthesized ω 01 at the left-hand side of LO in SSB mixing, the chances of the image frequency of the SSB falling at ω 12 can also be avoided.

C. Phase Noise
The individual effects of noise are reflected in the phase noise at the output of the pulse modulator. Since the phase noise indicates the spectral density of the synthesized output, the total noise power can be estimated by integrating the phase noise up to a band of interest (∼10 MHz offset). The phase error of less than 0.7 • for an error rate of less than 0.01% corresponds to an integrated jitter of 350 fs rms at 5.5 GHz [8]. The jitter at the synthesized output is affected by both LO and IF paths, including jitter terms in (a) leads to where j LO (t) and j IF (t) represent jitters at IF and LO frequencies, respectively. In the LO path, extra circuit-induced thermal noise would be added to the initial phase noise of LO at the PLL output. They are caused by an inevitable delay by routings, LO buffers, and SSB mixer circuits. The jitter at P OUT (t) is also affected by the IF path. The sinusoidal waveform of IF is built by the sampling of DAC. Since the system clock for the sampling is provided by dividing the same LO, the low-frequency parts of j LO (t) and j IF (t) are strongly correlated in a frequency band of interest up to 10 MHz, or Thus, the base for the best achievable integrated jitter at P OUT (t) is the same as that of the PLL output. Note that if j LO (t) and j IF (t) are not correlated, that is, using two different sources for the IF and the LO, the integrated jitter , depending on the selection of the band side of SSB mixing. Taking individual noise contributions into account, the target integrated jitter at the PLL output must be much smaller than 350 fs rms for an error rate of 0.01%. The target jitter for the PLL can be reasonably set to about 100 fs rms [8].

D. Receiver Chain
For a wideband operation, a multistage topology of LNA with a transformer load was proposed to achieve a sub-1-dB NF at cryogenic temperature for a direct readout of spin qubits [29]. This work adopts a similar multistage topology to support the wideband readout of superconducting qubits for the FDM. As the inductive source degeneration provides a favorable structure for the frequency tuning as well as for the noise matching, the three stages were, respectively, designed to be tuned at different frequency bands for a wideband operation. In addition, care needs to be taken for the amplification of −80 dBm (V P-P = 63.2 μV at 50 ) to have an appropriate output voltage swing for the next baseband processing. In this work, the target gain of the LNA was set to 40 dB.
The receiver chain assumes that −125 dBm output from a qubit readout resonator [30] is preprocessed by the traditional low-noise legacy amplification stages, for example, a 15 dB-gainJPA at 10 mK as the first stage and a 30 dB-gain HEMT LNA at 4 K as the second stage. Thus, the role of the receiver chain is to amplify the −80-dBm signal using a wideband CMOS LNA as the third stage so that the next stage can support the FDM-based readout. Considering the combined NF of three amplification stages, , the effect of the third stage on the overall NF in a whole readout system is less than 0.1 dB unless the NF of the third stage does not exceed 30 dB. So, the noise performance is no more a major factor in the design of LNA. In the down-conversion mixer, gain and phase mismatches in the double-balanced circuit structure can also induce a finite image rejection ratio (IRR). Though it raises the noise floor by a factor of 1 + 10 ((−IRR/10)) , its effect on the input-referred noise of the receiver chain becomes reduced to less than 0.1 dB by the LNA gain of 40 dB. The whole receiver chain in this prototype was designed to dissipate less than 20 mW. It would be further reduced by the optimization of circuits in continued research.
According to the analysis of the required measurement time for superconducting qubits, a high fidelity can be achieved for the readout of multiple qubitswith 30-40 MHz separations between the resonator frequencies [21]. Considering the IF band of less than 500 MHz, the design of the proposed receiver chain assumed, supporting an FDM-based readout up to 8 qubits with 50-MHz separations.The superposition of different tones modulated in the way of phase shift keying (PSK) raises the PAPR in the input signal. The PAPR with an 8-qubit FDM is about 10 dB [31].The required P1dBin of the LNA can be derived as −80 dBm (each tone) + 10 dB (∼8 tones) + 10 dB (PAPR) = −60 dBm. Using commonly used calculations for IIP3, which is 9.6 dB higher than P1dBin, the corresponding IIP3 should be −50 dBm. Note that IIP3 is a standard term for the indication of linearity with a two-tone test. According to the relationship in the signal-to-distortion ratio between two-tone and multiple tones [32], an additional 3 dB margin is required on the IIP3 for the case of eight tones to secure the same level of a signal-to-distortion ratio at the band edge. It leads to an IIP3 target of −47 dBm.

IV. CIRCUIT DESCRIPTION
A. Pulse Modulator 1) Direct Digital Synthesizer: The frequency and the phase of the IF tones are processed in the digital domain. Fig. 4 shows the overall block diagram of the proposedpulse modulator that consists of a digital part for IF synthesis and an analog part for DAC-based sinusoid generation. The phase and the frequency of IF are generated using a 15b rotational phase accumulator (PACC) operating at the system clock which is divided by N from the PLL output (N = 8 at LB-PLL, N = 16 at HB-PLL). Starting from an initially given 14b phase code, the PACC accumulates a 14b frequency-code word (FCW). Thus, the PACC output represents the phase at the frequency given by the FCW.
The digital processing gives flexibility in supporting various options for qubit control. The information includes pulse duration, pulse shape, and the number of the same pulses to be consecutively repeated in time. A local register file is also employed to store 16 sets of FCW and the amount of phase shift so that thepulse modulator could support consecutive 16 switching of the frequency and the phase of IF in real-time. Extra 3b are allocated to provide eight cases of the phase shift with aπ/4 step. A 1b, named sideband, indicates the selection of the high or the low side in the SSB mixing. This sideband selection is realized by switching the connections of the in-phase (I ) and quadrature-phase (Q) of the IF.
Note that the pulse modulator produces the microwave output only when the qubit is used in a quantum gate. It should be synthesized on demand through an SSB mixing using an LO. Since a qubit can be involved in a number of quantum gates, the phase evolution of the used qubit should be tracked in time so that the qubit can be used by other quantum gates later. Thus, the phase of the driving pulse for a qubit should be coherent throughout the quantum algorithm until the state is measured. In the DDS approach, it can be achieved by holding the phase of the IF signal instead of holding the phase of the microwave frequency. When a qubit enters on standby until the next use, the phase-save is turned on. It makes the PACC only keep accumulating the given FCW.
We propose a triangular transformation that converts the PACC output to a triangle-shaped transition. It is a digital preprocessing to help build a sinusoidal shaping in digital-toanalog conversion. For the implementation of the triangular transform, the PACC employs two accumulators (ACCs), UP AC C and DN AC C, respectively. The former increases the output by adding the FCW, while the latter decreases it by subtracting the FCW starting from the same initial phase. The triangular transform is realized by alternately taking the next 8b from the outputs of the two accumulators according to the MSB of the U PACC output. The 8b becomes four MSBs and LSBs to the following DAC (see Fig. 5).
2) DAC: Fig. 6 shows a detailed circuit schematic of DAC. The four MSBs can be interpreted as quantized 16 levels of the triangular waveform, while the four LSBs indicate 16 cases of residue. The MSBs are input to a nonuniform DAC. The LSBs are processed by a Line-to-Curve block for curvature compensation and then passed to an interpolating DAC. The nonuniform DAC has 16 current sources driven by a 16b thermometer code converted from the MSB code. The 16 current sources are responsible for 16 nonuniform steps, 1-16, which are precalculated to effectively approximate a sinusoid. Each step corresponds to an incremental amplitude change of  the sinusoid for a phase change of π/16, respectively. As a result, the triangular transition in the digital domain can be translated into a sinusoidal transition in the analog domain. Though the number of steps is only 16, the optimization of custom-defined steps does not have a limit in resolution. Employing only 16 current sources with differential switching enables a high-speed DAC operation up to 1.5 GHz. Bias for the overall current level can be adjusted with an 8b code.
The quantized steps are further fractionalized by the interpolating DAC driven by the four LSBs. One of 16 current sources to be fractionalized is bypassed by the interpolating DAC. The selected current source sees the input resistance of the interpolating DAC. Since the differential switch turns on only one transistor, the current source sees a constant input resistance of 1/(16·g m ) regardless of the interpolating code (B0-B3), while the ratio between the two input resistances to OUT and OUT is given by the code. A linear interpolation divides the selected current into two branches of the differential outpu according to the ratio. Therefore, the interpolating DAC adds the residue to the output of the nonuniform DAC. The differential output of the whole DAC becomes where k represents the value of MSB. The linear interpolation eventually causes a systematic error because it approximates the curved shape with a straight line (see Fig. 7). Simulation reveals >0.5% error in the edge codes ( 1 -4 and 13 -16 ) while it keeps <0.2% error in the middle codes ( 5 -12 ). The large error in the edge codes is caused not only by an inherent difference between the line and the curve, but also by a failure of ideal interpolation. As the difference between OUT and OUT increases, the transistor whose drain node is connected to the lower voltage would fail to keep operating at the saturation region. It becomes more serious in edge codes where the voltage swing between OUT and OUT approaches the maximum. Those errors can be reduced by preprocessing the LSB values with a digitaldomain Line-to-Curve mapping. A lookup table (LUT) is employed for this compensation. The LUT is used only for the eight cases of the MSBs ( 1 -4 and 13 -16 ). The LUT receives 4b LSBs as the address and outputs a 4b content.
There are two LUTs for the concave and convex conversions, respectively. With the total cost of only 128b (=2 LUTs × 16 contents × 4b) for the two LUTs, the error can be reduced to less than 0.2%. Equation (1) is then modified to 3) Envelope Shaping: The DAC output is followed by an optional raised-cosine envelope shaper before being mixed with quadrature LO phases. For the raised cosine filtering, a shunt-based envelope shaping is applied to the output of the DAC. Fig. 8 shows a circuit schematic for the envelope shaping. Since the DAC generates a sinusoidal current output in a differential form, the amplitude modulation is performed by controlling the conductance of MOS-based shunt resistance. It can be simply achieved by monotonically decreasing or increasing the number of turn-on MOS switches implemented with 11 predefined-sized transistors. The switching is driven by the same system clock which is divided from the LO. The pulse modulator circuit supports two cases of pulse shaping, that is, the raised cosine by monotonic switching and the rectangular by setting all the 11b to 0. The pulse shaping does not require additional power consumption in the analog part. Advanced pulse shaping for derivative removal by an adiabatic gate (DRAG) to reduce spectral leakage was not considered in this work.

4) Considerations for Cryogenic Circuit Design:
There have been a number of literature on the characterizations of CMOS devices at cryogenic temperature. The cryo-CMOS experiences all the increases in threshold voltage, mobility, and subthreshold slope [15], [33]. With these characteristics, the functionality of CMOS circuits at 4 K has been verified in the prior works [15], [16], [17], [18], [19], [22], [23], [24], [27]. The increased mobility results in about a 30% increase in the on-current of switches, enabling a high-speed digital operation at a faster clock [15], [33]. On the other hand, the increased threshold voltage can cause increased sensitivity to supply variations of digital circuits. So, the digital blocks in this work were designed to have more setup and hold margins. For analog circuits, the increase in the subthreshold slope can degrade the device matching characteristics. To mitigate the effect of mismatches among the current sources in DAC, a larger voltage (0.62 V) was used as the gate-to-source bias with long channel devices (L = 200 nm) for the current sources. The increased mismatch in the double-balanced mixer circuit causes a larger LO leakage. To compensate for the effect of mismatch, each DAC accompanies an extra branch for a variable current offset tuned by a 3b calibration code.

B. Phase-Locked Loop
The operation of an analog PLL strongly relies on the optimized design of the charge pump and the loop filter. However, the CMOS at cryogenic temperature suffers from significant changes in threshold voltage, mobility, subthreshold slope, as well as degraded matching characteristics [15], [33], [34]. Since the SPICE model for cryo-CMOS was not available in this work, we employed an integer-N DPLL architecture that can be characterized by programmable loop filter coefficients and a feedback factor.The two PLLs for HB and LB receive a common reference clock in a range of 300-500 MHz for global synchronization of the whole system to a single time reference. Each PLL includes an LC-DCO with band-tuning MIM capacitors driven by a 5b code (see Fig. 9). The frequency of LC-DCO is controlled by a 127b thermometer code with each bit setting the bias of individual MOS capacitors. This digital control has a frequency resolution of 0.5 MHz in the selected frequency band. The TDC was designed to have a time resolution of 0.5 ps in a conversion range of 10 ps. The loop dynamics are programed by two coefficients of the loop filter, α and β, responsible for the proportional and integral paths, respectively. The former dominantly affects the stability, while the latter effectively changes the loop bandwidth.

C. Receiver
The receiver chain consists of an LNA, a passive quadrature generation network, and two I/Q down-conversion mixers (see Fig. 10). Since the commercial foundry does not provide the process design kit (PDK) for cryo-CMOS, sensitive RF circuits should be designed to secure stable operation using the available models at room temperature. To reduce the sensitivity to the threshold voltage change, the biases were set for the transistors to operate with a higher gate-to-source voltage. Careful considerations were also given to keep enough margin for stability in the design of LNA since it would experience an increased gain at cryogenic temperature. However, overall improvements in gain and noise performance are expected by an increased transconductance and a decreased passive loss at the cryogenic temperature [33], [35].

1) Low-Noise Amplifier:
The LNA is formed with three stages of a single-ended cascode amplifier with an inductive source degeneration. Since the cascode configuration inherently curtails the coupling between the input and the output, it improves the stability and gives more degrees of freedom for input and output matching. The inductive source degeneration has been widely adopted to achieve the input matching with good noise performance. The real part of the input impedance is given as where ω T represent the transit frequency [36]. The nonzero imaginary part of the input impedance can be compensated by adding a reactance part (e.g., an inductor in series). The same circuit topology is used for the three stages with interstage matchings. Placing similar inductors at the drain and the source in each stage gives extra benefits of a similar layout between the upper and the lower parts of the LNA. It further improves stability by balancing the feedback paths of opposite polarity. The first stage of LNA includes additional circuits for the protection of electrostatic discharge (ESD). A shunt LC and a series inductor are employed to achieve the ESD protection as well as a wideband input matching. Simulation at room temperature shows a gain of over 30 dB in a wideband of 5.5-7 GHz with S 11 of <−10 dB. The NF is <3 dB in the frequency band of interest [see Fig. 11(a)].
2) Quadrature Generation Network: The mixer projects the RF signal to the domain of LO for the down-conversion. There are two ways to perform the down-conversion: 1) mapping differential RF signal to quadrature LO and 2) mapping quadrature RF signal to differential LO. In this work, the latter was adopted to save power consumption by using only two LO buffers in the mixer stage. In addition, the quadraturebalanced layout for the individual RF signal is easier than for the common LO, which would be used by multiple pulse modulators. The quadratics of RF were obtained by a passive quadrature generation network with coupled inductors [37]. The single-ended LNA output is first converted to a differential signal by a balun. It is then subsequently processed with two transformers to produce quadrature RF signals. A shunt capacitor is added at each output for fine adjustment of the phase difference between I and Q. This quadrature RF generation scheme based on the passive transformer network causes inevitable mismatches in gain and phase as the frequency band is widened. Simulation shows that the designed quadrature network has a gain mismatch of <2 dB and a phase mismatch of <5 • in a frequency range of 5.2-7.3 GHz [see Fig. 11(b)]. However, the effect of mismatches can be easily corrected by baseband digital processing after analogto-digital conversion. The gain and phase mismatches can also induce a finite tone at the image frequency. Calculation from Section III-D shows that the mismatches result in an IRR of 18.2 dB. This corresponds to a degradation of NF by less than 0.1 dB. Two double-balanced mixers are used for the downconversion. Fig. 11(c) shows simulated voltage conversion gain using high-side LO frequencies of 5-7 GHz, respectively, when converted IF is in 100-500 MHz.

V. MEASUREMENT
The chip was implemented using a standard 40-nm CMOS process. Fig. 12(a) shows a microphotograph. The chip and electrical components were mounted on a PCB. The PCB is fastened to a frame made of copper and brass to be fixed on the 4 K flange in the dilution refrigerator for thermal anchoring. The SMA connectors are used to feed power and microwave signals to the PCB [see Fig. 12(b)]. Fig. 12(c) shows the test setup. Measurements started when the refrigerator reached 3 K. The temperature was maximally elevated to 3.5 K by thermal dissipation during the measurements. The measurements include the effect of cable connections from 300 to 4 K followed by connections back to 300 K. The loss by cables and PCB traces is calibrated by testing the difference between the cases with and without the chip on PCB. The pulse modulator dissipates 8.3 mW by the analog part and 3.8 mW by the digital part while operating at 1.25 GHz. The power consumption by the LB-and the HB-PLL is 12.5 and 14.6 mW, respectively. The receiver chain dissipates 20 mW per channel. Fig. 13 shows measured results while two adjacentpulse modulators were operating simultaneously. The pulse modulators independently control the frequency, phase, and pulsewidth. Fig. 13(a) is when the LB-PLL was used as the LO source. The LB-PLL was configured to generate 10 GHz. It is then divided by 2 for the quadrature LO at 5 GHz. The system clock for digital circuits is 1.25 GHz that is obtained by dividing the LO by 4. The two pulse modulators generate IF frequencies of +100 and −200 MHz, synthesizing 5.1 and 4.8 GHz, respectively, using the single LO at 5 GHz. The output shows an IRR of >42 dB and an LO rejection ratio (LORR) of >37 dB. Fig. 13(b) is when the HB-PLL was used as the LO source. The output of 15.16 GHz is divided by 2 for the quadrature LO at 7.58 GHz. A division by 8 provides the system clock of 0.945 GHz. The two pulse modulators synthesize +7.396 and 7.91 GHz with IF frequencies of −184 and +130 MHz, respectively. It shows an IRR of >40 dB and an LORR of >37 dB. The interchannel interference (ICI) needs to be also considered because its tone might be close to the frequency of qubit under control. The suppression of ICI was >33 dB in this work. Since the ICI is mainly caused by on-chip coupling from other output through the common ground line, it could be further reduced by the separation of the ground of the last-stage transformer of the pulse modulator. The bandwidth of the pulse modulator is 4.6-6.3 and 6.7-8.1 GHz when used with LB-and HB-PLL, respectively. The measured LORR and SFDR are summarized in Fig. 14(a).

A. Pulse Modulator
The maximum output power from the pulse modulator is over −17 dBm in the whole frequency range [see Fig. 14(b)]. The power dissipated by the proposed pulse modulator for the generation of the output power of −20 dBm is kept to be less than 5 mW [see Fig. 14(c)]. Fig. 15 shows the spectrum of the driver output measured with a resolution bandwidth (RBW) of 30 kHz. However, the noise floor in this figure has little meaning since it was limited by the minimum level given by residual noise in the equipment at room temperature. Nevertheless, the measurement shows that the power at the signal is enough to thermalize with attenuators for driving the qubits at 10 mK. In this implementation, the pulse modulator was designed to deliver sufficient power so that the test board could be verified both at 300 and 4 K. Since the driving power for qubit control should be in the level of −70 dBm (at Rabi frequency of 50 MHz) [38], the use of additional attenuator was assumed when the chip is used at 4 K. The power consumption by the pulse modulator might be further reduced if the chip is dedicated for use at 4 K. However, there should be extra efforts for more precise calibration to scale down the LO leakage as well. Fig. 16 shows measured output waveforms for the two cases of pulse shaping, rectangular, and raised cosine, respectively. Fig. 17 shows the phase noise of the synthesized output of the modulator. It is generated with 6.08-GHz LO from LB-PLL and −160 MHz IF. It includes all the noise from the LB-PLL, LO buffer, and the pulse modulator. The total integrated jitter from 100 Hz to 20 MHz is 132 fs rms that meets the specification for the 0.01% error rate.

B. Phase-Locked Loop
The two PLLs show similar noise performance. Fig. 18 shows the phase noise of the LB-PLL output at 12.16 GHz. The output is externally monitored through a division by 16. The integrated jitter is 115 fs rms , achieving a figure-of-merit (FoM), 10 × log(σ (s) 2 P(mW )), of −247.8 dB. Fig. 19 is the phase noise of the HB-PLL output at 16.08 GHz. The integrated jitter is 119 fs rms with an FoM of −246.7 dB.

C. Receiver
The measurement includes SMA connectors and traces on the PCB. Simulation results at room temperature are also added for comparison. Fig. 20 shows measured single-tone gain and NF of the whole receiver chain as the input signal frequency varies from 4.5-7.5 GHz. Since the equipment is typically terminated at 50 , whereas the designed receiver chain has a 500--load impedance at the output, an external matching network for a 500-50-transformation was implemented on PCB to measure the voltage gain of the receiver chain. The measured result was then converted on a 500-domain to calculate the voltage conversion gain. Note that the output of the receiver chain is at the IF-band (100-500 MHz). Measuring a low-frequency output is not sensitive to the frequency dependence of the matching network. The gain and noise performances strongly rely on the intrinsic device characteristics at 3.5 K. The increased device gain and decreased loss of passive components at cryogenic temperature elevate the conversion gain by more than 5 dB in the band of interest [33], [35]. The Y -factor method was used for the measurement of NF. To get rid of the effect of fixtures (cables, connectors, etc.), identical routings from and to the chip on 3.5 K were also installed for the post de-embedding process. The measured minimum NF was 1.1 dB at 5.5 GHz. Fig. 21 shows linearity characteristics measured with 50-MHz spaced two tones. The increased IMD3 compared to the simulated value at room temperature can also be explained by the increased conversion gain. For the verification of the receiver chain in the FDM environment, eight tones from test equipment were applied as the input. Due to the limit in the frequency range of the available equipment, the eight tones are separated by 10 MHz instead of 50 MHz. Based on similar results from two-tone tests with 10 and 50 MHz spacing, this   result would be well matched to the case with 50-MHz spacing. Fig. 22 shows the spectrum at the receiver output when the average input power of eight tones is −57.5 dBm. The  input power was intentionally raised during this measurement so that the distortion tone below the residual noise floor of the spectrum analyzer could be elevated and clearly seen. In this case, the signal-to-distortion ratio at the band edge was 27 dB. This corresponds to the worst-case IIP3 of −44 dBm (−57.5 + 0.5 × 27) for the case of band edge, still meeting the target (>−47 dBm).

D. TX-to-RX Leakage
There would be a leakage from the qubit driving path to the qubit readout path when their circuits operate simultaneously. Fig. 23 showsthe measured output of a receiver chain when the input signal is at 6.390 GHz, while a pulse modulator generates 5.882 GHz. The applied power at the receiver input and the generated pulse modulator output were −80 and−39 dBm, respectively. The receiver LO for the down-conversion was 6.032 GHz. The two tones at 150 MHz (−64 dBm) and 358 MHz −41 dBm) correspond to the unwanted TX-to-RX leakage and desired input, respectively, showing a 25-dB suppression of the TX output during a 39-dB amplification of the RX input. Since the signal level (−41 dBm) is 23 dB higher than the leakage level (−64 dBm), the TX-to-RX leakage barely affects the signal readout. The modulator output of −39 dBm is still enough level to drive qubits with a subsequent attenuation. The small separation of 150 MHz between the TX and RX bands can be regarded to be the worst case. When the same testwas conducted with a 1500-MHz separation, the TX-to-RX leakage was further reduced from −64 to −86 dBm. Table I compares the performance with the previously reported works. This work proposes a quantum controller SoC architecture synchronized with a single time reference by generating multiple LOs using on-chip PLLs. The chip is verified at 3.5 K, while two adjacent driving pulse modulators were operating simultaneously.

E. Comparison
To claim fidelity, the term "expected fidelity" was used in this work considering that the experiments with real qubits  were not conducted. In addition, its use is limited for the pulse modulator since there has been no clear specifications with supporting measurements for the fidelity of the readout under the FDM environment. The 99.99% fidelity is based on the spectral purity of the pulse modulator output for driving a single qubit. A high-quality factor (>10 7 ) of a superconducting qubit indicates that the band of interest is narrow. The tone at ω 12 , if it exists, would be the major factor that degrades the gate fidelity. Therefore,the required SFDR of each pulse modulator fora single qubit is that the power at ω 12 should be lower than 40 dB compared to the power at ω 01 , while keeping a good SNR and a low phase error for the 99.99% fidelity. Though the ICI and LORR in this work exceed allowable levels, they can be avoided by the flexibility in the selection of LOs with the proposed architecture using multiple PLLs.

VI. CONCLUSION
The two major challenges in the design of a cryo-CMOS quantum controller SoC for superconducting qubits are 1) lowpower implementation of the pulse modulator per qubit and 2) suppression of unwanted tones at ω 01 and ω 12 . There have been two approaches to addressing those challenges. The one is to assign a dedicated LO per qubit for driving pulse modulation. Since the LO frequency itself is the qubit frequency (ω 01 ), this approach provides benefits in spectrum management by minimizing the chances of frequency mixing with different tones. However, the LO should not be turned off because the phase of the driving pulse for a qubit should be kept to be coherent throughout the quantum algorithm. Therefore, the power consumption from the LOs for individual qubits would eventually exceed the cooling power budget as the number of qubits exponentially grows. In addition, since the LO frequency is tuned to ω 01 , a small leak of the LO power can even result in an unwanted state transition. Thus, the LO must be stringently isolated from the qubit when it is not driven. The other approach is the generation of frequencies with DDS through SSB mixing using a common LO. Since the DDS supports frequency generation on demand, the pulse modulator can be turned off when the qubit is not used in a quantum gate. Though this approach can provide effectiveness in power management, the use of common LO can bring about a large synchronized noise at LO frequency.
This work proposes an approach with PLLs for the internal generation of multiple LOs. anaging different LOs with multiple PLLs provides an effective way of suppressing the LO leakage by uniformly distributing the LO power in a wide frequency range. It not only gives the flexibility in optimal profiling of spectrums for high gate fidelity, but also enables the scalability in building hardware for a large-scale quantum controller. The proposed architecture including two PLLs to generate two different LOs for 4.5-6 and 6.5-8 GHz was implemented in 40-nm CMOS. The chip is verified with full functionalities at 3.5 K, while two adjacent driving pulse modulators were operating simultaneously.