Broadband, High-Linearity Switches for Millimeter-Wave Mixers Using Scaled SOI CMOS

This work demonstrates new circuit techniques in distributed-stacked-complimentary (DiSCo) switches that enable picosecond switching speed in RF CMOS SOI switches. By using seriesstacked devices with optimized gate impedance and voltage swing, both high linearity and fast switching are possible. A theoretical analysis and design framework has been developed and verified through simulation and measurement through two broadband, high-linearity passive mixer designs, one optimized for linearity and the other for bandwidth, using a 45-nm SOI CMOS process. The mixers achieve <inline-formula> <tex-math notation="LaTeX">$P_{1dB}{s}$ </tex-math></inline-formula> of 16-22 dBm with <inline-formula> <tex-math notation="LaTeX">$IIP3s$ </tex-math></inline-formula> of 25-34 dBm across a bandwidth from 1 GHz up to 30 GHz. This performance exceeds prior SOI RF and microwave mixer performance by more than an order of magnitude and is comparable to III-V device technologies. The mixers include integrated local oscillator (LO) driving amplifiers for high efficiency operation and low total power consumption. DC power consumption ranges from 250 mW to 1 W for the LO driver. The integrated LO drivers demonstrate a pathway to on-chip LO generation with simplified matching to maximize LO power delivered to the input of the switch.


I. INTRODUCTION
R F SWITCHES are critical to transceiver architectures and are typically used in the front-end for switching matrices, mixers, and samplers. Moreover, reconfigurable, multi-band radios are placing more demands on switching as a feature to tune individual components. Depending on the requirements, different device technologies are demanded; for example, SOI CMOS offers the capability of RF and digital integration while GaN and GaAs devices excel for high power handling and linearity. By stacking SOI CMOS devices in series and adding a large gate resistance on each stacked device, SOI CMOS switches with P 1dB s up to 40 dBm are possible [1], [2]. However, the gate resistance limits switching time and creates a trade-off between linearity and switching speed. A large commercial market exists for SOI-based RF switches that are used in front-end filter banks and transmit/receive switches where they operate in quasi-static modes and fast switching is not required.
High linearity mixers are increasingly in demand with the growing use of arrays in commercial communication systems. In the presence of one or more strong signals at the antennas shown in Fig. 1, the combination of array and RF gain in the front end leads to as much as 15 dBm input power to the single mixer after combining depending on the size of the array [33]. In the case of digital arrays, no spatial filtering in the RF front-end demands additional linearity to prevent 3rd-order intermodulation distortion (IM3) products from appearing in-band and in-beam [34]. In addition, re-configurable, wide-band, and multi-standard arrays will be exposed to many interferers further driving the linearity requirements of array front ends [35].
This work proposes a gate capacitive element as shown in Fig. 1 (c), C gex , and device stacking similar to the approaches in power amplifiers [36], [37], [38], [39] to build a high-linearity SOI CMOS switches that are capable of fast switching necessary for passive mixers. A capacitive gate termination creates a frequency-independent voltage division which allows for high speed switching while maintaining linearity and achieving a FBW > 100% that is only limited by the intrinsic (gate) resistances. The drawback of this approach is that the gate-drive voltage must be proportionally increased to achieve the correct gate voltage swing, V gs , leading to greater power consumption. An analysis of power consumption and linearity suggests an optimization of the design in Section II. The linearity of the switch is proportional to local oscillator (LO) power as with other semiconductors, e.g., GaAs or GaN, with higher drive/breakdown voltages.
With recent advances in SOI CMOS amplifiers, high efficiency can be leveraged from integrated driver amplifiers leading to overall improvements in system efficiency and integration. The high-efficiency broadband LO driver design is outlined in Section III. The application of the RF switch design to broadband, distributed-stackedcomplimentary (DiSCo), passive mixers is presented in Section IV. Similar to other passive and active distributed mixers, [7], [40], [41], the proposed mixers integrate the transistors into a distributed artificial transmission line to improve the operational bandwidth. However, in this case the DiSCo devices are combined in series to increase the mixer linearity far beyond that of a single CMOS device, compared to previous work which leverages multiple devices to increase only isolation or conversion gain.
A prototype set of microwave and millimeter-wave (mmW) mixers were designed in a GlobalFoundries 45-nm SOI CMOS process with emphasis on comparing the linearity and bandwidth. A high-linearity version was presented recently and this work expands on earlier work to demonstrate a generalized design methodology and high-bandwidth variant that has not been previously reported [42]. Section V discusses the measured mixer performance and compares previous CMOS and III/V mixers where it is clear that the presented mixer outperforms other silicon approaches by more than an order of magnitude and competes with, or exceeds the performance of costly III-V materials.

II. BROADBAND SWITCH THEORY
Previous work has demonstrated the linearity of an RF switch can be considered independently for both the ON and OFF states [2]. In the conducting (ON) state, compression is avoided with a high impedance on the gate to induce voltage division allowing the gate voltage to swing in unison with the drain and source voltages. In the OFF state, the series, stacked devices distribute the drain-source voltage swing to prevent compression or breakdown. While device stacking is an established technique that has been previously handled comprehensively [43], [44], [45], it conventionally relies on resistive gate terminations which limit switching speed. The approach presented here investigates the choice of gate impedance that allows for maximum linearity and switching speed.

A. GATE IMPEDANCE AND FREQUENCY RESPONSE
The voltage response from the drain to the gate, V RF to V g , and the LO voltage response to the gate, V LO to V g , are shown in Fig. 2 (d) for an ON state switch model. The total V g is characterized by where C g,in is the intrinsic gate capacitance and Z g is the external impedance placed at the gate. This expression indicates that using a resistive or inductive external gate impedance produces an undesirable tradeoff as plotted in Fig. 2   that ranges from 0.1 to 10 times the intrinsic gate capacitance. The driving frequency, f LO , is plotted for different cases where f LO To maintain high linearity, both the transfer functions from V RF to V g,RF and V LO to V g,LO , plotted in Fig. 2 (a), (b), and (c), must remain close to 1 at the given modulation frequency. For the resistive gate termination in Fig. 2 (a), V g,LO declines rapidly as modulation frequency increases [2]. In Fig. 2 (b), using a low-Q inductive termination allows for both gate voltages to remain high at a much higher LO frequency and peaking is seen in the f LO = f RF contour. However, as f LO exceeds f RF , the modulated gate voltage drops rapidly, limiting inductive designs to applications where the LO frequency must be less than the RF frequency [46]. In these two cases, the response at the gate is frequency-dependent, limiting their for broadband applications.
Alternatively, an extrinsic gate capacitor produces a frequency-independent response in Fig. 2 (c). There is no extrinsic gate capacitance for which both V g,RF and V g,LO remain high. However, frequency independence allows the modulating gate voltage drop from a high Z Cgex to be compensated with increasing the modulating drive voltage uniformly across frequency, thereby, maintaining switch linearity.

B. DISTRIBUTED, STACKED SWITCHES
The capacitive gate approach is inherently wideband and will both linearize the device and reduce the shunt capacitance of the switch. However, stacking devices causes the total shunt capacitance to increase and introduces a lumped bandwidth constraint. For wideband designs, distributing the switch capacitance across a lumped-element transmission line, Fig. 3, and compensates for the bandwidth restriction.
Here, two design approaches for drive distribution are considered. First, a single amplifier drives the gate voltage along a distributed transmission line terminated in a resistive load as shown in Fig. 3 (a). Second, several smaller amplifiers drive each capacitive gate as shown in Fig. 3 (b). In Fig. 3, complimentary NMOS/PMOS devices are connected in parallel to cancel clock feedthrough in both cases.
The maximum tolerable input voltage before compression, V max , is proportional to the number of stacked devices, N, and is typically, where V DD is the nominal process voltage, i.e., 1 V for a 45 nm process. If, for a single device, the desired peak-topeak voltage swing on the gate is V g,pp = 2V DD (−1V to +1V) for optimum linearity, and V LO,pp is the peak-to-peak LO square-wave voltage, then where V LO,pp = (N + 1)V DD such that the LO drive voltage is proportional to N. The total gate capacitance, C gt equal to C g,in and C g,ex in series is, with C gin scaled proportionally with N because the devices need to be wider with more stacked devices to accommodate increased series resistance. The power consumption of the LO of the first case in Fig. 3 (a) can be determined directly by the characteristic impedance of the artificial line, Using well-known expressions for the characteristic impedance and cutoff frequency, f c , of an artificial transmission line, Assuming that the amplifiers have sufficient gain such that the power of the input signal is negligible, similar calculations can be done for the switch described by Fig. 3 (b), for capacitive power dissipation.
Comparing (6) and (7) determines the frequency range where each approach is beneficial for LO power consumption and there is a crossover frequency defined by f XO = π 2N f c . The design in Fig. 3 (a) will consume less power than Fig. 3 (b) over a certain range of frequencies so long as 2N > π, regardless of amplifier efficiency. Fig. 4 confirms this insight by plotting the crossover frequency and cutoff frequency as a function the number of stacked devices for a 45-nm process assuming 100 μm switches when N = 1. For larger stacks of devices, the crossover frequency is relatively low, in the range 5-10 GHz, so for millimeter wave designs, Fig. 3 (a) is highly favorable.
For the resistive optimized case in Fig. 3 (a), both the maximum input power (linearity) and the DC power consumption scale with the same factor, (NV proc ) 2 , a common relationship for diode and other III-V based mixers.

III. DRIVER AMPLIFIER DESIGN
While the artificial transmission line approach lowers power consumption across a large bandwidth, the amplifier design is critical for efficient performance. Stacked class-D amplifiers have been used to produce high voltage swings across bandwidth in SOI CMOS [47], [48]. An example of such a stacked amplifier is shown in Fig. 5 (a).
The NMOS/PMOS are sized to match their internal resistances of the amplifier in pull-up and pull-down directions, and thus the external gate capacitance needs to be scaled accordingly following, where N is the number of stacked devices in the amplifier, n is the index of the capacitor from 1 to N − 1, and like with the mixer, C gex,n and C gin,n are the intrinsic and extrinsic capacitances of each stacked device. Note that for n = 1, C gex,n goes to infinity as this gate impedance should be an AC ground. The theoretical efficiency of a switched amplifier is 100%, so the internal device resistances and the shunt capacitances are the primary sources of inefficiency. The maximum output voltage swing is realized with the minimum ON resistance, requiring large devices. However, this comes with the tradeoff of larger parasitic capacitances, which degrade efficiency at high frequency. Smaller devices result in resistivelydominated losses and the voltage division that occurs within the amplifier is compensated with supply voltage. Because the breakdown of the devices will be determined by the output swing of the amplifier, the overall stress on the devices is only marginally increased. The appropriate supply voltage can be chosen using, where R amp,int is the amplifier output resistance, as shown in Fig. 5 (a), and R L is the load resistance. An example of how this supply voltage scaling impacts output power and efficiency for a five-stack amplifier is shown in Fig. 6. Smaller devices with a higher supply voltage offer better efficiency across a larger bandwidth while supplying the same saturated output power as larger devices, with an up to 15% improvement at 20 GHz.

IV. BROADBAND MIXER DESIGN
The wideband, high-linearity switch capable of fast switching can be demonstrated in a passive ring mixer. CMOS technologies typically have a maximum P 1dB of 10 dBm [49] when using linearity optimized pass gates for switches, and other designs often have significantly lower P 1dB [3], [50], [51]. Mixers designed in other technologies such as GaAs and GaN exhibit higher linearity with external drive amplifiers, adding significant burden to the LO generation chain, especially at high frequencies [25], [26]. Thus, the millimeter-wave demonstration of wide-band, highpower handling CMOS SOI mixers with integrated LO drivers is compelling to demonstrate limits compared to III-V technologies for larger scale integration. In this section, two mixer designs are presented based on the trade-off between bandwidth and linearity shown by the reduced cutoff frequency with increasing numbers of stacked devices in Fig. 4. The first mixer is a six-stack architecture for maximum linearity as described in [42] while the second is a two-stack design that operates up to millimeter-wave bands with a lower LO drive voltage.

A. SIX-STACK DISTRIBUTED, COMPLEMENTARY MIXER
To remain competitive with state-of-the-art III-V mixers [16], [17], [18], simulations indicated that N > 6 to provide P 1dB > 20 dBm while keeping the insertion loss (IL) at 5-10 dB across 18 GHz of bandwidth. The width of the devices is chosen based on the cutoff frequency of the distributed lines and the desired IL due to series resistance of stacked devices. In a 50 environment, the conversion loss is which implies a total series resistance of about 15 . For N = 6, 2.5 is allocated for each device, which corresponds to a width of 100μm, and allows the cutoff frequency to exceed 18 GHz. Because the mixer is fully differential and uses complimentary devices, there are eight LO and RF paths which forms an artificial ground at each gate such that eq. (1) holds true. To conserve space at the expense of bandwidth, many of the inductors with common LO signals are shared, and the differential inductors are coupled where possible to achieve higher flux density with the layout of the coupled inductors in Fig. 7 (c). The overall schematic of the DiSCo mixer is shown in Fig. 7 (a) and (d). To confirm that the distribute structure of the mixer improves the conversion loss at high frequency, Fig. 7 (b) simulates the conversion loss vs. inductor size in both the LO and RF paths, showing that conversion loss is improved by up to 3 dB even within the non-linear and time-varying circuit. In this design, the RF and LO lines use the same sized inductors and are both matched to 50 as to match the time delay through the distributed lines as close as possible. However, this isn't necessary for operation of the mixer as a miss-match in delay turns into a slight frequency shift in the LO line as expressed by f LO where f LO is the shifted frequency, and θ is the phase delay from one segment of the distributed RF and LO lines respectively, and τ is the time delay of one segment.
Within the DiSCo switch, the unit cells can be divided into two categories: edge elements and center elements. The edge elements require slightly different sizing as they interface with non-repeating components within the mixer. Since two transistors connect to a single input or output node, the size of the edge elements needs to be reduced to prevent excessive capacitive loading. C comp maintains the capacitive loading along the LO line. All of the inductors within the differential distributed switch are uniform throughout the device. While the ideal size of the inductors is 250 pH, coupling the differential inductors together adds significant shunt capacitance at each node, approximately equal to 50 pF. The additional shunt capacitance increases the desired inductor value to 360 pH which is then compensated for in the inductor layout. The sizes for each device in the six-stack mixer are listed in Table 1.
To drive the LO lines, the LO signal is brought into the chip differentially, then split to drive the four pseudodifferential amplifiers. A resistive feedback amplifier offers wideband input matching. Generalizing the schematic in     Table 2. Fig. 8 plots performance of the driver amplifier. Note that both the third harmonic power and the fundamental power are shown as a measure of the overall efficiency and quality for the generation of the desired square wave.

B. TWO-STACK DISTRIBUTED MIXER
Equation (4) shows that the total gate capacitance is proportional to N/(N + 1), implying that the bandwidth of the distributed switch is related to (N + 1)/N. The bandwidth of a single device, N = 1, will be double that of larger N as N → ∞. Much of the bandwidth limitations exists in the driving amplifier due to the parasitics that come with large devices. As such, a wider bandwidth mixer is realized with fewer series devices to extend the bandwidth above 30 GHz. In this section, the design of a two-stack version of the mixer is presented to explore the bandwidth-linearity trade-offs.
While the same fundamental framework is used for the distributed switch design in Fig. 7, only using two series devices removes the center elements and leaves edge elements. As a result, the sizing of the LO path passives can be reduced by removing C comp on each side, doubling the LO bandwidth. The new component values are shown in Table 3.  This design requires a three-stack driver amplifier schematic as shown in Fig. 9. This amplifier enables bandwidths of up to 40 GHz and the device parameters for the three-stack amplifier are listed in Table 4.

V. MEASUREMENT RESULTS
Measuring accurate 3rd-order input-intercept points (IIP3) and P 1dB >20 dBm over 30 GHz requires attention to the test setup. This section details the test setups for P 1dB , IIP3, conversion loss, and LO feedthrough in the measurements of both up and down conversion. Micrographs of the six-stack and two-stack mixers are shown in Fig. 10. Measurements in this section use a rolling average smoothing function to reduce uncertainty.

A. SIX-STACK MEASUREMENTS
Since the six-stack mixer was designed to operate at <20 GHz RF frequencies, a Keysignt N9030B 26.5 GHz spectrum analyzer was used to measure the output power. For upconversion, a high IF power was generated with a ZHL-10W-2G+ power amplifier which operates over 1-2 GHz and saturates above 40 dBm. Additional calibration was performed manually with an HP 437B power meter to confirm the input and output power. For IIP3, two  tones were generated from combining the 40 dBm amplifier output with a Keysight N5183B MXG which has a maximum linear power output of 26.5 dBm at 1 GHz. Both sources were passed through CF1020 circulators before being combined to prevent source cross talk. This setup has an upper limit of 60 dBm IIP3 and a dynamic range of >100 dB.
Down conversion measurements require an additional wideband input amplifier because the cable losses become significant at high frequency, and high RF power is needed across the entire bandwidth. An HMC998APM5E DC-20-GHz, 33-dBm amplifier minimizes distortion for down conversion P 1dB measurements. Similarly, the output of the test setup was calibrated for power and the mixer output was measured on the spectrum analyser. Two tones were generated by the IM3 personality of the Keysight N5277A VNA and combined using a Marki Microwaves PR-OR636 power combiner/divider. The same Keysignt N9030B spectrum analyzer was used to measure the output tones at the required dynamic range. Figure 11 plots the simulated and measured up-conversion characterization for the six-stack mixer and Fig. 12 plots the down-conversion operation. In both cases the conversion loss is less than 10 dB from 2-18 GHz at the RF port where the IF frequency was 1 GHz. The mixer has a 1-dB compression power ranging from 18-22 dBm in both up and down conversion mode with IIP3 ranging from 25-33 dBm. There is some uncertainty in measuring IIP3 as the fitting of the IM3 tones has a large influence on the reported value. For this reason, Figs. 11 and 12 present upper and lower bounds for best and worst case extrapolation of the IM3 products, as well as a typical case. An example of this is shown in Fig. 13 where the same data is interpreted two ways. In Fig. 13 (a), the data is fit without constraining the slope of the IM3 products, representing a true interpretation of the data, while in Fig. 13 (b) the slope is constrained to be exactly 3, as is typically done. In Figs. 11 and 12, the upper bound represents fitting with a slope < 3, the lower bound represents fitting with slope > 3, and the data line in the middle forces the slope in the fit to be bounded between 2.7 and 3.3.
The simulated and measured isolation characteristics are shown in Fig. 14. LO-RF and LO-IF isolation remains   the standard deviation of LO isolation is < 1.5 dB showing that the difference between simulated and measured isolation can likely be attributed to the use of non-ideal baluns and other test setup imperfections. The reported values are given relative to the LO power output from the on-chip amplifiers rather than from the LO input to provide comparison with designs without an integrated amplifier.
The power consumption of the on chip drivers is plotted in Fig. 15. The power nominally increases with frequency due to larger parasitics in the driver amplifier and lower efficiency. The power draw is also affected by the amplifier driving the artificial line as output matching changes across frequency. This can be seen in both the measured and simulated power consumption.
To calculate the LO power reaching the mixers on chip, the external cables and balun losses were measured. The loss was subtracted from the output power of the microwave generator to determine the amount of power reaching the LO inputs on the mixer chip assuming good matching. Additionally, the LO input power required to saturate the on-chip LO amplifiers were simulated and the two values are plotted as a function of frequency in Fig. 16. In the case of the six-stack mixer, the LO power was sufficient to saturate the amplifiers across the bandwidth of 2-23 GHz. While the gain of the on chip amplifiers rolls off, in all cases, < 5 dBm of input LO power is needed for optimum performance with this design.

B. TWO-STACK MEASUREMENTS
The measurement of the two-stack mixer required a slightly different setup because the bandwidth was larger but the linearity lower when compared with the six-stack device. The same setup as the six-stack measurements can be used when creating signals for up-conversion P 1dB and IIP3, however the output was measured with the spectrum analyzer personality of the Keysight N5277A 67 GHz VNA. While the VNA has a lower dynamic range than the stand-alone spectrum analyser, it was sufficient to measure the outputs of the less linear two-stack mixer. The up-conversion metrics are shown in Fig. 17.
To measure down-conversion P 1dB , the HMC998APM5E amplifier was used from 2-20 GHz and an ADPA7007 18-44 GHz, 30 dBm amplifier generated sufficient power to cause compression. The IIP3 was measured with the VNA IM3 measurement tool and metrics are reported in Fig. 18. LO isolation is shown in Fig. 19. Again, the isolation is better than 40 dB across the entire band. The power consumption is plotted as a function of frequency in Fig. 20.
Similar to the six-stack measurements, the test-bench losses were measured to determine how much power was reaching the LO input of the two-stack mixer. In this case, increasing cable loss, decreasing saturated power of the microwave signal generators, and higher loss baluns, reduce the available power. For the two-stack mixer, the required input LO power could only be generated up to 27 GHz, well below the capable bandwidth of the mixer. As such, in Fig. 17 and 18, the P 1dB rolls off sharply as the LO power is no longer sufficient to support high linearity. This issue could be easily solved by adding more gain on chip at high frequency.

C. COMPARISON TO OTHER MIXER TECHNOLOGIES
Finally, Table 5 compares the measured parameters of each of the two mixer variations, two-stack and six-stack, to  previous work on microwave and millimeter-wave mixers. In both bandwidth and linearity, the presented mixers outperform previous work aiming for similar specifications.  When comparing to other silicon architectures, the presented mixers achieve record IIP3 across a wide bandwidth. A key consideration when comparing measured IIP3 values is the slope and extrapolation variance discussed earlier in this section. Most previous works do not discuss the extrapolation techniques, allowing for measured IIP3s > 9.6dB + P 1dB . Additionally, peaking IIP3 values that exceed this relationship are often highly bias/temperature/voltage dependent and may not be stable across a variety of operating conditions. As such, measuring P 1dB is a more robust measure of linearity for comparison as it does not depend on fine cancellation of 3rd harmonic terms. In this sense, the presented mixers exceed P 1dB values of any other CMOS mixer by more than 15 dB. Historically, SOI over bulk CMOS has not provided a particular advantage in linearity, as seen when comparing [4] and [5] with [50] and [51], and therefore adapting the presented architectures to bulk CMOS could be valuable for reducing the cost of the designs in future work [52].
When comparing to other technology approaches, such as GaAs Schottkey diodes, pHEMTs, or AlGaN/GaN HEMTs, the presented mixers exceed the performance of all the compared designs, other than [16], which has comparable performance. However, in the case of [16] the LO input power is 28 dBm. When considering the amplifier that would be required to generate such a high LO power, along with the losses of chip interconnects and cables, the presented mixers with integrated drivers will be significantly more efficient, consuming a maximum total power of 30 dBm in the LO amplifiers, even at 26 GHz. This advantage of integration will open many future opportunities of integrated, high efficiency systems.

CONCLUSION
This work improves the linearity of a SOI CMOS switch without impacting the fundamental device switching speed by using an extrinsic gate capacitor and device stacking along with co-designed on-chip driver amplifiers. These switches were incorporated into two versions of microwave and millimeter-wave mixers: one with compression points above 20 dBm and intermodulation distortion IIP3 greater than 30 dBm across 18 GHz of bandwidth, the other with compression points up to 18 dBm and IIP3 greater than 25 dBm across a bandwdith of 30 GHz. The linearity exceeds earlier work in CMOS or SOI by more than an order of magnitude better than existing CMOS or SOI for similar bandwidth, and matches or outperforms monolithic mixers designed in III-V processes. The high level of integration in CMOS SOI eliminates interconnects to improve the overall system efficiency compared to designs with external LO amplifiers. These results demonstrates the potential for high-performance microwave and millimeter-wave mixers.