A 16.4-dBm 20.3% PAE 22-dB Gain 77 GHz Power Amplifier in 65-nm CMOS Technology

We present a compact W-band power amplifier (PA) for automotive radar application in 65-nm CMOS technology. The circuit adopts a pseudo-differential push-pull configuration based on transformers (TFs) which offer highly efficient and flexible matching networks with minimized area occupancy. We have set the optimal output resistance close to 50 Ω, design guidelines in sizing active devices for each stage, and the corresponding transformers were presented for optimal power efficiency based on an analysis of surrounding matching networks. Working under a supply voltage of 1.3-V, the implemented 77GHz PA achieved a 3-dB gain bandwidth of 9-GHz (72.5–81.5 GHz), a peak gain of 22.4 dB, a saturated power (Psat) of 16.4 dBm, and a peak power-added efficiency (PAE) of 20.3%. The area for the core layout is only 0.05 mm2, which demonstrates the highest power density among the recently reported W-band CMOS PAs.


I. INTRODUCTION
Nowadays, collision avoidance systems (CAS) utilizing radar, previously known from ships and aircraft, have been widely applied in traffic vehicles to assist drivers in many situations. Indeed, such a system aims to enhance driving safety and provide better user convenience [1]- [2]. W-band is suitable to be used in radar sensors for traffic vehicles due to its two key properties. First, the small wavelength enhances the detecting resolution of radar sensors such that they can detect small objects at the size of a human, small cars, or traffic poles. The use of high frequencies also facilitates the sensors in capturing higher velocity, which is essential for a collision-avoidance system [2]. Secondly, with strong penetrating properties of the electromagnetic waves at high frequency, W-band radars are highly reliable under an extreme environment (e.g., bad weather conditions like heavy rain or snow, dense fog, etc.) [3]. Hence, ITU recommended a band of 76-81 GHz for automotive radar applications.
Among various integrated circuit technologies, CMOS technology is preferred for full implementations of W-band radar sensors since it can provide adequate power and efficiency performance with the virtues of low cost and high integration capability. However, designing a highperformance PA in CMOS at such high frequencies is challenging due to low device breakdown voltages. Moreover, millimeter-wave circuitry suffers from a lossy substrate environment for passive devices as well as the inferior power gain of the active devices.
To cover the standard detection distance of 250 m, the required power for the transmitter is estimated to be 13-dBm for a typical channel with a radar cross-section (RCS) of a mid-car (~30-m 2 ) [4]. Nevertheless, a PA that can provide higher output power is preferred for a reliable operation considering losses in packaging. Also, lower than the saturated output power could be used in the casual condition with the best efficiency, while the peak power might be necessary for adverse environments.
Recently, it has been possible to achieve an output power larger than 13-dBm from a CMOS PA without using a complex power combining network which could degrade the power efficiency due to the extra loss from the power combiners at the output [4]- [5]. Therefore, it is natural to employ the PA without such a complex combiner/splitter to attain advantages in efficiency and its occupancy with less designing effort [3]. To exploit the maximum possible power from a single-way PA, the active device at the output stage should be chosen the largest size possible while keeping feasible impedance matching with surrounding circuits. Therefore, it is critical to select an optimal active device size to achieve the largest possible output power with minimal area occupancy.
This paper presents a transformer-based push-pull PA design at 77-GHz for automotive radar application in 65-nm CMOS which supports the back-ended-of-line (BEOL) with an ultra-thick metal (UTM) of copper. The circuit composes three stages of the push-pull amplifier aiming at a power gain higher than 20-dB and an output power better than 13-dBm. Herein, the design procedure is emphasized as a guideline in choosing an optimal active device with proper transformer (TF) sizing for a highly efficient mm-wave PA.

II. MILLIMETER-WAVE PUSH-PULL PA DESIGN
The simplified schematic of the proposed PA is presented in Fig. 1. It consists of three stages of the push-pull amplifier, including the input stage, driving stage, and output stage. Besides various advantages in a compact size and power delivery, the transformer also provides galvanic isolation and electrostatic discharge (ESD) protection at the input and output ports.
In each amplifying stage, neutralization capacitors are included to improve the stability factor, which eventually results in better impedance matching. This architecture also increases the isolation between input and output and the gain of each push-pull amplifier stage. In this design, metaloxide-metal (MOM) capacitors were used to achieve a highprecision embedding network instead of using more compact MOS capacitors with a lower Q-factor [6]. The value of the neutralization capacitor is chosen to be roughly Cgd of the transistor [7]. Specifically, Rollet's stability factor (K-Δ) and the maximum stable gain (Gma) values of the amplifier were investigated carefully to ensure its stable operation [8].
The gate bias voltage (VGS) for the three stages was chosen considering the trade-offs between the dc-power dissipation and the maximum output power. The VGS for the output stage (3 rd ) was chosen to be 0.7 V to achieve a good output power while the VGS for the input and the driving stages was chosen to be 0.6 V for the better power gain [8].

A. TRANSFORMER-BASED MATCHING NETWORK DESIGN
For the push-pull amplifier with transformers at the input and output, a proper design of the matching network is crucial in achieving optimal power efficiency. To model an on-chip transformer, the low-frequency model with five parameters (L1, Q1, L2, Q2, and k) has been widely used to characterize their coil inductances, quality factors, and the mutually inductive coupling factor between the two coils [9]. The source and load of the transformer can be either 50-Ω terminal, gates, or drains of the MOSFET in the PA. The optimal source and load for a given transformer are reported in [9], which are written in terms of admittances as ; .
where YS = GS+jBS is the source admittance and YL = GL+jBL is the load admittance of the transformer. Throughout this work, impedance from a node is represented with admittance which characterizes parallel connections effectively. In (1), when Q1 = Q2 >>1 and k = 1, the expressions of BS and BL can be simplified to 1/(2ωL1) and 1/(2ωL2), respectively. Although the solution given in (1) indicates specific optimal load and source values that maximize the efficiency of the transformer, the constraints on the real parts (i.e., the source and load conductance values) are not so rigorous, specifically when their quality factors are high. Intuitively, an ideal transformer is perceptually realized as an impedance transformer that only requires a specific ratio between the load and source resistance values to maximize its efficiency depending on the turns ratio of the transformer.
For example, an ideal 1:2 TF transforms the load resistance to a quarter of that seen on the source side. To demonstrate that the real values do not strongly affect the TF efficiency, we examined the TF efficiency (i.e., the power gain or S21) of the symmetric 1:1 TF with Q1 = Q2 = Q and L1 = L2 under various source and load resistance values (i.e., RSp and RLp, respectively). Owing to the TF's symmetry, the twoterminal resistance values were set as the same and denoted by Rp. For a fair comparison, Rp was varied from its optimal value Rp(opt) and optimal parallel capacitances were used to keep the optimal susceptance for each state. Fig. 2 presents the simulated efficiency of the TF versus the ratio Rp/Rp(opt) for several cases of quality factor Q and coupling factor k at the center frequency. As can be seen, with k = 1 and Q = 50, the efficiency of the TF varied merely by 0.6-dB when Rp was increased or decreased by 10 times from its optimal value, Rp(opt). This independence property of the efficiency depending on the value of Rp was drastically weakened as the coupling factor k decreased. Moreover, we can observe that the quality factor Q strongly affected the intrinsic insertion loss. Nevertheless, the overall trend in efficiency depending on Rp was independent of Q. With the typical values of Q = 10 and k = 0.7, the efficiency of the TF decreased by around 2 dB (from -1.2 dB to -3.2 dB) when Rp was changed by four times its original optimal value (i.e., ±6 dB). Figure 3 shows simulated Gma and S21 values for several cases of the implemented TF with Din = 26 μm connected in the differential-to-diffrential configuration. When Ys and YL were set to their optimal values, S21 was maximized at the target frequency of 77 GHz. The operational frequency of the TF shifted to around 108 GHz when the parallel reactance values were reduced by twice their optimal values. Although changes in RSp and RLp caused minor shifts in the peak frequency, there was meaningful degradation of the power gain. When the optimal reactance values of the TF were applied both at the source and load, the decreased Rp at the source and load provided wider bandwidth with a higher degradation of the power gain whereas increased Rp made the bandwidth narrower with better power gain near the center frequency. Therefore, we can see that the compensation in the imaginary parts of the load and the source admittances is crucial for determining the operating frequency of the TF while matching in the real parts of them has only a minor impact on the TF efficiency. From this, we could achieve power matching of the active device by marginally sacrificing the TF efficiency while improving the overall power efficiency of the designed PA.
The significance of the matching networks depends on their position in the PA, i.e., at the input, inter-stage, or output of the PA. To evaluate this, let us consider a PA whose gain is 20-dB. Now, if the output matching network suffers from 1-dB more insertion loss, then the PAE of the PA will drop by ~0.794 times (i.e., 20.6% degradation). By contrast, if the 1-dB more loss is applied to the input matching network, then the PAE merely reduces by 0.3%. With this understanding, we can perform reasonable trade-offs between the insertion loss and other factors such as bandwidth or compactness of the matching networks.

B. THE EFFECTS OF MATCHING NETWORK LOSS DEPENDING ON GAIN
Let us consider a PA with the gain stage having matching input and output networks, as presented in Fig. 4. The gain stage has a transducer power gain of GT (= Gma-ILMin-ILMout) with well-matched input/output ports by assuming that Gma is the maximum available gain from an unconditionally stable device with input and output matching networks (TMNin and TMNout, respectively) that provide good enough impedance matching with ILMin and ILMout, respectively. The effect of TMNin and TMNout is quite different in the whole PA performance.
Let us evaluate their effect by assuming that either the insertion loss of TMNin or TMNout increases by 1 dB. Since the PAE needs to be compared at the same output power level for a fair comparison, we maintain the whole gain level as constant. Thus, to keep the same output power, if ILMin is increased by 1 dB, then Pin should be increased by 1 dB accordingly. Therefore, the new PAE (PAEpk(new)) affected by the variation in power gain ΔGTdB from the TMNs can be calculated as /10 /10 ( ) 10 10 1 TdB TdB From (2), the effect of ILMin on the PAE is quite minor when GT is relatively large. If GT is reduced from GTdB = 20 to 19 dB (i.e., ΔGTdB = -1 dB) due to the increase in ILMin, the calculated rPAE is merely 0.997 while rPAE = 0.88 for the PA with GTdB = 5 dB with the same degradation in TMNin (ΔGTdB = -1 dB). It can be seen that the influence of TMNout on PAE is more direct and stronger than that of TMNin. Thus, the influence of each matching network on the PAE of any PA can be evaluated by the gain of the PA. The impact of each block on the PAE of the PA is inversely proportional to the gain of each stage that provides the overall gain. Since the effect of the TMNs (except for the output stages) on the power efficiency is minor, we can perform a reasonable trade-off between the insertion loss and other factors such as bandwidth or compactness of the matching networks. With this understanding, the resistance matching issue in the interstage and the input stage presented in the previous subsection can be alleviated.

C. OUTPUT STAGE DESIGN CONSIDERATIONS
It is a natural choice to design the PA from the output stage to the input stage consecutively when considering the importance of the larger signal toward the output stage.
There are trade-offs in choosing the active device size for the output stage. A large-sized transistor is preferable for high output power. However, two issues need to be considered regarding its output and input impedance matchings. The output impedance of a transistor can be modeled by a parasitic capacitor (Cop) in parallel with an output resistor (Rop), and this model applies to the large signal as well. When the output transformer (i.e., TF4) has the impedance transformation ratio of Tim and its primary inductance perfectly resonates out Cop, then Rop should be RopTF = RL*Tim (RL is the load impedance) to attain the maximum efficiency ηmax. However, the device size can be further increased to enhance the output power in a trade-off with degradation of the power efficiency. When the device size is increased by n times (n >1), the output resistor, Rop, roughly decreases by n times. Then, the new efficiency η can be calculated through the maximum efficiency ηmax by the ratio re as ( )  Figure 5 presents the ratio of efficiency decrease (re) and power increase (rp) versus n which shows that rp increases faster than re decreases, particularly in the small region of n. Thus, we can see a small amount of the efficiency degradation can be well traded off for relatively larger output power.
There is another aspect to be considered when choosing the output active device size which is related to its preceding transformer (i.e., TF3). A larger transistor size (M3) requires a smaller transformer (TF3) to resonate out its increased gate capacitance. However, the reduced magnetic coupling of the small size results in a high-loss transformer implementation. To investigate the effects of the reduced magnetic coupling, we simulated various transformers of different inner   diameters (Din). The realized structure of the transformers is shown in Fig. 6. Herein, the on-chip transformer is constructed from three metal layers. The ultra-thick metal layer (UTM) forms the primary coil, aiming to carry the large drain quiescent current. Meanwhile, the two metal layers below the UTM are combined for the secondary coil. The inner diameter of the transformer is denoted by Din and the width of the winding is W=6 μm. The length of the two ports is fixed to be 25-μm to keep a certain distance between the windings and the surrounding ground. Each winding of the transformer has a center tap for VDD and gate biasing.
The extracted optimal load susceptance (BLopt in (1.2)) and maximum available gain (Gma) of the transformers in different sizes are presented in Fig. 7. We can observe that the transformer efficiency is degraded quickly as the transformer size decreases due to the reduced magnetic coupling. When we reduce the transformer diameter Din from 32 μm to 16 μm, Gma drops by about 20%, and the extracted BLopt increases from 14.8 mS to 43.6 mS. This means the output transistor size supported by the 32 μm transformer is expected to be nearly three times smaller than that of the 16 μm transformer.
In this analysis, it was assumed that the maximum generated output power and the parasitics of the transistor are linearly proportional to its size. However, in practice, the efficiency of a large transistor can be noticeably degraded due to the long routing line with bottom metal layers in the device layout. We designed various transistors at different sizes using the 'table structure" with eight cells to investigate this effect as shown in Fig. 8. The gate capacitance of the transistors was extracted to select the suitable preceding resonant transformer (i.e., TF3). Load-pull simulations were performed on the output transistors with their selected transformer-based input matching networks, and the simulation results are shown in Table I.
It is noticed that the required impedance transformation ratio, Tim, of the output transformer (TF4) is roughly close to unity for the optimal power efficiency from Table I. Thus, a 1:1 turns ratio is selected for TF4. The optimal size of M3 for the output impedance matching is expected to be around W = 128 μm. Based on the analysis, the width of M3 was slightly increased by W=168 μm from the optimal size to achieve higher output power. With the selected output transistor, Din = 18 μm was chosen for TF3, which could resonate with the large output transistor M3 to achieve a good trade-off between the expected output power and efficiency. The output transformer (TF4) was designed as large as possible for a given transistor to improve the overall power efficiency. By using the impedance matching formulas for transformers in [9], the output transformer was designed to be 24 μm so that the susceptance of the single-ended terminal compensates for the parasitic capacitance of the RF pad at the output port. Through the proposed approach, the maximum possible size of the output transformer can be chosen for improved power efficiency. On the primary side of TF4, an additional capacitor C4 = 4 fF is required to compensate for its primary coil inductance. A MOM capacitor with a tailored layout was used for the compact matching of the primary coil, and its capacitance was extracted using Calibre TM . C2 and C3 were also implemented in the same way.

D. GAIN STAGES DESIGN
The active device size of the first (M1) and the second (M2) driving stages were determined considering the optimal efficiency. The device size was reduced compared with that of the output MOSFET, but it must be large enough to drive their load (i.e., their next stage). In this 65-nm CMOS 420x120 um 2

S-para. [dB]
Frequency [GHz] S12 Mea. S21 Mea. S11 Mea. S22 Mea. process, each amplifier stage had an estimated gain of around 7 to 8 dB after impedance matching, and a power gain compression of 3 to 4 dB was observed when the output power (Pout) became saturated with a large input power level. Thus, it is roughly estimated that the driving stage should provide an output power of 3-4 dB less than that at the output stage to achieve the full drive. Assuming that the maximum output power is proportional to the device size, we can initially set the active device size of the driving stage to half of that of the output stage. Because the gate biasing voltages for M1 and M2 were set to 0.6 V for improved efficiency, the device size was set to slightly larger than the expected size.
To ensure the two driving stages can drive the output stage to its maximum saturated power and achieve a good OP1dB level, an iterative process was performed on the device sizes of M1 and M2 with the initial device sizes estimated. All other transformer-based matching networks were designed in the same procedure as for TF4 at the output stage. The final device sizes for M1 and M2 were 60 and 88 μm, respectively.
Notably, DC-current consumption by M1 is marginal compared with that by M3. Hence, we could choose a larger M1 size than expected to provide a higher gain. The relatively large gate capacitances of M3, M2, and M1 determine the size of TF3, TF2, and TF1, respectively, so that each gate capacitance resonates out the secondary inductances of the corresponding transformers. In this way, it was not necessary to add tuning capacitors for the gate of each transistor. However, on the primary side of TF2 and TF3, additional capacitors C2=30 fF and C3=45 fF were added to the corresponding drains to ensure the matching. Specifically, in the case of TF1 with a single-ended-to-differential configuration, the center tap of the primary winding is connected to the ground to reduce the parasitic capacitance. Because of this connection, an extra capacitor C1 of 34 fF was needed to make it resonate with the primary inductance of TF1 along with the parasitic capacitance from the input RF pad. The gate bias lines for TF1, TF2, and TF3 were connected in series with 5k-Ω resistors to avoid a potential common-mode oscillation caused by the parasitic inductances of the biasing lines [10].

E. DESIGN PROCEDURE
To summarize, the design sequence of the initial three-stage push-pull PA in this work is listed as below: • Step 1: Choose M3 by considering the output power, efficiency with the corresponding TF3. Design TF3 based   To demonstrate the validity of the design approach, a Wband push-pull PA was fabricated in 65-nm CMOS process. The photograph of the fabricated chip is presented in Fig. 9. The core size of the designed PA is only 0.05 mm 2 while the total area including RF pads is 0.435 mm 2 .

III. MEASUREMENTS RESULTS
In the measurement, the PA consumed a DC-current of 95 mA from a 1.3-V supply without input signals. The measurement setup for S-parameters and large-signal performances is illustrated in Fig. 10. A vector network analyzer (VNA), Keysight N5224A (10 MHz to 43.5 GHz) combined with an extension module was used with an onwafer probe station to measure the S-parameters of the PA. The on-wafer setup was calibrated using a calibration kit (CS-5). The measured S-parameters of the PA are presented in Fig. 11 in comparison with the simulation results. It achieved a peak power gain of 22.6 dB at 77-GHz and a 3-dB bandwidth of 9 GHz (72.5-81.5 GHz), which corresponds well with the simulation results. The measured reverse isolation (-S12) is better than 45 dB.
In the large-signal measurement, a signal generator with a stand-alone frequency multiplier was used to generate Wband signals and a tunable attenuator was used to sweep the input power level. The insertion losses of the probe tips and the WR-10 waveguides were measured and calibrated from the raw data. The measurement results for the PA in terms of output power, output 1-dB gain compression point (OP1dB) and power-added efficiency (PAE) as a function of the frequency is presented in Fig. 12. The measured output power, gain, and PAE at 77-GHz and 79-GHz are shown in Fig. 13. The fabricated PA achieved a maximum Psat of 16.4 dBm with a peak OP1dB of 13.6 dBm and a peak PAE of 20.3% recorded at 79 GHz. Over the band of interest (76-81 GHz), the measured saturate output power varies within 0.6-dB from its peak.
The performances of the proposed PAs are summarized and compared with recently reported CMOS PAs at similar frequencies in Table II. The implemented 77GHz PA in this work attained well-balanced small-signal and large-signal performances and, to the best of our knowledge, its achieved power density is among the highest score for a bulk CMOS PA in W-band.

IV. CONCLUSIONS
This paper reports a three-stage push-pull power amplifier (PA) for 77-GHz automotive radar application in 65-nm bulk CMOS technology. A design strategy with a detailed guideline was presented in sizing the active device as well as the transformers to achieve a good trade-off between its output power and efficiency. In measurement, the fabricated PA exhibits an output power of 16.4 dBm, a power gain of 22.6 dB, and a peak PAE of 20.3% while occupies only 0.05 mm 2 for the core block. The well-balanced performance of the implemented W-band PA demonstrates the feasibility of the single-way CMOS PAs for automotive radar applications by taking advantage of the low-cost and high-integration level.