Analysis and Design of Capacitive Voltage Distribution Stacked MOS Millimeter-Wave Power Amplifiers

Stacked MOS power amplifiers (PA) are commonly used in SOI nodes but also have the potential to be realized in bulk CMOS nodes. In this paper they are analyzed in millimeter wave regimes. The study focuses on the key limiting factors and in particular the optimum number of transistors from which the key performance parameters such as maximum possible operating frequency, output power, and efficiency are achieved. Based on the analysis, design trade-offs of stacked MOS PAs are presented. The frequency dependency of the optimum load presented to each stack is analyzed to express the overall performance of the mentioned PA topologies as a new optimization method. Additionally, it is shown how the optimal load variations translate into amplitude-to-amplitude/phase (AM-AM/PM) conversion distortions. The validity of the analysis is examined against simulations. The simulations are performed based on 8M1P CMOS 28nm technology and electromagnetic simulations in ADS Momentum.


I. INTRODUCTION
S HANNON'S channel capacity states the higher the bandwidth the higher the data rate. This is the main motivation towards higher operating frequency and emerging 5G and 6G systems, which could offer several advantages such as reduced system size, portability, and hence lower power consumption. Moving to higher frequencies poses several design challenges including modification to the technology node mostly in the form of scaling for covering higher f t / f max which yields reduced power density of the corresponding semiconductor components. Manuscript  Thus, the output power density offered by a single transistor is quite limited in practice. A great demand for compact solutions for wireless communications applications has promoted CMOS integrated circuits (IC) design. However, designing an IC at such high frequencies is challenging as the key performance characteristics of the transistors including gain, linearity, signal-to-noise ratio (SNR), etc. are remarkably degraded [1].
On the other hand, the PAPR of higher-order modulation schemes, such as m-QAM, (O)QPSK, and OFDM deployed in the communications systems, e.g. LTE or 5G NR, postulate stringent circuit design considerations which add to the already existing challenges [2], [3]. Power amplifiers are considered one of the most important building blocks of a transmitter as they dominantly determine the ultimate performance of wireless communications systems. To achieve both in-and outof-band signal integrity, the mentioned building blocks must fulfill the required performance. This concerns with the key characteristics such as AM-AM/PM conversion distortions, reduced desensitization at both in-band and adjacent channels, signal blockage, and bit-error-rate (BER) decrease. Due to amplitude variation of the modulated signals, the PA needs large back-off from saturation to attain sufficient linearity. However, linearity translates into poor efficiency obliged by the mentioned modulation schemes being on the order of 1% -10% at the desired frequency range. Poor efficiency poses form factor drawbacks, thermal management issues, and reduced system lifetime expectancy issues [2]- [20].
Stacking the transistors have been widely adapted specially in CMOS SOI solutions [4]- [20] and recently brought to bulk CMOS domain [21]- [26]. As it has been utilized more extensively, investigation of the characteristics and performance of the mentioned topology is needed. Since its modern introduction by Ezzeddine [8], all the designs and their analysis have been based on frequency independent formulations [4]- [26]. They also lack proper formulation on the AM-AM/PM conversion distortions, besides the classical transconductance, i.e. g m , compression and C gs variation, will advance on understanding the design trade-offs. It is shown in this paper that the traditional frequency independent design, underestimates the variation of the gain of the stacks and hence performance reduction of the stacked MOS PAs at a wide frequency band. Furthermore, the impact of amplitude variations on the gain variations, known as AM-AM conversion, as well as the phase variations, known as AM-PM conversion, is analyzed based on the projected optimal load variations, for the first time in this paper. This paper is organized as follows. In section II the stacked MOS PA topology is reviewed briefly. A detailed analysis of the design dimensioning of the mentioned topology is presented in section III. Section IV discusses with the phase rotation compensation and AM-AM/PM is studied in section V. Finally the analysis is evaluated vs. simulation results in section VI.

II. STACKED MOS TOPOLOGY
Scaling compels increased doping concentration to increase the operating frequency of MOSFETs. This has, in the meantime, resulted in decreased junction breakdown levels which binds the maximum possible voltage swing across the junctions. Excess increase in current density of a single MOS transistor, on the other hand, translates to reduced output impedance, which gives rise to the matching network transformation ratio, hence making it quite lossy. Accordingly, single-MOS-based power amplifier (PA) design is quite confined to a small available power density.
To alleviate the aforementioned problems, stacking transistors in a series connection on top of each other has been proposed [4]- [27] which is very well adapted specifically to SOI based technology nodes [4]- [20], to LDMOS [21] and MMIC [22], and recently to bulk CMOS technologies as well [23]- [27]. As can be seen in Fig. 1, this topology takes the advantages of cascading common source (CS) and common gate (CG) stages, which pretty much resembles a cascode amplifier topology, at the very first glance. There is a difference between the two. The gates in the stacked topology are not fully bypassed. The reason behind such technique is that in the cascode amplifier the fully bypassed gates result in the gain of each single inter-stage amplifier to remain ideally unity. And consequently, the overall signal swing will occur across the drain-source of the last stage making it more susceptible to break down. It should be noted that the focus of all the discussions is on the class A/AB operation.
Traditionally, the capacitors at the gates of the MOSFETs in the stacked PAs are dimensioned such that part of the signal swing is divided across them so that neither gate-source nor gate-drain junctions undergo breakdown levels. Besides equalizing the voltages across the transistors, the stacked circuit also makes impedance transformation from low at the bottom to high at the top: when the current in each stage is high, increasing voltage swing causes the apparent load impedance to increase stage by stage. The dimensioning of the mentioned capacitors was designed so that inter-stage matching, i.e. R opt = V DS DC /I D DC [28], was perfectly done [4]- [20] and [23]- [27].
With the increase in operating frequency, as specified in 5G systems, design of stacked MOS PAs has turned into a new challenge. This is not a straightforward procedure anymore, which is due to the fact that more and more high-order parasitics start to manifest with the increase of the operating frequency. Not only that but also scaling the transistors as well as the signal dependency of the parasitics add to the problem.
In the following sections we are investigating the stacked MOS amplifiers in different design aspects to clarify the design trade-offs and performance characteristics of the mentioned PA topologies.

III. STACKED MOS PA ANALYSIS
A detailed analysis of the stacked MOS transistor PAs is discussed in this section. First a small-signal model is analyzed to be further referenced for silicon-on-insulator (SOI), and then a more general small-signal model is introduced to cover the issues corresponding to triple-well bulk CMOS technologies. After indicating the design parameters, tradeoff between output power, efficiency, operating frequency, number of stages, etc. are described.

A. Frequency Dependent Design Consideration
Stacking concept has been utilized to distribute the overall output signal equally among the stages so that none of the transistor junctions experience over-stressed conditions. Additionally, it constructs an internal impedance which ultimately performs load matching at the output. Without loss of generality, the conventional small signal model of Fig. 2(a) has extensively been utilized for the analysis purposes.
Ideally, the equations governing the circuit of Fig. 2 can be expressed by (1), as shown at the bottom of the next page. Due to the capacitive loading of the upper stages, (1) needs to be modified to include the reactive part of the load. In order for that Y n+1 is assumed to be comprised of an optimum conductive part as well as a non-desirable capacitive susceptance, i.e. Y n+1 = G n+1 + j B n+1 (Fig. 2.b). Thus (1) can be rewritten in (2), as shown at the bottom of the next page. Solving (2) for the input admittance Y n = −V s n I s n Y n = G opt n + j B n = g m n + j ωC gs n C n +C gd n Y n+1 + j ωC gd n C n C gs n +C n +C gd n Y n+1 + j ωC gd n C gs n +C n +g m n C gd n For the design purpose the real part of the input impedance Y n needs to be equal to the desired optimum conductance, i.e. Re{Y n } = G opt n = 1 n R opt . As will be explained later, the susceptance of Y n+1 can be compensated. Thus, substituting G opt n+1 with Y n+1 , for 1 < n ≤ N − 1 we have, Fig. 3. The variation of the drain load as a result of frequency variation while using frequency independent design rule of (6) vs. frequency dependent rule (5). f t of the transistors is 240GHz.
where a = C n + C gd G opt n + 1 (4.a) b = ωC gd n C n (4.b) c = C gs n + C n + C gd n G opt n + 1 + g m n C gd n (4.c) d = ωC gd n C gs n + C n (4.d) and G opt = 1/R opt . Equation (4) clearly illustrates the dependence of the Re {Y n } on the operating frequency and in turn the frequency dependence of C n dimensioning. In other words, the values of the gate capacitances C n shown in Fig. 1 need to be designed depending on the operating frequency. Accounting for the frequency variations in the design of C n 's offers the opportunity to control the optimum stack loading. Therefore, the loading can be optimized versus the frequency band of interest. This is shown in Fig. 3 wherein the optimal loading is tuned at the desired frequency band. Solving (4) for C n , yields (5), as shown at the bottom of the next page. If the operating frequency range is less than f t /10, the gate capacitance C n+1 values asymptotically approach compliant with [15]. Although very short/simplistic and efficient for low frequencies, (6) is lacking the impact of the frequency on the C n 's dimensioning, which results in the − j ωC gd n j ω C gs n + C gd n + C n − j ωC gs n 0 0 − j ωC ds n − j ωC gs n + g m n g m n + j ω C gs n + C ds n 0 1 j ω C gd n + C ds n g m n − j ωC gd n − g m n + j ωC ds n −1 0 0 0 1 00 loads to drop with frequency. Due to this, (6) is misleading as soon as the operating frequency surpasses f t /10. Disregarding the higher frequency impact on the circuit performance, leads both optimal load reduction and impedance mismatching and hence gain drop at the mentioned bands. As will be shown later, this in turn translates to performance degradation. The optimal loading drop posed by (6) in comparison with that of (5) is graphically shown in Fig. 3. A 4-stacked MOS PA with C gs = 450 f F, C gd = 50 f F, g m = 600mS and swept operating frequency up to 120 GHz, i.e. f t /2, has been utilized. As can be seen, the optimal value required for n R opt at the drain of the n th MOS drops with frequency due to low frequency approximation of the gate capacitances, i.e. C n predicted by (6), which yields drastic gain decrease at high frequencies and hence delivered power to the output of the stacked MOS PA topologies. Reflected by constant curves in Fig. 3, this is while the C n 's designed by (5) tend to tune and keep the load at its optimal value at the desired operating frequency. This is shown to reserve the performance of the PA, later in section VI. Illustrated in Fig. 4 are the frequency dependency of (5) - (7). As can be seen, the C n 's designed based on (5) take operating frequency into account thus need to be re-dimensioned accordingly, whilst those based on (6) are constant over all frequencies. Specifically, the optimal loading in accordance with (5) remains constant as the corresponding C n 's are adapted for the desired operating frequency hence the optimal loading is retained, compliant with Fig. 3. Equation (6) on the other hand, does not guarantee optimal loading at all frequencies as it is frequency independent. The simulations are conducted for an example of 4-stacked MOS PA with C gs = 450 f F, C gd = 50 f F, g m = 600mS and swept operating frequency up to 250GHz.
To get more insight into the matter, yet another important extreme value of the Re {Y n } at the infinite operating frequency is obtained and solved for gate capacitances C n . This puts an upper bound on the C n 's. Accordingly, the required value for dimensioning the gate capacitances C n are varying between (6) and (7) as a function of the frequency. Bearing in mind that (6) and (7) put a lower and an upper bound on the C n 's, correspondingly, one can exploit any interpolation between the two as an approximation of (5), based on the requirements, if minor errors are acceptable in the design and later in performance of the stacked MOS PAs. If the frequency dependent part of the design is ignored, i.e. only (6) used as a design dimensioning rule, the optimal load presented to a stack starts to dramatically degrade from the desired value after a certain frequency, as in Fig. 3. Thus, the gain provided by each stacked transistor and hence the power delivered to the load is reduced. So, the gate capacitances C n need to be dimensioned in accordance with the operating frequency, as shown in Fig. 4, as proposed by either (5) or any estimation/interpolation using/between (6) and (7).
Considering the small signal model of the MOS transistor at a certain operating frequency and disregarding the impact of the feedback capacitance of C gd , a simplified qualitative gain analysis yield Av n ∼ = g m n (n + 1) R opt ×C n C n + C gs n which affirms a homographic behavior with an upper bound asymptote of g m (n + 1) R opt . (6) and (7) define a fixed lower and upper bound values for the gain, respectively. The problem with the designs based on (7) manifests at lower frequency bands where the PA experiences breakdown due to bypass property of the gate capacitances. In this mode the PA behaves more like a (small signal) cascode amplifier with its topmost stacked transistor being under the maximum junction stress. On the other hand, at higher frequency bands, design with (6) ends in lack of gain, hence transduced power and PAE. This is shown in Fig. 3 wherein the optimal load starts to decrease from its desired value at higher frequencies. Therefore, designs with (6) is appropriate only at lower frequencies whilst designs with (7) is appropriate only at higher bands. On the contrary, (5) predicts the design requirements and optimal load versus the desired frequency band and thus an optimum performance can be achieved. Moreover, the impact of the proposed design method, i.e. using (5), on the key performance characteristics of the PA compared to that of earlier approach, i.e. (6), is later shown in section VI Fig. 17.

B. Phase Shift/Rotation Impact on the Performance of Stacked MOS PA
The real part of the Y n , i.e. Re {Y n } in (3), is the desired term which was discussed in the previous section. The imaginary part of it, i.e. Im {Y n }, however is the unwanted term which directly impacts the performance of the stacked MOS PA topology. This aspect has been mostly ignored in previous analyses.
In fact, the presence of parasitic elements leads to the angles, θ , of drain-source voltage vectors to gradually rotate per stacked MOS transistor along the PA. From (3) we have One can realize the dependence of (9) on the device dimensioning, biasing, transistor parasitics, operating frequency, and C n . In other words, θ being defined as , follows a complex functionality of all the mentioned parameters of the form θ = f W/L, I ds , V gs , g m , C gs , C gd , C n , ω . With the assumption of uniform phase rotation, superposition of all the drain-source voltages yields the maximum amplitude of the output of interest (V D N in Fig. 1) to be where V DS n is V m e j nθ ; wherein V m and θ express the maximum voltage swing and the uniform phase rotation across the drain-source junctions. Thus, The output power can then be expressed by As the load R L is distributed along with the N number of stages and each stage is designed to match its optimal load, i.e. R opt , (10) must be modified by R L = N R opt . Thus, we have The first term in (11), i.e. I , is the maximum power that can be obtained from a non-stacked, i.e. single device, MOS transistor PA. We call the second term, i.e. II, as Stacking Factor (SF), which shows the dependence of the SF parameter and hence output power of the mentioned PA topologies to the number of stages as well as phase variation across each of them. As explained in the remaining part of this section, SF is used to define the maximum number of stages in a stacked MOS PA topology.
Plotting the SF versus the number of stacked transistors based on analytical equations, i.e. the second term in (11),   6. Voltage gain curvature as a function of phase rotation per stage. The green area is the optimal region where voltage/power increases with gradual efficiency decrease, the yellow area is the non-optimal region where in voltage increases, however efficiency decreases drastically, and the red area is the inoperable region where both power and efficiency decrease.
for different phase variations proves highly informative as it quantifies the relationship between the undesired phase rotation and output power (Fig. 5). In the absence of phase rotation, i.e. θ = 0 • , which is a representative of the ideal condition, the SF increases in direct proportion to the number of transistors, i.e. S F = N, whilst this is not the case as phase rotation manifests along the stack. Moreover, the more the phase rotation, the more drastically degrades the output power from its ideal case.
Deduced from Fig. 5, it is of great importance to define the optimum number of stacked transistors. The reason can be understood from Fig. 5 and that is, after some point, adding more stages to the PA will start to deteriorate the performance of the PA.
There exist several different approaches to figure out the maximum number of stacked transistors. The first one is to read the optimum number of stacked transistors intuitively from the SF plot of Fig. 5. As shown in Fig. 6, the second one is to plot the voltage signal swing of (9). As another approach, one can maximize (11) with respect to the number of stacked MOS transistors, i.e. N, for a given phase rotation per stack/stage, i.e. θ . Last but not least is to derive a formula that illustrates to what extent the power can be amplified, as will be shown later. All the above approaches are discussed in the following.
The impact of phase rotation along the transistors has already been expressed in Fig. 5. As mentioned earlier, it is not allowed to increase the number of stacks unboundedly. Conversely, after some point the SF starts to decrease as the number of transistors increases. Hence adding more stages after the mentioned point will not improve performance anymore. It can already be seen that the mentioned point, which is the maximum output power, is a function of the phase rotation. For example, in case the phase shift along each stage is 10 • , the point of maximum number of stacked transistors i.e. maximum output power is somewhere around 13; and adding the next stages only results in performance degradation. This corresponds to a total of 130 • phase shift approximately.
By the same token, for single stage phase rotation of 15 • and 20 • , the maximum number of transistors is 9 and 7, which corresponds to total phase rotation of 135 • and 140 • , respectively. The mentioned numbers are all around a unique optimal total phase rotation boundary which can be described in the following.
To further analyze the previous statements, (9) is plotted in the polar form of Fig. 6 to show both the phase rotation and voltage increase per stack. In accordance with the concept of Fig. 5 is the increase of voltage amplitude up to some maximum level and after that, the signal starts to degrade when more stages are added. To complete the foregone discussion, the maximum voltage amplitude happens at the phase of 180 • after which the amplitude starts to degrade. Exploring Fig. 6 gives more insight into the supply requirement of a stacked MOS PA design. Let us first start with the example of phase rotation of 10 • per stage. Adding first 12 stages will increase the output voltage amplitude to a level of 10 × V m , approximately, where V m is the maximum tolerable drainsource voltage of each single stacked MOS transistor. In order to get the mentioned amount of signal amplitude at the output it is required to have 12 × V m as for DC biasing of the overall structure. That means the maximum efficiency of the PA reduces to 83 percent of the maximum theoretical efficiency. This corresponds to a total phase rotation of 120 • over all the stages. Although addition of more stages will increase the maximum amplitude, the output amplitude varies only marginally in a way that adding for example the next 8 stages will not even add 2 × V m more signal swing. And for this to happen the PA requires 8 × V m more DC power supply. Hence, simply 75 percent of the DC power is lost in such circumstances.
Exploring other amounts of phase rotation per stage converge to the same 120 • boundary. The cases of 10 • , 15 • and 20 • are shown in Fig. 6. This is still an intuitive method of estimating the optimum number of transistors in a stacked MOS PA. In the following, we will provide analytical approaches to characterize theoretically the boundary. It should be noted  that the stacked MOS PAs may be utilized in this respect as long as no feedback is applied to the PA circuit, which is the case in many applications.
In order to maximize (11) with respect to the number of stacked transistors, i.e. N, for a given phase rotation per stage, i.e. θ , the derivative of the term SF in (11) must be calculated. Thus, we have (12). Solving (12) for the maximum number of transistors, i.e. N max , results in Equation (13) is nonlinear which requires numerical methods to be solved. One can use any nonlinear method to solve (13) without loss of generality. Here Newton's method has been used to solve (13). The maximum number of transistors (N max ) in the stack to optimize the performance is shown in Fig. 7 as a function of phase rotation per stage.
Interestingly the product N max × θ is always constant and equal to 133.6 • ; this is shown in Fig. 8. This is a gain-bandwidth counterpart which can be used as a rule-ofthumb in the design of stacked MOS PAs. Given θ using either simulation or transistor parameters, the maximum and/or optimum number of transistors, which can be stacked, can be obtained. Conversely, for a required number of stacked transistors, the θ should be kept under the product number for optimal performance.
Finally, calculating P out (N) from (11) for N and N -1 number of transistors and dividing them to find the optimum number of transistors that still offers additive power gain, is also informative. In other words, solving the power ratio P (N) /P (N − 1) for N to find the maximum number of transistors when the power ratio gain is still greater than unity. Thus, we have Plotted (14) against the number of transistors for different values of phase rotation per stage is illustrated in Fig. 9. Consider the curve corresponding to the 10 • phase rotation per stack; for the first 13 stages the power ratio gain is still above unity, i.e. 1. In other word if phase rotation per stage is 10 • , by adding up to 13 transistors, the PA still offers power gain. From the 14 th stage onwards, the additional stages will only act as attenuators. A similar statement can be made for other phase rotation values as well. This is plotted in Fig. 9 for phase rotations of 10 • -40 • per stack, with 5 • steps.
All the phase rotation calculations discussed tend to converge to the same total phase ∼130 • which was already introduced in the previous paragraphs.

IV. PHASE ROTATION COMPENSATION
Initially the capacitances C n at the gate of each stacked MOS transistor (Fig. 1) have been designed to fulfill two functions: 1) capacitive voltage divider to limit the voltage swing across the junctions [7]- [11], [13], [9], [16]- [21], and 2) tune the real part of the input impedance to the optimal load for the preceding stage, i.e. R opt n , proposed by (5). However as pointed out in (1) -(5), the admittance seen by each transistor essentially conveys an imaginary part B n , which needs to be compensated for. The impact of the B n part was also discussed in previous section. As was shown through (9) - (14), the B n part originates the phase rotation per stack and hence yields performance degradation. There exists also a mismatch between the susceptances looking upwards and downwards. In other words, B in n+1 is not necessarily equal to B out n . This is clearly explained through calculating imaginary parts of the admittances, which is the very first origin of discrepancy.
Thus, using the high frequency transistor model of Fig. 2 and two stacked MOS transistors in the middle of the structure (Fig. 10), in the presence of the drain-source compensation capacitances C ds ii and neglecting the channel length modulation effect, we have and B out n ≈ ω n C gd n g m n R opt + C ds− par n + C ds n + C gd n .
The term C ds− par ii in (15) and (16) are MOS transistor drain-source parasitic capacitances. It is seen that (15) differs from (16) which originates another design inaccuracy due to the discrepancy between the admittances seen towards different directions, which must be compensated for. The issues above were neglected in [8]- [10], [12]- [15], when dimensioning the devices. Equating (16) to the conjugate of (15), we have ω n C gs n+1 g m n+1 R opt − C ds− par n+1 − C ds n+1 = − ω n C gd n g m n R opt + C ds− par n + C ds n + C gd n .
Solving (17) for C ds n+1 results in C ds n+1 = C gs n+1 g m n+1 R opt − C ds− par n+1 + C gd n g m n R opt + C ds− par n + C ds n + C gd n . (18) Fig. 11. Simplified cross-section of deep n-well (DNW) process and its most dominant parasitics in bulk CMOS technology.
It should be noted that the first stage does not need to perform compensation, so we have forced C ds 1 to be 0. Thus, we have The dimensioning rule of (19) guarantees proper phase detuning along the stack. Equation (19) accounts for the Miller effect of C gd , i.e. The term C gd g m R opt , in phase compensation and/or interstage matching. Simulations prove the importance of it in multi stack PA design at mm-wave regime. This is while, the mentioned term is missing from the compensation method proposed in [8] and [15]. Although the calculations were performed for the proposed negative capacitance compensation method, the approach can be applied to other detuning techniques without loss of generality. More importantly, C ds n+1 can be dimensioned to compensate for more parasitics as well.

A. Bulk CMOS Considerations
As explained in [21], to reduce the body effect on the AM-AM conversion it is recommended to utilize triple-well technique [7]. The body isolation based on the mentioned process technique however poses two issues: a diode and a parasitic capacitance are formed between the deep N-well (DNW) and P-well, which must be considered when designing the stacked MOS PA (Fig. 11).
The effect of the former can be simply minimized by reverse biasing the PW-DNW junction diode, however, to compensate the effect of the parasitic capacitance, formed by p-well and the DNW, one needs to consider the bias dependence of the mentioned parasitic capacitance (Fig. 12) [21]. Fig. 10 needs modification to take triple-well bulk MOS parasitics into account. This is shown in Fig. 13. Calculating the susceptances looking upwards and downwards, equating them, and solving for compensating drain-source capacitances, yields (20), [21].

V. AMPLITUDE TO AMPLITUDE/PHASE CONVERSION DISTORTION
Up to this point all the analyses were based on the small signal domain approximation where the transistor parameters vary negligibly if at all. As soon as the input signal grows beyond such assumption, the PA manifests nonlinear behaviors. A direct consequence of which is gain compression known as amplitude to amplitude (AM-AM) conversion distortion. Also due to the presence of both intrinsic and extrinsic dynamic components, such as parasitic capacitances as well as gate capacitive voltage division network, the amplitude to phase (AM-PM) conversion distortion is inevitable [4]- [33].

A. AM-AM Conversion Distortion
After the gate capacitances, i.e. C n+1 , have been fixed in the PA design based on (5), the values of R opt n = Re Y in n , expressed in (4), are ideally required to remain constant. However, this was shown not to be the case over the frequency variations in previous sections. Moreover, g m and C gs are also amplitude dependent parameters (Fig. 14) which definitely alter the value of Re Y in n departing it from the desired value of R opt n = 1/G opt n , Fig. 14(c). This in turn degrades the gain translating to AM-AM conversion distortion. It should be noted that C gd also is an amplitude dependent parameter with a minor impact compared to the foregone parameters ( Fig. 14 b). Conversely, the impact of C gd on the Re Y in n manifests as a gain compression in the "Miller effect" which is already calculated in (4). Hence, taking the transconductance compression in the calculations should be fair enough.
It should be noted that, 4 th order polynomial has been utilized for estimating R opt 's in Fig. 14(c). This is chosen for the simplicity of hand calculations purposes. Although the simple low order polynomial is giving wrong estimate in negative signal swing side, increasing the order leads to calculations complexity. Based on simulation results presented in section VI, using such lower order polynomials shows very well matching with the final design, however it is evident one need to fine tune the values for better match. It is possible to express the input-output characteristics of the PA based on Volterra/power series [28]. However, to keep the analysis simple enough for hand calculation purposes, the effect of higher order nonlinearities on the first term of the Volterra series are considered in the following analysis, i.e. V out = a n V n in , wherein a n 's are yet to be determined for the total amplitude dependent output signal as well as the gain of the PA. Given V in = A c cos (ωt), the fundamental harmonic of the output signal hence the gain of the PA, using binomial formula, can be expressed as Based on the concept of Fig. 1, the overall output signal of a stacked MOS PA (or in general any technology) is the accumulation/summation of the signals across the drain-source of each single stage, i.e. v ds n . Hence the drain-source signal of each single stage of the PA can be expressed as: Thus, the overall gain can be expressed as an accumulation of the gains of the whole stages, i.e. AV total = n v ds n /V in .
In case of identical dimensioning the total gain can be written as Since the parameter G m (V in ) is technology dependent and the governing equation of the R opt n (V in ), i.e. (4), is quite complex, their values can be estimated with several different methods. Here, due to its widespread application, the power series approximation method has been used to estimate both (Fig. 14).
where g i n 's and r i n 's are fitting parameters extracted for the technology node of interest and the optimum load of interest at the desired quiescent bias point, respectively. With a one tone sinusoidal continues wave (CW) of V in = A c cos (ωt), and the fact that the higher order harmonics are filtered out and keeping odd harmonics up to 5 th term, substituting (23) and (24) in (22), yields (25), as shown at the bottom of the next page, which can be simplified in terms of identical stacks. Given the fitting parameters g i n 's and r i n 's, the output voltage and/or gain can be plotted vs input amplitude hence the AM -AM conversion distortion can be estimated. This is shown in the simulation results section.

B. AM -PM Conversion Distortion
When calculating (4), Im (Y n ) = B n was supposed to be fully compensated, i.e. it is required to be zero. As explained in previous subsection, the susceptance of the load projected to each stack varies with amplitude of the input signal. In other words, B n does not remain zero for the whole input amplitude range. This in fact translates to phase variation of the load seen by each stack and must be quantified to envision the impact of the AM -PM on the modulation schemes.
To define the variation of the susceptance, called B n , from the ideal zero value, first the non-compensated value is extracted from (26), as shown at the bottom of the page, where b m and d m are defined in (27). b m and d m are complementary modifications to the definitions of coefficients "b" and "d" in (4.b) and (4.d), respectively.
where "b" and "d" are defined in (4.b) and (4.d), respectively. Subtracting the compensated nominal value defined by (20) from (26) gives the B n . The susceptance variation of B n vs input signal is plotted in Fig. 15. By the same token, B i can be approximated using third order polynomials as depicted in Fig. 15. To proceed with AM -PM conversion distortion quantification, the phase rotation per stage is calculated as The "eff" subscription in (28) is the root mean square (RMS) calculated from polynomial estimation of the corresponding parameter. Still the AM -PM conversion distortion due to input matching is required to complete the analysis. In this respect, the approach proposed in [31] is followed to support the rest of the analysis in this section. To express the effective values of the terms in (28), i.e. R opt n ef f and B n ef f , their corresponding nominal values expressed in (24) and (29) are first plotted/extracted against input amplitude, shown in Figs 14(c) and 15, respectively.
The plots are then estimated based on power series curvefitting to extract the coefficients (Figs 14(c) and 15). Substituting the V in = A c cos (ωt) into the extracted polynomials and calculating the RMS values yields the R opt n ef f and B n ef f reflected in (30) and (31), as shown at the bottom of the next page, respectively. By the same token, the variation of the effective value of the gate-source capacitance, i.e. C gs ef f , is obtained from Fig 14(b) and expressed in (34), as shown at the bottom of the next page.
Accordingly, the overall phase rotation with respect to amplitude variations, i.e. the AM -PM conversion distortion, g 0 n r 0 n + A 2 c g 2 n r 0 n 4 + 3g 1 n r 1 n 8 + 3g 0 n r 2 n 4 + A 4 c g 4 n r 0 n 8 + 5g 3 n r 1 n 32 + 5g 2 n r 2 n 24 + 5g 1 n r 3 n 16 + 5g 0 n r 4 n 8  Fig. 17. Comparison between the simulated results of the proposed PA design dimensioning rule of (5) and that of conventional methods explained in [15], in both methods phase compensation method of (20) [21] has been applied.
Simulations based on TSMC28nm bulk CMOS PDK has been exploited for both methods.
throughout the whole structure can be expressed as in (35), as shown at the bottom of the page.

VI. SIMULATION RESULTS
In order to verify the proposed design methodology, first the impact of the proposed gate capacitance, C n , dimensioning on the performance of the stacked MOS PA topology was simulated and compared to that of the conventional model presented in [15]. Then, a 28GHz four-stack CMOS PA was designed and simulated based on 28nm bulk 8M1P CMOS technology (Fig. 16). Table I summarizes the power supply rail and transistor's parameters. An R L of 50 defines the optimum load to be 12.5 per stacked transistor, i.e. R opt . Resistors R 1 -R 5 were used for DC biasing of transistors M 1 -M 4 . Their resistance values were chosen to be much higher than the impedances of the gate capacitances C 2 -C 4 at the desired frequency band. Each single transistor device has 32 gate fingers with a total width of 600nm. Fig. 17 shows simulated power gain, G p , saturated output power, P sat , and P AE max at different frequencies for a 4-stack MOS PA based on the method introduced in [15] along with that of proposed in this work. Using (5) along with (20), which was applied in both cases, offers an approximately 8dB of power gain, 4dB of maximum output power, and 7% of maximum efficiency improvement at the maximum operating frequency of 80GHz, which corresponds to almost 200% gain, 30% output power and 450% efficiency increase, respectively. This performance increase is quite essential in mm-wave applications. Fig. 18 displays the EM structure of the layout of a 28GHz 4-stacked MOS PA simulated in ADS Momentum. In order to be closer to realistic circuit behavior, the EM structure includes all the interconnects from signal pads to the matching networks and then to the input/output of the PA and biasing pads to the biasing nodes of the PA. In other words, any metallic interconnect starting from (and including) M1 to M8 is included in the EM structure. It is worth bearing in mind that ideal ground is considered only at the ground pad extent consistent with the probe tips and the ground what is laid out inside the chip does not include any ideal ground connection. For this reason, the stability simulations include the ground network effects. Simulation was configured to conduct adaptive frequency sampling (AFS) using microwave engine (μW-Eng) to account for more radiation losses, coupling in/between the metallic routings. It should be noted that active parts extracted based on provided PDK model along with the parasitic extraction routine.
As mentioned in section IV, due to mismatch between the susceptance of the consecutive stages as well as additional  capacitance of pw-dnw, phase misalignment between the stacked transistors poses performance degradation. To compensate for such misalignment the method presented in [21] was deployed, the impact of which over different frequency bands is reflected in Fig. 19. The mentioned method offered a ∼20% improvement in phase alignment between the transistors which yielded an improvement in gain and input match, but the reverse isolation and output match were somewhat degraded. Large signal properties of the designed 28GHz 4-stack PA are plotted in Fig. 20. Based on the design guidelines in this work along with the phase compensation method of [21], the performance of the designed PA is improved by almost 30% in power gain, 200% in PAE, and it gives 5dB more output power.
The AM-AM, the orange dash dotted and yellow dotted lines, and AM-PM conversion, the solid blue and gray dashed lines, properties of the designed PA are illustrated in Fig. 21. Based on (21) -(35), the estimations were calculated for each single stage separately, then added up to form the final AM-AM/PM distortion. Fig. 20 shows a good agreement between the simulated results and theoretical analysis described in this work.
To get an insight into the impact of the quality factor, Q, of the gate capacitances C n 's and drain-source capacitances C ds n 's, the PDK capacitances were replaced by the ideal  capacitances in series with ideal resistors and the value of the resistors swept corresponding to sweep Q factor in the simulation setup. The corresponding results are shown in Fig. 22.
As can be seen from Fig. 22, so long as the Q factor of the mentioned capacitances are above 5, the performance of the PA remains within 0.5dB and/or 0.5% discrepancy with respect to its infinite Q factor counterpart. The simulated Q factor of the capacitances laid out using the mentioned PDK are above 25 which satisfy the design requirements.

VII. CONCLUSION
High frequency impact on stacked MOS PAs was studied in this paper. Based on analysis, it was shown the traditional device dimensioning is only valid for frequencies up to f t /10. After that frequency it was demonstrated that the optimal load degrades drastically necessitating a modification in device dimensioning, which was proposed in this paper. The impact of the phase rotation on the performance as well as the optimum number of the stacked transistors were studied, and it was shown that the product N max × θ is always constant and equal to 133 • . After reviewing a negative capacitance compensation method, the AM-AM and AM-PM conversion distortion due to variation of the optimum load was studied based on sensitivity analysis. The theoretical expressions were evaluated against the simulations. Finally, a 28GHz 4-stack MOS PA was designed and co-simulated using EM tools along with the passive structures of the circuit. The simulation results confirm the validity of the analysis expressed in this work. The results in the paper will help to minimize the inevitable performance degradation as a function of operating frequency in the PA design.
Mohammad Hassan Montaseri received the M.Sc. degree in electrical engineering from the University of Mazandaran, Babol, Iran, in 2010.
He is currently with the University of Oulu, Oulu, Finland. His research interest includes RF front-end design for mm-wave/(sub) THz ICs for wireless communications applications.  He leads the devices and circuits research area in 6G flagship program financed by the Academy of Finland. He has authored and coauthored one book, two book chapters, more than 150 international journals and conference papers, and holds several patents. He is also one of the original contributors to Bluetooth low energy extension, now called as BT LE. His research interests include wireless systems and transceiver architectures for wireless communications with special emphasis on the RF and analog integrated circuit and systems design.
Dr. Pärssinen served as a member for the Technical Program Committee of International Solid-State Circuits Conference from 2007 to 2017, where he was the Chair of European Regional Committee from 2012 to 2013; and the Chair of the Wireless Sub-Committee from 2014 to 2017. He has been serving as a Solid-State Circuits Society Representative for IEEE 5G Initiative from 2015 to 2019.