Limitations and Implementation Strategies of Interstage Matching in a 6-W, 28–38-GHz GaN Power Amplifier MMIC

In this article, we summarize the theoretical matching boundaries and show the limitations they implicate for real-world amplifier design. Starting with a common schematic prototype, we investigate the question of how to realize its electrical response in a densely routed, massively parallelized layout. To that end, we develop a comprehensive study on the application of space-mapping techniques toward the design of high-power amplifiers (HPAs). We derive three reference design procedures and compare their performance in terms of convergence, speed, and practicality when laying out a densely routed HPA interstage matching network. Subsequently, we demonstrate the usefulness of the study by designing the networks of a compact three-stage eight-way wideband HPA in the Ka-band. The processed monolithic microwave integrated circuit features a 1-dB large-signal bandwidth of more than 11 GHz (a fractional bandwidth of 32.8%) and thus covers most of the Ka-band with an output power exceeding 6 W in 3 dB of gain compression. This demonstrates the highest combination of power and bandwidth to date using a reactively matched topology in the Ka-band.


I. INTRODUCTION
I N RECENT years, as solid-state technologies continue to increase their output power density and efficiency at millimeter-wave frequencies, solid-state amplifiers can be employed in applications that could previously only be realized using traveling-wave tube amplifiers (TWTAs). These applications include intersatellite communications, medical Manuscript  imaging, radar, and mobile communications. In the future, solid-state technology could allow engineers to overcome common drawbacks of traditional TWTAs [1], for example, their warm-up time, limited service life, and their rather narrow usable bandwidth. These issues-especially the last pointcan be improved upon considerably using III-V semiconductors, such as gallium arsenide or gallium nitride (GaN). In the Ka-band , research toward higher power GaN high-power amplifiers (HPAs) recently produced publications demonstrating up to 40 W of output power [2], [3] on a single monolithic microwave integrated circuit (MMIC). High-power and high-efficiency designs typically utilize a reactive matching network to obtain a large-signal impedance match of the high-electron-mobility transistors (HEMTs) employed in the circuit. These types of MMICs generally achieve up to about 20% of relative bandwidth (RBW) [4], [5]. On the other hand, using traveling-wave topologies, fractional bandwidths exceeding 50% have been shown [6], [7]. However, MMICs employing traveling-wave topologies may exhibit certain disadvantages, among them an uneven heat distribution [6] and comparatively low-efficiency figures [8]. These are compelling reasons to strive for a reactive-type MMIC that reaches a relatively large bandwidth and still maintains reasonable power and efficiency figures.
When designing a matching network in a large and densely routed HPA MMIC, extensive coupling of neighboring structures as well as other distributed effects can dominate the RFresponse of the network. Since these effects are not accounted for in circuit models, it can be challenging to transfer a prototype (schematic) matching network to a layout such that the electrical responses are sufficiently similar, especially over a large bandwidth. Once we approach the fundamental matching boundary given by the Bode-Fano limit [9], [10], the accuracy to which the prototype network must be implemented is even more critical. This is because in a design that is close to the theoretical limit, additional parasitics will deteriorate the obtainable match inside the desired frequency range. In contrast to this, in a design that is considerably below the theoretical limit, additional parasitic elements can be absorbed without affecting the theoretically obtainable match.
Consequently, dedicated techniques are useful to systematically tweak a layout to approach the originally desired char-This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ acteristic. Space mapping [11]- [15] is one such technique that has gained significant use in the design of filters [16]. While its usage in MMIC design has been reported before [17], this is the first article showing a detailed study on the properties and usefulness of this concept toward high-power MMIC design. We concentrate on the practical implementation of wideband interstage matching structures. To that end, we derive, implement, and compare two different algorithms: one of them more complex and close to the full formulation of space mapping [11] and a second one that features a reduced number of steps and is simpler to implement in an EDA program such as Keysight ADS.
This article is organized as follows. In Section II, we introduce Fraunhofer IAF's GaN10 process and describe the properties of HEMT devices manufactured with it. Section III describes the theoretical boundaries of matching networks and examines the challenges arising from interstage matching networks (ISMN), where a complex impedance is matched to another complex impedance. A common topology of choice for an ISMN is described in Section IV. Section V contains a discussion of implementation approaches. The spacemapping technique is described, followed by the derivation of two concrete algorithms, a reference implementation, and a comparison in terms of performance and usability. Sections VI and VII give details on a wideband HPA MMIC design which was implemented using the study above. A comparison of the processed MMIC to the state of the art is given in Section VIII. Finally, the conclusion and outlook can be found in Section IX.

II. TECHNOLOGY AND DEVICE
Fraunhofer IAF's GaN10 technology features AlGaN/ GaN-HEMTs with 100-nm T-gates. As a substrate, 100-mm semi-insulating silicon carbide with a thickness of 75 µm is used. Passive circuitry is realized using an evaporated first and a galvanic second metal layer for increased current handling capability. Metal-insulator-metal (MIM) capacitors and nickel-chromium (NiCr) thin-film resistors are available as well as a full backside process with through-substrate vias. The transition frequency f T and maximum oscillation frequency f max of this process are in the range of 100 and 300 GHz, respectively [18]. This enables MMICs operating at W-band and beyond [19]. With a typical output power density of 2 W/mm at V DS = 15 V, high-power designs exhibiting state-of-the-art efficiency can be realized.
For this research, we analyzed the perspective of broadband parallel power combining in the lower mmW frequency band. A previous design has shown promising results in the same frequency band [20] and is improved upon in this work. As shown in [20], the optimum HEMT periphery for the frequency band in question was found to be an eight finger device with a unit gate width of UGW = 60 µm for a total gate width of TGW = 0.48 mm. Therefore, they are used for this work as well.
A major challenge in the design of broadband MMICs is to find a matching network that compensates for (in fact, absorbs) the parasitic capacitances, these devices exhibit at high frequencies. To that end, the HEMT gate can typically  Device-level LP measurement results at 34 GHz. The red and blue shapes indicate contours of constant P out and PAE, respectively. The equivalent circuit's S 11 between 28 and 38 GHz is shown in green ( marks its value at 34 GHz). The target match of m ≤ −15 dB is depicted as a black circle. be modeled as a series (L)CR element, while its drain behaves like a parallel RC circuit [21]. Both of these equivalent circuits are labeled accordingly in Fig. 1. To derive appropriate values for the large-signal equivalent circuits, load-pull (LP) measurements [22] are a valuable basis. As an example, Fig. 2 is generated from an LP measurement at f 0 = 34 GHz. It shows the contours of constant P out and PAE in 2-dB gain compression. Each contour represents a step of 50 mW and 2% in P out and PAE, respectively. By selecting appropriate values for R ds,ls and C ds,ls , we can synthesize matching networks that result in maximum PAE or maximum P out or choose a tradeoff goal. For a general-purpose PA, a tradeoff impedance in between the two maximum points is most sensible. In this way, we can ensure to obtain as robust a design as possible and allow for measurement uncertainty and process variations, which could shift the optimum impedance point slightly.
Using LP measurements at 30 and 34 GHz, we deduce the following values for the optimum device operating at V ds = 15 V: the gate capacitance equals C gs = 443 fF and the gate resistance is R g = 3.32 . For the HEMT drain, we find C ds,ls = 122 fF and the drain loadline resistance R ds,ls = 55.5 . S 11 of the resulting drain equivalent circuit is shown as a green trace in Fig. 2 (28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38). The goal in the following will be to design an ISMN that achieves 85% of the maximum P out and PAE in the frequency band of interest.
This equals an impedance range as depicted by the black circle in Fig. 2 around the tradeoff impedance, which represents a maximum reflection coefficient of m ≤ −15 dB with respect to the tradeoff impedance.

III. THEORETICAL MATCHING BOUNDARY
Given the passive equivalent circuit of an HEMT device, it turns out that there is an upper limit as to how broadband it can be matched in principle. The Bode-Fano criterion defines this limit, which is the maximum matching bandwidth that can be achieved for a given combination of capacitance and termination resistance [23]. In this context, we define matching bandwidth as the frequency range in which the reflection coefficient is below a certain threshold m .
For an RC-parallel element as shown in Fig. 1(b), the reflection coefficient is limited by the inequality [9] ∞ 0 ln 1 where τ is the RC time constant [24]. Assuming a constant reflection over the band of interest and total reflection out of band, we can simplify (1) to which gives us an upper boundary of the maximum achievable reflection coefficient. In general, the RC time constant τ is inversely proportional to the total area under the reflection coefficient curve (i.e., S xx in dB). Furthermore, the Bode-Fano limit for the HEMT input equivalent circuit, shown in Fig. 1 on the left, is given by As before, we can simplify this equation assuming a constant in-band reflection coefficient Introducing fractional bandwidth, RBW, as and the center frequency ω c as we can rearrange (4) to In contrast to the HEMT drain equivalent circuit (2), the result of (7) illustrates an improved match for higher center frequencies ω c . In other words, for the gate, the achievable RBW or alternatively m improves with frequency, while for the drain, it decreases. Due to their higher operating voltage, high-voltage technologies such as GaN feature HEMTs with high R ds,ls and are as such intrinsically more limited in terms of their Bode-Fano matching bandwidth compared to GaAs pHEMT or even silicon devices [25]. An example of the Bode-Fano limit of Fig. 3. S 11 of a lumped bandpass matching network that matches a parallel RC circuit to a series RC circuit [as shown in Fig. 1(c)]. Three different cases are shown where we varied the values of C gs and C ds,ls such that the matching is Bode-Fano limited either on the drain side (-), the gate side (− ·), or both (− −). For each of the cases, an optimization was performed to minimize the in-band reflection coefficient. a low-voltage technology can be calculated using the values supplied in [26], where at V ds = 1 V, a two-finger device featuring an UGW of 45 µm exhibits R ds,ls = 12.5 and C ds,ls = 120fF. With the values for GaN10 at V ds = 15 V given in Section II, we can compare the Bode-Fano limit for the technologies using (2). The comparison shows that the theoretical limit for GaN10 is roughly 22% of the aforementioned low-voltage technology. In other words, the drain of the low-voltage technology can be matched over 4.5 times the bandwidth with the same reflection coefficient as that of the high-voltage technology. Note that the transistor size is irrelevant to this boundary, as the HEMT's RC time constant τ stays approximately constant over TGW.
As it turns out, in terms of design complexity, the most critical matching network of a GaN high-power amplifier (HPA) with multiple stages is the ISMN. Its practical realization and the tradeoffs required will be described in detail in the following. From a conceptual point of view, we can use the previously introduced HEMT equivalent circuits to represent this problem, as shown in Fig. 1. Fig. 1(a) and (b) shows the matching of the input and output, respectively, whereas Fig. 1(c) shows the case of an interstage network. This arrangement, a frequency-dependent generator impedance to be matched to a frequency-dependent load impedance, is called a double-matching problem. Analytical and numerical topology synthesis procedures have been studied extensively in the past [27]- [29].
The fundamental matching limit of the formulation by Bode [9] refers to a purely resistive generator impedance that is matched to an RC load. Fano [10] extended this formulation to a load of arbitrary but fixed impedance, again with a purely resistive generator impedance. However, considering the ISMN's matching bandwidth limit, there is a complex source impedance that is matched to a complex load impedance [see Fig. 1 To investigate a possible penalty in the maximum matching bandwidth when both termination impedances are complex, we compared three different cases in which we varied the  Fig. 1(c). The first two cases are those that occur in an input and an output matching network: they are Bode-Fano limited on the gate (C gs ) or drain (C ds,ls ) side, respectively. For the third case, C gs and C ds,ls are chosen such that they both equally limit the matching bandwidth. A third-order lumped bandpass was used as the matching network and numerically optimized to exhibit the lowest possible S 11 between 28 and 38 GHz. As a result, the curves shown in Fig. 3 were obtained. By integrating the area under the curves from Fig. 3 as specified in (1) and (3), we can quantify how close the networks come to the theoretical optimum case, which is given by the right-hand side of (1) and (3). We define the ratio of these experimental values to the Bode-Fano limit as f BF,ds and f BF,gs and The resulting values for f BF,ds and f BF,gs as well as the capacitance values for each case are listed in Table I. The integration has been performed for frequencies in the range of 0-70 GHz. In the first case, which is limited by C gs , the optimized network reaches f BF,gs = 95.6% of the Bode-Fano limit. For the second case (limited by C ds ), we obtain a value of f BF,ds = 98.6%. In the third case, limited by both C gs and C ds , the network reaches 93.7% and 93.4% of the Bode-Fano limits for series and parallel RC circuits, respectively. These results suggest that the used bandpass network can absorb both the series and the parallel capacitance equally well. In addition, the presence of a second constraint (i.e., both terminations are complex), as shown in case 3, has only a marginal impact on the values of f BF,ds and f BF,gs . For a practical design, we can state that depending on the device's equivalent circuit values, either the gate or drain can limit the obtainable matching performance, depending on which of them is closer to the Bode-Fano limit.
These results indicate that we are limited by the gate side (below the limit by a factor of 1.8), whereas the drain-side matching limit is significantly above the design parameters required (factor of 4.1). Note that for this analysis, the networks are assumed to be loss-less, which, especially for mmW frequencies, will not be the case. The losses introduced either on purpose (e.g., stabilization) or due to parasitics will change the network's return loss (RL).

IV. INTERSTAGE MATCHING TOPOLOGY
In an HPA using parallel power combining, certain components are needed as a consequence of the physical and electrical constraints in place-irrespective of the matching function, which the network has to provide as well. Fig. 4 shows those components for the case of an ISMN. As seen from the HEMT drain, a bus interconnecting the parallel stages' drains is needed for bias supply. A series capacitor separates the dc path for the gate and drain supply voltages. An n-way power splitter divides the energy toward the next stage and provides the lateral distance required for heat distribution. Furthermore, a gate supply bus is needed to set the HPA's bias point, and in many cases, an RC highpass element is used for gain shaping and stabilization. As the ISMN will be mirrored along the x-axis for the final HPA, the total width of the network (w tot in Fig. 4) fixed. Also, the locations of the gate pins (shown as red dots) are fixed by the output matching network.
This basic topology can be altered by adding further components or rearranging some of them. In the authors' experience, a series line before the drain bus connections (MSL in Fig. 4) can improve matching considerably. Choosing appropriate dimensions, this topology can form a fourth-order bandpass network. However, in practical applications, a thirdor second-order nonminimum network [30] is realized using the same topology since it often features lower insertion loss while still providing adequate RL performance. This is shown in Fig. 5, which shows a comparison of second-and thirdorder matching responses from the topology from Fig. 4. The vertical lines represent the band of interest. The entire area under the S 11 -curve counts toward the integrals of (1) and (3). Therefore, the area outside the band of interest (shown hatched) limits the achievable in-band RL. As can be seen, the third-order network features higher insertion loss but satisfies the matching goal of = S 11 < −15 dB in the entire frequency range of interest. On the other hand, the second-order network only exhibits an S 11 ≤ −10 dB but features lower insertion loss. Considering the case laid out in Section III, where we assume a constant in-band reflection coefficient and total reflection out-of-band, this would only be satisfied by a network of infinite order. However, we can come reasonably close; for the third-and second-order responses, 63.3% and 51.0% of the area under the curve are in the band of interest, respectively.
For the experiment described in Section V, we parameterize the width and height of the elements highlighted in Fig. 4. The stabilization element's values are given by the active device and are thus fixed for the purpose of this experiment, as well as the RF blocking capacitors.
To describe the electrical behavior of the structure with a scalable schematic model, we use the components provided by the microstrip library in Keysight ADS wherever viable. In addition, we employ fab models where needed, for example, in case of the MIM capacitors. It is crucial to describe the network as accurately as possible. The parasitic behavior of parts, such as the T junction, the X junction, or a 45°curve, plays an important role in this and must be included from the library. Using an optimizer and the aforementioned goal of < −15 dB between 28 and 38 GHz as seen by the HEMT drain, we obtain the result shown as a yellow curve in Fig. 6(a). The goal keepout area is illustrated as a hashed box in the same plot. As can be seen, the optimized schematic simulation satisfies the goal adequately in the band of interest.

V. ISMN IMPLEMENTATION STRATEGIES
This section deals with the implementation of the network developed in Section IV, i.e., finding a physical realization that matches or surpasses the predicted reflection coefficient of the schematic simulation. In the experience of the authors, this part of the design process will often be very time-consuming and laborious, especially for layouts in which a relatively large number of transistors are to be parallelized.
To evaluate the initial guess produced by the schematic simulation, we translate the schematic to a layout and then to a 3-D model. In Fig. 6(a), the reflection coefficient of the schematic simulation is plotted over frequency (yellow).
The blue curve in turn shows the result of an electromagnetic (EM) simulation of the same structure. Clearly, the reflection coefficient is severely degraded and the design is in this state unacceptable for production. This is a typical effect in a complex layout like this, where extensive coupling between adjacent structures occurs. As an example, consider the parallel lines of the drain bus (Dbus) and the power splitter (Split) in Fig. 4, which are only separated by the size of the MIM capacitor Blk.
Assuming that the EM simulation is an accurate representation of the network properties, this raises the question of how to improve the EM simulation result. An elegant way of doing this is termed space mapping and has been researched in the past [11]- [15], mainly for passive structures such as microstrip or waveguide filters. In [15], a summary of space-mapping approaches can be found. As the mathematical intricacies have already been derived in the literature, we do not need to repeat them here in detail. Yet, we will give a short summary to facilitate an understanding of the experiment we conducted. The basic concept of space mapping is to use a coarse model R c that is computationally cheap and still implements physical knowledge of the structure in question-such as a detailed schematic. From this coarse model, a surrogate model R s is deduced using a suitable transformation. In the simplest form, R c and R s are identical. While performing the algorithm, in each iteration, we calibrate the surrogate model from a fine model R f (e.g., given by the EM simulation) to constantly decrease the error it contains. In case of convergence, the final iteration yields the design variables where the fine model represents a similar response (e.g., over frequency) as the original coarse model prediction.
The advantage of this procedure is a potential decrease in the time needed to arrive at a usable layout because the fine model is only evaluated once per algorithm iteration, while the surrogate model is used for optimization. Compared to a layout optimizer, the number of EM-model evaluations is reduced by this procedure. This allows the designer to make custom changes to the layout in each algorithm iteration. For example, the procedure allows the designer to decide between passes to fold a line if it violates spatial constraints. To evaluate this approach for the design of large and broadband HPA networks with the boundary conditions mentioned in Section IV, we formulated two algorithms that are variants of basic idea of input space mapping [14]. In Sections V-A-V-G, we introduce them and propose an efficient implementation using Keysight ADS as schematic simulation tool.

A. Algorithms A and B
A flowchart of algorithms A and B is shown in Fig. 7 (top). They share the same flow but vary in the factor c. As outlined before, the algorithms require two models: a coarse and a fine model. The coarse model is a detailed schematic employing the microstrip library in ADS. It is used as the surrogate model by introducing a vector p cal,i , which contains an offset value for each design parameter. This offset can be interpreted as a calibration of the coarse model and is initially set to 0.
In each iteration, the surrogate model's offset vector is updated from the previous iteration. The newly calibrated surrogate model is then optimized toward the specified goal, in our case for the desired in the band of interest. This optimization yields a set of optimized design variables p des,i which is used to synthesize a layout. The layout, translated to an EM model, serves as the fine model. Using an EM simulator such as CST or HFSS, the fine model S-parameters are calculated and transferred back to ADS. In the next step, the surrogate model response is fitted to that of the fine model (parameter extraction), yielding a vector p extr,i . If the surrogate and the fine model responses line up perfectly, p extr,i is equal to the initial surrogate optimization p opt,0 . Generally, p dev,i = p extr,i − p opt,0 (12) represents the deviation between the initial optimized surrogate and the fine model in iteration i , expressed in the design parameter space. Using the deviation vector and the damping factor c, we can calculate the next calibration vector With this value, the next iteration starts-using the updated offset values from p cal,i , the surrogate model response is reoptimized, progressively improving its prediction of the fine model. The algorithm converges once the fine/surrogate model response deviation falls below a certain threshold , which can again be judged by examining the deviation value in the parameter space For algorithm A, we set the damping factor c to 1.0, whereas for algorithm B, it was set to 0.5. Fig. 6. It does not make use of design parameter offsets. While in Algorithms A/B, the surrogate model is reoptimized with updated parameter offsets in each iteration, algorithm C only optimizes the surrogate model once at the start of the procedure. In the subsequent iterations, a parameter extraction [shown as a red curve in Fig. 6(a)] is performed, allowing us to calculate p dev,i as shown in (12). The inverse of the design parameter deviation is then directly applied to the design parameter set

Algorithm C represents a simplified approach of Algorithms A and B and its evaluation is shown in
The updated design parameter set is then used to reevaluate the fine model. The progression of the ISMN-fine model response is shown in Fig. 6(b) for the first three iterations, showing systematic progress toward the specified goal. As in algorithms A and B, c is a damping factor reducing the impact of the changes applied to the design parameters. If its value is set too high, electrical characteristics that are unaccounted for in the surrogate model can lead to overcompensation and prevent convergence of the algorithm. For Algorithm C, c is set to 0.5. Again, convergence is judged by the similarity of extracted and original design parameters, as shown in (14).

C. Analysis
Comparing the approaches of Algorithms A/B and C, the fundamental solution found by Algorithms A/B can vary between iterations because an optimization step is performed in each iteration after parameter extraction. In contrast, Algorithm C continually tries to replicate the solution found in the initial optimization step. Therefore, Algorithms A/B may yield a better result in cases where the initial solution of the coarse model is not achievable by the fine model (e.g., due to inaccuracies of the coarse model). As an example, consider a power splitter (fork) as it occurs in the ISMN shown in Fig. 4. If the parallel microstrip lines after the fork T junction are routed very closely, their common-mode characteristic impedance Z 0 will increase considerably. In case a low Z 0 is required by the initial optimization, the solution will not be realizable in some instances. This can render algorithm C unable to solve the problem unless specific constraints are enforced in the initial optimization.

D. Practical Implementation
To be of practical use to the designer, the implementation of an MMIC design procedure is a critical factor. For the algorithms described in Section V, a reference implementation has been developed for this article. Using Keysight ADS, it has been found that Algorithm C is simpler to describe in an electrical schematic than Algorithms A/B. This is due to Algorithm C's simpler structure (see Fig. 7) in which it features an initial optimization of the coarse model, after which only a parameter extraction step is performed in each iteration.  1) Surrogate Model: The surrogate model aims to approximate the fine model as closely as possible while remaining computationally cheap. In our case, we employ the ADS microstrip library and schematic circuits of Fraunhofer IAF's MIM capacitors. In Fig. 8, the elements highlighted in orange are changeable. In the implementation, each of these elements is parameterized using the variable blocks shown below the model. An element's value is comprised of the initial optimization and the current parameter extraction value, for example msl1_l = msl1_il + msl1_dl. (16) While the initial optimization variable block remains active throughout the execution of the algorithm, only one of the parameter extraction blocks is active at a time. For each new iteration, the previous block is copied and appended to the array.
2) Fine Model: The fine model is essentially an S-parameter block that serves to include the results obtained by the EM solver in the schematic simulation. It is connected in the same way as the surrogate model: using the FET loadline equivalent circuit on the left and the gate equivalent circuit on the right.
3) Simulation Setup: This section contains two goal setups. For the initial optimization, we use a setup to minimize the RL and/or the insertion loss of the surrogate model, i.e., the goal is set to obtain argmin abs(S 11 ) in the frequency range of interest. For the parameter extraction steps, a second setup is used where the goal is to minimize the difference between the surrogate and the fine model responses. In the schematic of Fig. 8, this means that we are looking for argmin abs(S 33 − S 11 ). In addition, we can also include the difference in insertion loss, i.e., argmin abs(S 43 − S 21 ).
The algorithm is then performed by first enabling the initial optimization goal and solving for the optimum parameters of the coarse model (parameter extraction block set to 0). Using an optimization controller with a random or gradient optimization in ADS, an adequate solution is usually found within 1 or 2 min. After an initial fine model evaluation, the fine model response is compared to that of the coarse model using the second goal, yielding the values of the first parameter extraction block. To prepare the fine model for its next evaluation, the parameter extraction values are multiplied with the damping factor c and subtracted from the design variable set. Next, a new fine model is generated and evaluated. This procedure is repeated until we obtain a sufficient fine model response.

E. Enforcing Physical Limitations
When realizing a closely spaced network such as the ISMN at hand, we will often find that the resulting dimensions deviate considerably from the initial schematic prediction. This can be a problem if the space available for an element is exhausted, and therefore, its size must be restricted in further algorithm iterations. For Algorithm C, a design variable can be fixed by setting its parameter extraction variable block to 0 and excluding the value in question from further optimization runs (iterations). However, this technique is limited in some cases if the parameter extraction run cannot achieve an adequate match between the previous fine model response R f and the surrogate model R s . In that case, a restart of the algorithm using different constraints on the initial optimization can be in order. For Algorithms A/B, this step is simpler, as we can set boundaries as needed in the coarse model optimization step of each iteration.

F. Experimental Comparison
In order to evaluate and compare the performance of the three configurations described in the sections above, we employed them separately to realize the ISMN shown in Fig. 4. In all three cases, we used the same schematic prototype as a surrogate model after the same initial optimization as a starting point. The surrogate model initial response is used as calculated in Section IV. To judge the algorithm result progression, we utilize an alternative method to the one outlined in Section V-A. This approach is based on the keepout area shown as a hatched rectangle in Fig. 10. For each iteration, all N frequency points where the keepout area was violated were included in the following squared error sum: where g is the minimum RL specification and R f is the fine model response at frequency point n. In Fig. 9, the progression of e i is shown for Algorithms A-C. Algorithm A exhibits a minimum error sum of e 3 = 4 in the third iteration, which subsequently increases again, oscillating around a value of 9 ± 5. Even with ten iterations, we could not observe convergence and thus stopped its execution.
Algorithm B, which features the same basic steps as Algorithm A but has a damping factor of c = 0.5, first shows a comparatively slow decay and even an increase in e i in the first three iterations. However, in the fourth iteration, a considerable improvement can be observed, and in iteration 6, we obtain a satisfactory result of e 6 < 0.1.
Finally, Algorithm C (reduced form with c = 0.5) exhibits a relatively slow but monotonous decay of the error function up to iteration 7, where the remaining error e 7 is equal to less than 0.1.
In Fig. 10, the resulting RL curves are plotted for each of the algorithms after the final passes. Interestingly, the resulting responses R f of Algorithms B and C are reasonably similar and are both an improvement on the original schematic prediction. This is somewhat unexpected at first, considering that the methodology tries to replicate the schematic response using the EM-simulated fine models. On the other hand, the microstrip library models assume the lines and junctions to be perfectly isolated from each other, which is not the case in a tight layout such as the one at hand. Therefore, some parasitic effects that  are present in the models affect the full structure differently, permitting an improved reflection coefficient to be realized.
As mentioned before, Algorithm A did not converge and thus does not satisfy the requirements, although its result response R f,10 is still an improvement on the first evaluation of the fine model R f,1 . Fig. 11 shows the initial and final layouts of the ISMN elements as well as the layouts resulting from unconstrained execution of Algorithms B and C. It is noteworthy that both solutions, although similar in return and insertion loss, do not share the same dimensions. Moreover, comparing the initial layout to the final design values, we can note a substantial deviation. One example is the progression of the dimensions of series line 3, which is significantly shortened and increases in width. We can also see that the lines running in parallel (after the power splitter) are significantly reduced in length and/or increased in width. This change compensates for the coupling between them, as the impedance value of coupled lines increases for a given linewidth.

G. Result Assessment
The results indicate that the damping factor c is an important adjustment to achieve convergence. Algorithm A's fine model evaluations R f,i exhibit jumps in the matching resonances from below to above the band of interest, which is consistent with overcorrections caused by the space-mapping algorithm. A factor of c = 0.5 seems reasonable if the coarse model is a sufficiently accurate approximation of the structure.
In conclusion, given an adequate damping factor c, both the full (B) and the reduced algorithm C can solve the problem. However, it turns out that preconditioning of the initial guess is more important for the reduced algorithm C than for the full algorithm. More specifically, when performing the initial optimization, it is important to set boundary conditions that avoid effects not represented in the coarse model. An example for this could be thick microstrip lines that are routed closely next to each other: coupling effects between them will be nontrivial to represent in a schematic. The effect a missing representation will have can be more pronounced for the reduced algorithm as it always aims to reproduce the initial guess.
On the other hand, in the full algorithm B, the parameter extractions are used to gradually improve the calibration. In the case of an unreachable optimum, B can switch to a different local optimum. In this context, the full algorithm B is superior to the reduced one. However, the reduced algorithm is considerably easier to implement and takes less user input to perform. Compared to the simple setup as shown in Fig. 8, the full algorithm B needs an additional parameter block storing the calibration offsets for each iteration. Furthermore, the required user input is increased. However, in the experi- ment, it converged after six steps instead of 7 for the reduced algorithm C. Thus, depending on the complexity of the problem, the additional user input of the complex algorithm B can be warranted, for example, if the EM simulation time of additional passes is prohibitive. On the other hand, in cases with less significant EM simulation times and those where the limitations of the coarse model are well known, the reduced algorithm can be preferred.

VI. HPA DESIGN
In order to demonstrate the usefulness of the approach outlined in the foregoing sections, we designed an HPA using IAF's 100-nm GaN-on-SiC process (see Section II for details on the technology). The MMIC is intended to be used in a large system and needs to cover most of the Ka-bands. A gain magnitude in excess of 20 dB is required. To meet these requirements, a three-stage topology was adopted featuring a staging ratio of 1:2, with eight HEMTs in the final stage. Similar to the concept in [20], eight-finger HEMTs with a unit gate width (UGW) of 60 µm each were employed, which equals a total gate width of 3.72 mm in the final stage. To ensure high RL, a balanced topology was implemented using a four-finger Lange coupler. Due to its comparatively simpler implementation, space-mapping algorithm C was used for each of the required matching networks. We found that for the ISMNs, physical limitations have to be enforced extensively (approach described in Section V-E). Some important constraints include the total network height, the HEMT port distances, and the distance between the HEMTs and the MIM capacitor vias. An overview of the ISMNs after Algorithms B/C and a comparison to the finalized (and constrained) MMIC layout is shown in Fig. 11. The ISMN MMIC layout constitutes a base cell and is mirrored along the Y -axis to create a massively parallelized IC layout. A simplified overview of the final schematic is given in Fig. 12, featuring two of the aforementioned ISMN base cells, input and output matching, and a second ISMN. It is doubled along the mirror plane (dashed line) to create the fully parallelized amplifier.

VII. MEASUREMENTS AND ANALYSIS
A micrograph of the processed MMIC is shown in Fig. 13. We carried out a small-signal wafer mapping with a nominal drain voltage V D of 15 V and a drain current density i D  of 50 mA/mm. The measured S-parameters over frequency of 15 samples are shown in Fig. 14. As a result of the balanced architecture, both the input and output RL magnitudes measure below −15 dB in the entire Ka-band. This also indicates that the Lange couplers perform as designed. Furthermore, the S 21 curves exhibit a flat characteristic, with a 3-dB smallsignal band between 25.7 and 36.6 GHz (small-signal RBW of 35%). Above 37 GHz, the gain decreases at about 5 dB per GHz, which can be attributed mainly to the HEMT maximum available gain (MAG) characteristics. The largesignal frequency characterization was carried out in the same bias point. Fig. 15 shows the output power and power-added efficiency (PAE) in 3 dB of gain compression for the entire Ka-band frequency range. The 1-dB large signal-band ranges between 28.0 and 39.0 GHz, which equates to a fractional bandwidth of RBW = 32.8%-interestingly, only slightly lower than the 3-dB small-signal band and shifted upward in frequency by about 1.5 GHz. A maximum output power of 38.2 dBm was measured at 30 GHz, the peak PAE of 26.1% at 31 GHz. Similar to the output power curve, the PAE characteristic exhibits a good flatness across the band, with a minimum of 22.4% at 39 GHz.

VIII. STATE OF THE ART
A substantial amount of research has been conducted toward HPAs in the Ka-band frequency range. Fig. 16 shows an  overview of recent publications in the power range above 33 dBm, with the large-signal relative 1-dB bandwidth plotted over the maximum recorded output power (see (5) for the definition of the RBW). A more detailed overview of a subset of publications is given in Table II. Note that some papers, such as [4] and [34], do not include full-band large-signal power measurements. In these cases, the available data have been used for Fig. 16. Most publications of interest describe reactively matched topologies (blue symbols), while two traveling-wave/hybrid amplifier publications were included for comparison (green symbols).
In terms of bandwidth, distributed amplifiers [6], [7] surpass reactively matched amplifiers considerably, easily exceeding 50% of RBW. On the other hand, distributed amplifiers also have disadvantages compared to reactively matched topologies, in which they require careful design of the power distribution among the circuit's transistors [6] and often reach limited efficiency [8].
With over 46 dBm, the highest output power was reported by Din et al. [2] and Roberg et al. [3]. Both of these HPAs utilize the highest drain voltage in the set (V D = 28 V). They both achieve a similar RBW of between 14% and 16% and operate in the lower end of the Ka-band (below 32 GHz). Of course, a higher drain voltage often increases the loadline resistance and thus decreases the achievable bandwidth, especially for amplifiers that are limited by their drain matching bandwidth (see Section III). Typically, the considered publications report an RBW of between 8% and 20%, with some exceptions with amplifiers optimized toward single-frequency operation [34], [39]. Compared to this trend, with an RBW of 32.8%, the HPA introduced in this work exhibits the highest relative large-signal bandwidth of all reactively matched amplifiers.

IX. CONCLUSION
In this article, we summarize the theoretical matching boundaries and show the limitations they impose on real-world amplifier design. Starting with a common schematic prototype, we investigate the question of how to realize its electrical response in a densely routed, massively parallelized layout. To that end, we develop a comprehensive study on the application of space-mapping techniques toward the design of HPAs. We derive three reference design procedures and compare their performance in terms of convergence, speed, and practicality when laying out a densely routed HPA ISMN. Subsequently, we demonstrate the usefulness of the study by designing the networks of a compact three-stage eight-way wideband HPA in the Ka-band. The processed MMIC features a 1-dB large-signal bandwidth of more than 11 GHz and thus covers most of the Ka-band with an output power exceeding 6 W in 3 dB of gain compression. This demonstrates the highest combination of power and bandwidth achieved to date using a reactively matched topology in the Ka-band, suggesting that the method developed in Section V is a useful tool to increase the bandwidth of a circuit or, more generally, to reproduce the electrical characteristic of a prototype network with high accuracy.