Hybrid Cascode Frequency Compensation for Four-Stage OTAs Driving a Wide Range of CL

Feedback amplifiers consisting of multiple gain stages are used to establish highly accurate buffered/amplified signals that can drive a wide range of capacitive load (CL). This article models, analyzes, and presents the measurement results of a high-gain four-stage operational transconductance amplifier (OTA) that is able to handle a wide range of CL up to infinity. In addition to local compensation capacitors and nulling resistors in the intermediate stages, two high-speed feedback loops made by parallel Miller capacitors and current buffers provide Miller compensation and the consequent pole-splitting in the lower CL range. The dominant pole is made dependent on the CL for the higher CL range, enabling the maintenance of the stability conditions up to infinite CL. The proposed amplifier was integrated into a 65-nm CMOS technology, consuming 140-<inline-formula> <tex-math notation="LaTeX">$\mu \text{A}$ </tex-math></inline-formula> static current under a 1.2-V supply and an active area of 0.0086 mm2. A dc gain greater than 100 dB was also perceived with a unity-gain frequency (UGF) of 4.09, 2.01, and 0.27 MHz for 4.7, 10, and 100-nF load capacitors, respectively. The average slew rate (SR) is 0.59 V/<inline-formula> <tex-math notation="LaTeX">$\mu \text{s}$ </tex-math></inline-formula>, when the OTA is formed as a buffer targeting the CLs higher than 4.7-nF.


I. INTRODUCTION
T HE continuous trend toward the scaling of MOS transis- tors has been followed by reducing the voltage supply (V DD ) to guarantee a safe operation under very low power constraints.Meanwhile, short-channel MOS devices suffer from reduced intrinsic gain [1], and traditional gain-boosting solutions like cascading have been progressively abandoned in low-voltage nano-scale bulk CMOS technologies [2].Although FinFET technologies (i.e., technology nodes lower than 16 nm) exhibit sufficient intrinsic gain, the majority of commercial analog products are still fabricated with conventional higher than 32 nm bulk technologies.Enlarging the gain of the amplifiers in scaled bulk CMOS (down to 32 nm) can be therefore accomplished by resorting to the multiple gain stages cascaded.Efficient implementation of multistage amplifiers has not been, however, straightforward and thus the focus of extensive research over the last decades [3], [4], [5].Within this framework, one of the commonly used applications of the high-gain operational transconductance amplifiers (OTAs) was to provide a reliable amplified signal for the capacitive loads (CLs) reaching tens of nanofarad.Some principal blocks that call for such specifications are active-matrix liquid crystal displays (LCDs), low-dropout regulators (LDOs), active filters, analog-to-digital converters, and line drivers [6], [7], [8], [9], [10], [11].
Maintaining the loop stability is the main challenge of the feedback OTAs supporting a wide C L range [2], [5], [12].Each stage adds a high-impendence node and, consequently, a dominant pole in the transfer function which potentially depends on the loading conditions, so keeping the OTA stable over a broad C L range entails a carefully designed compensation network.The size of the CL is highly variable in some applications, preventing the optimization of the transistor sizes irrespective of C L .In the case of a headphone driver, e.g., the external load is dependent on the type of cable connected to the output and its size may vary between a few picofarads to several nanofarads [13].The C L variations may not be aggressive in those designs used for LCD drivers, but a general-purpose amplifier is highly desirable to prevent multiple design cycles by supporting a wide range of static CLs.Single-or two-stage feedback OTAs handling a wide C L range can be conveniently designed and stabilized using the commonly used architectures [14], [15], [16], [17].Indeed, at least three stages are needed to overcome the gain requirement of several customary applications in recent technologies.It soon became apparent that two compensation capacitors, both of which are dependent on C L , are required to stabilize a three-stage amplifier using classical nested Miller compensation (NMC) [4], [18].[19], [20], [21].As such, the bandwidth is severely limited and ends up with inferior power and area efficiencies due to improper placement of poles/zeros setting aside the loading conditions of the amplifier.Sophisticated design strategies for three-stage amplifiers thus tend to eliminate the second capacitor and enrich the so-called single-Miller capacitor (SMC) compensation method with auxiliary current buffers, local RC networks, and feedforward stages, so additional left-half-plane (LHP) zeros are generated and nondominant poles can be moved to higher frequencies for the minimum area and power consumption [22], [23], [24], [25].Capacitor-free compensation strategies have been also reported at the cost of inferior power efficiency relative to the capacitor-based solutions [26], [27].Among the possible ways to realize a stable three-stage OTA, hybrid-cascode frequency compensation [28], [29] was originally developed from the idea of Miller compensation employing current buffers [23], [24], and proved to be very efficient especially when combined to the local RC network for achieving a wide C L drivability range [2].According to the original strategy, the Miller capacitor is split into equal fractions, and each fraction is applied to build up a unilateral feedback pathway to the first stage through a current buffer.The original compensation loop is thus divided into parallel loops to detect the output signal with a higher feedback factor, thus boosting the overall efficiencies in terms of area and power.The additional feedback is carefully incorporated into the circuit topology without sacrificing area or power and by exploiting the original resources of the amplifier only [2].In the case of threestage OTAs, experimental results thus reflected substantial improvements from the CL range, size of C C , silicon footprint, and power aspects [2].As intrinsic gain and channel length of transistors continue to scale down, four-stage amplification is becoming more essential to maintain both dynamic range and the gain requirements in nano-scale technologies.By contrast, the frequency compensation of four-stage amplifiers appears to be much more complicated especially when a wide C L range is demanded [30], [31], [32], [33].
In this work, we extend the idea of hybrid-cascode frequency compensation to four-stage OTAs by combining it with local compensation capacitors and resistors so that acceptable stability margins can be maintained over a wide C L range (from 4.7-nF to infinity, as confirmed by the experimental results).The remainder of this contribution is structured as follows.Section II analyzes the topology, block diagram, and small-and large-signal conditions of the new amplifier.Section III is devoted to the design procedures and the prerequisite stability analysis.In Section IV, we show the simulation and experimental results and compare the performance metrics with the relevant art.Finally, Section V concludes the article.

II. PROPOSED TOPOLOGY
A. Circuit Schematic Fig. 1 presents a possible implementation of the proposed four-stage OTA, where v i = v i+ − v i− is the input voltage and v O is the output.The input stage g m1 is made by M 0 -M 10 with M 0 to power up the input devices.The current mirror devices M 9 -M 10 perform the differential to single-ended conversion.The complementary MOS devices M 11 -M 12 , M 13 -M 14 , and M 15 -M 16 constitute the second, third, and fourth inverting stages between the supply rails where g m2 , g m3 , and g m4 are their equivalent transconductances, respectively.Transistors M 14 and M 16 implement the feedforward stages with transconductances g m f 1 and g m f 2 , which assist in improving the large-signal operation by forming class-AB third and fourth stages topologies.The quiescent current of the output branch is decided by the second stage current since M 11 and M 16 share the same source-gate voltage.To establish a more stable quiescent current for the output stage, either M 15 or M 16 can be biased via a low impedance node in the form of a class-A configuration.Such approach is not, however, effective in terms of dynamic load drivability and static power consumption.The biasing voltages V B1 -V B4 are accompanied by a biasing circuitry not illustrated for the sake of conciseness.It is worth noting that the circuit topology is very simple and entails the same number of transistors of a three stage OTA since the second, third, and fourth stages are all inverting.
The main elements of the frequency compensation network are the series R D1 , C D1 , and R D2 , C D2 around the second and the third stages, respectively, besides the Miller C C /2 capacitors between v O and the source of M 6 and M 8 that implement the embedded g mC transconductances without any power overhead.These devices with a relatively light 1/g mC input impedance close the parallel feedback pathways from v O to the input, moving a right-half-plane (RHP) zero to very high frequencies by minimizing a feedforward current to flow to the output via either C C /2 [34].The hybrid nature of the compensation network also allows designers to achieve a balanced time response during the falls and rises of v O , unlike the classical solution which possibly connects C C to the source of M 6 or M 8 .all of which are modeled by a transconductance g mi , output resistor R i , and output capacitor C i (i = 1, 2, 3, 4).The feedforward stages g m f 1 and g m f 2 have negligible effect on the small-signal operation, but are intended for improving the large-signal operation [2], [24].The compensation network contains C C broken into equal fractions, and each fraction forms a feedback loop consistent with the earlier descriptions.It will be shown later that the new arrangement pushes to a higher frequency the magnitude of the nondominant poles and lowers their quality factor, Q, when compared to a single feedback loop potentially closed by C C .One-way g mC current buffers are placed in series with the Miller capacitors to represent the contribution of M 6 or M 8 in Fig. 1.The small input impedance of the current buffers extends the bandwidth of the compensation loop by relaxing the C C loading on v O .As for the series C D1 , R D1 and C D2 , R D2 , they enhance the stability of the inner gain-stages while contributing to the overall stability not by introducing a low-frequency zero only but also by reducing the Q-factor of the poles as will be analyzed later.

B. Small-Signal Analysis
A voltage-gain transfer function based on the amplifier diagram in Fig. 2 is a prerequisite to exploring the different aspects of the proposed OTA in the presence of C L variations.The exact transfer function is excessively complex, containing several terms related to the stages' output impedances, their equivalent transconductors, and compensation elements.Many terms can be simplified upon for the output impedances are subsequently dominated by C L , g mi and compensation elements.Under these circumstances, the methodology described in [34] can be used to approximate where and denote the dc gain and the main pole, respectively.The transfer function contains two nondominant poles, whose Q-factor and natural frequency ω 0 are given as These latter equations show that as follows.1) Increasing C L lowers the Q factor, thus leading up to two real poles when the OTA should drive an ultralarge load capacitor.
2) The coefficient "2" appearing in the Q factor and ω 0 expressions stems from the parallel loops implemented by dual C C /2.Not only ω 0 is pushed to higher frequencies by this coefficient, but the Q factor is also reduced, both of which are in favor of stability.
3) The Q factor is governed by R D1 and R D2 rather than the output resistors R 2 and R 3 .This means that reducing the compensating resistors assist in improving the gain margin (GM) by dropping the Q factor without sacrificing the dc gain.Too low R D1 or R D2 are accompanied by compromised stability as they excessively lower ω 0 and, in turn, the phase margin (PM) of the exterior loop.Therefore, these resistors should be tuned to optimally position the nondominant poles relative to the time and frequency requirements of the application.The derived transfer function also involves the LHP z 1 and the RHPz 2 located at high frequencies, in which their magnitude is The LHP zero depends on all C D1 , C D2 , and R D1 and R D2 , and should be positioned to nullify partially the negative phase shift caused by the nondominant poles.As for the RHP zero, it is proportional to the compensating resistors, Miller capacitor and the g m factors, and should be located well after the gain-bandwidth (GBW) product for its contribution not to reduce the stability margins.From (2) and (3) the GBW is expressed by As for C L ≪ g m2 g m3 g m4 R 1 R 2 R 3 C C , the above expression is simplified to the maximum g m1 /C C , reducing gradually to g m1 g m2 g m3 g m4 R 1 R 2 R 3 /C L under the heavy capacitive loading conditions.Depending on the size of C C , the classical g m1 /C C relation thus holds only in the lower range of C L (say up to a few nanofarad).This means that the role of C C is substituted by C L when the amplifier should handle a very large load capacitors load.Endorsed by measurement results, scaling of the GBW via C L can be exploited to optimize the stability conditions and to widen the upper C L limit up to infinity.The movement of the poles/zeros is sketched in Fig. 3 by enlarging C L , indicating that increasing the CL converts to real poles the original complex and conjugate poles and pushes the first pole toward the origin.
The transconductances of M 6 and M 8 were assumed to be identical in the foregoing analysis.While perfect matching can never be met using different types of M 6 and M 8 in Fig. 1, we anticipate that small mismatches between their g mC trivially alter the positioning of the poles and zeros [2].
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

C. Large-Signal Analysis
The large-signal step response is generally described by slew rate (SR) instead of the small-signal GBW, GM, and PM parameters.Referred to as the maximum rate of deviation, the SR is defined by the available current that can charge or discharge the parasitic and load capacitors connected to the gain stage outputs.By denoting I A1 ≈ I C , I A2 ≈ I 2 , I A3 ≈ I 3 , and I A4 ≈ I 4 as the upper limit of the currents that can be fed to the loading of the first stage 1, further by assuming that the parasitics are negligible when compared to the load and compensation capacitors, the SR will be the minimum among the following contributors: The above relation is nominally restricted by SR 1 and SR 4 for small and heavy CLs, respectively.As for M 16 in Fig. 1, it manages to improve the SR by forming an output push-pull stage capable of boosting the output current I A4 under the undesired output transitions.The use of SR boosters can help drive ultralarge CLs more effectively [30].
The above SR analysis models the average trend of the realistic SR only.Completely symmetrical positive/negative SR (SR+/SR−) never happens in practice, as different type of devices with potentially unequal gate inputs switch on/off and affect the charging/discharging rates of the capacitive loading in different stages.

A. Analysis of Stability
Let us apply the key expressions derived in Section II to characterize PM and GM in terms of C L .Using the GM definition as the starting point, it can be derived from (1) as where PX is the phase crossover frequency derived as follows: A solution for the above relationship is For the usual R D1 , R D2 , and g mC , provided that the former will be reduced to P X ≈ ω 0 , so the GM can be written as As deduced from ( 14), the stability will not be compromised by GM at large C L values, for the trend is rising when C L approaches infinity lim with refer to PM definition, it can be formulated as Substituting from (4) to (7) and after some algebra, the above relation can be expanded to (17), shown at the bottom of the next page.
Tending C L to infinity, we get lim In view of (18), it becomes evident that proper sizing of C D1 , C D2 , R D1 , and R D2 leads up to adequately high PM in very heavy CLs.In this sense, while zeros are nearly unaffected by the load, increasing C L influences the GBW and Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
asymptotically the GBW decrease compensates the reduction in the high-frequency poles, which finally become real and separate (one pole remains asymptotically constant while the other falls with C L at the same rate as GBW).The lower limit of C L would not be limited by PM for a carefully sized compensation network, since tending C L to zero ends up to lim Decreasing C L raises ω 0 as is apparent from (4), yielding a higher PM when its effect outperforms the GBW increase in (17).Instead of the PM, the GM dictates the stability conditions under the light CLs, as reducing C L is accompanied by compromised stability in the internal Miller loop which causes a peaking in the frequency response [35].Minimum GM usually happens at maximum Q factor (Q max ) in the minimum load capacitor (C Lmin ), which would be equivalent to Q max = ω 0 /GBW if the contribution of the zeros in (14) was neglected.Regarding this scenario, a maximum Q max may be set in (5) to evaluate C Lmin as The minimum C L is dependent on R D1 and R D2 rather than the output resistors and can be lowered either by increasing g mC of the cascode devices or by reducing R D1 and R D2 , the former being at the cost of increased power consumption while the latter comes at the price of compromised PM at higher C L due to a large z 1 frequency (6).

B. Design Guidelines
The transfer function along with the derived equations for PM and GM were used to develop an iterative design procedure based on targeted bandwidth and stability margins over a prescribed C L range.For this purpose, the parasitics were extracted and updated with the help of computer simulation at the beginning of each sizing sequence.Starting from the transconductance values, a primitive g m1 value needs to be calculated following the design specifications that take into account the input-referred noise, matching, and, most importantly, the g m /I D ratio based on the power, area, and speed envelopes [36].A maximum GBW, i.e., GBW max = g m1 /C C can be evaluated afterward through a maximum GBW limited by C L ,min when the feedback factor is set to unity, since the final PM and GM would be large enough to support such approximation.With the GBW max acquired, an initial C C can be achieved from g m1 .The transconductor g mC can be subsequently achieved from g m1 , by observing that critical stability margins happen at C L ,min , so careful placement of poles and zeros is a prerequisite here.Pertinently, the minimum GM can be estimated from (14) as where ω 0,max is the maximum pole-pair frequency measured from (4) for C L ,min .The g mC /g m1 ratio must be selected carefully for GM min not to become negative in any processvoltage-temperature (PVT) corners.Setting g mC = αg m1 and picking, e.g., α = 4 gives 18 dB by the first term, a fairly sufficient margin to counteract the adverse contributions of the second and the third terms caused by the zeros.Sizing g m2 , g m3 , and g m4 is the next step and should rely on the designated operating regions and the relevant power/area trade-offs.The resistors R D1 and R D2 are then sized based on the proper location of the zeros.Placing the first zero right after the maximum GBW frequency, i.e., γ × GBW max which happens at C L ,min , we get The second RHP zero, z 2 , should be positioned well beyond GBW max simultaneously.Choosing |z 2 | > 10 GBW max would be a safe margin for the phase contribution of z 2 not to disturb the frequency response, consequently The sizing of C D1 and C D2 is relied on R D1 C D1 + R D2 C D2 , the term that contributes directly to the absolute PM and its limits derived in (18) and (19).Larger C D1 and C D2 improve the PM but also add to the silicon footprint and vice-versa.After setting C D1 and C D2 , the next step is to modify the initial g m1 and the consequent C C and g mC .To estimate the required C C , we set ω 0,max equal to β • GBW max and choose β between Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
2 and 3 to allow for sufficient GM and PM.Combining ( 8) with (4), we get Substituting g mC = αg m1 from the former assumptions, C C will befound by Combining ( 22) and ( 24), the coefficients α, β, and γ need to be adjusted such that a sufficiently high GM min is resulted from the following equation: The above procedure should be reiterated and enriched by simulations as the initial assessments of the parasitics and some design parameters are not accurate.Simulation of the zeros and poles under the component and process variations would be also necessary in the final phase.A prototype of the proposed OTA was realized by taking into account the described design rules, using an algorithm that optimizes the device sizes for lesser power consumption and area.Measurement and simulation results are discussed in Section IV.

A. Simulation Results
The operation of the proposed amplifier was analyzed through the simulation results in a 65-nm standard CMOS process under a 1.2-V voltage supply.Optimizations were conducted according to the design guidelines in Section III to reach a stable operation with maximum bandwidth over a 5-100 nF C L range.The final configuration consumes 140 µA current in a total area of 0.0086 mm 2 .Table I outlines the transistor sizes and the small-signal parameters.Notably, the value of g mC was set to 4g m1 in line wirh the design guidelines described in the previous section.Table II presents the operation details for C L = 5 nF.The GBW product was adjusted to 5 MHz in this case, while the minimum GM is derived as 9 dB thanks to α = 4 as discussed earlier.
Fig. 4(a) shows the loop-gain frequency responses at various load capacitors.The main pole depends on the Miller capacitors for small C L values, becoming a function of the load capacitor for heavy C L s. Fig. 4(b) presents the step response of the amplifier in buffer configuration to the falling and rising edges of a 400-mV input.The supply current drawn by the output stage lies in the almost zero to some mA range depending on the step size and C L , while the bias current of the output stage indicates a tolerance of ±12% and ±8% across the 0.2-0.6V output voltage and 1.2−1.5 V supply voltage ranges, respectively.
The OTA is found to be stable with adequately high stability margins over the designative C L range from 5 nF to infinity, becoming gradually unstable for lighter CL due to insufficient GM.The stability is typically limited by GM rather than PM in lighter C L s.Nevertheless, it is possible to configure the compensation network for the proposed OTA such that it drives

C. Measurement Results and Comparisons
The proposed OTA with the chip micrograph shown in Fig. 6 was designed in a standard 65-nm CMOS.The exper-    8).The measured UGF turns out be less than about 21% as compared to their nominal values, which is surely attributed to the internal parasitics as well as the layout-dependent effects.For instance, metal-oxide-metal (MOM) and high sheet-resistance capacitors and resistors were used in the compensation network, respectively.Such components, however, suffer from high process excursions, especially for the MOM case, being about 25%-30% of variations, but have been exploited for more compactness and lower equivalent series resistance (ESR).
Fig. 8 exhibits the step response of the unity-gain OTA subject to an input step size of 400 mV.As for C L = 4.7 nF, the mean(µ) positive and negative 1% settling times are 1.79 and 1.74 µs with a standard (σ ) of 0.31 and 0.35 µs, respectively.The positive/negative SR+/SR− were measured as 0.65/0.53and 0.05/0.03V/µs for 4.7 and 100 nF CLs, respectively.Altogether, the step responses demonstrate very close agreement with the simulation results.A longer settling time was, however, appreciated owing to the aforementioned internal parasitics and layout dependent effect as well as the loading of the bond wires and the experiment setup.
The figures of merit, IFOM S = GBW × C L ,max /I D D and IFOM L = SR × C L ,max /I DD , were employed to quantify the small-signal and large-signal characteristics in the upper limit of C L , whereas IFOM SA = IFOM S /Area and IFOM LA = IFOM L /Area were added to include the active area [2].Tables V and VI compare the performance metrics of the proposed amplifier with some of the state-of-the-art three-and four-stage OTAs based on the above FOMs.The minimum among SR+/SR− is reported for each C L and is used later to find the large-signal FOMs.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The proposed four-stage OTA does not outperform all the other solutions among the three-stage OTAs driving a wide C L range, which is mainly due to the increased number of gain stages and a more complicated compensation network.
However, among the fabricated and tested four-stage amplifiers, it outperforms all the topologies presented in Table VI when taking into consideration the active area, consuming current, range of stability, as well as the large-and smallsignal operations.The proposed OTA can drive the widest C L range up to 100 nF for analogous current consumption and compensation capacitor sizes.

V. CONCLUSION
The principle of hybrid-cascode frequency compensation was applied to a four-stage feedback amplifier, thus leading up to improved performance metrics with respect to the prior art.Other than the local Miller compensations used for the stability of inner gain stages, two Miller capacitors with current buffers are applied according to the idea.A reliable operation and a wide range of the CLs spanning from 4.7-nF up to infinity were achieved by shaping the frequency response such that stability is ensured initially by the Miller and, gradually, by the load capacitor in lower and higher CLs, respectively.The proposed amplifier was fabricated in a 65-nm CMOS, consuming 168 µW power while occupying an active area of 0.0086 mm 2 .Experimental results indicate a gain factor higher than 100 dB, and a PM of at least 72 • for the CLs beyond 4.7-nF.

Fig. 2
Fig.2depicts the amplifier diagram of the proposed fourstage OTA, where identical symbols as in Fig.1are utilized to represent similar elements.It contains a front-end differential stage followed by inverting second, third, and fourth stages,

Fig. 3 .
Fig. 3. Pole-zero map of the proposed OTA against C L .

Fig. 5 .
Fig. 5. Performance comparison in different process corners for C L = 5 nF.

TABLE III BEST
AND WORST PARAMETERS OVER CORNERS

TABLE IV SUMMARY
OF MONTE-CARLO ANALYSIS (1000 RUNS @ C L = 5 nF)

TABLE V COMPARISON
WITH THREE-STAGE OTAS

TABLE VI COMPARISON
WITH FOUR-STAGE OTAS