Analog RF Circuit Sizing by a Cascade of Shallow Neural Networks

A deep neural network architecture for the automatic sizing of analog circuit components is proposed, with a focus on radio frequency (RF) applications in the 2 to 5-GHz region. It addresses the challenges of the typically small number of examples for network training and the existence of multiple solutions, of which impractical values for integrated circuit implementation. We address these issues by restricting the learning to one component size at a time, thanks to a cascade of dedicated shallow neural networks (SNNs), where each network constrains the prediction of the next ones. Moreover, the SNNs are individually tuned by a genetic algorithm for the prediction order and accuracy. This reduction of the solution space at each step allows the use of small training sets, and the introduced constraints between SNNs handle component interdependencies. The method is successfully validated on three different types of RF microcircuits: 1) a low-noise amplifier (LNA); 2) a voltage-controlled oscillator (VCO); and 3) a mixer, using 180 and 130-nm CMOS implementations. All the predictions were within 5% of the true values, both at the component and performance levels, and all the responses were obtained in less than 5 s, after 4 to 47 min. Training on a regular PC station. The obtained results show that the proposed method is fast and applicable to arbitrary analog circuit topologies, with no need to retrain the developed neural network for each new set of desired circuit performances.


Analog RF Circuit Sizing by a Cascade of Shallow Neural Networks
Philippe-Olivier Beaulieu, Étienne Dumesnil, Frederic Nabki , Member, IEEE, and Mounir Boukadoum , Life Senior Member, IEEE Abstract-A deep neural network architecture for the automatic sizing of analog circuit components is proposed, with a focus on radio frequency (RF) applications in the 2 to 5-GHz region.It addresses the challenges of the typically small number of examples for network training and the existence of multiple solutions, of which impractical values for integrated circuit implementation.We address these issues by restricting the learning to one component size at a time, thanks to a cascade of dedicated shallow neural networks (SNNs), where each network constrains the prediction of the next ones.Moreover, the SNNs are individually tuned by a genetic algorithm for the prediction order and accuracy.This reduction of the solution space at each step allows the use of small training sets, and the introduced constraints between SNNs handle component interdependencies.The method is successfully validated on three different types of RF microcircuits: 1) a low-noise amplifier (LNA); 2) a voltage-controlled oscillator (VCO); and 3) a mixer, using 180 and 130-nm CMOS implementations.All the predictions were within 5% of the true values, both at the component and performance levels, and all the responses were obtained in less than 5 s, after 4 to 47 min.Training on a regular PC station.The obtained results show that the proposed method is fast and applicable to arbitrary analog circuit topologies, with no need to retrain the developed neural network for each new set of desired circuit performances.

I. INTRODUCTION
C URRENTLY, the typical design flow of analog cir- cuits still requires substantial human intervention [1], since designer experience must compensate for properties and implementation factors not accounted for by the existing design tools, including nonlinear components, bias requirements, and real-world effects like stray impedances, physical circuit layout, and component coupling.This usually results in a time-consuming iterative design process that must be Philippe-Olivier Beaulieu and Frederic Nabki are with the EE Department, École de Technologie Supérieure (ÉTS), Montreal, QC H3C 1K3, Canada.
Étienne Dumesnil was with the University of Quebec at Montreal (UQAM), Montreal, QC H2Y 3Y7, Canada.She is now with the Solid State of Mind, Inc., Montreal, QC H3C 1G7, Canada.
Digital Object Identifier 10.1109/TCAD.2023.3282570repeated for each new design, and the situation is likely to worsen with the increasing complexity of electronic circuits and systems, and the shortening of their useful life spans.Many works have proposed circuit design workflows for electronic design automation (EDA), particularly for digital integrated circuits and systems.However, the design and synthesis/sizing of analog circuits, particularly for radiofrequency (RF) applications, are still challenging due to the aforementioned limitations.
Circuit design is usually more complicated than analysis.In the former, the tools uniquely determine a circuit's behavior from its topology and component values, but circuit design addresses the reverse problem: find a feasible circuit topology and component values to match given performances.Formally, two problems are reversible with respect to each other if the formulation of one involves all or part of the solution of the other [2].Then, the better-understood problem is said to be direct, while the other one is said to be inverse.For, analog circuits, analyzing circuit behavior given a topology and component values is a direct problem, while synthesizing the circuit (i.e., finding the topology and component values) for a given behavior is an inverse problem, and it usually affords multiple solutions, not all feasible in practice.This includes outsized geometries or component values.
Automating the solution of inverse problems has been a focus of research in many fields since they arise whenever a physical system is to be inferred from property measurements [3], and their well posedness must be established for a straightforward solution to exist.Formally, a problem must meet three criteria to be well posed [4]: 1) it can be solved; 2) the solution is unique; and 3) the solution is continuous with respect to data and parameter changes.A problem that does not respect those criteria is ill-posed, and this is often the case for inverse problems as they usually afford multiple solutions, and they may also violate the third criterion, as small data or parameter changes in them may lead to wide variations in the output values or solution accuracy.
The inverse problem of circuit design can be simplified if the circuit topology is given, since the design scope then reduces to estimating the component sizes.However, it is difficult to model RF circuits by regular linear equations with lumped parameters at GHz frequencies, since electric and magnetic fields must be accounted for, along with dissipative losses [5], leading to complex wave equations, and secondary effects, such as those of stray capacitances and layout effects arise.In this context, the typical approach follows a lengthy iterative process of simulation and analysis, followed by sizing adjustments, and an efficient automatic sizing solver can be of great assistance.There also exist commercial products, such as Neolinear [6], Solido [7], and MunEDA [8], that have been developed to help, but their lack of genericness across circuits and technologies, and the need to reconfigure them for each new design have been obstacles to a wide adoption.In practice, the choice of topology and the sizing of components are application specific, making it harder for one algorithm to perform well in all situations.In addition, the set up and configuration costs are important since the designers must use different tools and design environments.Finally, because of the high number of circuit topologies, technologies, and performance metrics, there are no specific benchmarks to evaluate and compare the available EDA algorithms [9].Analog RF circuit design is typically a mix of methods and experience where, after selecting a circuit topology, a component sizing process takes place, followed by drawing the circuit layout and extracting the parasitic components and secondary effects to fine-tune it.In this article, a neural network-based methodology is proposed to speed up the initial sizing step by learning from a relatively small set of solved examples.Then, given an RF circuit and a set of desired performances, it automatically sizes the circuit components before the tuning step, to within 5% of both the required values as determined by the usual simulations and the desired performances.The approach is fast and generic for any circuit topology or performance specifications, and our validation results show that it predicts the sought component values within the set accuracy and performance thresholds, without being affected by the implementation geometry.
The proposed approach consists in a cascade of progressively built shallow neural networks (C-SNN), where each shallow neural network (SNN) is individually specified by a genetic algorithm (GA) to predict one component size.Moreover, each SNN output constrains the subsequent SNNs for mutual compatibility.This one-by-one approach simplifies the search space to allow for smaller training sets while forcing the generated component sizes to be compatible.
We define an SNN as one that includes two hidden layers at most.Here, it consists of a multilayer perceptron (MLP) whose hyperparameters are tuned by a GA.Then, each trained MLP adds its output to the desired performances to constrain the learning of the next MLPs in the cascade while compensating for potential coupling with the already predicted values.The prediction order is also determined by the GA.
Three specific analog RF circuits are used for validation, with one of them implemented in 180-nm CMOS technology and the other two in 130-nm CMOS technology, but the proposed methodology is circuit and technology agnostic, and other choices could have been made by simply using the appropriate set of completed designed as training examples.The three circuits were selected mainly to represent basic building block functionalities in RF circuits.As the proposed tool only accomplishes the initial sizing step of the design cycle as implemented, the used training data do not account for the effects of parasitic components or electromagnetic (EM) interference.This will be discussed further in Section VI.
The balance of this article is as follows.Section II reviews the related work on component sizing; Section III presents the C-SNN architecture, along with the GA used to optimize its hyperparameters; Section IV describes the RF microcircuits used for validating the proposed method, with Section V providing the obtained results; and finally, Section VI offers a discussion and concluding remarks.

II. RELATED WORK
Initially, two major approaches could be identified in relation to analog design automation, knowledge-based, and metaheuristic-based.Today, artificial neural networks (ANNs) constitute a promising third alternative.Below is a brief survey of previous work devoted to the analog circuit sizing problem, with a justification of the present work at the end.

A. Knowledge-Based Methods
These techniques are among the oldest and strive to imitate the behavior of expert designers [10], like plans to set the values of the design parameters in steps [11].The approach is effective at low frequencies and for relatively small circuits such as operational amplifiers [12].However, larger circuits rapidly increase in complexity, with nonlinear relations and couplings to account for between the components and the circuit behavior, making it difficult to build an efficient design plan [11], [13].FEATS [14] is a method that uses abstract building blocks to create and evaluate topologies based on known circuits.This methodology is flexible in terms of circuit classes, technology nodes, and performance measurements, but generates many meaningless interim structures.A similar open-source design methodology based on the functional block is FUBOCO [15].A library for different functional stages (bias, load, differential pair, etc.) is used in conjunction with composition rules to create and evaluate the topologies.The methodology is more complicated than FEATS but reduces the search space for the desired circuit topology.In all cases, a good knowledge of the design rules is required to use the previous methodologies.Moreover, as knowledge-based methods are essentially expert systems, they suffer from the fundamental difficulty to extract human expert knowledge (i.e., rendering explicit an essentially implicit procedural knowledge) [16].This has led many researchers to look for methods that can autonomously learn to find solutions.
Case-based reasoning (CBR) [17], [18] attempts to circumvent the problem by shifting the focus on the expert thinking's outcome instead of the thinking itself.CBR can be summarized as the search and retrieval of a "close" solved design and its revision for adaptation to the current case.If the retrieved circuit behavior is the same as the required one, its component values are just reused; if not, knowledge-based methods are used to revise the component parameters [19].However, this requires many stored designs to be efficient [12], which makes its usefulness limited for circuit design, synthesis, or sizing, as only a few examples are usually available.

B. Metaheuristics
Metaheuristic-based methods view the sizing problem as multiobjective constrained optimization [13], [20], [21], and they typically find the solution by using a search algorithm.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
1) Simulated Annealing: Simulated annealing [22] explores the solution space with increasingly finer granularity until a final solution is reached (e.g., [23] and [24]).The approach has two important limitations in the context of circuit sizing.First, the circuit of interest must be simulated each time for error evaluation, making the algorithm time consuming for complex problems.Second and more importantly, the final sizing solution is specific to the analyzed circuit.Indeed, the method does not learn "how to" size, but only finds the idiosyncratic solution for a single set of required circuit behavior.Thus, the optimization algorithm must be restarted for each new desired behavior.This can be a serious limitation in terms of sizing time.For example, the simulated annealing method proposed in [19] needed between 1 and 3 h on a 2.3-GHz CPU to complete the sizing of each low-noise amplifier (LNA) performance criteria it was given.The one proposed in [23] required approximately 25 s to size a new operational amplifier and 1 h to size a new voltage-controlled oscillator (VCO) on a 2.4-GHz CPU.
2) Genetic Algorithm: GA search is similar to simulated annealing, in that a competitive process is used to find a solution (e.g., [25] and [26]).However, instead of two competing neighbors at each iteration, a whole population of potential solutions is involved, each one using a "chromosome" metaphor.In the context of circuit sizing, the chromosomes typically define the circuit component values and their population goes through a series of selection-reproduction-mutation cycles until convergence toward a solution [27].Like simulated annealing, the returned component values apply to a specific set of performance criteria, and the process can be time consuming due to repetitive circuit simulation.For example, the GA implemented in [28] took approximately 54 s on a 2.8-GHz CPU to complete the sizing of an operational amplifier, while one implemented in [25] took consistently more than 30 min to size an LNA.In both cases, the process had to be repeated for any new set of performance criteria presented.Generic algorithms are also used to optimize other approaches [29].
A variant of GA, differential evolution, uses real-valued vectors as genotypes and the difference between vectors in breeding new generations.In [30], it is combined with surrogate models of EM simulations for an end-to-end design.However, the approach must be restarted for each circuit instance and its time consumption for the three provided examples varied between 42 and 106 h on an average desktop station.
3) Genetic Programming: Genetic programming (GP) attempts to solve the synthesis and sizing problems using a dynamically bred program [31], [32].It has been shown to perform well not only to find an adequate set of values for components but also to find an adequate topology, including selecting the actual components and interconnections of the circuit.As proposed in [33], a tree is used to represent the analog microelectronic circuit.First, an embryonic circuit is generated, from which the final circuit is evolved.The evolution of the circuit reflects the evolution of the functions that constitute the branches of the circuit-constructing program tree (see also [34] for a different coding scheme).The main advantage of GP over GA and simulated annealing is its efficiency at synthesizing the whole circuit instead of being limited to circuit sizing.Many variations of the GP algorithm have also been proposed for circuit sizing (e.g., [34]), reaching efficient sizing optimizations.However, in the context of the present work, they also have the limitations of the preceding metaheuristics.
In all the previous approaches, a design-in-the-loop approach is used, in which the circuit of interest is built or simulated for testing during the algorithm iteration process.The approach can result in very long convergence times [30], [35], even when approximation techniques, such as surrogate circuits, are used for simulation, and final tuning is done by an expert [36].
The preceding sections are just examples of the various techniques used to tackle the optimization problem with metaheuristics, and many other approaches have been reported in the literature.For example, [37] used Bayesian optimization for the design of RF power amplifiers, and [38] used Bayesian model fusion to reuse early stage data when fitting a late-stage performance model of two circuits mixed signal circuits.

C. Artificial Neural Networks
Neural networks differ from knowledge-based techniques and metaheuristics by viewing the sizing problem as one of classification/regression. Using a set of successful circuit design examples, they use their generalization ability to specify the component values of similar new circuits.This capability makes them more generic than the previous methods which must be started over for each new design, even when the same circuit topology is used.As the name implies, the approach relies on ANNs, especially those with many layers, known as deep neural networks or DNNs [39].
ANNs have been used with relative success in modeling and designing microwave passive circuits [40], checking the designed circuit's conformity [41], and designing simple circuits [42].Using more layers, DNNs have also been successfully used to solve inverse problems (e.g., [43]), and contextual problems such as language translation [44].
In the context of circuit sizing, DNNs present the potential to overcome the main limits of the methods described in the previous sections.First, they do not have the explicit knowledge acquisition problem of knowledge-based approaches, as they learn autonomously from the example data.Next, the training data come from already sized circuits that are used as examples, and the optimized solution corresponds to a generic mapping of performance criteria to circuit sizing, instead of being idiosyncratic.Hence, once trained, they do not need to restart for each new set of performance criteria.Therefore, the time invested in sizing a set of circuits for training a DNN is largely repaid by reuse in the long run.
However, the number of already sized circuits to train a DNN for efficient prediction is a problem.Because the number of parameters (i.e., connection weights at the input of each neuron) increases as a power law with the number of neural layers, a huge number of examples is usually required for DNN training.Unfortunately, multiple reasons make this difficult, if not impossible, to achieve for analog circuit sizing.First, it takes time to synthesize each training example and there are no publicly available datasets; second, the technology for analog circuit design and implementation is not static, making it necessary to adapt the training sets to each new technological advance.
The lack of availability of large datasets of successfully sized microelectronic analog circuits and the search for a component sizing methodology that is generic in scope are the main motivations for the work presented here.Indeed, when looking at the literature, even in the few cases where researchers gathered tens of thousands of examples to train their DNNs, their models address low to mid-frequency analog circuits, operate within fixed design parameters or use data augmentation techniques with lower-performance specifications to train the networks (e.g., [45], [46], and [47]).An exception is the work in [48] whose two-model approach is somewhat like ours in that it tries to shrink the solution space before the final classification.But the work is mainly a proof of concept as presented and our approach is simpler.As will be presented next, our proposed method uses simple means to greatly reduce the required number of example data to train a neural network for circuit sizing, and it applies to arbitrary performances given a circuit topology.

III. METHODOLOGY
As argued above, DNNs are an attractive method for circuit sizing, but the relatively small size of the available training sets prevents their efficient training due to the large number of parameters to set.The proposed approach circumvents the problem by successively predicting the outputs one at a time: instead of a static DNN architecture to predict all the component sizes at once, the architecture is generated in steps, using a DNN made of a cascade of SNNs (C-SNNs).Two variants are presented: a fixed cascade with one SNN per size to predict (FC-SNN), and a dynamic cascade where more than one SNN contributes (DC-SNN).They are described next, along with the GA to optimize the hyperparameters of each SNN, and the validation method used in this work.

A. Fixed Cascade of Shallow Neural Networks
This is the base version of C-SNN, with each component size predicted without concern for the next sizes to predict or their couplings.Therefore, fixed C-SNNs (FC-SNNs) comprises as many SNNs as there are component sizes to predict as shown in Fig. 1.
Each SNN in the figure has one or two hidden layers, with each input layer neuron holding one input value, and each hidden layer neuron j producing an output given by where s j is the weighted sum of the N inputs from the previous layer, x i is the output of the ith neuron in that layer, ω ji is a weight to be determined, and f θ is the neural output function.
In this work, that function was either the hyperbolic tangent sigmoid or the rectified linear unit (ReLU) The type of output function and number of hidden neurons are set by the GA, and the same output function applies to all the hidden neurons in a given MLP.
In the output layer, the neural output is the weighted sum of its inputs, given by z = N j=1 ω kj y j (4) where y j the output of the jth hidden layer neuron from the previous layer and ω kj another weight to be determined by the learning process.The neural weights are optimized using the error backpropagation with the gradient descent algorithm, which minimizes the network's output error by propagating it back through the hidden layers and adjusting the different connection weights for minimal contribution to the error [50].The gradient descent algorithm used the Adam optimizer [51] with regularization for weight setting, and the relevant neural hyperpaprameters set by the aforementioned GA.After tuning and subsequent training, each MLP in Fig. 1 adds its output to the specified performances to constrain the learning of the next MLP to find a new component size while compensating for potential coupling with the already predicted values.
The FC-SNN can be seen as a hybrid architecture that combines a GA and an error backpropagation with a gradient descent algorithm, working together to optimize the specification of each MLP stage.At the top level, the GA searches the space of MLP hyperparameters for the optimal values of the number of layers, the number of neurons per layer, the neural output functions, and the training algorithm's parameters.At the bottom level, the backpropagation algorithm with gradient Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
descent searches the space of MLP weights to optimize the prediction of the selected target component size.
FC-SNN operates as follows: MLP 1 takes as inputs the performances of the desired circuit and the GA tunes it to predict a first component size.Then, the process repeats to optimize MLP 2 for the second component size to predict.MLP 2 takes both the performance criteria and the output of MLP 1 as inputs.In the next step, the performance criteria and the outputs of MLP 1 and MLP 2 form the input of MLP 3 in the FC-SNN sequence, and the process continues with each remaining MLP until all the component sizes have been correctly predicted.
One important issue is the component size prediction order.This is accomplished as follows: initially, several chromosome populations are randomly generated, one for each component size to predict.Then the GA starts with the first population to tune MLP 1 to predict each component size in turn.The GA is iterated until the 5% prediction accuracy threshold is reached and the best predicted component size is selected as the first one to predict.Then, the process repeats for the remaining component sizes using the second chromosome population and MLP 2 , and the winner is selected as the second component size to predict.The same procedure is repeated for each of the remaining MLPs until no component size to predict is left, leading to an ordered prediction sequence.Section III-C4 summarizes the algorithm using pseudo-code.

B. Dynamic Cascade of Shallow Neural Networks
Because of its sequential prediction, the FC-SNN method may lead to a deadlock when two component sizes are interdependent since each one must be known to determine the other.To overcome that problem, a new version of C-SNN is proposed, called the dynamic C-SNNs (DC-SNN).It applies when one or more SNN in FC-SNN has a prediction error higher than the set threshold (5% component and performance tolerances in this work).
In DC-SNN, more than one SNN may contribute to sizing the same component, with the added SNNs accounting for potential component cross-couplings.The rationale behind this is that, while the SNN that predicts the first component size can only rely on the specified performance criteria for the target circuit as inputs, the same SNN moved 1 position forward in the cascade can count on both the performance criteria and the l−1 size predictions obtained up to that point, hence potentially producing a lower-prediction error.
The DC-SNN method starts by generating a cascade of M MLPs like FC-SNN, with MLP i predicting the ith component size d i .Then, a new MLP, MLP i+M , tries to learn d i again using all the performance criteria and predicted values as inputs.If MLP i+M predicts d i better than MLP i , it is added to the cascade as the predictor of d i ; otherwise, it is ignored.The DC-SNN approach is illustrated in Fig. 2 for M = 3.

C. Genetic Algorithm for MLP Configuration
As mentioned, a GA is used to optimize the hyperparameters of each MLP in the cascade.Each GA starts by generating a population of chromosomes that encode the MLP hyperparameters as a binary vector where the successive bit fields represent different hyperparameters (see Table I).In this work, a population size of 64 randomly initialized chromosomes of 19-bit size is used.Then, the GA's three-step process of selection, reproduction, and mutation is repeated until reaching a stopping criterion.
1) Selection Phase: This phase selects the chromosome pairs for reproduction.The deterministic tournament method is used as it has been shown to be more efficient and less time consuming than other selection methods, such as ranking and roulette [52], and it has been suggested that its deterministic ranking has the potential to outperform probabilistic selection [53].Moreover, deterministic tournament lends itself more easily to hardware implementation in a field-programmable gate array (FPGA), which is a future objective of this work.
In the deterministic tournament, the future parent chromosomes are selected through a series of pairwise matchups, using the mean-squared error between the target and predicted outputs of the associated MLP as a fitness function.For each matchup, the chromosome with the best fitness value proceeds to the next stage [54].The chromosome population is arbitrarily ranked for fitness at the beginning.
2) Reproduction Phase: At the end of the selection phase, 32 of the 64 chromosomes have been deleted and the remaining 32 are reranked.Then, to replenish the population, 32 new chromosomes are generated from the remaining ones used as parents, with each pair breeding two children.Each child is created by using two randomly selected crossover points as is commonly done.The chromosome corresponding to the first child is identical to the chromosome of its first parent before the first crossover point and after the second crossover point Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.and is identical to the chromosome of its second parent in between.The inverse is true for the second child of the same pair of parents.
3) Mutation Phase: The mutation phase inverts each chromosome bit of the population with probability 1/nb, where nb is the size of the chromosome in bits.This probability was suggested in [53], and it was tested here with various other values in pilot experiments; none consistently provided better results.
4) Final DC-SNN Architecture: The pseudo algorithm of DC-SNN generation is in Algorithm 1.

IV. VALIDATION
The performances of FC-SNN and DC-SNN were tested with three different analog RF microcircuits: 1) an LNA; 2) a VCO; and 3) a mixer.For each circuit, 200 designs were completed with Cadence Virtuoso Spectre RF to serve as training examples, using randomly chosen, wide-ranging performance specifications.Moreover, to get a perspective on the obtained results, they were compared to those produced by a set of independent MLPs (I-MLPs), each one only predicting one component size from the performance specifications.This makes it possible to evaluate the impact of increasing the communication between MLPs on the sizing performance, starting with an architecture with no communication (I-MLP), to one with only successful MLPs communicating (FC-SNN), to one with all MLPs communicating (DC-SNN).The hyperparameters of the different models were all optimized by GA as described in Methodology.
1) Low-Noise Amplifier: The circuit topology of the LNA is shown in Fig. 3.It is the same as in [55] and consists of a cascode common-source stage with source degeneration and inductive load.The cascode configuration was selected for its design simplicity (easier matching, stability, etc.) and widespread use.All the inductances, including the matching networks', were assumed to be nonideal with Q-factors of 10, which is consistent with current CMOS fabrication processes.To further simplify the design effort and reduce the number of parameters, only the L-shaped matching network shown in Fig. 3(b) is considered for 50 source and load impedance matching.The source inductance (L S ) was also fixed at 0.578 nH for all designs.
Two hundred LNA microcircuits were designed using 180-nm CMOS technology.Table II shows the performance Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.(P) and design (D) variables of the different designs, with the considered ranges of values.

2) Voltage-Controlled Oscillator:
The second circuit topology is the symmetrical cross-coupled VCO illustrated in Fig. 4. It is a common VCO configuration that allows almost rail-to-rail output swing, with the cross-coupled pMOS-nMOS pair helping to reduce 1/f noise [56].
Two hundred VCO microcircuits were designed in CMOS 130-nm technology.Table III shows the performance and design variables of the different designs, and the considered ranges of values.The current source and inductance were fixed before each design and given as input to the sizing process, and V tune was varied within each design to get the tuning range.
3) Mixer: The circuit topology considered for the mixer is the double-balanced Gilbert cell presented in Fig. 5.It was selected because of its common use, since it provides good conversion gain and rejection at the input ports [56].Moreover,  the version used here includes a resonator that allows, through its parallel inductance L and capacitance C, to adjust the mixer for maximum response at the required frequency [57].
Here, also, 200 mixer microcircuits were designed using CMOS 130-nm technology, with the lengths of all transistors fixed to 130 nm.Table IV shows the performance and design variables of the different designs, including the considered ranges of values.Resistance R was fixed before the mixer sizing, and the size and gate voltage of transistor Q 3 , which was part of a current mirror, were given as inputs in the sizing process.
4) Procedure: The prediction performances of the proposed FC-SNN and DC-SNN architectures, along with those of the I-MLP set of I-MLPs, were compared based on the designed LNA, VCO, and mixer microcircuits.For each set of 200 designs previously described, 180 were used for training, 10 for validation, and 10 for testing, with twofold cross-validation.The models were coded in Python using the Theano [58] and Lasagne [59] libraries.The simulations were run on a laptop computer with an Intel i7-7700HQ CPU clocked at 2.8 GHz, with 8-GB RAM and no dedicated graphics card.The GAs were set to run for a maximum of ten generations.The sizing performance of each trained model for the LNA, VCO and mixer was first assessed with three metrics: 1) the number of correctly predicted component sizes (i.e., with an error below 5% of the normalized target sizes during the test phase); 2) the mean prediction error during the testing phase, which measures the generalization ability of the solutions within the performance ranges in Tables II-IV; and 3) the number of GA generations to reach the testing phase (i.e., get a size prediction error below 5% for all components during the validation phase).However, as 5% component size tolerance does not necessarily lead to 5% performance tolerance in nonlinear circuits, the predicted component sizes by DC-SNN were simulated in Cadence in additional experiments, and the obtained microcircuit performances were compared to the reference ones.

V. RESULTS
Table V summarizes the obtained results, showing the DC-SNN success in predicting all the component sizes of the three different microcircuits with a mean error at test time smaller than 5%.The worst prediction performance what that of I-MLP, showing the superiority of the FC-SNN and DC-SNN over predicting the component sizes independently from each other.
The learning time of DC-SNN was 4 min and 37 s for the LNA, 36 min and 21 s for the VCO, and 47 min and 56 s for the mixer.Once trained, the tool took less than 5 s to size each of the ten test circuits for each of the three types.Hence, the response time was very fast to find the component sizes given a set of performance criteria.
To further confirm the robustness of the DC-SNN algorithm, it was run 20 times for the LNA circuit, leading to a mean prediction error of 3.39% with 0.6% standard deviation, and Tables VI-VIII present examples of the mean absolute error (MAE) of the predicted component sizes and resulting performance of the LNA, VCO, and mixer microcircuits.As expected, the errors on the component sizes do not generate the same magnitude errors on the circuit performances, but the MAE remained under 5% for both the component sizes and circuit performances of the three circuit topologies.
Table IX shows the average MAEs of both the predicted parameters and the simulated performances for the ten predicted designs by the algorithm.These results show that the nonlinear dependencies between the component sizes and the circuit performances are not important enough for the predicted sizes to degrade the desired performance beyond the 5% threshold.Then, using the predicted sizes, sweep analysis can be used for fine-tuning, or hindsight about the circuit behavior to change specific component parameters for the desired performances.
A single deep MLP was also tested to predict all the component sizes of a circuit at once.This network was allowed between 20 and 50 hidden layers by the GA and the vanishing gradient problem that often occurs in deep neural networks was attenuated by using residual connections [60].However, its prediction accuracy was very poor and not worth reporting.This failure may be due to the small training set (200 circuit examples) in comparison to the large number of parameters to learn for such a network.Still, an earlier experiment with a shallow MLP was tested for the LNA with similarly poor results [54], hence emphasizing the importance of an architecture, such as a C-SNN in the context of small training sets.

VI. DISCUSSION AND CONCLUSION
Our results show that the proposed deep neural network made of a cascade of separately trained shallow MLPs can successfully perform an initial sizing of the components of an Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.analog RF circuit design, as it successfully did so for three different types of RF microcircuits within the set error margin for values and performances.More precisely, the FC-SNN variant predicted all the components sizes of an LNA within five percent error tolerance, and all components' sizes of a VCO and a mixer within eight percent error tolerance, and DC-SNN variant predicted the components sizes of all three circuit types within five percent error tolerance.
The developed architecture offers an effective way to circumscribe the neighborhoods of the sought component values Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
in solution space, and the presented work is an efficient first step in a two-step solution to the sizing problem, where the second step fine-tunes the predicted values to account for layout and EM issues.As mentioned in Section I, this second step is still a challenge as no automatic procedure exists for arbitrary circuit topologies and performances.Ongoing research to automate the whole process includes techniques based on surrogate models [30], Bayesian fusion [38], layout migration and reuse [61], layout self-organization [62], reinforcement learning [63], and training our deep neural network using a database of successfully fabricated circuits.However, the current cost of IC fabrication prevented us from creating such a database, given the number of end-to-end designs that must have been completed.
The proposed neural network-based methodology distinguishes itself by the relatively small training set required for training, thus circumventing the difficulty to gather the large number of microelectronic circuit designs normally required to train a deep neural network, a problem that does not exist in other domains that use deep learning, such as image classification or language translation.Here, only two hundred exemplars were included in each dataset to learn the final MLP sequences.This was made possible by predicting the component sizes one by one and using MLPs with only one or two hidden layers, hence reducing the network complexity and quantity of parameters to learn at each step.That allowed a progressive process of constrained predictions to take place, with a relatively small training set used to solve the circuit sizing problem.
Another distinction of the proposed method from the related work is to go beyond finding idiosyncratic component values, as it creates a general mapping from the desired performance criteria to the component sizes without being specific to any set of performance values.Indeed, although the exemplars used for testing the model were not seen by the algorithm during the training or validation phases, there was no need to retrain the network for each one of them.
In this respect, the proposed approach is useful to generate a C-SNN that provides the initial sizing conditions to a more precise idiosyncratic optimization method.This would reduce the convergence time of those methods while preserving their accuracy.
The final component values obtained during the test phase were not fabricated and thus, the corresponding specifications were not tested post fabrication.At the present stage, the goal of the proposed method is mainly to help designers decrease the time needed to reach the final values of their analog RF circuit components for IC implementation, by saving lengthy simulation time for the initial sizing before layout.Moreover, as different designers have different styles, the component sizes yielded by the algorithm reflect the design experience and bias of the designers of the microelectronic circuits used for training the algorithm.In the future, standardized databases might be useful to minimize this potential bias.The tool could also be trained with a more complete database, including the extracted parasitic components and it could be constrained to specific inductances that were designed with EM simulations, however, such a training database is difficult to build as already discussed.
Finally, the obtained results show that the proposed method can be applied to many different types of circuits and technologies with appropriate training.Thus, it generalizes well.Here, it was successful in correctly predicting the component sizes of an LNA, a VCO, and a mixer, where the first was designed using CMOS 180-nm technology, and the other two using CMOS 130-nm technology, but different circuits, topologies, and CMOS technologies could have been used as well and for arbitrary performances, by using the appropriate completed designs as training examples.

Manuscript received 10
October 2022; revised 11 January 2023, 16 March 2023, 4 May 2023, and 12 May 2023; accepted 15 May 2023.Date of publication 2 June 2023; date of current version 22 November 2023.This work was supported in part by the National Science and Engineering Research Council of Canada (NSERC) and in part by the Quebec Strategic Alliance for Microsystems (ReSMiQ).This article was recommended by Associate Editor H. E. Graeb.(Corresponding author: Mounir Boukadoum.)

Fig. 1 .
Fig. 1.Block diagram of the C-SNN [49], where MLP means MLP ANN and M is the number of component sizes to predict.

Fig. 2 .
Fig. 2. Example of MLP selection in a DC-SNN model with three component sizes to predict, where p is a performance criterion and d is a component size to predict: a) case where MLP 4 makes a better prediction of d 1 than MLP 1 , leading to adding MLP 4 to the MLP sequence; and b) case where MLP 4 's prediction is worse, leading to not adding MLP 4 to the MLP sequence.

TABLE I DESCRIPTION
OF CHROMOSOME BITS FOR THE GA

TABLE II CONSIDERED
RANGES OF PERFORMANCE PARAMETERS AND DESIGN VARIABLES FOR THE LNA Fig. 4. VCO topology [56].

TABLE III CONSIDERED
RANGES OF PERFORMANCE PARAMETERS AND DESIGN VARIABLES FOR THE VCO

TABLE IV CONSIDERED
RANGES OF PERFORMANCE PARAMETERS AND DESIGN VARIABLES FOR THE MIXER

TABLE VI EXAMPLE
OF PREDICTION ERRORS AND RESULTING PERFORMANCES FOR THE LNA

TABLE VII EXAMPLE
OF PREDICTION ERRORS AND RESULTING PERFORMANCES FOR THE VCO

TABLE VIII EXAMPLE
OF PREDICTION ERRORS AND RESULTING PERFORMANCES FOR THE MIXER

TABLE IX TEST
SET MAE OF THE SIZING PREDICTIONS AND RESULTING PERFORMANCES FOR THE LNA, VCO, AND MIXER CIRCUITS