Efficient Design Optimization of High-Performance MEMS Based on a Surrogate-Assisted Self-Adaptive Differential Evolution

High-performance microelectromechanical systems (MEMS) are playing a critical role in modern engineering systems. Due to computationally expensive numerical analysis and stringent design specifications nowadays, both the optimization efficiency and quality of design solutions become challenges for available MEMS shape optimization methods. In this paper, a new method, called self-<underline>a</underline>daptive <underline>s</underline>urrogate model-assisted <underline>d</underline>ifferential <underline>e</underline>volution for <underline>M</underline>EMS <underline>o</underline>ptimization (ASDEMO), is presented to address these challenges. The main innovation of ASDEMO is a hybrid differential evolution mutation strategy combination and its self-adaptive adoption mechanism, which are proposed for online surrogate model-assisted MEMS optimization. The performance of ASDEMO is demonstrated by a high-performance electro-thermo-elastic micro-actuator, a high-performance corrugated membrane micro-actuator, and a highly multimodal mathematical benchmark problem. Comparisons with state-of-the-art methods verify the advantages of ASDEMO in terms of efficiency and optimization ability.


I. INTRODUCTION
Microelectromechanical systems (MEMS) with high performance requirements are widely used in modern engineering systems [1]. For example, the performance requirements of contemporary micro-actuators/micro-sensors are becoming increasingly stringent for biomedical devices and wearable robotics in recent years [2], [3]. Despite that the overall performance is also affected by supporting electronics and software, shape optimization of the MEMS is often essential.
The associate editor coordinating the review of this manuscript and approving it for publication was Sunith Bandaru . Present MEMS shape optimization methods can be divided into two categories, which are: (1) local optimization informed by design expertise and (2) evolutionary algorithms (EAs), which aim to perform global optimization. In the first category, design expertise is firstly used to simplify the optimization problem by providing an initial design, narrowing down the search ranges and reducing the number of design variables based on sensitivity, [4]- [6]. This kind of method is playing an important role in modern high-performance MEMS optimization. However, the main drawbacks include a lack of generality and limited optimization ability. The design expertise is often case-specific, leading to an adhoc optimization process. The local optimization methods have limited search ability, and may not be able to meet the stringent design specifications when the initial design is not sufficiently good [7].
Compared with the first category, which is more fitted for design experts, the second category, EA-driven MEMS design optimization [8]- [11], has the advantage of being free of an initial design, generality, and robustness. Particularly, EAs often have reasonably high global optimization capability for nonlinear and multimodal problems, which are suitable for high-performance MEMS optimization. At present, besides the traditional genetic algorithms, differential evolution (DE) [12] and particle swarm optimization (PSO) [13] are arguably the most widely used EAs for MEMS shape optimization [4], [8], [14]. DE and PSO show good performance for various high-performance MEMS, but it is also found that their optimization abilities still need improvement when handling complex design cases. In the computational intelligence field, various improved EAs have been proposed. Only considering DE, there are a few popular improved versions, e.g., [15]- [17]. Among them, a stateof-the-art method is Linear Population Size Reduction -Success-History Based Parameter Adaptation for Differential Evolution (L-SHADE). However, our experiments (Section IV) show that even L-SHADE may not be able to achieve the required optimization quality for the targeted problem.
Another important challenge for EA-driven MEMS shape optimization is efficiency. Computationally expensive numerical simulations (e.g., finite element analysis) are often unavoidable to obtain accurate performance estimation of MEMS devices [18]. In contrast with local optimizationbased methods, which are often efficient when there exists a sufficiently good initial design, standard EAs, and their improved versions often need a large number of simulations to obtain the optimum. The optimization time, therefore, becomes very long or even prohibitive in some cases [10], while nowadays most real-world systems including MEMS often require a reduced time-to-market [19], [20]. Thus, substantial optimization efficiency improvement without compromising performance is highly desirable.
To address this problem, off-line surrogate model-based EAs (SAEAs) are introduced into MEMS shape optimization [21]. An initial random sampling (such as using Design of Experiment [22] methods) is firstly carried out. Using the sampled design variables as the input and the performance via numerical simulations as the output, a black-box surrogate model is constructed to approximate the performance of the MEMS device. The surrogate model is often constructed by statistical learning techniques. For example, Lee et al. [21] use artificial neural networks (ANN) to construct a black-box surrogate model for predicting the performance of new candidate designs. In the optimization process, the computationally cheap surrogate model is used to replace the expensive numerical simulation model to improve computational efficiency. Although efficient for various MEMS, high-performance MEMS requires a very accurate surrogate model to meet the stringent constraints and obtain a highly optimized objective function value. When the number of design variables is more than a few, obtaining a sufficiently accurate surrogate model may need a tremendous number of initial samples, canceling out the saved simulation time [23].
Because the optimal region is unknown beforehand, oneshot sampling wastes a lot of samples (i.e., computationally expensive simulations) in regions far away from the optimal region. Infill sampling techniques have been introduced to address this drawback for simulation-driven design optimization, which show successful results for complex problems [24]. Rather than using a one-shot initial sampling to obtain the surrogate model, the surrogate model is updated iteratively in the optimization process. By sampling the "interested" regions based on analysis of previous samples, computationally expensive simulations are saved. When active learning or infill sampling techniques are combined with EAs, on-line SAEAs are constructed. In online SAEA, a very rough surrogate model is firstly built, and simulations are used in each iteration to explore the design space and improve the surrogate model.
Even though efficient state-of-the-art online SAEAs have been proposed [25]- [27], to the best of our knowledge, these methods are arguably not purpose-built shape optimization methods for high-performance MEMS which need to meet highly stringent specifications. Surrogate model-aware evolutionary search (SMAS) is a typical framework introduced by the authors, which premises on surrogate-assisted metaheuristics using infill sampling techniques and it outperforms several popular online SAEAs [32]. A MEMS design optimization method based on SMAS is also proposed [23]. However, our experiments (Section IV) show that although with high efficiency, the optimization ability is insufficient for the targeted problem. Many recent successful SAEAs mainly focus on handling the challenges in higher-dimensional problems [27] or a very limited computing resources [25], which are also not suitable for the targeted problem.
A new SAEA is proposed in this paper, called self-adaptive surrogate model-assisted differential evolution for MEMS optimization (ASDEMO), which aims to: • Substantially improve the optimization efficiency of standard EAs and some modern variants for highperformance MEMS shape optimization; • Improve the optimization ability of standard EAs and some modern variants for high-performance MEMS shape optimization; • Be general enough to handle MEMS shape optimization with about 10 design variables or more without any initial solutions or ad-hoc process.
The remainder of the paper is organized as follows. Section II introduces the basic techniques. Section III introduces the ASDEMO algorithm, including the main innovations and the parameter settings. Section IV tests ASDEMO using two practical high-performance MEMS and a highly multimodal mathematical benchmark problem. The concluding remarks are presented in Section V. VOLUME 8, 2020

II. BASIC TECHNIQUES A. GAUSSIAN PROCESS MACHINE LEARNING AND PRESCREENING
The selected surrogate modeling method in ASDEMO is Gaussian process (GP). GP modeling is a theoretically sound and principled method for determining a much smaller number of free model parameters when compared to many other surrogate modeling approaches such as ANN [28], [29]. It can also provide an estimate of the model uncertainty for each predicted point, which is shown to have a large advantage in SAEAs [30]- [33]. GP has been widely used for surrogate modeling in the electromagnetic machine design optimization field [34], including MEMS. GP works as follows: To model an unknown function y = f (x), x ∈ R d , GP modeling assumes that f (x) at any point x is a Gaussian random variable N (µ, σ 2 ), where µ and σ are two constants independent of x. For any x, f (x) is a sample of µ + (x), where (x) ∼ N (0, σ 2 ). For any x, x ∈ R d , c(x, x ), the correlation between (x) and (x ), depends on x − x . More precisely, where parameter 1 ≤ p i ≤ 2 is related to the smoothness of f (x) with respect to x i , and parameter θ i > 0 indicates the importance of x i on f (x). More details about GP modeling can be found in [35]. Given K points x 1 , . . . , x K ∈ R d and their f -function values y 1 , . . . , y K , then the hyper parameters µ, σ , θ 1 , . . . , θ d , and p 1 , . . . , p d can be estimated by maximizing the likelihood that f (x) = y i at x = x i (i = 1, . . . , K ) [36]: where C is a K × K matrix whose (i, j)-element is c(x i , x j ), y = (y 1 , . . . , y K ) T and 1 is a K -dimensional column vector of ones.
To maximize (2), the values of µ and σ 2 must be: Substituting (3) and (4) into (2) eliminates the unknown parameters µ and σ from (2). As a result, the likelihood function depends only on θ i and p i for i = 1, . . . , d.
(2) can then be maximized to obtain estimates ofθ i andp i . In this work, the maximization of (2) is carried out by sequential quadratic programming [37]. Although the computational complexity grows cubically with the number of training data points, the computing overhead of GP modeling is often low. The reason is that most MEMS shape optimization problems have around or less than 10 design variables. A large number of training data points is therefore not needed to build a reliable surrogate model. With optimizedθ i andp i , the estimatesμ andσ 2 can then be readily obtained from (3) and (4). Given the hyper parameter estimatesθ i ,p i ,μ andσ 2 , one can predict y = f (x) at any untested point x based on the f -function values y i at x i for i = 1, . . . , K . The best linear unbiased predictor of f (x) is [36], [38]: and its mean squared error is: where r = (c(x, x 1 ), . . . , c(x, x K )) T . N (f (x), s 2 (x)) can be regarded as a predictive distribution for f (x) given the function values y i at x i for i = 1, . . . , K .
Considering the predicted valuef (x) and the prediction uncertainty s 2 (x), several prescreening methods are proposed to predict the quality of a candidate design [31]. Lower confidence bound (LCB) is used in this paper and we consider the minimization of f (x). Given the predictive distribution N (f (x), s 2 (x)) for f (x), a LCB prescreening of f (x) can be defined as [30]: where ω ∈ [0, 3] is a constant. The use of LCB prescreening can conduct explorative global search when using a large ω or conduct fast local search when using a small ω. Sincef (x) is Gaussian distributed, according to the 3σ rule, when ω = 2, the confidence level of f lcb (x) to be the LCB off (x) is about 97%. The comparisons of using different ω to define LCB are detailed in [30], [31]. ω = 2 is used in this paper as suggested in [31].

B. DIFFERENTIAL EVOLUTION
DE algorithm is used as the search engine in the proposed ASDEMO algorithm. DE is an effective and popular global optimization algorithm. It uses a differential operator to create new candidate solutions [12]. Suppose that P is a population. Let x = (x 1 , . . . , x d ) ∈ R d be an individual solution in P. To generate a child solution u = (u 1 , . . . , u d ) for x, a donor vector is first produced by mutation: where x r 1 , x r 2 and x r 3 are three different solutions randomly selected from P. F ∈ (0, 2] is a control parameter, often called the scaling factor [12]. The above mutation strategy is called DE/rand/1, which is one of the most widely used strategies. Besides, there are quite a few different mutation strategies, trading off exploration ability and convergence speed. Then the following crossover operator is applied to produce the child u: 1 Randomly select a variable index j rand ∈ {1, . . . , d}, 80258 VOLUME 8, 2020 2 For each j = 1 to d, generate a uniformly distributed random number rand from (0, 1) and set: where CR ∈ [0, 1] is a constant called the crossover rate. In ASDEMO, three other DE mutation strategies are also adopted. The first one is DE/rand-to-best/2 [39], which is shown in (10).
where x i is the i th candidate solution in P and x best is the best candidate solution in P. x r 1 , x r 2 , x r 3 and x r 4 are mutually exclusive candidate solutions randomly selected from the current population and are different from x i and x best . Another strategy is the DE/rand/2/dir mutation strategy [38], which is shown in (11).
where x r 1 , x r 2 and x r 3 are three different solutions selected from P and x r 1 has the smallest objective function value (considering a minimization problem). Another mutation strategy used in ASDEMO is DE/hybrid trigonometric mutation [40]: in 95% rate, the DE/rand/1 strategy is used; while in 5% rate, (12) is used.

III. THE ASDEMO ALGORITHM A. CHALLENGES AND MAIN IDEAS OF ASDEMO
ASDEMO is an on-line SAEA, for which, the surrogate model keeps improving in the optimization process. Hence, the predicted good designs may be wrong especially at the beginning of the optimization when there are insufficient training data points, which may cause wrong convergence.
To address this problem, some available SAEAs begin with a standard EA for a certain number of iterations (expensive simulations are used for the whole population). After a surrogate model with sufficient quality is obtained, the solution quality and the surrogate model quality are then iteratively improved in the consecutive search [41], [42]. This kind of method often has reasonably good solution quality, but the efficiency is sacrificed. In contrast, some other available SAEAs perform expensive simulations to the "optimal" solutions predicted by the existing surrogate model, despite that its quality may not be good enough [31]. The number of expensive simulations is therefore highly reduced, but the solution quality remains a weakness. Hence, neither of them is suitable for high-performance MEMS shape optimization, requiring both high solution quality and high efficiency. As said in Section I, SMAS framework [32] is a new kind of SAEA and advantages are shown compared to the above two kinds of SAEAs [32]. However, it is found that when the expected performance of the MEMS device is high, both the solution quality and efficiency need improvement. It is worth understanding the landscape characteristics of the targeted problem. Compared to normal MEMS shape optimization, high-performance MEMS optimization introduces stringent constraints and higher expectations of the optimal objective function value. To cope with that, both (very) high exploration and exploitation abilities are essential for the optimization algorithm. The reasons include: (1) Stringent constraints and highly optimal objective function value make the optimal region become (very) narrow and the optimization algorithm must be able to search elaborately in that narrow region (i.e., a high exploitation ability). (2) To get access to the optimal region, high exploration ability must be available to jump out of the local optima in the design landscape. More particularly, when several constraints are imposed on, their extent of difficulty to be satisfied varies and the population may be dominated by relatively easier ones among them in an early stage and the diversity may be lost to satisfy the more stringent constraints (e.g., Example 1 in Section IV). Hence, a high exploration ability is essential to maintain population diversity.
In addition, there is a critical balance between the exploration ability, the convergence speed [43] and the surrogate model quality. The higher the exploration ability (i.e., population diversity), the lower the probability of finding the correct search direction (i.e., convergence speed), and more necessary training data points are required to construct a reliable surrogate model (i.e., computationally expensive simulations). Hence, the central questions become: (1) How to build an SAEA framework (Section III B), and (2) How to design the optimization kernel (Section III C) to call for an appropriate balance among these factors?

B. THE NEW SAEA FRAMEWORK
The flow diagram of ASDEMO is shown in Figure 1. The algorithm consists of the following steps.
Step 1: Sample α (often a small number) designs from the design space using the Latin Hypercube sampling method [22], perform numerical simulations to all of them and let them form the initial database.
Step 2: If a preset stopping criterion (e.g., a maximum number of simulations) is met, output the best design from the database; otherwise go to Step 3.
Step 3: Select the λ best designs from the database to form a population P. VOLUME 8, 2020 Step 4: Apply the proposed self-adaptive DE search (Section III C) on P to generate n child populations and each population has λ child solutions.
Step 5: For each candidate design in each population, take the τ nearest designs (based on Euclidean distance) from the database and their performance values as the training data points to construct a local GP surrogate model. (There are n × λ GP models in total.) Step 6: Prescreen the n × λ child solutions generated in Step 4 using the GP models in Step 5 and the lower confidence bound method (7). Select the top n child solutions based on the lower confidence bound values.
Step 7: Simulate the estimated top n child solutions from Step 6. Add them and their performance (via numerical simulation) to the database. Go back to Step 2.
It can be seen that some ideas are borrowed from our SMAS framework [23], [32], [33]. The standard EA process is not used. Instead, only the predicted top candidate designs are simulated and the current best λ candidate designs are used as the new population in each iteration. The goal is to improve the locations of the training data points. Often, the number of training data points is considered as the main factor affecting the quality of a surrogate model, but their locations are overlooked. Intuitively, using the same number of training data points located near to the points waiting to be predicted (child populations in Step 4) can obtain a better surrogate model than using those far away from them. Because the training data points are selected from candidate designs generated by the search operators, the search should generate candidate designs in expected locations (i.e., near to the child population).
To meet this requirement, in each iteration, the λ current best candidate designs construct the parent population (it is reasonable to assume that the search focuses on the promising region) and the top n candidate designs based on prescreening in the child population are selected to replace the few worst ones in the parent population. Hence, only a few candidates at most are changed in the parent population in each iteration, so the top quality candidates in the child solutions among several consecutive iterations may be quite near to each other (they will then be simulated and are used as training data points). Therefore, the training data points describing the current promising region are much denser compared to those generated by a standard EA population updating mechanism, which may spread in different regions (many of them are far from optimal) of the design space, while there may not be sufficient training data points around the candidate solutions to be prescreened.
New candidate design generation and prediction are two other critical issues of the SAEA framework. Generating new candidate designs through a self-adaptive DE mutation method is described in the next subsection. However, correct predictions of the new candidate designs become a new challenge. SMAS builds a single surrogate model using the solutions near the child population to be predicted, while the new candidate designs for the targeted problem are expected to have diversity for exploration or serving as a local improvement for exploitation. A single global surrogate model is therefore inaccurate for the prediction. In ASDEMO, a local GP model is built for each child solution waiting to be predicted using the method in Step 5. Experiments show a sufficient prediction capability.

C. THE OPTIMIZATION KERNEL
To achieve the goals described in Section III A, the optimization kernel is designed as follows: For each child population i = 1, 2, . . . , n: Step 1: If the algorithm is within the learning period (the current number of iterations is smaller than a threshold L), the rate of using DE/rand-to-best/2, (10), DE/rand/1/dir, (11) and DE/hybrid trigonometric mutation strategies, (12) is 1 3 . Otherwise; use the rates in Step 5.
Step 2: Perform a roulette wheel selection [44] based on the rates to determine a DE mutation strategy and generate a child population C i (λ child solutions).
Step 3: Use the local GP surrogate models in Step 5 of the SAEA framework (Section III B) to predict all the candidate designs in C i . Step 4: Compare the predicted value of each solution in C i and the current best design (simulated value). Add the number of solutions that are better than the current best design to N s (the number of successes of (10), (11) or (12) and add λ to N u (the number of uses of (10), (11) or (12)). Until all the n groups of child solutions are generated.
80260 VOLUME 8, 2020 Selecting appropriate DE mutation strategies is critical for the success of the optimization kernel. Some considerations are stated as follows: (1) The selected mutation strategies must have high exploration ability. Besides the reasons regarding stringent constraints and high expectations of the optimal objective function value (Section III-A), the SAEA framework (Section III-B) is another reason. The search focuses on the promising subregion and only the top n candidates are simulated and are used to update the surrogate model, laying emphasis more on exploitation. Thus, the exploration ability mainly comes from the mutation strategy themselves. (2) Given the required exploration ability, the convergence speed should be as high as possible. The reference used is DE/rand/1 mutation, which is popular and successful. The goal of the ASDEMO optimization kernel is to improve the convergence speed and keep the exploration ability of the DE/rand/1 mutation strategy. (3) Different DE mutation strategies have a different tradeoff between the exploration ability and the convergence speed. This trade-off is expected to be similar for the selected strategies; Otherwise, because the selfadaptive mutation strategy selection method is greedy (Step 5), the strategy with the fastest convergence speed may dominate the selection in an early stage, which is harmful to preserving the diversity in the whole search process. Considering multiple constraints with various extent of difficulty to be satisfied, the above consideration is even more important. (4) MEMS shape optimization problems can be separable or non-separable, and ASDEMO aims to be a general method, so both conditions should be considered. (5) Memetic algorithms, which include local search into global search, show largely improved exploitation ability than standard EAs [45]. Different from our previous works using different stages for local and global optimization [46], [47], local search will be embedded in DE mutation operators to help satisfy stringent constraints as well as locally improve the objective function value.
Based on the above considerations, three more efficient DE mutation strategies [38]- [40] compared to DE/rand/1 are employed. A method to improve the convergence speed is to use x best to guide the search. DE mutation strategies employing x best show strong competitiveness for not highly multimodal and separable problems [39]. To keep the exploration ability, two pairs of individuals are used. That is why DE/rand-to-best/2, (10) is selected.
Another convergence speed improvement method is to utilize the objective function value. A typical example is the DE/rand/dir category. One pair of individuals is selected because using two pairs of individuals results in too high diversity compared to the other two selected strategies. It is shown that DE/rand/1/dir (11) can maintain the diversity of DE/rand/1, and shows good performance for multimodal and non-separable problems [38]. This is a complement to DE/rand-to-best/2, (10).
To promote local exploitation, we include local search in DE mutation at a low rate. A typical method is DE/trigonometric mutation, (12) [40]. Using (12), the perturbations are biased towards the candidate designs providing the smallest objective function values, to perform a local search. When hybridizing with DE/rand/1 (Section 2.2), the exploration ability is kept and the exploitation ability is enhanced.
Considering different MEMS design problem landscapes, the most fitted mutation strategy is used with the largest probability based on previous experience of the targeted problem. Our experiments show two observations: (1) For different MEMS optimization problems, the rate of using a certain mutation strategy can be very different, verifying the effectiveness of the self-adaptive method. (2) The optimal design obtained by the self-adaptive method is better than using each selected mutation strategy alone in all of our test cases, which shows the complement of the selected strategies.

D. PARAMETER SETTINGS
In ASDEMO, the algorithm parameters include (1) the DE parameters: λ, F and CR, (2) surrogate modeling parameters: the number of training data points (τ ) for each child solution waiting to be prescreened, the initial sample size (α), and (3) self-adaptive search parameters: the threshold of the learning period (L) and the number of child populations (n).
ASDEMO is a SMAS-based algorithm. Because the population updating scheme in SMAS emphasizes exploitation, large F and CR should be used to maintain the diversity. Empirical studies are in [48]. We suggest F = 0.8, CR = 0.8, λ ∈ [5 × d, 10 × d]. For surrogate modeling parameters, we suggest α = 5×d and τ = 8×d when the size of the database is larger than 8 × d; otherwise, the whole database is used. This is based on the empirical rules in [32], [36] for on-line surrogate modeling. Clearly, L is not sensitive and we suggest it to be within [30,50]. For simplicity, we use L = 30 in all test cases. For n, n = 3 is used according to empirical study.

IV. EXPERIMENTAL RESULTS AND COMPARISONS
In this section, two real-world high-performance MEMS examples (a 4 variable electro-thermo-elastic micro-actuator and a 10 variable corrugated membrane micro-actuator) and a mathematical benchmark problem (the 15 − d Ackley function [49]) are used to demonstrate the ASDEMO method. The reason why the Ackley function is selected among various benchmark problems is that it has a narrow optimal region and a highly multimodal landscape, which has common characteristics with the targeted problem, but it is arguably more complex considering the number of local optima. The forward problems of both MEMS examples are based on finite element analysis. The reference methods are PSO [13], DE [12], SMAS [23], [32], IWO [51], [52], L-SHADE [50] as well as ASDEMO only with a single mutation strategy (i.e., the strategies in (10), (11) and (12), respectively).
The reasons to select the comparison references are as follows: DE and PSO are selected because they are the dominant methods in the MEMS shape optimization area. IWO is selected because it is a relatively new method and it is attracting much attention in shape design optimization [51]- [53]. L-SHADE is selected because it is a state-of-the-art method in the evolutionary computation domain with applications in engineering design optimization [16], [17]. Although ASDEMO is particularly proposed for high-performance MEMS shape optimization, it is interesting to investigate its optimization ability compared with L-SHADE. Standard SMAS is selected because it is a typical surrogate modelassisted EA, and it has shown considerably higher performance compared to several popular surrogate model-assisted single objective optimization techniques [32]. ASDEMO with a single mutation strategy (i.e., the strategies in (10), (11) and (12), respectively) is considered to verify the performance improvement of the proposed hybrid mutation strategy and the self-adaptive adoption mechanism.
The population/swarm/plant size and the maximum number of evaluations are problem dependent and are described in each subsection. The same population size is used for all the methods for a fair comparison. For L-SHADE, the control parameters (i.e., scaling factor and crossover rate) are adaptive in each generation and historical memory with 5 memory cells and an archive rate of 1.4 are used following [50], [54]. The parameters for SMAS, ASDEMO with a single strategy are the same as those used in ASDEMO. Other parameter settings of ASDEMO are discussed in section III. All the tests are run on a workstation with Intel 4-core i7 (4770K) 3.50 GHz CPU and 24GB RAM and the time consumptions reported are wall clock time.

A. EXAMPLE 1
The first example is a silicon electro-thermo-elastic microactuator (Fig. 2) with a Young's modulus of 210 GPa and a Poisson's ratio of 0.22. The coupled equations in (13) to (15) are solved subject to the boundary conditions in (16) to (21).
where J is the current density, γ is the electrical conductivity, E is the electric field, Q is the heat due to Joule-effect, T is the temperature, k is the thermal conductivity, T ref is the reference temperature, h s is the convection coefficient which is different in the upper and in the lower surface of the microactuator (Figure 2), α T is the thermal expansion coefficient and Th s is the thermal part of the total strain. The conduction current problem is: elsewhere; where V A and V B are the terminal voltages at A and B respectively in ( Figure 2) and n v is the normal vector. The thermal problem is: at the simple supports and at the hinges.
elsewhere; where T ext is the external temperature. The mechanical problem is: at the simple supports.
at the hinges; where U is the displacement. The three problems are coupled via the thermal heat Q (electric and thermal problems) and via the temperature T (thermal and elastic problems). Because non-linearities of material parameters are not taken into account, it is possible to solve the three problems subsequently. Therefore, a weaklycoupled analysis problem is dealt with; to solve it, a cascade algorithm such as the successive substitution algorithm can be applied. The micro-actuator is modeled in COMSOL Multiphysics [55]. Its parametric finite-element model has a typical mesh composed of about 8,000 3-D elements and each simulation costs 25 minutes on average.
The design optimization problem is as follows. The design objective is to minimize the controllable area consumption of the micro-actuator (represented by l × (dw + 2d)) subject to specifications on maximum temperature, stress and total displacement as shown in (22). It can be seen that the design specifications are very stringent, as the first and third specifications are narrow bound constraints despite that the constraint on stress is moderate. The ranges of the design variables are shown in Table 1. A geometric constraint defined by dw ≤ 2 × d is used to ensure consistency in the model. This constraint is handled by repairing geometric infeasible candidate designs to the nearest feasible ones before simulation (i.e., setting dw = 2 × d when dw > 2 × d).
Considering the number of design variables, the population/swarm/plant size is set to 30 for all the methods. The external penalty function method is used to handle constraints and the penalty factor for each constraint is 50. To make all the methods converge, the computing budget is as follows: 200 simulations for ASDEMO, SMAS, ASDEMO with a single mutation strategy (i.e., the strategies in (10), (11) and (12), respectively), 1500 simulations for DE, PSO, IWO, and L-SHADE. A total of 10 independent runs are carried out for surrogate model-assisted methods, and two independent runs are carried out for DE, PSO, IWO, and L-SHADE, respectively, because more runs are not affordable. (22) where V min = 1 V , V max = 2 V , T max is the maximum temperature, S is the stress and U is the displacement.
The performances and convergence trends are shown in Table 2 and Figure 3, respectively. ASDEMO and DE satisfy all the design constraints in each run, while other methods fail to satisfy the design constraints in each run. In one PSO run, T max = 702.90 K and U = 2.05 µm and the constraint on S is satisfied. In the other run, the constraints on T max and S are satisfied, but U = 2.03µm. Because the constraint violation for PSO is reasonably small, it is still included in the efficiency comparison. In both runs of IWO, the constraints on T max and U are not satisfied with violations of around 165 K and 0.1 µm, respectively. In both runs of L-SHADE, the constraint on U is not satisfied with violations of around 0.04 µm. Note that in contrast to PSO where only one of the two runs violates the constraint on U with a violation of 0.04 µm, L-SHADE failed in both runs with worse results. In terms of our SMASbased MEMS shape optimization method [23] (i.e., SMAS), and ASDEMO with a single mutation strategy, in the 10 independent runs, none of them satisfy the constraint on U with a violation of around 0.06 µm and they seldom satisfy the constraint on T max with a typical violation of around 1 K . Based on the required performances in (22), it can be seen that the optimization abilities of SMAS, ASDEMO with a single mutation strategy, IWO and L-SHADE are insufficient for this high-performance MEMS shape optimization (i.e., stringent constraints). Hence, they are all excluded from the efficiency comparison.
As revealed in Figure 3, ASDEMO obtains the average objective value of 9.69 nm 2 using 199 simulations (about 4 days). A typical result of ASDEMO is 9.69 nm 2 for the objective function, T max =702.48 K , S=0.62 GPa and U =2.05 µm. The convergence criterion that we defined is that the design specifications are satisfied and the improvement of the objective function is less than 1 nm 2 after 30 simulations. Using

B. EXAMPLE 2
The second example is a silicon corrugated membrane microactuator ( Figure 4) with a Young's modulus of 135 GPa and a Poisson's ratio of 0.33. The micro-actuator is analyzed using Hooke's law and the subsequent equilibrium and geometric equations in (23) to (25).
where σ ij,j is the stress tensor components, ρ is the density and f is the loading vector component.
where ij,j is the deformation tensor components, U is the displacement vector configuration.
where C is the tensor components of the elastic constants. The micro-actuator is modeled in ANSYS [56]. To make 15 runs for each method affordable for comparison, the micro-actuator is configured to have a typical mesh composed of about 500 2-D elements and each simulation costs 1 minute on average.
The ranges of the design variables are shown in Table 3. The design exploration goal is to satisfy specifications on critical pressure (P c ) and deflection at critical pressure (D f (@P c )) as shown in (26). Again, the design specifications are strict. Since this is a constraint satisfaction problem, the weighted sum of the violations of the two constraints of (26) serves as the objective function. To make the level of constraint violations at the same level, the penalty coefficients are 0.1 for P c and 10 for D f @P c . According to the number of  design variables, the population/swarm/plant size is set to 50 for all methods. To make all the methods converge, the computing budget is as follows: 250 simulations for ASDEMO, ASDEMO with a single mutation strategy, SMAS, 1500 simulations for PSO, DE, IWO, and L-SHADE. 15 independent runs are carried out for all the methods.
847.5kPa ≤ P c ≤ 852.5 kPa As shown in Table 4, in all the 15 runs, ASDEMO, SMAS, ASDEMO with a single mutation strategy (i.e., the strategies in (10), (11) and (12), respectively), PSO, DE, and L-SHADE obtain designs satisfying the design specifications in (26). IWO obtain designs satisfying the design specifications in (26) in only 2 runs within the computing budget of 1500 simulations. These two successful runs use 46 and 28 simulations, respectively, which are comparable to the best cases for other methods. Although IWO satisfies the specifications efficiently in two successful runs, its stability is insufficient. Hence, IWO is not included in the efficiency comparison. A typical result for ASDEMO is P c = 847.89 kP a and D f (@P c ) = 171.53 µm.
According to Table 4, SAEA methods (ASDEMO, SMAS, ASDEMO with a single mutation strategy) are more efficient than EA methods (DE, PSO, L-SHADE) given both kinds of methods satisfy the design specifications. Among SAEAs, considering the average number of necessary simulations, ASDEMO's efficiency is slightly better than SMAS and ASDEMO with a single mutation strategy. It can also be observed that the worst case of ASDEMO is 207 simulations. This is similar to SMAS and about 1.2-1.5 times better than ASDEMO with a single mutation strategy (i.e., the strategies in (10), (11) and (12), respectively). This verifies the stability of ASDEMO; in particular, the effect of the three selected DE mutation strategies and the self-adaptive adoption mechanism.

C. BENCHMARK PROBLEM Tests
To show the performance of ASDEMO for problems having similar landscape characteristics with high-performance MEMS optimization, but may have more local optima, a mathematical benchmark problem, the 15 − d Ackley function (27) [49] is used. Often, MEMS shape design optimization problems have fewer than 10 design variables. The use of 15 dimensions shows possible co-design with other devices. The Ackley function has a global minimum at x = 0 d with a function value of 0. It has a narrow peak and numerous local optima. Considering the number of design variables in (27), the population/swarm/plant size is set to 50 for all methods. To make all methods converge, the computing budget is as follows: 650 evaluations for surrogate model-assisted optimization methods, and 7500 evaluations for PSO, DE, IWO, and L-SHADE, respectively. 30 independent runs are carried out for each method.
The performances and convergence trends are shown in Table 5 and Figure 5, respectively. As revealed in Figure 5, ASDEMO obtains an average function value of 0.0895 using the assigned budget of 650 function evaluations. The convergence criterion that we defined is that the improvement of the objective function is less than 1e-4 after 30 function evaluations. Using this criterion, ASDEMO converges using an average of 558 function evaluations. All other reference surrogate model-assisted optimization methods fail to obtain this average value after 650 function evaluations. PSO, DE, IWO, and in particular the famous L-SHADE also fail to obtain this average value after 7500 function evaluations, respectively. To obtain the average function values of SMAS, ASDEMO with strategy in (10)), (11)) and (12)), PSO, DE, IWO and L-SHADE, ASDEMO needs 563, 320, 561, 626, 518, 440, 422 and 617 function evaluations, respectively as shown in Figure 5. Hence, ASDEMO offers 1.16, 2.03, 1.16, 1.04, 14.5, 17.1, 17.8 and 11.8 times speed improvement respectively, while obtaining better average result for this complex problem. In terms of the best result in the 30 runs, although the winner is IWO, the best result obtained by ASDEMO, PSO and L-SHADE are of comparable high quality. In terms of the worst result in the 30 runs, the winner is ASDEMO, showing its stable performance.

D. DISCUSSIONS
The three selected test cases involve various complexity of design landscapes, various number of design variables and various kinds of optimization problems (goal optimization, constraint satisfaction and constrained optimization). Based on the results, the following conclusions can be drawn: (1) ASDEMO can address high-performance MEMS design problems with stringent design specifications efficiently, while other reference methods, including dominant methods in MEMS shape optimization (DE and PSO), a novel heuristic method (IWO) and a state-of-the-art EA (L-SHADE), find these problems difficult to address. (2) ASDEMO is scalable to 15 design variables, which is often enough for MEMS design optimization problems. (3) Besides significantly improved efficiency, ASDEMO even shows improved solution quality compared to DE, PSO, IWO and L-SHADE for the targeted problem. Often, surrogate model-assisted optimization methods are worse than pure EAs in terms of optimization quality due to the no-free-lunch theorem. (4) Compared to a popular surrogate model-assisted EA, SMAS, ASDEMO also shows both improved optimization quality and efficiency.

V. CONCLUSION
In this paper, the ASDEMO algorithm has been proposed for efficient high-performance MEMS design optimization. As demonstrated by two real-world MEMS design problems and a complex mathematical benchmark problem, ASDEMO can provide optimal designs that are better than DE, PSO, IWO and L-SHADE at a far lower computational cost. Therefore, ASDEMO achieves the goals in Section 1. These advantages are achieved by the core ideas of the new on-line SAEA framework, the three DE mutation strategies specially selected and combined for the targeted problem and the selfadaptive adoption of them. The ASDEMO algorithm can also serve as a reference for design optimization with stringent specifications in other engineering domains. Note that the robustness of the generated designs has not been considered in ASDEMO. Future works will focus on addressing robust high-performance MEMS design optimization and developing ASDEMO-based software tools.