WH-MOEA: A Multi-Objective Evolutionary Algorithm for Wiener-Hammerstein System Identification. A Novel Approach for Trade-Off Analysis Between Complexity and Accuracy

Several approaches have been presented to identify Wiener-Hammerstein models, most of them starting from a linear dynamic model whose poles and zeros are distributed around the static non-linearity. To achieve good precision in the estimation, the Best Linear Approximation (BLA) has usually been used to represent the linear dynamics, while static non-linearity has been arbitrarily parameterised without considering model complexity. In this paper, identification of Wiener, Hammerstein or Wiener-Hammerstein models is stated as a multiobjective optimisation problem (MOP), with a trade-off between accuracy and model complexity. Precision is quantified with the Mean-Absolute-Error (MAE) between the real and estimated output, while complexity is based on the number of poles, zeros and points of the static non-linearity. To solve the MOP, WH-MOEA, a new multiobjective evolutionary algorithm (MOEA) is proposed. From a linear structure, WH-MOEA will generate a set of optimal models considering a static non-linearity with a variable number of points. Using WH-MOEA, a procedure is also proposed to analyse various linear structures with different numbers of poles and zeros (known as design concepts). A comparison of the Pareto fronts of each design concept allows a more in-depth analysis to select the most appropriate model according to the user’s needs. Finally, a complex numerical example and a real thermal process based on a Peltier cell are identified, showing the procedure’s goodness. The results show that it can be useful to consider the simultaneously precision and complexity of a block-oriented model (Wiener, Hammerstein or Wiener-Hammerstein) in a non-linear process identification.


I. INTRODUCTION
Mathematical models are used in several engineering fields to describe the behaviour of dynamic systems. These models are based on a set of ordinary or partial differential equations [1], [2] associated with physical phenomena that occur in the system. However, it is not always possible in many practical applications to know the precise relationships that characterize a system. It is thus preferable to use simplified The associate editor coordinating the review of this manuscript and approving it for publication was Bing Li . models obtained experimentally from a set of input-output measurements. This way of building models is referred to in the literature as systems identification [3]- [5].
When the system to be modelled has non-linearities, the identification problem is not easy and selecting the model structure is the most challenging decision. When non-linearity is static, a good way to model nonlinear systems is through block-oriented models consisting of the interaction of linear time-invariant (LTI) dynamic subsystems and static nonlinear elements (NL) [6], [7]. The most common block-oriented models are the Wiener, (MPC) [50], it is well known that controller performance will depend largely on the model quality. However, the higher the model complexity, the higher the computational cost required to calculate the control action. In terms of the control algorithm, there is a significant gap between the MPC based on Wiener models [51], [52] or Hammerstein models [53] and the MPC based on Wiener-Hammerstein models [54]. In this work we thus consider a trade-off analysis between accuracy and complexity in estimating the Wiener-Hammerstein model and the identification problem is stated as an MOP.
Unlike linear models, defining a suitable structure (complexity) in nonlinear models is not a simple task. For this reason, the use of multiobjective optimisation (MO) in nonlinear identification is not new. Thanks to MO, it is possible to generate a Pareto set to compare and analyse the complexity and accuracy of different models in order to represent the same process and avoid over-fitting problems. Applications of MO in nonlinear modelling includes the identification of: the Volterra series [43], radial-basis function (RBF) networks [44]- [46], Nonlinear Auto-Regressive eXogenous (NARX) models [47], [48], Nonlinear Auto-Regressive Moving Average models with eXogenous inputs (NAR-MAX) [49], and Wiener-Hammerstein models [26]. This last approach is based on genetic recombination and particle swarm optimisation. During the search process, the algorithm requires minimal user interaction, but even though a large number of poles and zeros are allowed for both LTI subsystems, good accuracy cannot be achieved since the optimisation problem does not use a linear approximation as a starting point. Furthermore, a polynomial is used for the static nonlinearity, which is not recommended, since the sensitivity of the coefficients increases with the degree of the polynomial.
In this paper, a new Multiobjective Evolutionary Algorithm for Wiener-Hammerstein identification (WH-MOEA) is proposed based on ev-MOGA (epsilon-variable Multi-Objective Genetic Algorithm) [34] but using some new genetic operators and others inherited from WH-EA [16]. WH-MOEA genetic operators perform a smart distribution and fine-tune the linear dynamics while capturing the nonlinearity. To tackle model complexity, two new genetic operators are incorporated into WH-MOEA to increase or decrease the number of points assigned to the static non-linearity.
The entire procedure includes the handling of several design concepts (i.e. alternative initial model structures with different number of poles and zeros) in separate optimisation trials. Nonlinear model complexity is thus handled naturally by the non-linearity with a variable number of points and by the different structures tested in several WH-MOEA runs. Due to the ad-hoc genetic operators of WH-MOEA, different initial linear structures can lead to nonlinear models that do not necessarily have a Wiener-Hammerstein structure, so that Wiener models and Hammerstein models can also be obtained. This may be attractive for the user since a Wiener-Hammerstein model obtained in an optimisation run can be compared to other Wiener or Hammerstein models from other optimisation tests. This procedure can give the user a broader spectrum to decide on the best model. According to the available literature and estimation tools, this is currently only possible using a different method for each type of structure.
Compared with [26], this approach has two significant advantages. It uses a linear approximation as a starting point for nonlinear estimation. Although an additional step is required, knowledge of linear dynamics leads to better results. Also, different Pareto fronts (one for each initial linear structure selected) can be compared in the multiobjective space for an exhaustive trade-off analysis.
By way of summary, the novelty of this paper lies in two main aspects: 1) The development of a new multiobjective algorithm with specific genetic operators for the identification of Wiener-Hammerstein models and their specific cases, i.e., Wiener models and Hammerstein models. 2) Creation of a methodology that can compare different design concepts (alternatives to the dynamic part of the model) from an MO point of view (precision vs complexity) allowing the designer to analyse different model candidates in a more informed way and to choose the most suitable according to his/her preferences. Thanks to this, the user will value the effect of adding or removing poles or zeros from a model. For example, a Wiener-Hammerstein model can be compared to models of similar structure but with a greater or lesser number of poles or zeros, or in case of subtracting complexity in the dynamic part, the model is no longer Wiener-Hammerstein and goes to a Wiener or a Hammerstein model. Comparing different structures can be interesting because differences in precision will justify whether or not to select the most complex structure. Also thanks to WH-MOEA, it will also be possible to compare several models with different complexity in the static nonlinear part.
WH-MOEA has certain advantages over other Wiener-Hammerstein identification methods. Table 1 summarises the most important features of several of the methods proposed in the last decade. As can be seen, most methods use BLA as a starting point for nonlinear estimation and are conditioned to use Gaussian excitation signals, and most methods have a high User Interaction Level (UIL). In these cases, user intervention is required to carry out intermediate procedures, analyse results and make decisions. This aspect is even more critical in methods with a non-parametric approach or where the QBLA/CBLA is required.
In line with previous work [28], WH-MOEA uses multi-step signals, which, unlike Gaussian signals, are easy to design, and almost all processes are enabled to be excited by these signals. Even though estimation starts from a completely unknown static non-linearity, multi-step signals will help to highlight the effects of the non-linearity on the signals used for the identification process. On the other hand, WH-MOEA uses a standard linear model as a starting point for non-linear estimation. This model can easily be obtained from the step response, which is more practical, especially in industrial processes. Rigorous BLA estimation may require the use of more than one Gaussian excitation and each signal must be carefully designed with appropriate frequency content.
Regarding the UIL, WH-MOEA does not require intermediate procedures since the genetic operators are enabled to distribute the dynamics of the initial linear model when fine-tuning the location of the poles and zeros and capturing nonlinearity. Finally, assuming that the process under test is affected by a static non-linearity, the user does not need to enter the structure to be identified in the algorithm. WH-MOEA will find a set of models, which may have a Wiener, Hammerstein, or Wiener-Hammerstein structure.
The rest of this paper is organised as follows. In Section 2 the Wiener-Hammerstein formulation is presented together with the necessary assumptions, after which the multiobjective optimisation problem is stated. In Section 3 the evolutionary algorithm (WH-MOEA) is described in detail. Next, a guide is given for the application of this approach together with a summary of WH-MOEA parameters in Section 4. In Section 5 the application of this approach and results are presented, while the concluding remarks are reported in Section 6.

II. WIENER-HAMMERSTEIN FORMULATION AND PROBLEM STATEMENT A. WIENER-HAMMERSTEIN SYSTEM
A Wiener-Hammerstein model is formed by a static nonlinear block (f (v(t), ρ nl )) sandwiched between two LTI blocks G w (s, ρ w ) and G h (s, ρ h ). The structure of a Wiener-Hammerstein model is shown in Figure 1, where u(t) and y(t) are the model input and output, respectively, while v(t) and w(t) are internal variables that cannot usually be measured. In general terms, a Wiener-Hammerstein model can be mathematically described by: where vectors ρ w and ρ h contain the parameters of G w and G h , respectively, while vector ρ nl contains the static non-linearity parameters. Let the LTI subsystems be represented as rational transfer functions in a continuous-time domain in factorised form (zero-pole-gain): while static non-linearity is represented by a piece-wise linear function: In (3), −p w i with i = 1 . . . n a and −z w i with i = 1 . . . n b represent the poles and zeros of the front subsystem G w , while in (4), −p h i with i = 1 . . . n c and −z h i with i = 1 . . . n d represent those of the G h block. In both LTI subsystems s is the complex Laplace variable. Notice that static gains of both LTI subsystems have been normalized to 1 since the nonlinear block will absorb the real gains. On the other hand, in (5) ρ nl contains the abscissas and ordinates that define the breakpoint locations of the piece-wise linear function. Under this configuration, vectors ρ w , ρ h , and ρ nl would be defined as follows: where the pair (v i , w i ) defines the location of a breakpoint and n is the number of breakpoints assigned to represent the static non-linearity. The identification problem is presented as a search problem in which the elements of vectors ρ w , ρ h , and ρ nl must be found from input and output measurements. It should be taken into account that parameters n a , n b , n c , n d and n define the model complexity but are unknown beforehand. They will be known after the optimisation algorithm has distributed the dynamics of the initial linear model, whose number of poles (n poles = n a + n c ) and zeros (n zeros = n b + n d ) is known in advance. Since parameter n is variable, the user must indicate the minimum and maximum number of points allowed (n min < n < n max ) for the nonlinear block. Thanks to the unified approach proposed in [28], the formulation presented here is also applicable to Wiener and Hammerstein models, which are specific cases of Wiener-Hammerstein structure when one of the two LTI blocks lacks dynamics. In the case of Hammerstein models G w = 1 and ρ w do not exist, while neither do G h = 1 and ρ h in Wiener models. In any case, the user will not need to specify the model structure to be identified, but rather the algorithm will decide the best structure (Wiener, Hammerstein or Wiener-Hammerstein) from the measured data set.

B. GENETIC CODING AND SEARCH SPACE FOR NONLINEAR OPTIMISATION
This approach uses a multiobjective optimisation algorithm to solve the identification problem detailed in the previous section. This algorithm is based on a population of individuals in which each individual represents a solution of the problem. All the individuals contain coded genetic information on the structure and parameters of a model. This genetic information is composed of three segments ( Figure 2). The first one, P, contains information on the pole and zero locations. A second segment C contains binary information classifying poles and zeros, i.e. which ones belong to G w and which to G h . Segment B contains breakpoint coordinates representing the static non-linearity and varies in size between individuals. It can also change from generation to generation. This variation will be exploited to generate a set of optimal solutions (models of different complexity).
In P, the field zr 1 , . . . , zr nr contains the real zeros, while fields zc 1 , . . . , zc nc and zi 1 , . . . , zi nc contain the real and imaginary parts of the complex zeros, respectively. In the same way, pr 1 , . . . , pr mr contains the real poles of the model, while fields pc 1 , . . . , pc mc and pi 1 , . . . , pi mc contain the real and imaginary parts of the complex poles, respectively. The vector C contains a binary code that indicates how the poles and zeros of the initial linear model are distributed between the two LTI subsystems. Field xz 1 , . . . , xz nc+nr indicates how the zeros should be distributed, while field xp 1 , . . . , xp mc+mr indicates the poles' distribution. Values of mr, nr, mc and nc are obtained directly from the structure of the initial linear model and indicate the number of real poles and zeros as well as the number of pairs of complex conjugate poles and zeros respectively. The vector B contains information on the location of the breakpoints used to represent the All the parameters of vector ρ are encoded in vectors P, B and C. Figure 3 shows an example of how an individual has been coded to represent a nonlinear model in which the initial linear model has six poles (four reals and two complexes) and three zeros (one real and two complexes). According to this structure, mr = 4, nr = 1, mc = 1 and nc = 1. On the other hand, it is assumed that static non-linearity is represented with five points (n = 5). Notice the correspondence between vectors C and P: if an element of vector C has the value 1, the corresponding zero (or pole) of P is located in the input LTI subsystem G w (s), while a 0 value in C indicates that it is located in the output block G h (s). In Figure 3, the first element of C is zero, so the first one of P belongs to G h (s). Since the first element of P contains the real part of a complex zero, the output LTI subsystem must include this zero −1.4 + 0.6i and its conjugate −1.4 − 0.6i. Since the second element of C is 1 then the second one of P is located in the input LTI subsystem. Under this same logic, the poles are distributed between both LTI subsystems.
As generations go by, the genetic information of individuals is modified by genetic operations to find the best solutions to the identification problem. A modification in the genetic information of P implies an exploration of new locations for the poles and zeros, while a change in the genetic information of C means that a new distribution of poles and zeros is tested. On the other hand, a modification in B may imply a variation in the number of breakpoints or a coordinates change of the existing ones. In any case, individual modifications must respect a search space, which must be adequately defined to facilitate the convergence of the algorithm.
New pole and zero locations are bounded by P min and P max (components of vector P are element-wise bounded by vector P min and P max ). Bounds in P min and P max are defined around the location of each pole or each zero of the initial linear model. They can be set according to the sensitivity of poles and zeros, i.e. the closer to the imaginary axis, the smaller the search space, and conversely the farther to the imaginary axis, the larger the bounds. By way of illustration, assuming that the poles and zeros in the example of Figure 3 belong to the initial linear model, vectors P min and P max could be set as follows: In this case, bounds have been set so that the pole in −3.2 can move ±1 around its value, the pole in −2.1 can move ±0.5 around its value and all other poles and zeros have a freedom of movement of ±0.3. Since there is no recipe to define the bounds precisely, others could be set for this example. However, it should be noted that too large bounds could cause the algorithm to converge more slowly, while too small bounds could cause an ineffective exploration.
Regarding parameter bounds of non-linearity, WH-MOEA takes the approach defined in [28], where a common search space is defined to face the uncertainty of the location of the static non-linearity around the dynamics. The bounds will be vertically defined by the minimum (y min ) and maximum (y max ) values of the output signal, while the horizontal bounds will be given by the minimum (u min ) and maximum (u max ) values of the multi-step input signal. To prevent disorderly horizontal movement of breakpoints, movement of each breakpoint will be constrained by the position of the neighbouring breakpoints. In addition, to avoid an overlap, a minimum distance between the breakpoints will be considered through the user-defined parameter α. The minimum and maximum bounds that define the vertical and horizontal search space of each breakpoint are expressed through: where vectors W min and W max define the vertical search space (y-coordinate) of the breakpoints and vectors V min and V max define horizontal ones (x-coordinate). The entire search space of vector B is then defined as: On the other hand, the binary code contained in vector C is generated randomly but subject to the following considerations: • The number of poles assigned to a subsystem must always be greater than or equal to the number of zeros assigned to the same subsystem, i.e., the resulting system cannot be improper. Therefore n a ≥ n b and n c ≥ n d .
• The sum of the poles distributed between both subsystems must be equal to the number of poles of the initial linear model (n a + n c = n poles ).
• The sum of the zeros distributed between both subsystems must be equal to the number of zeros of the initial linear model (n b + n d = n zeros ).

C. MULTIOBJECTIVE OPTIMISATION PROBLEM STATEMENT
A MOP with m objectives to minimise can be stated as follows: is the vector-valued objective function and x is the decision variable in the search space D.
Since a MOP usually involves conflicting objectives, there is no single solution that minimises all the objectives. Instead, there will be a set of optimal solutions, known as non-dominated solutions or Pareto solutions.
Definition 1 (Pareto Optimality [35]): An objective vec- Definition 2 (Dominance [36]): The Pareto set (set of optimal solutions) and its corresponding Pareto front are therefore defined as follows: Definition 3 (Pareto set, X p ): The Pareto set is the set of solutions in D that are not dominated by another solution in D.
Usually, X p contains an infinite number of solutions and, for this reason, it is not possible to get it completely. Therefore, a discrete set X * p ⊂ X p such that f (X * p ) characterises f (X p ) is obtained. Note that the set X * p is not unique. For the identification problem, the decision variable x is a vector formed by the concatenation of vectors P, B, and C. Therefore, the optimisation problem would be stated as follows: subject to: P min ≤ P ≤ P max (19) (20) n a ≥ n b (21) n c ≥ n d (22) n a + n c = n poles (23) where Notice how f 1 is related to model accuracy quantified by the MAE between the estimated model output (y) and the real output (y r ) for a set of N samples, whilst f 2 represents complexity model, measured by the number of poles, zeros and points of the static non-linearity.
It should also be taken into account that the metrics defining the objectives f 1 and f 2 are independent of those that can be used in the estimation and selection of linear structures, which, as explained in Subsection IV-B, is a preliminary step to multiobjective optimisation.

III. WIENER HAMMERSTEIN MULTIOBJECTIVE EVOLUTIONARY ALGORITHM (WH-MOEA)
WH-MOEA is an improvement of WH-EA algorithm [16] to address the identification of Wiener, Hammerstein and Wiener-Hammerstein models under a multiobjective optimisation approach. WH-MOEA adopts some genetic operators and coding from the WH-EA algorithm, whilst the structure and functioning are acquired from ev-MOGA algorithm [34]. Arguably, WH-MOEA is therefore an elitist multiobjective evolutionary algorithm based on the concept of -dominance [37] for identification of block oriented models (Wiener, Hammerstein, and Wiener-Hammerstein).
Balancing convergence and diversity is guaranteed in WH-MOEA thanks to features inherited from ev-MOGA. Basically, ev-MOGA tries to ensure that X * p converges toward the Pareto set in a smart distributed manner along the Pareto front with limited memory resources. To do that, 1) the prevalence of dominant solutions in the population Pop and the archive A with respect to the dominated solutions is guaranteed, and 2) the objective space is split into a fixed number of boxes, and only one solution can be stored in each box. This avoids the need to use other clustering techniques to obtain adequate distributions, and so considerably reduces the computational cost [34], [37]. 2) Archive A is used to store the Pareto front approximation f (X * p ). Its size Nind A is variable but bounded because, as already mentioned, the target space is divided into a finite number of boxes where, at most, there can only be one solution.
3) Auxiliary population G is used to store new solutions created in each iteration of the optimisation process. Its size is Nind G , which must be an even number. The pseudocode of the WH-MOEA algorithm is shown in Algorithm 1, whilst its main steps are detailed as follows: Evaluate objectives f for all individuals in G(g); 8: , G(g)); 9: , G(g)); 10: end for 11: Print solution A(MaxGen) . First individual is coded from the initial linear model, as indicated in [16] and [28]. Next, this first individual undergoes all mutation operations Nind P − 1 times to give rise to the rest of the population.
Step 4. Function store checks individuals in Pop(g) that might be included in the archive A(g) taking into account -dominance concept. Step 6. Function create creates new individuals of G(g) by using procedure and genetic operators described in Subsection III-B.
Step 8. Function store checks individuals in G(g) that might be included in the archive A(g) taking into account -dominance concept. Besides, individuals from A(g), which are -dominated by individuals in G(g), will be eliminated.  Increase a breakpoint 4: else if P as > 0.5 and size(B/2) > n min then 5: Compute the slopes of all segments 6: Find the two consecutive segments with the most similar slopes 7: Compute the difference (D s ) between these two slopes 8: if |D s | ≤ 1.0 × 10 −3 then 9: Remove the breakpoint that is common to both segments 10: end if 11: end if C.3. In this last genetic operation the individual selected from Pop(g) and an individual from the archive A(g) randomly selected by r 2 exchange information.
Each genetic operation acts on a certain portion of an individual's information. As can be seen in Algorithm 2, to expand diversity and avoid premature convergence, not all genetic operations are applied at the same time. A random process and control parameters γ (g), η min , η(g), ξ , P cr , and P nl decide which genetic operators should be applied. The random number r pznl ∈ (0, 1] is compared with γ (g) to decide if the location of a pole/zero is mutated with M.1 or whether the location of a breakpoint is mutated with either M.2 or M.3. This last selection depends on the random number r mm ∈ (0, 1] and the control parameter η(g). On the other Individuals r 1 and r 2 cross all genetic information contained in P 4: else if C ga = 2 then 5: Individuals r 1 and r 2 cross all genetic information contained in B 6: else if C ga = 3 then 7: Individuals r 1 and r 2 cross all genetic information contained in C 8: end if hand, parameter ξ defines the probability that an information alteration will occur in C (binary code for the classification of the dynamics). This alteration is handled by mutation M.4, and also its occurrence depends randomly on the value of r c ∈ (0.1]. Similarly, mutation M.5 and crossover C.3 are randomly selected through random numbers r nl ∈ (0, 1] and r cr ∈ (0, 1] respectively. Parameter P nl defines the probability that mutation M.5 occurs, while parameter P cr defines the probability that crossover C.3 occurs. Control parameters adjust the probabilities of mutation and crossover. The parameters γ (g) and η(g) change through generations, while fixed parameters η min , ξ , P cr , and P nl are considered tuning parameters of the algorithm. However, through examples where WH-MOEA and its precursors have been used, it has been possible to establish through trial and error appropriate values depending on the results obtained. These values have been used in both identification problems addressed in this paper.
During the first generations, γ (g) is close to one which gives a high probability of a mutation occurring on the segment that contains genetic information on the position of the breakpoints; the probability of a mutation occurring on the segment containing genetic information on the location of the poles and zeros is therefore low. Parameter γ (g) decreases as the generations pass so that in the last generations the algorithm will modify both portions of genetic information with equal probability. The way the γ (g) parameter works is justified by the fact that static non-linearity is entirely unknown, so the algorithm should focus more on this portion of genetic information during the first generations. As non-linearity takes shape, the algorithm will increase the probability that new positions for the poles and zeros will also be explored, so that the dynamics will be refined. Similarly, the control parameter η(g) allows the probability of selecting between M.2 and M.3 mutations to be variable. During the first generations, mutation M.3 is not so necessary since this genetic operation concentrates breakpoints in the curvatures to achieve higher accuracy. As long as the non-linearity does not take shape, the concentration of points in the curves will have no significant effect. The mutations M.1, M.2, M.3, and M.4 are inherited from the WH-EA, therefore further information on how these genetic operations work as well as the calculation of the parameters γ (g) and η(g) can be found in [16]. In this section, a brief description of these genetic operations and new genetic operators (mutation M.5 and crossover C.3) are briefly explained.
• Mutation M.1. A gene (element) of P is randomly selected to be modified. This modification involves exploring a new position for the corresponding pole or zero within the search space. The following mathematical expression describes this genetic operation: where, N zp (0, σ 2 (g)) is a random number with Gaussian distribution and variable standard deviation σ 2 (g). p act contains the current value of the selected gene to be mutated, while p new contains the result of the mutation.
In each generation g, the standard deviation is calculated with the following expressions: As generations go by, σ 2 (g) will reduce its value from an initial value (σ 2 ini ) to a final value (σ 2 end ). In the last generations, mutations on P will be more subtle to achieve a fine-tuning of the parameters. The rate of decrease of the standard deviation (σ 2 ratio ) depends on σ 2 ini , σ 2 end and the predefined number of algorithm generations (MaxGen). Both σ 2 ini and σ 2 end are user-defined parameters that must be configured before the execution of the algorithm. s is a measure of the scanning space that can be calculated with the upper and lower limit of the search space of the corresponding pole or zero.
• Mutation M.2. A pair of B genes are randomly selected to be modified. These genes are matched and represent the coordinates of a breakpoint. The new values of the selected genes are calculated with the same procedure used in the mutation M.1. Two random numbers with Gaussian distribution (N v (0, σ 2 (g) and N w (0, σ 2 (g))) are thus required. N v (0, σ 2 (g)) is used to mutate the gene of the abscissa of the selected breakpoint, while N w (0, σ 2 (g) is used to mutate the gene of the ordinate of the same breakpoint. As in mutation M.1, σ 2 (g) varies from σ 2 ini to σ 2 end to control the aggressiveness of the mutations and fine-tune the breakpoints.
• Mutation M.3. This genetic operation is applied to B and allows the breakpoints to jump to each other so that they can concentrate on the curvatures. The two genes that define the position of a breakpoint are randomly selected. The selected breakpoint must jump to a new location which will be in a segment defined by two other breakpoints. This segment is also chosen randomly.
The new abscissa is computed as the midpoint between the two breakpoints that form the segment over which the point will jump, while the new ordinate will be calculated using a quadratic interpolation and information on the breakpoints near the location where the jump occurred. Quadratic interpolation will help to make a smooth transition from a point when it jumps to a specific segment.
• Mutation M.4. This genetic operation is applied to C and generates new combinations of poles and zeros to give rise to the dynamics of the two LTI subsystems. Each time this mutation is required, a new binary code subject to (21)-(24) is randomly generated in C.
• Mutation M.5. This genetic operation is applied to B. Under certain conditions it can increase or reduce the number of breakpoints used to represent static nonlinearity. The way this genetic operation works is synthesised in Algorithm 3. A possible increase or decrease is determined by the random number P as ∈ (0, 1]. An increase of one breakpoint will occur as long as the individual selected to mutate does not contain in B the maximum amount of breakpoints allowed (n max ). If an increase is required, a new breakpoint will appear on a randomly selected segment. As with mutation M.3, the abscissa of the new breakpoint will be calculated as the midpoint between the two breakpoints that define the selected segment, while the ordinate will be calculated using a quadratic interpolation and the coordinates of three neighbouring breakpoints. For a decrease to occur, two conditions must be met in addition to P as > 0.5. The first condition is that B must not contain the minimum number of breakpoints allowed (n min ), while, the second condition allows a decrease to occur as long as a breakpoint is located on a straight line, that is, in a redundant position. To verify this condition, the slopes of all segments joining two breakpoints will be calculated. The slope of each segment will be compared with the slope of the segment on the right. The two consecutive segments with the smallest difference in slope will contain the breakpoint that is likely to be eliminated. These two slopes will be compared again; if the absolute value of the difference is close to zero, the breakpoint that is common to the two segments will be removed. Closeness to zero is quantified by a fixed value of 1.0 × 10 −3 . This procedure aims to eliminate unnecessary breakpoints.
• Crossover C.3. This genetic operation allows an individual from the main population Pop(g) to exchange genetic information with an individual from the archive A(g) (front of Pareto), for which two random integers are generated. The first integer (r 1 ∈ [1, size(Pop(g))]) is used to randomly select the individual of Pop(g), while the second integer (r 2 ∈ [1, size(A(g))]) is used to randomly select the individual of A(g). Each individual of both Pop(g) and A(g) has three portions of genetic information (P, B, and C), however, the exchange of information between the two individuals will be of only one portion, for which another random integer is generated (C ag ∈ [1, 3]). This number will decide which portion of the genetic information will be exchanged (see Algorithm 4).

IV. USING WH-MOEA TO COMPARE DESIGN CONCEPTS: A PROCEDURE A. COMPARING DESIGN CONCEPTS BY THE MOP APPROACH
When dealing with a MOP design problem, it is quite common to consider different alternatives (design concepts), i.e. ideas on how to solve the problem. It is possible to define an independent optimisation problem for each of these design concepts [38] and when they share the same objectives they can easily be compared directly in the objective space by simply comparing their respective Pareto fronts. For example in [39], this idea is used to compare alternative slide model control structures, in [40] to compare different loop pairings and in [41] to analyse the performance of different battery models.  Figure 4 shows an example in which the Pareto fronts of three design concepts are compared in a two-dimensional objective space. Notice that as design concept DC2 completely dominates design concept DC3, concept DC2 will be preferred over concept DC3. However, DC2 does not completely dominate concept DC1 or vice versa. If the designer has a preference for solutions in Zone 1, solutions from concept DC2 would be selected, as in this zone concept DC2 dominates concept DC1. Whilst if Zone 2 was preferred, the designer should choose solutions from concept DC1. This illustrative example shows how interesting it is to use the MOP optimisation approach when comparing possible design concepts. In the end, the designer would choose the final solution taking into account the dominance between concepts and his/her specific preferences.
In this paper, an initial linear model is used as a starting point for nonlinear estimation. Most of the time, the model is selected from a ranking based on a specific criterion. However, under the assumption that there are any other useful criteria giving more candidates, the idea of design concepts arises so more than one initial linear model should be tested. Then, each linear structure will give rise to a design concept. VOLUME 8, 2020 Furthermore, if you sort the list of initial candidates by a criterion, several models with similar values could be obtained and therefore one might add other criterion to perform the initial selection. In these case, it is worth to evaluate more than one linear structure.

B. PROCEDURE DEFINITION
Using WH-MOEA as a tool for nonlinear identification can not only get a specific model, but also a set of models with different features for effective decision-making based on particular needs. The suggested procedure is defined as follows: Step 1 (Establish a set of Candidates Using a Standard Linear Identification Method): Since WH-MOEA can fine-tune the initial linear model, a classical step test can be used. The ranking can be made according to the structure of the models (number of poles and zeros) and one or more performance criteria. For example, the mean squared error (MSE) could be used; however, to avoid over-modelling the MDL criteria or any others, mentioned in Section I, can be useful.
Step 2 (Choose the Design Concepts): From this set of candidates, you have to select some initial linear structures to be tested. Each one will correspond to a design concept. It may be that two or more structures present similar values to a performance criterion or you may have two or more conflicting candidates for the best of their respective criteria. However, other less complex linear structures can also be selected. A comparison of the Pareto fronts of various design concepts will help in adequate decision-making. Step

(Prepare a Multi-Step Excitation Signal and Performs an Experiment With the Process Under Identification):
The input signal must ensure that the system output reaches steady state. For static non-linearity to be appropriately captured, different step amplitudes must allow a scan of the entire process operating range. A minimum distance between two consecutive steps must be considered [28] to highlight static non-linearity. Finally, excite the process under identification with this signal and record its output for a given sampling period.
Step 4 (Set the Bounds): From the minimum and maximum values of the input (u min and u max ) and output (y min and y max ) signals, define the search space bounds for static non-linearity using equations (11) to (14).
Step 5 (Run WH-MOEA for Each Design Concept): Table 2 shows the list of WH-MOEA parameters that must be previously assigned to carry out each run. The bounds P min and P max must also be previously defined as a function of the sensitivity of the poles and zeros. At the end of each WH-MOEA execution a Pareto front is obtained.
Step 6 (Perform a Decision Making Analysis): Each Pareto front contains a set of models with different performances. For trade-off analysis, draw all the Pareto fronts on the same graph. Evaluate selected models on the estimation and validation data sets for a final decision.

V. APPLICATION OF WH-MOEA PROCEDURE, RESULTS AND DISCUSSION
To show the effectiveness and usefulness of the methodology presented, two application examples are included in this section. The first consists of a numerical example, a process with a pure Wiener-Hammerstein structure, while the second is a real thermal process based on a Peltier cell. WH-MOEA parameters were set to the same values for both identification problems: ξ = 0.25; P nl = 0.33; P cr = 0.33 and η min = 0.30. The initial and final standard deviations for mutations were set to 20 and 1, respectively. A. APPLICATION 1: NUMERICAL EXAMPLE Figure 5 shows the Wiener-Hammerstein process used for this example. Static non-linearity (S NL ) consists of a sigmoid hyperbolic tangent function ''tansig'' (31), which symmetrically saturates large values of the independent variable. Also, Gaussian noise with a power of −40db was added on the system output to emulate a real situation.
Step 1 (Establish a set of Candidates Using a Standard Linear Identification Method): First, an experiment for initial linear identification was designed. Since input range was between 0 and 100 units a constant signal with amplitude 50 units was injected into the system until it reached steadystate. On this operating point, the process was excited with a small input change of 2.5 units and a duration of 15s ( Figure 6). During the experiment, input and output data were recorded with a period of 50ms. Offset was removed from the data to avoid problems with initial conditions.  Fifteen initial linear models were estimated from 3 rd to 6 th order considering only strictly proper systems and no more than three zeros. For each structure, a linear model was estimated using the Matlab System Identification Toolbox [42] with the tfest command. The MSE and the modified MDL were calculated on the estimation data set. The results of these estimates are shown in Table 3. According to the modified MDL criteria, the best model is two zeros -five poles (G 2z/5p ) which is consistent with the actual dynamics of the WH process under consideration.
Step 2 (Choose the Design Concepts): According to the results in Table 3, the linear structure G 2z/5p might be considered for nonlinear estimation; however, as mentioned in Section IV, others can also be considered. Models with six poles would not be a good option since there are other less complex models that can achieve similar performance. As model G 3z/5p has a zero in −119, far from the system dominant dynamics, all the models more complex than G 2z/5p would be discarded.
The effect of non-minimum phase zero in this example is remarkable. The MSE of model G 1z/3p (4 zero-pole) is better than that of model G 4p having the same number of parameters. It is even better than more complex models without zeros, such as models G 5p and G 6p . For this reason, models without zeros will not be considered for nonlinear estimation.
According to this analysis, linear structures G 1z/3p , G 2z/3p , G 1z/4p , G 2z/4p , G 3z/4p , G 1z/5p , and G 2z/5p would be good options for nonlinear estimation. Each of these structures will give rise to a design concept for nonlinear estimation. Table 4 shows the poles and zeros of each design concept, while Figure 7 shows a comparison of their performance. For convenience sake, from now on each design concept will use a nomenclature as indicated in the first column of Table 4. This nomenclature is preceded by the characters ''DC'' followed by an indication of the number of poles and zeros of the linear structure. Step

(Prepare a Multi-Step Excitation Signal and Performs an Experiment With the Process Under Identification):
Two multi-step signals were designed, one for estimation and one for validation purposes (see Figure 8). The amplitude of the steps in both signals was handled randomly, lasting 25s each. The minimum step amplitude was set to 15 units. The estimation signal was designed with 50 steps, while the validation signal was designed with 30 steps. The proposed WH process was simulated twice in Matlab using estimation and validation multi-step inputs. In both experiments, process input and output were recorded with a sampling period of 50ms.
Step 4 (Set the Bounds): From the estimation data set, the minimum and maximum values of the input and output signals were obtained (u min = 0, u max = 100, y min = −0.024 and y max = 99.933). The vertical search space for VOLUME 8, 2020 TABLE 4. Selected initial linear models (design concepts) for nonlinear identification of WH process.

FIGURE 8. Estimation (top) and validation (bottom) input signals for nonlinear identification of WH process.
all breakpoints was defined with y min and y max using (11) and (12) whilst u min and u max values were used internally by the algorithm to define the variable limits V min and V max using (13) and (14) respectively. It should be noted that the horizontal search space of each breakpoint is variable and depends on the position of the neighbouring breakpoints.
Step 5 (Run WH-MOEA for Each Design Concept): According to selected design concepts (Table 4), seven bi-objective optimisation problems were stated as in Subsection II-C. WH-MOEA was thus executed seven times, each one fed with a different initial linear model (design concept). The configuration parameters in Table 5 were used for all the executions. Table 6 shows how the first individual in the population (P 0 1 ) was coded for each design concept and the lower (P min ) and upper (P max ) exploration bounds to refine the locations of the poles and zeros (defined individually around each pole or zero). It should be borne in mind that there is no recipe to set precisely the bounds but, the higher the interval the more exploration (slower algorithm convergence). A good alternative is to set the bounds based on the sensitivity of each pole or zero. Those closest to the imaginary axis are more sensitive to changes so their exploration bounds may be small, while a greater degree of freedom can be given to the poles and zeros furthest from the imaginary axis.
After the seven WH-MOEA runs, a total of 82 different models were obtained. Table 7 shows the number of models obtained in each design concept. This table also specifies how each model has been labelled.
Step 6 (Perform a Decision Making Analysis): To make an effective analysis of the results obtained, the Pareto fronts of all design concepts must be compared in a two-dimensional objective space. According to the number of design concepts stated for this example, seven Pareto fronts were obtained. Given the high precision achieved by the models of design concept DC − 2z5p, the corresponding Pareto front will be analysed separately. Figure 9 shows a graph of this Pareto front which contains 14 Wiener-Hammerstein models with the same dynamic structure. The distribution of the poles and zeros of these 14 models is consistent with the real dynamics of the proposed example. Undoubtedly, the fact of having the same structure as the real example has meant that the precision of all these models cannot be achieved by any model of the other fronts, even by the models of design concept DC − 3z4p, which has the same complexity level devoted to representing the dynamic part. In this case, the nonlinear estimates have highlighted the different degrees of accuracy that models of these design concepts can achieve. This difference is not so apparent on inspecting Table 3, which contains the results of the linear estimates, i.e. the MAE achieved TABLE 6. Coding of the first individual in the population and exploration bounds to refine the poles and zeros (design concepts for identification of the numerical example). by initial models G 2z/5p and G 3z/4p is 8.143 × 10 −5 and 9.315 × 10 −5 while their MDL is 4.781 × 10 −5 and 5.479 × 10 −5 , respectively. Apparently, it seems that both models have similar performances. However, with nonlinear estimations, the importance of selecting the appropriate structure has been highlighted.
Looking at Figure 9, it can be clearly seen that the best model in terms of accuracy is M 2z5p 1 . The precision (objective f 1 ) achieved by this model is 2.065×10 −2 , while its complexity level (objective f 2 ) is 33 (2 zeros, 5 poles and 26 points for the static non-linearity). This model would undoubtedly be the best option in terms of precision, but it should be borne in mind that it is also the most complex model.
A great advantage of this approach is that the Pareto fronts allow the performance of different models to be compared. In addition, in each Pareto front it is also possible to analyse the contribution of the number of breakpoints in terms of precision. Looking at the Pareto front in Figure 9 it can be seen that the accuracy achieved by model M would be preferred since it has five points less. Figure 10 compares the other Pareto fronts. Analysing these fronts one can notice that the best models in terms of precision are on the front of design concept DC − 3z4p, However, it should be noted that the models of this design concept have the same complexity in the dynamic part as the models of the design concept DC − 2z5p, which achieved better accuracy. It would therefore not make sense to select models from design concept DC −3z4p, as there will always be an equally or less complex model of design concept DC − 2z5p which will always be more accurate.
Something similar occurs with models belonging to design concepts DC − 1z5p and DC − 2z4p, all of which have the same complexity in the dynamic part; however, the difference in accuracy is evident. This would exclude all models that have one zero and five poles from the selection since there will always be a model with the same complexity in the dynamic part that has better performance in terms of accuracy.
In the same way, the models of design concepts DC − 1z4p and DC − 2z3p have equal complexity in the dynamic part, so it would be interesting to compare them. At first glance, the most complex model of each design concept (M  models of this front with a complexity level less than 21 were analysed, their performance would be worse than models of design concept DC − 1z3p, whose dynamic structure has one pole less. According to the analysis presented, there are some design concepts whose models do not offer good performance. However, before discarding them, it would be essential to review their dynamic distribution. For example from the previous analysis, models from design concept DC −3z4p are not good candidates since other models have better precision and the same complexity in the dynamic part. Conversely, if models with a Wiener or Hammerstein structure had appeared as a result of WH-MOEA execution, any of them would have been very attractive especially for control applications. 1 In this regard, the fronts from which models could be selected to represent the system would be those of design concepts DC − 2z5p, DC − 2z4p, DC − 2z3p and DC − 1z3p. If high accuracy were required, the M  Table 8 shows the complexity and accuracy achieved by the three models indicated. The accuracy reached in the validation data set is also calculated (f 1v ).  Figure 11 shows their responses compared with the process under identification. Figure 12 shows the breakpoint locations representing static non-linearity.
As can be seen in Figure 12, distribution of the points in the three models is effective and the non-linearity has been satisfactorily captured. This is reflected in Figure 11, where there are no visible inaccuracies in steady state behaviour. Differences can be seen on the transients due to non-modelled dynamics in models M . This is shown in greater detail in Figure 13, which highlights the differences. Notice that model M can be seen to lack a pole, its performance could be considered acceptable for certain uses. The effect of the non-modelled dynamics is more apparent in model M 2z3p 5 , which tries to reproduce the real dynamics with the effect of the non-minimum phase.

B. APPLICATION 2: THERMAL PROCESS
The second example consists of a lab-scale thermal process based on a Peltier cell. The input system is the voltage (u a ) applied to the Peltier cell actuator, which can vary between 0 and 4.5 Vdc. The temperature gradient between the hot and cold surfaces ( T ) was considered as the output of the system. Further information on the process structure, lab testbench, and the linear model estimation procedure can be found in previous work [28]. This non-linear real identification procedure is shown below: Step1 (Establish a Ranking of Candidates Using a Standard Linear Identification Method): The results of the linear estimation are summarised in Table 9. As in the numerical example, the MSE and the modified MDL criterion were calculated for each model.
Step 2 (Choose the Design Concepts): According to the modified MDL criterion and the MSE values, the best linear model is G 1z/4p (four poles and one zero). However, the MDL of models G 1z/3p and G 1z/5p is very similar. Model G 1z/3p has one pole less than the best model, so it would be interesting to compare the results obtained with the nonlinear estimation from these two linear structures. On the other hand, model   G 1z/5p will be excluded as there are two less complex models with slightly higher performance.
The remaining models G 2p , G 1z/2p , and G 3p have similar performance and any of these could be a good option for obtaining less complex nonlinear models. In this case, the linear structure G 1z/2p will be selected. The location of the poles and zeros of the three selected structures for nonlinear estimation are shown in Table 10. As in the numerical example, each of these structures gives rise to a different design concept. Step

(Prepare a Multi-Step Excitation Signal and Performs an Experiment With the Process Under Identification):
Two multi-step signals were generated, one for estimation and another for validation purposes. The estimation signal was designed with 38 steps and the validation with 24 steps. The steps in both signals lasted 700s, and the amplitude changes were handled randomly within the entire range of variation of signal u a (0 − 4.5V ). To highlight the non-linearity of the process, the minimum step amplitude was constrained to 1.5V . The thermal process was excited with both signals separately. After extinguishing the transient stage of the first step, the input and output data were recorded with a sampling period of 100ms. The input signals recorded for both estimation and validation tests are shown in Figure 14. Step 4 (Set the Bounds): The minimum and maximum values of the input and output signals were obtained from the estimation data set. These values were: u min = 0V , u max = 4.50V , y min = −53.957 • C and y max = −0.181 • C. As the previous example, y min and y max values were used to create W min and W max , respectively and u min and u max to define the variable limits V min and V max .
Step 5 (Run WH-MOEA for Each Design Concept): According to the selected design concepts, three bi-objective optimisation problems were stated, as in Subsection II-C. The configuration parameters in Table 11 were used for all three executions of WH-MOEA. For each design concept, the first individual in the population was coded using the poles and zeros of each initial linear model. The exploration bounds to refine poles and zeros were defined according to its location concerning the imaginary axis. In other words, the poles or zeros furthest from the imaginary axis had a greater exploration margin since they were less sensitive, while those closer to the imaginary axis had a smaller exploration margin. The codification of the first individual and the exploration bounds to refine the poles and zeros of each design concept are shown in Table 12.    On comparing DC − 1z2p front with DC − 1z3p and DC − 1z4p fronts, one can see the effect of omitting the two fast actuator poles in the models. This shortcoming cannot even be compensated by increasing the static non-linearity points.
As final part of the analysis, the designer can evaluate the models of all fronts on a different data set (validation set) than the one used for estimation. A practical example of this is shown in Figure 15. The displacement of the three Pareto fronts (models represented by diamonds) shows lower accuracy than those achieved by the estimation data set.
The Pareto front of the first model concept can be seen to stand out more to the left, indicating that the difference between the four-pole and three-pole models has become wider in terms of precision. This particular situation can also give the designer clues to select a suitable model.
Thanks to this approach it has been possible to compare Wiener-Hammerstein and Hammerstein model structures. Clearly, more complex models will be more accurate, however it will always be important to quantify their differences and select the best, considering not only model accuracy but also complexity. For example, if you require an adequate compromise between precision and complexity, model M . Figure 15 shows a comparison of the performance of these models on the estimation and validation data sets, while Figure 17 shows how the break points were located to represent the static non-linearity of each model.
At first glance ( Figure 16) it seems that the performances of the three selected models are the same; however an enlargement of the image (Figure 18) shows the existing differences mainly in the transient stage. Notice how model M 1z2p 2 (green) has a small difference with the other two models with a Wiener-Hammerstein structure. If a simplified model is required, M 1z2p 2 would therefore be a good option.

VI. CONCLUSIONS
This paper describes a new method of identifying nonlinear block-oriented models (Wiener, Hammerstein or Wiener-Hammerstein) and demonstrates its effectiveness with a complex numerical example and a real thermal process. The procedure is based on WH-MOEA, a new multiobjective evolutionary optimisation algorithm which was formulated ad-hoc to manage this type of block-oriented model without previous knowledge about process structure. The method highlights the importance of generating a set of models with common targets and diverse performance -the generated Pareto fronts can compare and analyse the trade-off between precision and complexity. The procedure is therefore not focused on the selection of a specific model, but rather on showing the benefits of obtaining a wide range of models with different features and giving the engineer a chance to choose a final model according to his/her preferences. The WH-MOEA procedure has two important advantages: on one hand, by studying the Pareto front of a specific design concept, the influence of a variation in breakpoints for the static non-linearity on model precision can be analysed, which is more useful to the user than other methods, as it eliminates uncertainty about the number of breakpoints that must be assigned to represent static non-linearity. On the other hand, the design concept analysis can be used to compare several candidates. For example, simpler models than those recommended by the MDL criteria can be found with acceptable performance. Finally, the procedure described here contributes to solving the issue stated in many control problems, where design requirements and user preferences do not always point to model precision as the only objective, as model simplicity can also be an influential factor in decision-making.
J. SANCHIS received the B.Sc. and Ph.D. degrees in computer science from the Universitat Politècnica de València, Spain, in 1993 and 2002, respectively. He is currently a Professor with the Department of Systems Engineering and Control, Universitat Politècnica de València. He has more than 15 years of experience applying soft computing methods to engineering problems related to process control and optimization. His main research interests are multivariable predictive control, constrained process control, and computational intelligence methods for systems engineering and process control. M. MARTÍNEZ (Member, IEEE) received the B.Sc. and Ph.D. degrees from the Universitat Politècnica de València, Spain. He is currently a Professor with the Department of Systems Engineering and Control, Universitat Politècnica de València, Spain. He has three years of industrial experience in control engineering in cement plants. He has published several chapters of control books, and numerous papers in conference proceedings and journals. His research interests are applied adaptive control, model predictive control, and evolutionary optimization applied to identification and process control.