ResNet and PolyNet Based Identification and (MPC) Control of Dynamical Systems: A Promising Way

This paper deals with model predictive control synthesis which take benefits from artificial neural networks to model (non-linear) dynamical system. More precisely, thanks to a systematic and rigorous methodology, it is shown that residual networks (ResNet) and PolyInception networks (PolyNet) neural network architectures, developed initially for image recognition, are very good candidate for i) identification of dynamical systems, ii) being used as embedded model in a model predictive control laws. Concretely, the widely used non-linear dynamical system quadruple tank process is used as a benchmark. The neural network architectures studied are i) feedforward networks as a reference point, and the two other linked to Euler integration method ii) residual networks and iii) PolyInception networks. Networks training is performed by mixing classical back-propagation algorithm and hyperparameters optimisation through heuristics. The identification results provided show that neural networks of types ii) and iii) perform better than the classical one i), with a better generalisation capability. Finally, model predictive controllers are synthesized based on the various networks trained. The simulation results obtained for controlling water levels of a 4 tanks system benchmark give interesting insights. They show that residual networks based model predictive control is better suited than feedforward networks and PolyInception networks based ones, both taking into account computation time and set point errors.

when available to identify a model [13], because nowadays an increasing amount of process data are available [14].
Machine learning is used to learn features from data, such as to identity objects from images, match items, transcribe speech or choose relevant search results. A common method is to learn the feature with supervised machine learning, where the purpose is to learn a function that can map the input and output pairs with the highest score [15], [16]. In control systems, system identification usually uses linear parametric models from measured data [17]. However, it has been shown that Artificial Neural Network (ANN) can outperform linear parametric models and reduce identification error [18]. Supervised learning has been used to identify complex non-linear dynamical systems [19], [20], and ANN have been assessed and evaluated for dynamical system identification. Without claiming to be exhaustive, we can cite FeedForward Neural Network (FNN) [21], [22], Time Delay Neural Network (TDNN) [23], [24], Recurrent Neural Network (RNN) [25]- [27], modern RNN with gating mechanisms such as Long Short Term Memory (LSTM) [18], [28], [29] or Gate Recurrent Unit (GRU) [30], [31].
The relation between numerical integration methods and ad hoc neural network architectures has been recently observed in [32] and other works [33], [34].
In [32] and [34], the authors noted the link between the Residual layer Network (ResNet) [35] and the forward Euler integration method [9], while [33] depicts the link between PolyInceptions neural Network (PolyNet) [36] and the backward Euler integration method [9]. A neural network architecture derived from higher order integration method is described as a Runge-Kutta neural network [37], which are also considered in [38]. There is a major difference between neural networks derived from integration methods such as ResNet and others such as FNN. In the first group, the rate of change is learned, while this is the state in the second group [37], [38].
Dynamical systems in control engineering could be considered as a control benchmark, which are not a fully industrial plant but more a plain toy process. These systems allow highlighting, developing, testing and comparing control methods or algorithms before considering industrial applications in a more sophisticated plant. An example is the inverted pendulum on a cart, which is well known by every graduate student in control engineering for learning feedback control [39]. Another example is the four tanks, known as Quadruple Tank Process (QTP) [40], which has been used to evaluate predictive control in [41], robust economic MPC in [42], sliding model control in [43] or proportionalintegral-derivative auto-tuners from manufacturers in [44]. The continuously stirred tank reactor is another benchmark in chemical engineering processes [45], which is used to present a two-layer EMPC control [46], a robust non-linear MPC control [47] or an MPC approximated by a neural network [48]. Another example is the blood glucose control relevant model for type-I diabetic [49], with a robust MPC control approach in [50] or contractive MPC in [51]. Another example is the spacecraft, which has been use to evaluate receding horizon control in [52] and robust MPC in [53]. In this work, we chose the QTP to comparatively evaluate candidate neural network architectures and the associated MPC implementation, because the QTP is both simple and popular in the MPC community [41], [42], and its behaviour is non-linear despite its simplicity. The QTP is also naturally linked to issues arising from the processing industry, such as wastewater treatment, nuclear steam generation, chemical treatment or food processing [54].
Finally, our contribution regards recent work aiming to control the QTP with an ANN using an FNN in [54] and RNN in [55]. These works use FNN and RNN, while our objective is to illustrate using a dedicated architecture which is articulated with the digital integration process.
Although many ANNs have been used for dynamic system identification, the recent discovery of architectures better suited to digital integration needs has not led to precise numerical comparisons. The main contribution of this work is to present an accurate comparison of ResNet and PoyNet and feedforward shallow architectures regarding their ability to identify dynamical systems in terms of computational complexity and forecast accuracy. The aim is to provide a numerical illustration using the case studies' QTP, which represents important industrial problems. An MPC command will be thereby developed for the QTP using input-output data via neural identification of the process model. Three MPCs are considered: one with FNN, the second with ResNet and the third with the PolyNet model. As a result, it leads to the FNN-MPC, ResNet-MPC and PolyNet-MPC. To the author best knowledge, this is the first time where neural networks link to integration method are used as a model for an MPC in order to control a dynamical system. The contributions of the present work can be divided into three main elements: • A presentation of different neural network architectures for non-linear dynamical system identification with supervised machine learning; • A fair comparison between the three neural networks is considered in this work; • A predictive controller based on neural models to control the process considered in this work. This paper is organised as follows: Section II presents a system setup; Section III depicts neural networks; Section IV presents neural computations; Section V presents the MPC; Section VI depicts the experimental setup; Section VII presents the results; Section VIII depicts the discussions of the work; and Section IX concludes the work.

II. DYNAMICAL SYSTEM IDENTIFICATION
The first step is to define by (1) the discrete time invariant system considered below: withx[k] andū[k] as the system state and input respectively, In addition, T s is the sample time. The absence of an output equation is linked to the assumption that all the states are assumed to be known and are therefore usable by a possible control law. The objective is to learn from data an appropriate approximation of the nonlinear system (1) of the form:  [56].

III. NEURAL NETWORK
The notation used for the neuron units are h for the output of a neural unit at sample instant k and v for the input.

A. ARTIFICIAL NEURON
The first artificial neuron was developed by McCulloh and Pitts [57]. In this work, the artificial neuron considered is defined as [58]: with h as the output of the cell, σ the activation function, W the weighting vector, v the input of the cell and b the bias.

B. FEEDFORWARD NEURAL NETWORK
When artificial neurons are combined, this leads to a networks of neurons, which is formalised by [58]: with h as the output of the FNN and v the input, g h i the non-linear transformation of hidden layer i , the Hadamard (elementwise) product operator. At the hidden , h i is the output, h i−1 is the input, σ i is the activation function, b i is the bias and W i the weighting matrix. Note that layers 1 and i − 1 from (4) are not hidden layers, but rather they are input and output layers (merged if only one layer is used in the network).

C. RESNET
Residual layer networks have been proposed in [35] to propagate the residual value of the input along the hidden layers in order to avoid vanishing gradient issues when training deep neural networks. The main feature of the ResNet is a skip connection and addition between each cell. ResNet is defined as: with h as the output and v the input, while G is an FNN as in (4).

D. POLYNET
PolyInceptions neural networks were proposed in [36] for image recognition, and they have multiple paths and lead to polynomial compositions. PolyNet is defined as: with h as the output and v the input, while G is an FNN as in (4).

E. DEEP RESNET AND DEEP POLYNET
ResNet and PolyNet with one layer are derived from forward and backward Euler integration methods (see appendix for details). We can stack multiple layers to increase the stage and integration order, such as a ResNet with n layers (see Fig. 2): For a PolyNet with n layers (see Fig. 3): with h n as the output of layer n (upper n denotes here the layer), and v the input. G is an inner non-linear transformation, which is considered as an FNN. Note that there is a modification of the original ResNet [35] and PolyNet [36], where in this work, G is the same FNN and they share weights and biases.

F. LINK TO DYNAMICAL SYSTEM IDENTIFICATION
To identify a discrete non-linear dynamical system, the prediction rule must first be determined.    the hidden layers. To remedy this problem, an input layer and output layer are added to the network (see Fig. 4). To adapt vector dimensions, neurons from the input and output layers have an identity activation function. The input layer and output layer are defined as: (14) with h in as the output of the input layer such as v : , W in , b in the weighting and bias of the input layer, W out , b out the weighting and bias of the output layer,x[k] the system state,ū[k] the system input,x[k + 1] the predicted state and h n the layer n output.

IV. NEURAL TUNING A. TRAINING
Neural network training aims to iteratively tune the neurons' weights and biases to minimise a fidelity measure [54]. Training is usually performed with the gradient-based backpropagation algorithm to adapt neurons' weights from the output layer to the input layer [59], [60].
The gradient-based back-propagation algorithm allows iteratively tunes weights and bias of the neurons, such as [61]: with W the weighting vector, W the increment vector, b the bias and b the increment vector. The increment vectors are: with η the learning rate and L a loss function. Also, the loss function is defined in section IV-B. It has been observed that the learning rate is sensitive in regard to stability and training performance with respect to identification performance. Large value derives in lack of robustness in convergence and small value derives in lack of identification accuracy. In order to improve the training performance, one strategy is to gradually decrease the learning rate during training [61]. It can be, for instance, achieved by added an exponential learning rate schedule, such as [62]: with η j as the updated learning rate, η 0 the initial learning rate, λ the decay parameter and j the iteration step. In order to enhance neural network training performance, we selected stochastic gradient descent Adam optimizer [63] and derivative Radam [64], Nadam [65], Oadam [66]. Adam allows to fine tune the learning rate during training as it combines the adaptive gradient algorithm and root mean square propagation. The decay parameters of Adam were evaluated in [67], and results finding showed that choosing Adam parameters between 0.9 and 0.999 allows statistically a training improvement.

B. FIDELITY MEASURE
A loss function L provides a quantitative scoring that shows the degree of similarity between data and model outputs [68]. The fidelity measure considered in this work is the Mean Squared Error (MSE): with D as the number of samples,x[k + 1] the neural network output andx[k + 1] the target. Other loss functions exist but will not be considered: Mean Absolute Error (MAE), Root Mean Square Error (RMSE) or Mean Absolute Percentage Error (MAPE) [69].

C. HYPERPARAMETERS TUNING
The hyperparameters defining the neural network are the number of layers, number of neurons, type of activation function, batch size, epochs and the optimiser method for gradient-based backpropagation. These hyperparameters influence the performance and complexity of the model [70]. The hyperparameters are optimised according to: with g mse as the cost function (see Eq. (20)), g nn the neural network trained considered and H ∈ H the set of hyperparameters for the optimisation problem. The optimisation problem is non-convex and challenging since it is non-differentiable and involves constrained variables [70]. The metaheuristics algorithms may be considered, such as genetic algorithms or particle swarm optimisation [71]. The algorithm of the optimisation and interlinking with the backpropagation algorithm is shown in Fig. 5.

V. NEURAL NETWORK MODEL PREDICTIVE CONTROL
The neural networks presented in Section III are non-linear and involve non-convex optimisation when uses within MPC, and the resulting finding can be a local optimum [72]. A Neural Networks Model Predictive Control (NN-MPC) is considered with additional contractive state constraints, which is called a contractive MPC [73], [74]. The additional contractive constraints aim to ensure that the MPC is stabilised [73]: At each iteration, the first sample of the optimal input computed is applied to the plant's actuators (Fig. 6). In addition, N h denotes the time horizon considered for prediction. Q, R, P are the weighting matrices. α ∈ [0, 1[ is the contractive parameter. P, Q and R are positive definite matrices, and T denotes the matrix transpose.

VI. EXPERIMENTAL SETUP A. QUADRUPLE TANK PROCESS DESCRIPTION
The considered benchmark has four tanks, two pumps and two three-way valves. The control has to manage a remain VOLUME 11, 2023  water level. Each tank has an orifice from which water leaks.
Pump q a fills tanks 1 and 4, and pump q b fills tanks 2 and 3 (see Fig. 7).

B. MODELLING
QTP is modelled using Modelica [75] with Dymola software from Dassault Systèmes to support the simulation and generate data using fine-scale modelling rather than physical equations. Each component is modelled, including: water properties, pumps inertia, and pipe roughness, using the Modelica standard library [76] and building library [77]. The components are graphically assembled to build the QTP simulation model. An overflow outlet is added to the four tanks to consider the maximal water level and to avoid simulation breaking, whose process parameters are shown in Table 1.
The Modelica four-tank program has 1,049 equations and the same number of unknown variables. These equations could not be used for the MPC controller due to some if-then-else conditions. A hybrid MPC controller is mandatory and leads to complex analyses, design and optimisation techniques [78].

C. DATA GENERATION
The nature of the input signals is essential for dynamical system identification. The relation between the input signals and the outputs variations are used to describe the dynamical system. Pseudo-Random Binary Sequences (PRBS) as input   signals are able to excite the system with a widespread frequency spectrum in order to acquire a unique set of parameters [17]. In this work, we performed the data acquisition of the QTP digitally using Dymola with the fine-scale dynamical system from Modelica program. The sampling period is equal to 5s and, Differential Algebraic System Solver (DASSL) is used during simulations to allow solving the Modelica model comprising mixed differential and algebraic equations [79]. The PRBS signals applied on pumps q a and q b are visible in Fig.8.a and outputs signals in Fig.8.b. Also, PRBS parameters are shown in Table 2. During simulations, it appears that all water levels are never in a steady state (Fig.8.b). In order to allow ANN to learn the QTP steady state, we chose to apply a second kind of input signals. They are piecewise constant signals with random values from minimum to maximum. Whose parameters are shown in  Table 3. Corresponding inputs and outputs signals are shown in Fig.8.c and Fig.8.d. We chose to consider a large amount of data. As a result, 10 368 001 time steps were generated.

D. DATA SEPARATION
Simulation data are separated between train, validation and test data. Training data are used to optimise weights and biases; validation data are used for hyperparameters optimisation; and test data are used for neural network generalisation assessment. In classical approaches data are separated between train, validation and test data and are picked randomly from all subspace. To assess neural network architectures in this work, data separation is performed according to the tank levels' value ranges. First, the data are processed to keep data within the water-level constraints ( Fig.9.a and Fig.9.b); second, the data are divided into sequences of 32 samples length; third, the sequences are separated into two groups, where the first group data are picked when the sequenced water levels of tanks 1 and 2 (h 1 , h 2 ) remained between 0.4 m to 0.8 m (Fig.9.c and Fig.9.d). The remaining data thus constituted the second group and to form the test data (e.g., water level lower than 0.4 m or greater than 0.8 m) (Fig.9.e and Fig.9.f). Finally, the first group is divided into two, with 5% for the validation data and the remaining for the training data. This data processing is shown in Fig. 10.

E. COMPUTING MACHINE IMPLEMENTATION 1) NEURAL NETWORK TRAINING
Neural networks are implemented on computing machines using the open-source Julia programming language [81] and Flux package [82]. This package allows easily implementing neurons, shortcut connections for ResNet, parallel connections for PolyNet, deep networks with multiple layers and all necessary training and fidelity measure computing. The hyperparameters are optimised using the BlackBoxOptim package which implements metaheuristics algorithms [83]. We chose the separable NES derived from Natural Evolution Strategies [84]. The hyperparameters considered in this work and their values are presented in Table 4. The hyperparameters optimisation is arbitrarily stopped when more than 1,000 network trainings are performed for each considered architecture (FNN, ResNet, PolyNet), resulting in 3,000 trained networks. The training is performed in parallel with 8 cores on the Central Processing Unit (CPU), which leads to 125 steps for the metaheuristics algorithms. The running environment is composed of Windows 10 on Dell Workstation with Intel Xeon Gold 5122 CPU and 128 GB of random access memory.

2) PROCESS CONTROL
Controller simulation is performed using Simulink. The Modelica model is first exported to Functional Mock-up Unit (FMU) for simulation. The controller interacts with the FMU according to Fig. 12 and is implemented through JuMP [85] using a multiple-shooting numerical method [72]. The neural network is defined with JuMP from the Flux object function as a JuMP user-defined function, which allows automatic differentiation to compute the derivative [87]. It is not necessary to rewrite the neural network within the VOLUME 11, 2023   optimisation modelling since Flux and JuMP are both in Julia's ecosystem. In addition, the chosen optimisation solver is Ipopt [86]. The set points used during simulations are shown in Table 5. The set-points are selected to evaluate control performance with different tank water levels. In addition, the set-points 1 and 3 are located inside the train data area and the set-points 2 and 4 are located outside the train data area. This choice will make possible to appreciate the extrapolation capability of the ANNs. Set point 1 in Table 5 could be considered as the saddle point. It is implemented to initialize tanks water levels at the beginning of the simulation. In addition, the four tanks water level references are inside the train data area. Set point 2 increases difficulties. References get closer to the state constraints and they are outside the train data area. As a result, generalisation capabilities of the ANN are harnessed. Set point 3 added another difficulty. The four water levels are not equivalent. However, water levels remain within the train data area. Set point 4 is the most difficult to reach. It brings all the difficulties: references get closer to state constraints and the four tank water levels are not equivalent. In addition, generalisation capabilities of the ANN are harnessed as the set point 4 is located outside the train data area.  Several simulations are performed to analyse control performance regarding the implemented neural network. For each neural network architecture (FNN, ResNet, PolyNet), the 25 networks with the lowest MSE with test data are simulated, and for each simulation run, a comparative performance index is calculated to assess networks' influence on control. This index is defined as: In addition, the MPC parameters used during process control are shown in Table 6.

A. SYSTEM IDENTIFICATION
The number of neural networks trained with the metaheuristic algorithm is 3,027 (1,009 for FNN, ResNet and Polynet). Neural networks with MSEs greater than 1 are removed,    Fig. 13 depicts the boxplot function loss of the neural networks after training and hyperparameters optimisation. The MSE with training data is visible in Fig. 13.a, MSE with validation data in Fig. 13.b and MSE with test data in Fig. 13.c. Fig. 13.a and Table 7 show that the 1 st quartile losses are reduced for neural networks related to numerical integration compared to the FNN. In addition, the lowest MSE is reached by ResNet for the median and 3 rd quartile, and the same is observable in Fig. 13.b and Table 8 for the MSE with the validation data. Note that all networks have a relatively equivalent performance in MSE with the training data and validation data, but the result is clearer when considering the generalisation properties, according to the results obtained with the test data in Fig. 13.c and Table 9. In this case, both neural networks related to the numerical integration method achieve a reduction of the median, 1 st and 3 rd quartile MSE compared to the FNN, and ResNet achieve the lowest MSE compared to the FNN and PolyNet.  The hyperparameters' influence over MSE with test data is shown in Fig. 14. The figure presents MSE with test data over hyperparameters optimisation with hidden layers, neurons, activation functions, optimisers and epochs for the FNN (first line), ResNet (second line) and PolyNet (third line).
First, boxplots of MSE over hidden layers for FNN, ResNet and PolyNet are depicted in Fig. 14.a, Fig. 14.f and Fig. 14.k. The results show that increasing the number of hidden layers increases the median MSE test loss for the FNN, with 1.34 × 10 −5 with one hidden layer, 3.32 × 10 −5 with two hidden layers and 3.86 × 10 −5 with three hidden layers. Increasing the number of hidden layers reduces the median MSE with test data for ResNet and PolyNet, 2.59 × 10 −6 and 7.09 × 10 −5 with one hidden layer, 1.83 × 10 −6 and 1.03 × 10 −5 with two hidden layers, 1.51 × 10 −6 and 4.16 × 10 −6 with three hidden layers.
Second, we can analyse the impact of the number of neurons per hidden layer (see Fig. 14.b, Fig. 14.g, Fig. 14.l). The results show that the algorithm investigates 10 to 18 neurons for FNN, 6 to 15 neurons for ResNet and 4 to 15 neurons for PolyNet, while the neurons' values range from 4 to 20 neurons.
Fifth, we can examine the MSE with test data over epochs (Fig. 14.e, Fig. 14.j and Fig. 14.o). The algorithm investigated 13 to 30 epochs for the FNN, 20 to 38 epochs for ResNet and 3 to 37 epochs for PolyNet. The epochs' allowed values range from 5 to 50, and thus the algorithm did not consider all possibilities. Fig. 14.e shows that increasing epochs from 15 to 25 allows a reduction of the MSE with test data, while it grows from 25 to 30. In Fig. 14.j with ResNet, the algorithm chose most epochs from 30 to 36, and the median MSE remains equivalent. Fig. 14.o shows that increasing the epoch allows  a reduction of the median MSE with data until 30 epochs with PolyNet.
We can regard the ResNet and PolyNet number of layers, (see Fig. 15.a and Fig. 15.b). The algorithm investigated 1 to 4 and a jump to 8 layers for ResNet but from 1 to 9 for PolyNet. Fig. 15.a shows that the median MSE plateaus from 1 to 4 layers with 1.62 × 10 −6 for one layer, 1.85 × 10 −6 for two layers, 1.54 × 10 −6 for three layers and 1.70 × 10 −6 for four layers. Fig. 15.b shows that the median MSE with test data plateaus from 1 to 4 layers and increases from 5 to 9 layers with PolyNet. In addition, the smallest median MSE with test data is obtained with 3.62 × 10 −6 and one layer.
Finally, Fig. 16 depicts boxplot training time with the FNN (Fig. 16.a), ResNet ( Fig. 16.b) and PolyNet (Fig. 16.c), with values presented in      (Table 11). Fig. 18 depicts simulation results with water level controlled by the ResNet-MPC for the 25 simulations. Fig. 18.a and Fig. 18.b show that state trajectories are not the same according to the neural model. Fig. 18.c and Fig. 18.d show the lowest J with state trajectories from 2, 700s to 12, 000s. The lowest J with ResNet-MPC is equal to 2.576 × 10 −1 (Table 11). Fig. 19 depicts simulation results with water level controlled by the PolyNet-MPC for the 25 simulations. Fig. 19.a and Fig. 19.b show that state trajectories are not the same according to the neural model. Fig. 19.c and Fig. 19.d show the lowest J with state trajectories from 2, 700s to 12, 000s. The lowest J with PolyNet-MPC is equal to 2.567 × 10 −1 (Table 11).   in Table 11. The lowest J is observed with FNN-MPC, the lowest median for ResNet-MPC, the lowest 1 st quartile for PolyNet-MPC and the lowest 3 rd quartile for ResNet-MPC. Fig. 21 depicts the boxplot mean computation time with regulators considered in this work, namely FNN-MPC ( Fig. 21.a), ResNet-MPC ( Fig. 21.b) and PolyNet-MPC (Fig. 21.c), with values presented in Table 12. The lowest mean computation time is observed with ResNet-MPC, the lowest median with FNN-MPC, the lowest 1 st quartile with FNN-MPC and the lowest 3 rd quartile with ResNet-MPC.

VIII. DISCUSSION
ResNet and PolyNet were originally presented for computer vision applications such as image classification, as they are well suited for this task by using hundreds of layers for increasing accuracy, but they are only recently used for   dynamical system identification. In this work, we observed that in the case of ResNet and PolyNet, the best performances were obtained with only a few layers, and that increasing the number of layers does not increase the dynamic identification accuracy of the system (range 1 to 9). ResNet seems to be a favourable option, while the performance of the FNN for dynamical system identification was shown to greatly depend on the chosen activation function, which is a point of weakness.
In this work, neural network identification for control purposes was further considered, specifically for MPC control. It was observed that the lowest identification error (MSE) did not systematically produce the lowest process control error (J).
The evaluation criteria are given in Tab.13 for each controller. They are rated from + to + + +, with + + + better than +. ResNet-MPC produces the lowest identification error, the best performance control criterion J, the lowest computation time for the MPC. It has the best generalization ability compared to FNN-MPC and PolyNet-MPC. In addition, ResNet-MPC gives the most consistent results in control. Indeed, the results are good for a large choice of hyper-parameters, and homogeneous. The only drawback lies in the increase in training time compared to feedforward networks, the training was performed on CPUs only, and valuable reduction of training time could be achieved using specific hardware such as GPUs [90] or TPUs [91]. However, the performances of the PolyNet MPC are disappointing, in that we observe sometimes erratic simulation results (despite a rather favorable criterion). Hence the bad notation (+) next to Fnn-MPC and ResNet-MPC.
Finally, ResNet-MPC turns out to be the best solution, and a good candidate for systematic use in neural network based modeling and NN-MPC control of dynamical systems.

IX. CONCLUSION
ResNet and PolyNet neural networks are architectures that can be linked explicitly to Euler integration method. Thus, the initial motivation in this study was to evaluate their particular interest to deal with dynamical systems. Firstly, for non-linear identification, and secondly as a support to MPC implementation, using ad hoc numerical integration scheme for the prediction part. A contribution of this paper lies in the consideration of these architectures outside their usual context of use, e.g. as deep networks for image recognition.
To assess the relevance of these networks for regression and control, the following choices were made. The first choice consisted in considering a representative multivariable system with coupled dynamics, illustrating certain classical characteristics of industrial processes. The second consisted in proposing a methodology to ensure a fair comparison of ResNet and PolyNet with the classical FNN. Common evaluation criteria have been defined. A systematic investigation of hyperparameters via metaheuristic optimization has been proposed. Finally, the data have been segmented in an original way to highlight the generalization capabilities.
The results of this work confirm, first of all, the particular interest of the two architectures ResNet and PolyNet studied. ResNet and PolyNet achieve equivalent performance on learned data. It is even observed, which was unexpected, that the generalization capacities of the ResNet surpass those of the PolyNet, in spite of the more complex architecture of the latter. Beyond that, we analyzed the quality of MPC water-level control obtained via MPC, considering in turn FNN, ResNet and PolyNet based prediction models. ResNet once again appears to be superior, both in terms of the homogeneity of the results obtained and the average computation time of the resulting MPC, which is lower than with PolyNet.
The perspectives of this work are multiple. To be concise, the work now consists of implementing the proposed methodology within the framework of the control of an agricultural greenhouse. At the same time, we are working more fundamentally on the problem of physics informed learning, in order to enrich the potential of the methodology in the case of a small volume of experimental data.

APPENDIX
For an ordinary DEẏ(t) = g(y(t)), y(0) = y 0 , the forward Euler integration method is defined as [9]: with y[k + 1] as the solution at k + 1 and T s the sample time.
The link with ResNet is observable with Eq. (5). In addition, the backward Euler integration method is [9]: with y[k + 1] as the solution at k + 1 and T s the sample time.