Robust Development of Active Learning-Based Surrogates for Induction Motor

A robust open-source cloud-based workflow is developed for finite element (FE) data generation for active learning (AL)-based surrogate modeling. Special attention is paid to making the FE solution procedure as robust and fast as possible without human intervention by, e.g., implementing special convergence criteria, reliable parallel computation, and variable timestep length. In AL, a surrogate model automatically improves itself by iteratively querying more FE data. Using AL and large datasets generated with parallelized cloud FE simulations, we develop a surrogate model to rapidly predict induction machine steady-state torque, torque ripple, total losses, and current harmonic distortion, as a function of motor frequency, voltage, and slip. Results show that AL performs better than grid sampling and on average works as well as random sampling, but with some outputs, the results vary less with AL. In addition, accurate ripple estimation requires a much larger training dataset than the other variables.


I. INTRODUCTION
D ATA-DRIVEN models offer new ways of modeling and simulating electrical machines (EMs).For long, ML has been utilized in the analysis of field data for fault diagnosis purposes.Lately, different surrogate [1] and reduced order models [2] have been developed.The idea of these surrogate models is to distill the accuracy of a high-fidelity physicsbased model, e.g., a finite element (FE) model, into a model that is significantly faster to run.In EM application, surrogate models can be utilized in, e.g., design optimization, anomaly detection in condition monitoring, control, and digital twins [1].To develop such a fast and accurate surrogate model based on physics simulation data, a vast number of simulations could be needed to run.
If the physical simulation is slow, it is desirable to minimize the number of such runs.One modern way of doing this is to allow the machine learning (ML) algorithm to decide new data points to improve the model instead of using tradition design of experiments (DoEs).This is called active learning (AL), which is a subtype of ML where a learning algorithm can interactively query new data points from a user or another software [3].In EM surrogate modeling, the data source can be, e.g., an FE model of the EM.Previously, AL surrogates have been utilized, e.g., in material design [4], optimization [5], and engineering [6].To the best of our knowledge, our study is the first to apply active learning surrogates to any electrical engineering application.
In this article, we develop a workflow using open-source tools for surrogate development combining AL, a FE solver, We demonstrate the workflow with a case where a surrogate model is developed to predict the average torque T , torque ripple T ri p , total loss Ploss and current total harmonic distortion THD for an induction motor in a large range of operating points determined by supply frequency f , voltage U and motor slip s.For that, we implemented two new convergence criteria directly for the steady-state cycle-averaged torque and a scheme for variable timestep length.We aimed for high convergence robustness to have reliable results in all feasible operating points without human intervention.The presented surrogate model can be used in, e.g., EM control to reduce losses, torque ripple, or current ripple, and to indicate faults and anomalies, if the quantities behave unexpectedly compared to the surrogate.Moreover, torque and losses together can be used to optimize a single motor operation or a large-scale industrial system that uses multiple EMs of different sizes and power, resulting in energy savings during the whole product lifecycle.THD of the current in drive-operated machines helps to understand the stability of the control system in the drive, and in general, indicates the stability of the machine operations and overall condition of the machine.Since the high torque ripple can be co-related with bearing health and high current ripple can be co-related with high iron losses, bearing health and operational temperature can be estimated in real-time with estimations of the torque ripple and THD of the current.a three-phase 11 kW skewed-rotor induction motor nonlinear FE multi-slice model (Fig. 1).The motor has a nominal point at f = 50 Hz, U = 400 V, and s = 0.0163.Scalar control was implemented by keeping the U/ f ratio constant under 50 Hz and using flux weakening over 50 Hz.The range for frequency was 10-100 Hz and for voltage 40-500 V so the voltage range depends on frequency.The slip had an upper limit for each frequency and voltage so that the torque did not significantly exceed the nominal torque of the motor.

B. Robust FE Computation
Since our aim is to make the FE computation a black box producing the desired output from any feasible operating point without human intervention, we implement methods to make the FE solution as robust as possible.We aim at a faster FE solution time for surrogate development.For that, we utilize the parallel computation capabilities of Elmer for multi-slice modeling in the most robust way by using one MPI process for each slice, since using more processes per slice would not make the surrogate development substantially faster [7].On the contrary, we choose to solve several Elmer runs of different operating points in parallel, and that way reach full parallel efficiency.The Elmer multi-slice FE model has been experimentally validated, e.g., in [7] with a case close to ours.
The case study model has long solution transients even when a harmonic solution is used as an initial guess.To reduce solution time, a methodology to geometrically shorten the timestep length was implemented in Elmer.The simulation was initiated with 50 timesteps per electrical period and shortened to 200 timesteps per period during four cycles in the ramp-up phase, and timestepping continued from there until the transients caused by the timestep change were attenuated.With such short timesteps also torque ripple may be studied.
In FE analysis of EMs, an excessive number of timesteps is typically solved to get sufficient accuracy for all interesting quantities.For running hundreds or thousands of FE simulations in different operation points, the convergence criteria have to be selected cleverly and the convergence needs to be reached in all operating points.In a surrogate generation, the convergence criteria should directly measure the surrogate output quantities.We ended up implementing two convergence criteria to measure the convergence of the torque.The first one measures when the torque averaged over the cycle has been converged, monitoring when the difference between the cycle-average torques of two successive cycles falls below a limit.The second one measures the variance of the torque inside each cycle and accepts the result when the variance is less than the selected value.These criteria were selected, as we are especially interested in the steady-state average torque and the torque ripple of the motor.These criteria were observed to work as well for getting exact enough results for motor loss components and for the stator winding current harmonics.
After the solution has converged on our two criteria, one additional cycle was computed with post-processing activated, computing the total losses and stator winding current harmonic components over the work cycle.The core losses were integrated with the Bertotti model, the stator losses from electrical current, and the bar losses from eddy currents.The THD was evaluated based on the FFT of the electrical current waveform.

C. Cloud Computation
The Elmer simulation was performed in CSC's Rahti Kubernetes container cloud, enabling easy and quick deployment of fast computation resources without queueing.A generic network API was built for Elmer in Rahti and a Python code was implemented to utilize the API from a local computer.For each run, new input parameters ( f , U , and s) were updated into Elmer model files and sent to Rahti.After the simulation was completed, the results were available for download.Each job utilizes eight parallel MPI processes and ten jobs could be run in parallel.The computation of one job took 297.5 s on average.Fig. 2 shows an example of the average torque in selected data points.To benchmark the AL method efficiently, the Rahti API was used to generate all the datasets described in Section III before AL experiments, and samples chosen by the AL algorithm were picked from one of them called the pool.In a real setting, one directly connects the AL algorithm to the Rahti API to query new data.

III. AL SURROGATE
AL provides a data-efficient way to generate an accurate surrogate model by querying new data from an FE model.In AL, a surrogate quickly evaluates a large number of input candidates, of which the most promising, in terms of potential improvement to the surrogate model performance, are selected for querying [3], as illustrated in Fig. 3.This way, the number of required data points for training an accurate ML model can be potentially reduced which is important when the FE model is computationally slow.In this study, the standard deviation of model predictions is used to select the inputs.
The initialization of AL requires an initial dataset for training the first surrogate, which is here an ensemble of eight neural networks (NNs) or random forest (RF) models built using the Scikit-learn Python library [8].Initial datasets of 8 and 64 samples were generated using grid-like sampling.In addition, Latin hypercube sampling (LHS) was used to generate two datasets of 100 samples for validating and testing the surrogate.To study how much the surrogate's accuracy improves with very large datasets, grid datasets of 216 and 729 samples and an LHS dataset of 1490 samples were generated.AL experiments started from 8 or 64 samples, querying one sample from a pool of 729 samples per round until the size of the training datasets was 64 and 216, respectively.In addition to the given input variables, polynomial features s × U, s × f, and U × f were computed and used as input Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I RMSES OF MODELS TRAINED WITH DIFFERENT DATASETS
to the surrogate models.All experiments were repeated five times with both model types, and each experiment was also repeated using random sampling instead of AL to compare the performance of the two.
The model error was evaluated using root mean squared error (RMSE) in the experiments and the results in Section IV represent the average and standard deviation of RMSE of the five repetitions.The RMSEs were calculated using the test dataset to measure the generalization ability of the model.

IV. SURROGATE MODELING RESULTS
The prediction RMSEs of models trained with different datasets in Table I show that with 216 samples, active learning outperforms grid sampling (traditional DoE) except when NN is used to model T ri p .However, other results show that  RF is better suited for modeling T ri p , and NN achieves lower RMSE with T , Ploss, and THD.The results with grid and LHS datasets of 729 and 1490 samples, respectively, show that significant improvements in RMSE can be achieved with very large datasets.
Figs. 4-7 represent the evolution of the average and standard deviation of RMSE as more data is added to the training dataset.They also confirm that RF works best for T ri p and NN for the rest.A comparison of AL and random sampling (labeled R in the figures) shows that there is only a little difference when modeling T and Ploss.With T (Fig. 4), the RMSE decrease slightly faster with AL but in the end, it is the same as with random sampling.However, it should be noted that the standard deviation of RMSE is much larger with random sampling than with AL.
With Ploss (Fig. 5), the RMSE with random sampling remains slightly lower throughout the sampling process than with AL.The predictions of the best model for T and Ploss with datasets of 64 samples are visualized as coefficient of determination (R 2 ) plots in Fig. 8 shows that the error is at a sufficient level.R 2 is the percentage of variance in the output that is explained by the inputs.The standard deviation of Ploss RMSE is approximately equal with both random sampling and AL.
Since the RMSE of T ri p and THD was not sufficient with datasets of 64 samples, the AL process was repeated for them starting from 64 samples and sampling until Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.216 samples were reached.The results for T ri p (Fig. 6) show that the RMSE decreases from 1.45 to 0.7 Nm on average.Its RMSE decreases rapidly when AL starts but then the standard deviation of RMSE increases and the average RMSE stalls before decreasing again at around 155 samples.The results for THD (Fig. 7) show that the RMSE decreases from 1.28 to 0.99 Nm on average but there are three sudden increases in the RMSE during the AL.The reason for such behavior is to be studied more closely in future work.
The R 2 plots (Fig. 9) show that even with 216 samples, the variance of T ri p predictions is still relatively high, whereas the NN achieve low error for THD.However, Table I and Fig. 9 show that modeling T ri p accurately requires a large number of training data points.Its RMSE decreases by a factor of 2.7 when the amount of training data is increased by a factor of 6.9 (A216 versus L1490).
The CPU times to predict one sample with an NN and an RF model were 0.54 and 0.32 ms, which correspond to predicting 1866 and 3110 samples per second, respectively.Therefore, the speed was approximately 0.5-1 million times greater than the FE simulation.In the surrogate model generation, FE simulations dominate the time consumption, which is determined by the number of required training data points.
V. CONCLUSION We presented a novel active learning workflow for generating FE data for surrogate model development in a robust and rapid way without human intervention.Induction motor torque, torque ripple, loss, and stator winding current THD estimation results were presented, where active learning was recognized to improve the surrogate accuracy, at least with a low amount of data.However, the difference to the random sampling was not substantial, which might be due to the test case with only three-dimensional parameter space, but the reason should be further investigated.Modeling torque ripple accurately required a substantially larger training dataset than the rest of the outputs, and it was also the only studied output variable that could be predicted more accurately with an RF than with an NN model.However, it was noticed that the use case needs to be quite complex in order to active learning to outperform traditional DoE in the required FE simulation time.

Fig. 1 .
Fig. 1.Multi-slice model of the case study motor with magnetic flux density and flux line solutions.Both end and side view.

Fig. 2 .
Fig. 2. FE torque in data points in three different 2-D sections of the 3-D parameter space, with the third parameter limited to a narrow range described above each subfigure.Left: s around 0.015, Middle: U around 400 V, and Right: f around 50 Hz.