Analysis of the ultrasonic signal in polymeric contaminated insulators through ensemble learning methods

Outdoor insulators may experience stress due to severe environmental conditions, such as pollution and contamination. Through the identification of partial discharges by ultrasonic noise, it is possible to assess the possibility of a power grid failure occurring. In this paper, ensemble models are used to analyze an ultrasonic signal from an ultrasonic microphone Pettersson M500. As the insulators are susceptible to developing irreversible failures, it will be evaluated whether the ultrasonic signal will remain over time, so that it is possible to assess whether the discharges being captured can result in a failure in contaminated polymeric insulators, evaluated in a high voltage laboratory under controlled conditions. The ensemble models were used in this paper because they typically require less computational effort than techniques based on deep learning and have acceptable performance for the problem at hand. The bagging, boosting, random subspace, bagging plus random subspace, and stacked generalization ensemble models are evaluated, and the best result of each model is used to compare the differences between the models. The bagging ensemble learning model proved to be faster and have lower error than other ensemble models, long short-term memory (LSTM), and nonlinear autoregressive (NAR).


I. INTRODUCTION
To guarantee the supply of electricity to consumers, it is necessary to have a reliable power grid. Electrical power system inspections are the key to early fault identification and predictive maintenance [1]. Usually, during the insulation degradation process, there are various types of manifestations that can be caught with appropriate equipment. By monitoring such manifestations in regular field inspections, the reliability of the system can be increased [2].
There are several types of insulators degradation processes, one of them that is common to be observed in inspections is the build up of contamination on the surface of the insulators in rural or polluted areas [3]. When contam-ination is strongly attached to the surface of the insulator, its conductivity may increase, making this component more vulnerable to flashovers [4]. Some techniques to identify the contamination can be applied for the diagnosis of the insulator's condition [5].
According to Lv, Zhao, and Song [6] the non-soluble deposit density in a fog environment can be analyzed using the finite element method (FEM) to evaluate the electric field in transmission line insulators. The FEM stands out for the flexibility of the evaluation of parameter changes in electrical equipment, thus, profile variations can be evaluated without the need to develop several prototypes [7].
The composite materials shows advantages in this applica- VOLUME 4, 2021 tion and are being widely used, then, it becomes increasingly important to assess the influence of contamination on these new components [8]. A major difference of composite materials in relation to porcelain insulators is their hydrophobicity capacity, reducing the levels of the leakage current, which may also reduce the degradation process, thus improving anti-pollution performance [9].
The prediction of time series can be performed using several models [10]- [12]. There is no model that is suitable for all signal variations. Thus, to reach a better accuracy is necessary to compare the models that are used for this purpose [13]. Based on this statement the contributions of this paper are: • The ensemble learning methods are adequate for time series prediction because they require less computational effort compared to deep learning structures. To evaluate that, a comparison between the ensemble models and the long short-term memory (LSTM) is presented. The LSTM is an algorithm applied in deep learning that is widely used for time series prediction. • The second contribution is related to the possibility of variation in the structure, based on different arrangements it is possible to apply a more efficient structure. In this paper the bagging, boosting, random subspace, bagging plus random subspace, and stacked generalization models are evaluated. • Finally, the third contribution is related to the analysis of a chaotic time series. Signals with high noise are difficult to be analyzed as the high frequencies can have a nonlinear pattern. For this reason, the algorithm needs to be robust to have a reliable result. The necessary feature to deal with this type of data is found in the ensemble models that are presented in this paper. The continuation of this paper is organized as follows: In Section II related works are presented. Section III discusses the characteristics of polymeric insulators and presents the laboratory test, carried out to detect the ultrasonic noise of contaminated insulators. In Section IV the Ensemble Learning Models are presented. In Section V the results of the analysis are discussed and finally, in Section VI the conclusion is described.

II. RELATED WORK
To improve the diagnosis of possible failures in electrical power components, an approach that has proven to be successful is the prediction of failures [14], which is the specific subject addressed in this paper. Among the techniques used to improve the predictive capacity of the model, the wavelet transform stands out for having the ability to reduce the noise in the signal without losing its characteristic [15]. Other techniques widely used nowadays are the LSTM and the support vector machine (SVM) that can be applied to different type of problems [16].
One of the practical difficulties in identifying an insulator with damaged properties is that defects can be hidden under the mooring cables or under contamination [17]. Identifying a failed component during an inspection of the electrical power system requires great experience from the operator [18], even then the human factor may bring uncertainties to the process [19].
Based on the growing use of polymeric insulators and the need to diagnose their conditions [20], this paper presents an evaluation of the time series prediction, to determine the development of a fault by verifying the signal variation. The evaluation presented in this paper is related in terms of the ability to predict a chaotic signal emitted by an ultrasound detector, which is a specific equipment for inspection of the electrical power system.
Increased contamination on the surface of an insulator results in a cumulative loss of its insulating properties, resulting in a greater likelihood of discharges occurring in its surroundings. Partial discharges, which usually occur during the degradation process, emit ultrasonic noise that can be identified with specific equipment [21]. From the prediction of increased discharges or a variation in the ultrasonic signal, it is possible to predict a flashover before it happens.
Pre-processing the ultrasonic signal is a strategy to improve the ability to predict a failure, however, many algorithms that filter the signal can reduce relevant information. This becomes a major challenge for prediction models, considering that with more nonlinearities it is more difficult to train the model properly. Some studies have been carried out to classify the condition of insulators using acoustic signals [22]. In addition to classification, it is promising to evaluate the capabilities of predicting the signal of a contaminated insulator, considering that it is more susceptible to failure.
Nowadays, one of the approaches that are being increasingly used is the ensemble learning method, mainly because of their lower computational effort compared to other techniques. When the problem is divided into smaller problems to be solved by simpler combined models, an efficient framework is obtained for dealing with complex problems [23]. As it is possible to combine the weak models in different ways to obtain a robust structure, it is possible to carry out variations in the architecture until a suitable model is obtained to be used in the problem in question. Because of this ability to adapt the model structure to the problem and require less computational effort, the ensemble learning models stand out to predict chaotic time series [24].

III. OUTDOOR POLYMERIC INSULATORS
High voltage polymeric insulators that are used outdoors are susceptible to tracking and erosion due to contamination, since hidrophobicity eventually is lost [25]. From the beginning of these effects there is a tendency to continue until there is a flashover and the insulator is damaged [26]. Based on this, there will be a failure in the electrical power system that may cause a disconnection from the grid and the components will need to be replaced through corrective maintenance, leaving the system off [27]- [29]. Therefore, it is necessary to detect these effects before a disruptive fault occurs. The ultrasound can be used to assess the level of contamination, making possible to perform predictive maintenance, cleaning the electrical power grid, or changing the component that may develop the failure [30].
The development of the fault can be linked to the increase in a measure of the insulator's condition. The leakage current is a measure that can be used to evaluate the time series measured according to the increase of contamination, thus being possible to forecast the development of a failure by the variation of this measure [2]. Regarding the time series forecasting, several approaches are being studied to obtain a better performance with lower computational cost, such as the ensemble learning models which will be presented in the next section.
According to Meyer and Pintarelli [31] the polymeric insulators are being used in electricity distribution networks in Brazil, considering that they are lighter and therefore easily installed. A long rod 24.2 kV class polymer insulator for anchoring the electrical distribution network [32], with surface contamination was evaluated. This component is presented in Figure 1, where the electrical potential is applied to the top of the insulator and ground is connected to the bottom.

A. LABORATORY SETUP
To simulate the contamination found in the field, a solid contamination method was used with Kaolin inside the salt spray chamber (as shown in Figure 2). Kaolin is an ore composed of hydrated aluminium silicates that are used to simulate contamination in outdoor insulators. Initially, the insulators are cleaned with isopropyl alcohol and dried in an oven, the kaolin is weighed with a precision scale to have the exact concentration to simulate contamination. The insulator is immersed in a glass beaker containing a slurry of Kaolin. The definition of the quantity of salt used for contamination is determined by measuring the conductivity during sample preparation [33].  After the contamination is evenly distributed over the insulation surface, the insulator is dried in the oven and installed in the chamber for measurements. The chamber has a volume of 8 m 3 , being 2 m high, 2 m wide and 2 m deep. The salt spray chamber is designed for laboratory analysis to be carried out under controlled conditions. To perform the artificial contamination contamination the international electrotechnical commission (IEC-507) [34] was used, which is specific for determining the characteristics of the tolerably of artificial pollution. This standard is developed by the insulator studies commission for overhead lines and substations, which is specific for tests on artificial pollution in high voltage insulators [35].
The chamber has a conductive metal arc that is energized with high voltage to simulate the electrical cables, this arc is connected to a porcelain bushing, which in turn is connected to an external muffle which is connected to an isolated medium voltage cable, which in its other termination also has an external muffle connected. The muffle is connected to the high voltage of a single-phase transformer. The low voltage of the transformer is connected to a voltage regulator that is connected to the mains. The voltage is measured by a multimeter connected to a high voltage probe at the high voltage terminal of the transformer.
The chamber was used because it is completely sealed and helps to reduce possible external interference in the ultrasound measurements. In this way, the insulator was connected to the ring with electrical voltage applied and the other side of the insulator was grounded. The microphone was positioned inside the chamber directed to the subject insulator, at a distance of approximately 50 cm from the insulator. The microphone was connected to a notebook and then measurements were made.
In this paper, 10,000 samples recorded in the measurement were considered. In this way, all ultrasonic frequencies are within the analyzed spectrum and the signal becomes pro-VOLUME 4, 2021 cessable [36]. Due to the high amount of the data, the holdout approach [37] is adopted in this paper. For training and testing 70% and 30% were respectively used in the machine learning models.
For comparative purposes, Figure 3 presents the image of the signal recorded by the ultrasound detector of an insulator in good condition (which is clean and new), and an insulator that is contaminated. A window of the recorded signals was presented with only 500 samples, then it is possible to better visualize the differences. In this work, in particular, the signal evaluated was that of the contaminated insulator.  It can be observed that the signal of the contaminated insulator has a greater amplitude as well as variations containing lower frequencies. The visual difference between the signals is small, which makes it difficult to identify a fault in the field. When an insulator develops its leakage current the signal amplitude becomes even higher with frequencies that have its origin in partial discharges and/or dry band arcing [38].
Due to the large amount of non-linearities in the signal caused by discharges, forecasting the time series of a signal based on ultrasonic noise is a difficult task, making it necessary to use advanced prediction models such as deep learning strategies or techniques that combine several weaker learners to obtain a stronger model, such as ensemble-based techniques that will be the focus of this paper.
The identification that there is an increase in the amplitude of the signal is possible through advanced artificial intelligence techniques [39]. The choice of which technique is most suitable for the problem is difficult, since some techniques have a high computational effort [40]. Based on the problem presented in this section, ensemble models are used to predict the signal produced by ultrasound, these models are presented in the next section.

IV. ENSEMBLE LEARNING MODEL
The ensemble learning models are approaches that combine weak learners to obtain an algorithm with greater regression capability. These methods have currently stood out in applications related to the electrical system, as their main advantage is their superior convergence speed compared to deep learning strategies [41].
In this paper, the weak learners used for the ensemble models are the support vector regression (SVR) type [42].
The SVR to perform the relationship between input and output of the data is given by: where f (x) is the forecasting values and ϕ is the mapping of the input vector x [43]; w and b are adjustable the coefficients calculated by minimizing the risk function (R): where L ε is the loss function to penalize the training errors, calculated by: The use of the loss function in the regularized function leads to a quadratic programming problem [44]. The minimization of the regularized function can be rewritten as the equivalent optimization problem, which is often called the primal problem: Simplifying the dual problem: subject to Given the Karush-Kuhn-Tucker (KKT) conditions for the primal problem [45], the dual form of the regression function is: wherein α i and α * i are the Lagrangian multipliers [46]. The kernel functions K(x i , x j ) used in this paper were linear (LIN) (10), radial basis function (RBF) (11), and polynomial (POLY) (12).
The optimizers used in the SVR were soft-margin minimization via quadratic programming (L1QP), iterative single data algorithm (ISDA), and sequential minimal optimization (SMO) [47]. Using the L1QP a linear approximation is considered in the space resource. The approximation function is calculated by minimizing the approximation error for training and response data [48]. This is accomplished by minimizing: for where M is the number of training data, w is the weight vector, x i is the training data, y i is the response variable, C is the margin parameters, and ζ i is the positive slack variable. ISDA is designed to avoid the use of typical solvers [49]. The important feature of the algorithm is that it deals with one data point at a time to develop the objective function. SMO systematically solves many small optimization problems that are divided into subsets, including only 2 Lagrange multipliers at a time [50].
One of the great difficulties in using ensemble models is that there is a wide range of approaches and defining the best strategy is a difficult task [51]. Based on this premise, this paper aims to compare variations of these algorithms for a chaotic time series prediction problem, specifically, bagging, boosting, random subspace, and stacking ensemble learning models will be evaluated. The differences between these models are mainly given by the way that the weak learners are organized, being usually these structures bagging, boosting, random subspace, and stacked generalization.
The bagging ensemble learning will focus on getting an ensemble model with less variance than its components, with stacking mainly trying to produce strong models less skewed than its components [52]. To fit several independent models and calculate the average of their predictions to obtain less variance would be necessary to have a very large dataset, then the bagging model is considered to approximate properties of bootstrap samples to fit the model, which are almost independent [53].
Initially, examples are created for each new bootstrap that acts as other approximately independent datasets taken from the true distribution. Thus, each weak learners is adjusted for the samples and these are aggregated obtaining an average of their results [54]. In Figure 4 the bagging ensemble learning model is presented, it is possible to see that the strategy consists of adjusting several base models in different bootstrap samples to build a model that is the average of these results.  The meta-learner is the combination of the weak learners, which can be realized in several ways besides bagging ensemble learning. From an ensemble learning framework, the use of SVR for regression the meta-learner is given by: here the final meta-learner function is S(·) and each weaklearner regression is given by w l (·), where L is the number of weak learners [49]. Some strategies like boosting and stacking work similarly to aggregate the weak learners and obtain a model with better performance [55]. The boosting model consists of sequentially adjusting several weak learners in an adaptive way, so more importance is given to observations that were poorly handled by previous models in the sequence [56].
The boosting ensemble learning presented in Figure 5 focuses its efforts on observations that are more difficult to fit, therefore, the resultant has less bias. Weak models tend to have low variance and high bias [49]. Based on this characteristic the models have little degrees of freedom when parameterized. As the fit of models cannot be done in parallel, it can be computationally expensive to sequentially fit complex models [57].
To perform a comparison between the ensemble learning models, SVRs weak learners were used. Thus, it is possible to perform a comparison between the ensemble structures to assess which model is more suitable for the problem in question. The SVR determines support vectors close to a hyperplane that maximize the margin between the two-point classes obtained in relation to the difference between a target value and a threshold value [51].
As with other strategies, the stacking ensemble performs a combination of weak models to result in a model with greater processing capacity (meta-model). For a classification task, beyond the SVR the weak learners can be, for instance, support vector machine (SVM) [58]- [60], k-nearest neighbors (k-NN) [61]- [63], or decision trees [64]- [66]. The artificial neural network will take as inputs the results of the weak learners and it will return the final predictions based on these [67]. The structure of the stacking ensemble model is shown in Figure 6. The first step in this algorithm is to adjust the data referring to the input of the network and the second step adjusts the meta-model using predictions made by previous weak learners [68]. The data division is performed in such a way that the training of weak learners is not relevant for the training of the meta-model, only the combination of its results has an influence on the training process [69]. To obtain better reliability in the model, cross-validation can be applied to separate the dataset. For the stacking approach the combination of weak learners is presented in Figure 6.

Initial Dataset Weak-Learners Weak-Learners Predictions
Meta-Learner To prevent overfitting a regularization parameter is used. In the boosting structure, new models are iteratively trained, focusing on observations that previous models had greater difficulty in predicting, making this structure predictive. As the goal is to reduce the bias of the simplest predictors, it is suitable to use a simpler model with high bias and low variance [49].
To generate the meta learner in the boosting approach, weak learners are added one by one in an iterative optimization process, according to: where the weights c l and w l are chosen so that s l is the model that fits better the training data (improving s l−1 ). The process is done until convergence, when S(·) = s l (·). The random subspace method is similar to bagging, except those resources are randomly sampled for each learner. This causes individual learners to not focus on features that appear to be predictive or descriptive in the training set [70]. Thus, in this model, random subspaces are a promising choice for large problems where there are more resources than training dataset [71].
The random subspace ensemble learning model, shown in Figure 7, was developed to deal with high-dimensional problems. This approach combines weak learners trained in random subspaces in an iterative process, which results in a suitable approach to problems with a large number of resources [72].  For comparison purposes, the LSTM [73] and the nonlinear autoregressive (NAR) [74] models will be used to compare to the ensemble approach. The LSTM is used as a deep learning strategy and the NAR is used as a classical approach to perform a complete comparison.

A. LONG SHORT-TERM MEMORY
The LSTM model stands out for applications in time series, considering its ability to deal with non-linear variations of the system [75]. The LSTM can be calculated through the equations: where R and W are earning matrices and b is the polarization matrix, whose values are assigned by the net training. The σ g is the activation function of gate. To achieve the predicted values of future time steps, the responses of the training sequences are shifted by one time step. Thus, for each input time step, the network learns to forecast the value of the next time step [76].
For global analysis, two optimizers will be used, these being the stochastic gradient descent with momentum (SGDM) [77] and the adaptive moment estimation (ADAM) [76]. The SGDM is a classic optimizer that has been used by several researchers due to its simplicity and satisfactory result, ADAM is a modern optimizer that is standing out for artificial intelligence applications [78].
In addition to the analysis of the optimizers, the use of deeper layers is evaluated, being a strategy that is currently being widely researched given the popularization of the deep learning approach. For comparison purposes the hyperparameters were set: 200 hidden units, initial learn rate of 0.005, gradient threshold of 1, learn rate drop period of 125, and learn rate drop factor of 0.2. The model used in this paper is a standard structure sequence-to-sequence regression LSTM network, available in MathWorks of Matlab 1 .

B. NONLINEAR AUTOREGRESSIVE
To perform a comparative analysis with shallow learning structures [79], the NAR was used to predict the time series, given by: where y(t) is the predicted output of the model given d past values and another series of x(t) [80].
Like the compared models, NAR is applied for multi-step prediction of a sequence of values in a time series 2 . In this approach, when external feedback is missing, the closed-loop can continue to predict using internal feedback.
To evaluate the configuration parameters, the Levenberg-Marquardt (LM) [81] and bayesian regularization (BR) [82] optimizers were applied. All evaluated models were compared using the same settings, following the performance measures presented in the next subsection.

C. PERFORMANCE MEASURES
The signal error is calculated by the difference in the observed value y i to the predicted outputŷ i [83]. Thus, for an overall evaluation in relation to the forecast error, meansquare error (MSE), root-mean-square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) were used, which are calculated as follows: where n is the length of the original signal [84]. In this paper, simulations were performed using an Intel Core I5-7400, 20 GB of Random-Access Memory (RAM), with Matlab software.

V. ANALYSIS OF RESULTS
In this section, the evaluation of the parameters change in the considered models will be presented and discussed. The learner kernel base change in the SVR and solver optimizer will be evaluated. The configuration that obtains the best results will be used to compare the differences between each model. The best results for each metric for the used model will be highlighted in bold. In Table 1 is presented the evaluation of ensemble learning models. The best configuration for the bagging model was obtained using the SMO solver from the LIN kernel as learner base, considering that using this configuration there was a significant improvement in MSE and RMSE, and the time to convergence was among the best results. Using SMO, the best MAPE was obtained with the RBF kernel, however, there was a wide variation of results using this metric, which does not make its analysis reliable. The best convergence time was also obtained with the RBF kernel, however, using the ISDA solver.
In this initial analysis, it was noticed that the metrics do not necessarily indicate the same solver and configuration as the best choice. This requires a deeper analysis of the results to determine which metric is more interesting to be evaluated for more consistent analysis.
The boosting model had similar results to the bagging model. The best MSE and RMSE were obtained using LIN kernel from the SMO solver, being among the best speed results for convergence. In this analysis, the MAPE values were more approximate, with the exception of the RBF kernel using the L1QP, which resulted in a MAPE considerably higher than the other configurations. This result is not promising, as there is no stability regarding this metric.
In the evaluation of the random subspace ensemble model, there was also a great variation of MAPE, and the best result was obtained using the SMO solver with the POLY kernel function. The best result of the speed for convergence of the algorithm were also obtained using the SMO solver. Considering that the best MSE and RMSE were obtained using L1QP with kernel function LIN with a median convergence time, this setting was considered for the final evaluation.
In the bagging plus random subspace model, the best result considering the MSE and RMSE was obtained using the SMO solver with the were also obtained using the SMO solver, however with the RBF kernel function. In this evaluation, there was less MAPE variance in relation to the configuration change of the analyzed model, and the best result obtained in this metric was using the ISDA solver with the LIN kernel function.
The best values for the MSE and RMSE of the stacking ensemble learning model were obtained using the L1QP solver with the RBF kernel function. The best MAPE value was obtained using the ISDA solver and in this case, there was not such an expressive variation as in the other models. The best conference time was obtained using the SMO solver with the LIN kernel function. The subsection V-A provides a VOLUME 4, 2021 comparison between all models in order to obtain an overall performance comparison.

A. OVERALL COMPARISON OF THE ENSEMBLE METHODS
Using the best optimizer configuration and kernel function, all simulations were performed again to obtain a comparison between the best results of each model, this evaluation is presented in Table 2.
As can be seen, bagging and boosting models have similar results in all metrics evaluated in this work. They can be considered the best models for the application in question, since they have the lowest values of RMSE, MSE, MAPE, and time to convergence. Comparing all the results, the best model for this application was the bagging invest which had the best MSE, being a faster model for convergence in this evaluation. The forecast result using the bagging model is shown in Figure 8.

B. LONG SHORT-TERM MEMORY RESULTS
For a comparison using another approach in Table 3 the results of the LSTM are presented, which is an algorithm that has stood out for time series prediction due to its ability to deal with non-linear data, being an approach widely applied in deep learning. In this evaluation, the SGDM and ADAM optimizers were used and the use of deeper layers (DL) in the LSTM structure were evaluated. The first observation regarding the results is that the LSTM needs a longer time for convergence considering that it uses more computational effort. This observation becomes even clearer when deeper layers are used. The best result obtained in this evaluation was using the SGDM optimizer, being the best MSE, RMSE, and time to convergence. From this optimizer, the inclusion of deeper layers did not result in a reduction of the error and resulted in a considerable increase in the time needed for convergence.
Using the ADAM optimizer, the best result was obtained with 3 deeper layers. However, this result was lower than the values obtained using the SGDM optimizer and the time needed for convergence was considerably higher with this number of deeper layers. Comparing these results with the bagging ensemble model, the biggest noticeable difference is in relation to the time to convergence, as this model was much faster. Another difference is the error, given by lower MSE and RMSE, making it clear that this model is more appropriate to be used in the problem in question.

C. NONLINEAR AUTOREGRESSIVE RESULTS
For a comparative analysis, using a nonlinear autoregressive model for time series forecasting, Table 4 presents the results of the use of different optimizers and variations in the number of hidden neurons (HN). Although the NAR model is usually fast in the training process, for this dataset the improvement in the training time was not so significant compared to LSTM. The training time in all compared ensemble models was faster. Furthermore, all error results calculated by the MSE and RMSE were superior, the NAR model did not show promise for this evaluation. For this approach, the variation of parameters did not result in expressive variations in the results.

VI. CONCLUSION
Improved fault detection in electrical system inspections can help electrical utility companies increase reliability in the electrical power system. Ultrasound is promising equipment for fault identification, considering that indirect detection is directional equipment. From a failure prediction model based on an analysis of equipment that are on the threshold of breakdown, it will be possible to identify failures before they occur and perform maintenance. From predictive maintenance, it is possible to reduce the maintenance costs of the network and mainly to reduce the need for corrective maintenance, which in some situations can leave the system off and reduce the reliability in the power supply by the electric power utility.
Based on the results obtained from the ensemble learning models analysis, it was found that this type of structure was able to handle highly non-linear data to perform the prediction. Among the compared models the bagging and boosting resulted in lower error values considering MSE and shorter time to convergence than the other models, with a MSE of 1.12 × 10 −6 , and 1.13 × 10 −6 , and a convergence time of 0.92 and 1.48 seconds respectively.
It was also observed that the random subspace and bagging plus random subspace models obtained similar results, regarding the MSE of approximately 1.3 × 10 −6 , which makes it clear that depending on the fact that there may be great variation in the results, it is necessary to evaluate different structures to obtain promising results for forecasting time series. VOLUME 4, 2021 Comparatively, ensemble models are faster to converge than LSTM in all variations of the approaches, considering the use of the best configuration for each one of them, since the LSTM with a deep layer using the SGDM optimizer had a time to convergence of 30.93 seconds, being the fastest configuration in this category, while the bagging, boosting and stacking ensemble models need less than 2 seconds to converge. Another point to be noted is that despite the greater computational effort of the LSTM, the error values considering the MSE and RMSE were higher than the ensemble models, except the stacking model, which resulted in a greater error than the LSTM. The best result using the LSTM was a RMSE of 1.14 × 10 −3 while the bagging and boosting models had a RMSE of 1.06 × 10 −3 .
Considering the promising results presented in this work, it becomes feasible to use filters to further improve its processing capacity as high-frequency noise can increase the forecast error. The wavelet transform, among other filters, can reduce signal noise, and then improve the predictive power of the model. With this, there is room to use hybrid models in future works, thus improving signal prediction.