A comparison study about parameter optimization using swarm algorithms

Adjusting the parameters of a machine learning algorithm can be difficult if the possible domain of expansion of these parameters is too high. In addition, if a sensible parameter is not adjusted correctly, the changes can be very impactful in the final results, making adjusting it manually not trivial. In order to adjust these features automatically, the current work proposes six models based on the use of optimization algorithms to automatically adjust the models’ parameters. These models were built around two machine learning-based algorithms, an extreme learning machine neural network and a support vector regression. The optimization algorithms used are Particle Swarm Optimization, the Artificial Bee Colony, and the genetic algorithm. The models were compared with each other based on predictive precision in the criterion of Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and statistical tests. The experimental results on ten datasets indicated that optimized algorithms models could better the performance and robustness of the non-optimized algorithms models. Therefore, the automatic adjustment of the parameters of optimized algorithms is a powerful tool to analyze different contexts of data.


I. INTRODUCTION
N EURAL Networks (NN) are machine learning techniques that can be used to address a wide range of natural or artificial phenomena [1]. One of the main drawbacks of the NN's is the lack of faster training algorithms. Usually, the training method used is based on gradient descent and seeks to adjust the weights from the hidden layers of the NN.
Huang [2] proposed a method training single-layer feedforward neural networks (SLFNs) called Extreme Learning Machine (ELM) to solve the aforementioned problems. This algorithm tends to provide better generalization performance. It works by inverting the output matrices of the hidden layer of the neural network, thus performing training that can be thousands of times faster than traditional methods [1].
Although ELM is fast and has good generalization performance, because the generalized inverse of Moore-Penrose [3] is used to calculate output weights based on prefix input weights and hidden deviations, there may be a set of entry weights and hidden biases that present different performance [4]. However, the ELM learning performance is unstable due to hidden layer bias and the randomness of the connection weight between the input and hidden layers. Many ELMs perform relatively poorly during the training process [5].
As demonstrated by [6], a neural network with only one hidden layer may learn N distinct observations, working with any nonlinear activation function, depending on the number of N neurons in the hidden layer. As ELM networks follow this idea, selecting the best number of neurons to be used can be a powerful tool in the search for better results in specific application contexts [7] [8]. Furthermore, the optimal adjustment of the parameters can help minimize the learning instability offered by ELM's random factor generation [8].
Even with these aspects mentioned above, a machine learning model may not achieve optimal results if its parameters are not correctly adjusted [9]. In the case of an ELM neural network, changing parameters such as the number of neurons present in the hidden layer can be time-consuming if it is necessary to run tests at each adjustment. Since the ELM network has high-speed training, modeling a parameter optimization for models that use this technique as the primary focus can significantly improve the final application.
However, parameter optimization can be applied not only in ELM networks but can also be generalized to regression algorithms, which can have their hyperparameters adjusted through the use of some technique. For example, the Support Vector Regression (SVR) algorithm can model linear and nonlinear problems; however, its training process can be time-consuming with large volumes of data [10].
Since an ELM network can provide a value to be minimized, such as the absolute mean error, it is possible to model the problem related to parameter adjustment in a way that it is possible to carry out an approach efficiently. With this fact in mind, it is noted that optimization algorithms can automatically adjust parameters if a cost function is provided to guide the algorithm's steps. Three algorithms that share similarities but have different ways of operating can be modeled for this purpose: Particle swarm optimization (PSO) [11], Artificial Bee Colony (ABC) [12], and Genetic Algorithm (GA) [13] [14].
PSO [11] is notable for its ability to be applicable and solve most optimization problems. The algorithm operates by initializing a certain number of particles, representing a potential solution, in a search space with defined limits so that the particles will move according to the one that provides the best value for the chosen cost function at being optimized.
ABC is an optimization algorithm proposed by Karaboga [11]. The algorithm is based on the intelligent behavior of a bee colony. Agents (bees) are separated into employees, onlookers, and scouts. These agents have their actions defined around the existence of food sources. These food sources contain the information that guides the colony to the position that specifies the solution, where the cost function to be selected is better optimized.
GA is an algorithm used in optimization problems and operates through a population of agents with some genetic information. These agents undergo the action of several genetic operators, such as selection and recombination, which aim to generate new agents with better characteristics and thus obtain better results through the optimization process [13].
Recently, many modified ELM algorithms have been proposed, KALOOP et al. [15] proposed a Particle Swarm Optimization Algorithm-Extreme Learning Machine (PSO-ELM) to measure the pavement structure's basic materials' performance. Zhang et al. [16] has presented residual Compensation ELM (RC-ELM) to Device-Free Localization (DFL) and Gas Utilization Ratio (GUR) prediction in the blast furnace. Different from the above RC-ELM, Zhang et al. [17] proposed the Integrated Multiple Kernel ELM (IMK-ELM) model to strengthen the device-free localization (DFL) performance in the cluttered indoor environments utilizing spatiotemporal information. In addition, Zhang et al. [18] proposed Robust ELM (R-ELM) to improve modeling capability and robustness with Gaussian and non-Gaussian noise. Multilayer Extreme Learning Machines (ML-ELM) have been proposed by Zhang et al. [19], which accelerates the development of deep learning.
Algorithms inspired by nature have received a lot of attention in solving feature selection problems, among them: Genetic Algorithm (GA), Memetic Algorithm (MA), Differential Evolution, Ant Colony Optimization (ACO), Bee Colony Optimization (BCO), Gravitational Search Algorithm, Flower Pollination Algorithm, Bat Algorithm, and Particle Swarm Optimization (PSO) [22]. Zhang, Song, and Gong [22] propose a new swarm intelligence algorithm called return-cost-based binary firefly algorithm (Rc-BBFA). The algorithm aims to improve Firefly (FFA) [23] ability to tackle resource selection problems.
This research study aims to propose six models algorithm using the optimization algorithms (PSO, ABC, GA) combined with the ELM network and the SVR algorithms to automatically tune the parameters of the basic algorithms and consequently improve the final results. The performance of these models is statistically evaluated to confirm the results obtained.
The paper is divided as follows: Section I presents an introduction with some general contextualization to the problem and the objectives of the work. Section II presents related works. In Section III, the models proposed in the current work are explained and detailed. Section IV presents the measurements used for model accuracy. Section V presents the analysis and results of the model. Finally, section VI addresses the conclusion and future works.

II. RELATED WORKS
A non-exhaustive overview of the literature on parameter optimization using swarm algorithms is presented in Table  1. This table summarizes the strategy used by the author, the applied modeling techniques, the datasets that are used, and the evaluation metrics.
KALOOP et al. [15] developed a hybrid PSO-ELM, which was used to design a prediction model for the resilient modulus (Mr) or elastic modulus that measures the performance of basic materials in a pavement structure. The author investigated two other hybrid models, PSO-ANN [24] and KELM [25], to validate and evaluate the performance of the PSO-ELM model. In the PSO-ELM model, the PSO was used to find the ideal values for the input weights matrix and hidden biases matrix based on the minimum RMSE value to achieve better learning ability of ELM. The results presented that the PSO-ELM model presented better results with less bias and greater precision than the other two investigated models.
Zhang et al. [16] proposed a novel ELM for a regression problem, Residual Compensation ELM (RC-ELM) to prevision Device-Free Localization (DFL) and Gas Utilization Ratio (GUR) prediction in the blast furnace. The proposed model employs a multilayer structure with the baseline layer for building the feature mapping between the input and the output and the other layers for residual compensation layer by layer iteratively. RC-ELM has been proposed with the primary purpose of improving the capacity capability of the original ELM. Experimental results show that RC-ELM has better generalization performance and robustness than other machine learning approaches. Different from the RC-ELM above, Zhang et al. [17] proposed the Integrated Multiple Kernel ELM (IMK-ELM) model to strengthen the device-free localization (DFL) performance in the cluttered indoor environments utilizing spatiotemporal information. IMK-ELM can integrate the temporal dynamics and spatial characteristics of different subregions of a specific chaotic indoor environment and build a DFL model through global optimization. Many experiments show that the proposed DFL based on the IMK-ELM outperforms existing technology in a cluttered indoor environment.
Zhang et al. [18] have proposed a robust ELM (R-ELM) for improving the modeling capability and robustness with Gaussian and non-Gaussian noise. A new R-ELM objective function is built, and a mixture of Gaussian is used to describe the noise characteristics to approximate the characteristic mapping between the inputs and outputs. Comprehensive experiments have been conducted, and the corresponding experimental results show that R-ELM is superior to state-ofthe-art machine learning methods on selected reference data sets and practical applications.
Multilayer Extreme Learning Machines (ML-ELM) have been proposed by Zhang et al. [19], which accelerates the development of deep learning. First, the author performed a complete review of ML-ELM development, including the stacked ELM autoencoder (ELM-AE), residual ELM, and local receptive field-based ELM (ELM-LRF) well discussed their applications. After a thorough review of ML-ELM development, the model makes deep learning non-iterative and faster due to its random resource mapping engine. Furthermore, the combination of ELM and traditional deep learning methods can significantly ensure the computational efficiency of deep learning.
ZHANG et al. [5] propose a Modified Residual Extreme Learning Machine (R-ELM) to improve the ELM's learning performance. R-ELM is an algorithm kind of ensemble method. L2 regularization [20] is applied to the R-ELM (RR-VOLUME XX, 2022 ELM) to solve the problem of overfitting in the training process. In RR-ELM, L2 regularization is used to reduce the contribution of each ELM to help the R-ELM forget the noises of the training set. Experimental results demonstrate that the proposed RR-ELM and the R-ELM are more stable than the single ELM. The experimental results also indicate that the generalization ability of the RR-ELM is better than the R-ELM.
To solve High-Frequency Surface Wave Radar (HFSWR), inaccuracy of azimuth estimation caused by wide beams severely limits its detection ability Zhang et al. [21] proposed a novel direction of arrival (DOA) estimation method based on Extreme Learning Machine optimized by particle swarm optimization (PSO-ELM) whit objective o improve azimuth estimation accuracy for HFSWR. PSO optimizes the input weight and hidden layer bias of ELM to obtain optimal parameters for enhancing the estimation performance. The network training of PSO-ELM requires samples and accurate output, provided by the receiver system linear uniform with eight antennas, and the adjacent antennas are spaced 14.5 m apart. The transmitted waveform is a linear frequencymodulated interrupted continuous wave (LFMICW) signal with a working frequency of 4.7 MHz and a coherent integration time (CIT) of 131.072 s. The network performance evaluation is carried out by Root-Mean-Square Error (RMSE) and the Coefficient of Determination (R2). The experimental results show that the new method has a lower root-meansquare error and higher computational efficiency than the typical DOA estimation methods.
Likewise, we conduct a comparative study between optimized and non-optimized models as detailed in the following section.

III. PROPOSED MODELS
This section introduces the concepts and details of each model and its structure and construction methods. In this work, eight different models are proposed and are divided into two categories: Non-optimized models and optimized models. More details about each group of models are given as follows.

A. NON-OPTIMIZED MODELS
This group comprehends the two models based on the basic algorithms without alterations and manually sets their parameters. Model 1 (Section-III-A1) uses an ELM network, while Model 2 (Section-III-A2) is based on the SVR algorithm. Details about each algorithm can be observed below.

1) Model 1: Basic Extreme Learning Machine
The model 1 is composed essentially of a standard Extreme Learning Machine [2] network with empirically defined parameters. It is an algorithm for training Single Layer Feedforward Networks (SLFN) and has much faster performance when compared with traditional methods [1].
According to Yu et al. [26], the ELM is built around the fact that neurons in the hidden layer do not need to be iteratively tuned. Furthermore, the training error ||Hβ − y|| and the weight standard ||β|| are minimized. Given a set of N observations, (x i , y i ), i ≤ N. with x i ∈ R p and y ∈ R. The mathematical formula of ELM is represented by: Where m is the number of neurons in the hidden layer, β i is the output weights, f is the activation function, w i is the input weights, and b i is the activation threshold. Assuming that the model describes the data perfectly, the relationship can be written in matrix form as Hβ = y, with proach is based on randomly initializing w i and b i , and compute the output weights β = H † y. Where H † is the Moore-Penrose [3] generalized inverse of matrix H. Therefore, the ELM algorithm can be summarized in three main steps: (i) random generation of hidden layer input weights and biases, (ii) hidden layer H output calculation and (iii) inverse matrix Moore-Penrose calculation generalized of H.
Machine Learning techniques have parameters that, in most cases, significantly affect the performance of these techniques. We used the Extreme Learning Machine model available at [27], and for our model, we applied the identical parameters of the author, as can be seen in Table 2.

2) Model 2 -Support Vector Regression
SVR is a regression technique based on support Vector Machines [28]. In the case of SVR for linear regression, f (x) is given by: where ϕ is some non-linear function that maps the input space to a space of higher dimensional feature (perhaps infinite). Three parameters define an SVR model: the complexity parameter denoted with C, the extent to which deviations are tolerated denoted with Epsilon ϵ, and the kernel. These parameters directly influence the performance of the SVR [29].
As this model belongs to the non-optimized group, we used the model's default parameters. The default parameters can be seen in Table 3:

B. OPTIMIZED MODELS
Six models were proposed using the optimization algorithms combined with the ELM network and the SVR algorithms. Then, the optimization algorithms were applied with the main objective of automatically tuning the parameters of the basic algorithms and consequently improving the final results. Details about each model are given as follows.
1) Models optimized using the PSO Models 3 and 4 are composed of an ELM network and an SVR algorithm optimized using a PSO algorithm. In these models, each particle of the algorithm carries information of a possible best set of values for each selected parameter; with this information stored in each particle, these elements would interact with each other in a search space defined by the minimum and maximum values each parameter. The particles have their speed and position information iteratively updated using the respective following expressions Where w is the inertia value, c1 and c2 are two positive constants, rand() and Rand() are two functions used to generate of random values in the range of [0,1].
x i represents the i-th particle (x i1 , x i2 ...x iD ). The best position ever found by the i-th particle is represented by p i = (p i1 , p i2 ...p iD ). The particle with the global best value ever found is represented by g and the speed rate for the particle i is represented by Since the particles update their position values based on the global best solution ever found, in the proposed models, the position is defined using a vector with D dimensions, where D represents the number of parameters being adjusted. This way, the parameters adjusted in models 3 and 4 are presented in Tables 2 and 3. Figure 1 is the flow diagram of Models 3 and 4. In the Initialization step, the parameters: population size, initial and final inertia factor, cognitive and social coefficients, and maximum velocity, are initialized.
The next step is to define the particles, best individual and best global, that is, the best position visited by each individual and the best particle ever found by the swarm, respectively, since the velocity of each particle depends on the identification of these particles. When performing this operation, you obtain the Mean Absolute Error. This value is used as a cost function for each particle and provides the fitness value of each particle.
Update velocity step defines the direction and module of the velocity vector of each particle in the swarm. The velocity vector is calculated by Equation 33.
Then, the update position step moves the particle from the current position in the direction, in the order, and in the magnitude of the velocity vector v id , according to Equation 4.
In the Evaluation step, each particle represents the hyperparameters of an ELM/SVR. After updating the position and velocity of all particles, the algorithm will restart the fitness evaluation step of each particle. This process is repeated up to several times pre-defined by the user.
2) Models optimized using ABC Models 5 and 6 are composed of the ELM network and the SVR algorithm combined with an ABC algorithm for parameter adjustment. Figure 2 show the flow diagram of Models 5 and 6. Since the food sources are the elements that store the information about the possible solutions in the ABC algorithm, VOLUME XX, 2022 in this adaptation, each food source will have a vector of D dimensions that will represent their position, where D is the number of parameters being adjusted. Parameters adjusted in models 4 and 5 can be observed respectively in Tables 2 and  3. In the ABC algorithm, many agents called bees will visit food sources that contain information about a possible solution for the problem being investigated. The bees are divided into three categories: Employed bees, onlooker bees, and scout bees.
The employed bees visit a specific food source, get the information about their fitness (nectar) and seek a new position for that food source in the neighborhood; if the new position presents a better value for the cost function, the bee reallocates the food source to the new position. If the bee fails, the food source will have a counter increased, and this counter will be used to track the quality of the information in that space around the food source.
After the employed bees, the onlookers' bees visit specific food sources selected based on a probabilistic distribution built around the fitness value of each food source and described by the expression.
Where f it i is the fitness value for the solution i, which is proportional to the nectar quantity of the food source in the position i and N represents the number of food sources.
After visiting the food source, the onlooker bees will try to reallocate the food source the same way as the employed bees.
Both employed and onlooker bees use the following expression to reallocate the food source to a new position Where k and i are indexes from the food source position and ϕ ij is a random number, this way, the new position to be tested is generated in the neighborhood of the current food source.
The nectar of each food source is obtained through the MAE as cost function, and the bees perform the tasks of analyzing the neighborhood of the food sources normally, optimizing the result in the process and consequently adjusting the parameters.
After the onlookers, the scout bees search for food sources. Finally, it reached the counter related to the quality of the solution in that area. Those food sources are then abandoned, and the scout bees create new food sources to explore each space and increase the possibilities of finding an optimal solution.
The tasks performed by each group of bees are repeated until the algorithm reaches a limit defined through some criteria.
3) Models optimized using GA Models 7 and 8 are composed respectively by an ELM network and the SVR algorithm combined with the GA algorithm for parameter adjustment. In the GA, the parameters being adjusted were encoded into an D dimension array representing the genetic material of each agent, where D represents the number of parameters being adjusted. The parameters being adjusted in models 7 and 8 can be observed respectively in 2 and 3.
The algorithm used in this work was composed of three stages: Parent Selection, Crossover, and Mutation. Figure 3 show the flow diagram of Models 7 and 8. The parent selection seeks to choose a subset of individuals from the population with their genetic material crossed and recombined to generate new individuals. Since the algorithm aims to always get better values for the cost function being optimized, the parent selection is performed looking to pick individuals that will increase the general solution quality from the population. In the method used here, first, a subset of N random individuals is selected, the one with higher fitness value is selected to be a parent. This cycle continues until some criteria are met.
After selecting the parents, the next step is to recombine their genetic material to generate solutions with better results. At the end of this step, a new set of child individuals is generated. These individuals are then compared to those from the current population and substitute the individuals with worse fitness value.
After these steps, the new population with the new child individuals already inserted is used to repeat all the stages until some stop criteria are reached. Since the evolutionary mechanisms seek to reproduce the individuals with the best solutions and fitness values, the MAE is used as a cost function to evaluate the agents of the population. This value guides all the evolutionary processes and creates new generations of individuals, consequently improving the final results.

IV. EXPERIMENTAL EVALUATION
The following three criteria were used as performance measures to evaluate the results predicted by the models used in this work.

A. MEAN ABSOLUTE ERROR
MAE is a measure that defines the distance between the predicted values (obtained through the models) and the real values. MAE is defined in Equation 7: where y i is the ith value of the variable being predicted,ŷ i its estimate, y i −ŷ i the ith residual.

Mean Squared Error (MSE) is the mean quadratic difference between the estimated values and the current value as denoted in Equation 8
.

C. ROOT MEAN SQUARED ERROR
Root Mean Squared Error (RMSE) is defined by obtaining the square root of the MSE, as denoted in Equation 9.
The MAE is also used as the cost function to guide the optimization algorithms to search for the best solution.
During this experiment, the method used for training the models was the holdout, where the training set and testing set includes two-third (75%), and one-third (25%) of the data respectively. Additionally,each experiment was repeated 50 times and compute the metrics: MAE, MSE, and RMSE. It was analyzed the mean and standard deviation (SD) for all metrics, considering 50 simulations performed for which dataset. All algorithm simulations were performed using packages developed in Python language, version: 3.8.8 and  [37] 9 398 Behavior of the urban traffic [38] 18 135 Concrete Slamp Test [39] 10 103 Computer Hardware [40] 9 209 Daily Forecasting [41] 13 60 Energy Efficiency [42] 8 768 Real Estate Valuation [43] 7 414 Synchronous Machine [44] 5 557 executed in the Jupyter Notebook environment, running on an Intel(R) Core(TM) i7 2.1 GHz CPU, 8 GB RAM machine. Table 5 shows the results of applying the eight algorithms on the ten real-world datasets provided, as shown in Table  4. The evaluation and comparison were performed on three statistical error measures(MAE, MSE and RMSE). These results are expressed as the mean and, in parentheses the Standard Deviation (SD) of the performance metrics. In this study, comparing models based on optimization algorithms to automatically adjust the parameters of models without algorithms optimization, as mentioned in Section-III. The optimization algorithms behavior is potentially good compared to the other two models of algorithms without optimization.
As shown in Table 5, all types of statistical measures used to test dataset and evaluate the error rate. The ELM PSO obtained the best results in all indices with the lowest error rate for the three types (MAE, MSE and RMSE). The basic PSO is more appropriate to process static, simple optimization problem and computational efficiency. The other methods that used ELM as the base algorithm, ELM ABC and ELM GA, scored results close to ELM PSO.
Furthermore, the results obtained by the optimized algorithms in most cases had a substantial gain in prediction accuracy compared to the non-optimized models. It is because function optimization is the reason why we minimize error, cost, or loss when fitting a machine learning algorithm.
In a predictive modeling project, optimization is also performed during data preparation, hyperparameter tuning, and model selection. We can observe some of these cases in the datasets: Airfoil, Automobile, and AutoMPG. The MAE results obtained by the non-optimized ELM model in these datasets were: 5.4375, 17.3263, and 13842.0856, respectively. When compared with the Optimized ELM models in the same datasets, we have 2.8918 (ELM PSO), 4.4937 (ELM-ABC) and 4.4932 (ELM-GA) for the Airfoil dataset; 0.8376 (ELM PSO), 0.9231 (ELM-ABC), and 0.9090 (ELM-GA) for the Automobile dataset and the AutoMPG dataset, 2.5019 (ELM PSO), 2.9431 (ELM-ABC) and 3.0591 (ELM-GA). We saw that all optimized ELM models improved accuracy compared to the non-optimized ELM model, highlighting the ELM PSO model, which in all datasets obtained better results. Therefore, the basic algorithm PSO is a simple concept, easy implementation, and robust to control parameters. This gives an idea of potential gains and expresses the VOLUME XX, 2022 Optimization methods are used in many areas of study to find solutions that maximize or minimize some study parameters, such as reducing costs in the production of a good or service, maximizing profits, minimizing raw material in the development of a good, or maximizing production.
Another metric for evaluating the model was MSE, calculated from the difference between the real values and the predicted values. RMSE is the square root of the mean of the square of these differences. Again, the ELM PSO model obtained the best result in both metrics. We can also see that the standard deviation in some data sets had slight variation, as seen in the boxplot plot of Figure 4. Figure 4 shows the MAE boxplot graph in the Airfoil dataset. For all models, there is outlier, except for the ELM model. The boxplot of the ELM model is less asymmetrical than the others, and their median is closer to zero, presenting better performance than the other models.  Figure 5 shows the MAE boxplot plot in the Automobile dataset. As can be seen in Table 5 Hypothesis tests were performed to validate the results and verify the statistical difference between the optimized and the non-optimized models. Observing the results is possible to notice the improvements in the final prediction errors obtained through the models that use the optimization algorithms to adjust parameters. This result provides an idea about this kind of application and modeling power.
Before performing the hypothesis test, it was verified the existence of normality between the values using the Shapiro-Wilk test [45]. Since the dataset does not have a normal distribution, Wilcoxon's hypothesis test [46] was used with a significance of 5% level in all performed at MAE values. The Equation 10 can describe the hypothesis test: Where H 0 represents the null hypothesis that the values of the models present results equal to or less than the ELM PSO model. H 1 represents the alternative hypothesis that model ELM PSO has results less than the other models. µ 1 represents the error values of the models being compared and µ 2 represents the values of the ELM PSO model. Table 6 presents the results of the p-value for the tests. This way, after obtaining the p-value for each compared model, we can confirm, with 95% confidence, that it is possible to reject the null hypothesis in all cases. So it is possible to conclude that the performance of the ELM_PSO method in terms of the MAE is clearly superior to those of the other methods. So, the ELM_PSO methos has been used increasingly due to its several advantages like robustness, efficiency and simplicity. When compared with other stochastic algorithms it has been found that ELM_PSO requires less computational effort as ELM_Ga and ELM_ABC.

VI. CONCLUSION
Different algorithms can be used to tune the parameters of a model automatically. These techniques that can automatically adjust the parameters can be powerful allies in improving the results of determined models. The current work seeks to demonstrate the efficiency of the optimization algorithms in this kind of task. The experiments were conducted using the ELM network and the SVR algorithm as the base algorithm, with their parameters adjusted swarm algorithms PSO, ABC, and GA.
In our simulations, we have conducted a series of experiments on ten well-known datasets in different contexts: Airfoil, Automobile, AutoMPG, Behavior of the urban traffic, Concrete Slamp Test, Computer Hardware, Daily Forecasting, Energy Efficiency, Real Estate Valuation, and Synchronous Machine.
We compared the results of the non-optimized algorithms, ELM and SVR, with the optimized algorithms, ELM PSO, ELM ABC, ELM Ga, SVR PSO, SVR ABC, and SVR Ga. Our simulations showed that in all datasets, the ELM PSO model obtained performed best in all experimental evaluations in terms of MAE, MSE, and RMSE. The experimental results on ten datasets demonstrate that a base algorithm VOLUME XX, 2022 using swarm algorithms is a highly competitive alternative for parameter optimization. Therefore, optimized algorithms are the collective intelligence behaviour of self-organized and decentralized systems. Two fundamental concepts that are considered as necessary properties of optimized algorithms are self-organization and division of labour.
In summary, can be used optimization algorithms to got better results and show that the automatic adjustment of the parameter is a powerful tool to be used when analyzing different contexts of data. Furthermore, the hypothesis tests achieved results that increased the confidence in the optimized algorithms models and confirmed the robustness of the models.
As future works, some plans include adding more optimization algorithms to analyze the efficiency of each one. Also, a study about a parameter selection using methods such as grid search [47] when used together with parameter tuning, for example, with the ones developed in the current work with optimization algorithms.

ACKNOWLEDGMENT
This work was carried out with the support of the Improvement Coordination of Higher Education Personnel -Brazil (CAPES) -Financing Code 001.