A Dynamic Predictor Selection Method Based on Recent Temporal Windows for Time Series Forecasting

The development of accurate forecasting systems for real-world time series modeling is a challenging task. Due to the presence of temporal patterns that change over time, the adoption of a single model can lead to underperformed forecasts. In this scenario, Multiple Predictor Systems (MPS) emerge as an alternative to adopting single models since they struggle to learn in the presence of temporal patterns that change over time. Dynamic prediction/ensemble selection is a special case of MPS where each model is an expert in the time series’s specific patterns. In dynamic selection, instead of combining all models, the most competent models per test pattern are selected. A criterion commonly used is to evaluate the models’ performance in the region of competence, formed by the patterns present in the in-sample set (training or validation sets) more similar to the test pattern. Thus, the region of competence’s quality is a key factor in the precision of the MPS. However, adequately defining the similarity criterion and the size of the region of competence is challenging and problem-dependent. Furthermore, there is no guarantee that similar data exist in the in-sample set. This paper proposes a dynamic selection approach entitled Dynamic Selection based on the Nearest Windows (DSNAW) that selects one or more competent models according to their performance in the region of competence composed of the nearest antecedent windows to the new target time window. This strategy assumes that the temporal windows closer to a test pattern have a behavior more similar to the target than in-sample data. The experimental study using ten well-known time series showed that the DSNAW outperforms the literature approaches.


I. INTRODUCTION
Time series forecasting is a central task in many application areas, such as Economy [1], Seismology [2], Meteorology [3], and Astronomy [4], Hydrology [5], Engineering [6]. The development of accurate forecasting systems has been a central goal in the time series modeling area. However, due to the presence of different temporal patterns in real-world time series, accurate forecasting systems' construction is a challenge. In this sense, the classical approaches focus on seeking the best single model to forecast the whole time series have some serious drawbacks, such as the incorrect specification of the model's parameters [7]. Among the alter-The associate editor coordinating the review of this manuscript and approving it for publication was Justin Zhang . native approaches that have emerged, multiple predictor systems (MPS) have gained attention because they employ an ensemble of models instead of using only one model to improve the system's accuracy [8]. Such an ensemble contains many models that likely capture different behavior of the time series. MPS are composed of three phases [9]: Generation, Selection, and Integration or Combination. In the Generation phase, a pool of models is generated using a training set. After, one or more models are selected using some criterion in the Selection phase. In the last phase, the prediction is obtained by combining the forecasts of the selected models.
The Selection phase can be performed during the system's training (offline) or in the generalization (online). In either part, online or offline, this phase has a crucial role because it is related to the accuracy [10] and the computational complexity [11] of the MPS. Dynamic selection approaches aim to select a subset of the pool containing the most suitable models to predict each new test pattern [9], [12]- [14]. Thus, instead of using the same models for all new patterns, these approaches select a different subset of models per pattern, which is a suitable strategy in modeling real-world time series that exhibit dynamic behavior that changes over time [15]. However, selecting the most suitable subset of models for a given test pattern is complex since there is no well-established criterion [16].
Dynamic selection approaches commonly use the performance of the models in the Region of Competence (RoC) as a criterion to select the most competent ones [9], [12], [13], [17]- [19]. The RoC is composed of the k patterns in the in-sample (training or validation sets) [12], which are more similar to the test pattern according to some measure such as the Euclidean distance [20]. This strategy for populating the RoC is applied for different tasks such as classification [21], [22], regression [23], [24] and time series forecasting [9], [12]- [14], [18].
The accuracy of dynamic selection approaches is directly related to the region of competence's quality, commonly defined using the k-nearest neighbors algorithm (k-NN) to find the k most similar patterns to the test pattern. However, to guarantee the optimum similarity measure is being adopted for a given time series is a challenging task [25] because the parameters setting of the k-NN is problem-dependent [18], [26]. Moreover, it is unlikely to assure the existence of patterns having similar behavior to the new test patterns (time windows) in the RoC due to noise, the available amount of data, or the absence of similar data [27]. In the time series context, the data distribution can change over time [28], [29]; consequently, the most suitable model to forecast a new test pattern can also change [30].
In this paper, we claim that the temporal windows closer to the new test pattern tend to have more similar behavior to the target than the traditional approach that used the k-NN to define the RoC. To validate such a hypothesis, the Kolmogorov Smirnov (KS) [31] test is applied to compare the data distributions defined by the proposal against the traditional approach. Based on this claim, we propose a new dynamic selection approach, entitled Dynamic Selection based on the Nearest Windows (DSNAW), that defines the region of competence with the nearest windows to the new test sample. DSNAW consists of four phases: i) definition of region of competence using the nearest antecedent windows to the new time window; ii) evaluation of each model in the region of competence using a forecasting measure; iii) construction of a ranking of the model by their performance; iv) forecasting of the point using the selected models.
The proposed approach is evaluated in the one-step ahead forecast using ten real-world time series. The proposal's performance is compared to traditional and state-ofthe-art approaches in terms of seven well-known forecasting error measures: Mean Squared Error (MSE), Mean Absolute Percentage Error (MAPE). Symmetric Mean Absolute Percentage Error (SMAPE), Normalized Root Mean Square Error (NRMSE), Root Mean Square Error (RMSE), Average Relative Prediction Error Variation (ARV), and Mean Absolute Error (MAE) [6], [32]. The Dynamic Selection based on Nearest Antecedent Windows (DSNAW) is proposed based on the assumption that the temporal windows closer to the new test pattern are promising candidates to compose the region of competence. The main advantages of the proposed approach are: • The creation of the region of competence is based on the assumption that the nearest temporal windows close to the new test pattern are promising candidates to compose the region of competence; • DSNAW automatically finds the number (k) of time windows that compose the region of competence (RoC) per data set; • DSNAW also automatically defines the number (n) of forecasting models used in the test phase. So, DSNAW can choose the best approach between Dynamic Predictor Selection or Dynamic Ensemble Selection for each data set; • DSNAW selects the most suitable way to combine the forecasting models' outputs when more than one model (n > 1) is selected. The remainder of the paper is organized as follows. Section II formulates the problem, and an experiment is performed to analyze the distribution of the region of competence generate by the patterns of the in-sample set. Section III introduces the proposed approach. Section IV describes the methodology that is used in the experiments. Section V presents a comparative study involving ten real-world time series and the respective discussion. Section VI shows the concluding remarks and future work.

II. PROBLEM DEFINITION
Dynamic selection approaches [9], [12], [13], [17] for time series forecasting have been proposed aiming to improve the accuracy of MPS. These approaches are based on [23] that seminally proposed a dynamic selection approach for regression tasks named herein to as Dynamic Selection by Local Accuracy (DS-LA). In the time series forecasting area, the dynamic selection approaches supposes that the most suitable model to forecast the future value z t+1 given the new time window w t = {z t , z t−1 , z t−2 , . . . , z t−n−1 } with n lags (previous records), is the model with the best performance in the neighbourhood of w t . This neighbourhood called the region of competence (RoC) is composed of k time windows from the training or validation set most similar to w t .
The definition of the RoC plays an important role since its quality can be a limiting factor to the algorithm's performance. Likewise to classification problems, it is not possible to guarantee the existence of patterns really similar to the test sample in the region of competence since it depends on the distribution of the used dataset [27]. Besides, the most suitable similarity measure to define the region of competence for a given temporal behavior is unknown.   and its RoC composed of three temporal windows w x , w y , and w z . This RoC was defined using the assumption adopted by literature approaches, named ''Literature RoC''. According to a given distance measure, the Literature RoC searches for the temporal patterns in the training and/or validation sets most similar to the new time window. We claim that this strategy is not appropriate for the time series forecasting since it may select a weak model to predict z t+1 . In contrast, we suppose that selecting the temporal windows just before w t can produce better results; this new region of competence is composed of w t−1 , w t−2 , and w t−3 , as shown in Figure 1 It is reasonable to consider that the best model to forecast z t+1 changes over time according to the data distribution. Based on this, we hypothesize that the k previous temporal windows (w t−1 , w t−2 , . . . , w t−k ) and w t have a high probability of belonging to the same distribution. So, the best forecasting model for the k closest previous windows is a promising candidate to forecast z t+1 , since the behavior of a time series can be described through its probabilistic distribution function (PDF) over time [33]. PDF, estimated here as F(w t ), is a function that describes the probability that an observation of the time series falls into a range of values. So, any change in F(w t ) affects F(z t+1 |w t ), since w t and z t+1 are described using the same distribution. Hence, a model that attains a high performance for w t is a promising candidate to forecast z t+1 . Moreover, this candidate changes depending on the w t , which is a desirable strategy since it is expected that the data distribution of real-world time series changes over time [28].
For evaluation of the proposed assumption, an experiment was performed to compare the data distribution of the new patterns with the Literature RoC and Proposed RoC. Ten time series 1 with different behaviors (described in Table 1) are used in this evaluation. Each time series was divided into three sets: the first 50% of the points for the training set, the next 25% for the validation set, and the last 25% for the test set. For this experiment, the training set is defined as insample, the validation set is used as out-of-sample, and the test is not considered.
For the experiment, each window (w t ) contains 20 records from the out-of-sample, i.e., 20 time lags that are used to forecast z t+1 . So, in a dataset with 100 records, 80 windows are generated. It is important to stress that between two neighboring windows (w t and w t+1 ), the difference is of only one record. For each w t of the out-of-sample, two regions of competence LT t and PR t are generated. Each region of competence is composed of 10 windows (k = 10), where LT t = (w m1 , w n2 , . . . , w ik ) represent the windows from in-sample selected by the literature assumption [9], [12], [13], [17] (using Euclidean distance), and PR t = (w t−1 , . . . , w t−k ) are the k closest windows to w t selected by the proposed assumption. The Kolmogorov Smirnov (KS) test [31] was chosen to compare the data distribution. The KS test is used to identify similar patterns on data and for detecting differences in position, dispersion, or shape of the distribution of the two samples [34]. This test is appropriate to analyze samples with sizes varying from 10 to 50 records [35]. The null hypothesis indicates that both samples have the same distribution, regardless of the distribution.
The KS test is applied to compare w t with both regions of competence ( LT t and PR t ) separately. For example, given the test pattern w t and the region of competence LT t , w t is compared with each pattern in LT t and it is calculated how many times (out of k, which is the size of the region of competence) they have the same distribution. This formulation is defined in Equation 1.
where t can be LT t or PR t , k is the number of pattern in t , and the function KS(w t , w j ) outputs 1 if w t and w j have the same distribution, otherwise the output is 0. Figure 2 shows the experimental results where the higher the Grand Mean of Occurrence in percent (GMC), the more selected windows have the same distribution as w t . For most of the time series, the RoC defined by the proposed assumption obtained better results. In other words, the proposed hypothesis was able to select more windows with the same distribution as w t in the in-sample set than the literature assumption. Besides selecting more promising windows than literature assumption, the proposal is computational cheaper because the search for the region of competence is straightforward. Figure 3 shows the proposed architecture composed of three phases: generation, dynamic selection, and combination. The first phase generates a pool of forecasting models P that is trained using the training dataset (φ). After, the dynamic selection phase chooses a relevant subset of models (P ⊂ P) for each test pattern (w t ∈ γ ). If more than one predictor is selected, their outputsẑ 1 t+1 , . . . ,ẑ n t+1 are combined in the last phase; otherwise, whether only one predictor is selected, the prediction of w t is given by this predictor. These three phases are detailed in the next sections.

A. GENERATION
The generation phase's objective is to create a diverse pool of forecasters that can model different time series behaviors. This diversity can be generated using two main strategies [36]: employing different samples for training each model or using different models trained from the same training sample.
Herein we decided to generate a homogeneous pool where all the predictors are trained using the same learning algorithm using Bagging [37]. Since Bagging performs a random sampling with replacement for populating each training data set, the diversity is achieved using different samples to train the models.

B. DYNAMIC SELECTION
The dynamic selection phase consists of selecting a single model or an ensemble of models per new time window (w t ). Algorithm 1 and Figure 4 show the proposed dynamic selection steps of the Nearest Antecedent Windows (DSNAW) algorithm.
The first step of the DSNAW is the definition of the region of competence PW t that is composed of the k previous windows PW t = (w t−1 , . . . , w t−k ) closest to w t (''Region of Competence Definition'' module in Figure 4 and line 1 in Algorithm 1). This strategy of selecting the closest windows is based on the hypothesis presented in Section II that shows these windows represent a better choice than choosing the most similar windows, as performed by the DS-LA algorithm.
In the second step (''Models Selection'' module in Figure 4 and lines 2 -9 in Algorithm 1), all the models in are evaluated  in the region of competence. Thus, each model p ∈ P is employed to predict the patterns in the region of competence PW t . Each model's accuracy is calculated using the Sum of Absolute Errors (SAE) [38] metric, which was chosen based on its robustness and reliability [39]. A ranking in ascending order of the models using the SAE is returned (line 8 -Algorithm 1). The lower the SAE value, the better the model's accuracy.
The selection of the n best models, i.e., the n models having lower SAE (line 9 in Algorithm 1), can work in two different ways: dynamic predictor selection, in which only the best model is selected (in this case, n = 1), or dynamic ensemble selection, in which a set (n > 1) with the best models of the ranking is chosen.
In the last step (''Forecasting'' module in Figure 4 and lines 10 -16 in Algorithm 1), each one of the n selected models in P is applied to predicted the value of w t , generating the following forecastingẑ 1 t+1 , . . . ,ẑ n t+1 .

C. COMBINATION
In the last phase, the final forecastingẑ t+1 is generated.
Whether an ensemble (n > 1) was dynamically selected in the selection phase, the final prediction is given by the combination of the output of all models in the ensemble. The combination can be performed using different strategies, such as average or median [8], [29]. If only one single model is selected (n = 1), the forecasting of this model is the final forecasting.

IV. EXPERIMENTAL PROTOCOL
An experimental evaluation of the proposed approach was conducted with a set of ten univariate time series (described in Table 1): Goldman Sachs, Sunspot, Star Brightness, Amazon, Apple, Microsoft, Vehicle, Wine, Pollution, and Electricity [40], [41]. 2 These time series were chosen because they are widely used in the literature and have distinct behaviors regarding the presence or absence of seasonality, stationary, and trend [16]. Figure 5 shows the plot of each time series used in the experimental evaluation. Each time series was normalized into the interval [0, 1]. After, the observations are organized into time sliding windows, composed of a maximum of 20 lags [42], where these lags are selected using the Auto-correlation function (ACF) [43]. Each time series was split into three sequential samples with the following proportion: 50% for training, 25% for validation, and the last 25% observations for testing.
The bagging algorithm [37] that performs sampling with replacement was used to generate a diverse pool. Each bag  has the same size as the training sample, and it was split into 67% for training the model and 33% to validate the model.
The forecasting model pool is composed of 100 Support Vector Regressors (SVRs) based in [9]. SVR is a stable and robust model [44], which has attained significant performance accuracy in the forecasting task [40], [45]. A grid-search approach was employed for selecting the best combination of the hyper-parameters for each model.  The DSNAW has two parameters: k (size of the region of competence) and n (number of selected models). For both k and n parameters, the values are defined in the range [1,20]. These values for k and n expand the interval evaluated in literature [46], [47]. When n = 1, one forecaster model is selected (Dynamic Predictor Selection task) and when n > 1, an ensemble of models is selected (Dynamic Ensemble Selection task). For the latter case (n > 1), the average and median operators are employed in the combination phase. These combination approaches are commonly employed because it is fairly simple and obtains accurate forecasting [8], [29]. The parameters k, n, and the combination operator are defined are selected through the validation set for each time series (Table 3).
The DSNAW was compared against 12 literature approaches. These approaches can be grouped into four main categories, described in the next paragraphs.
Single Models group is composed of the traditional statistical models and Support Vector Regression (SVR) model [48]: • ARIMA [49]: the ARIMA parameters were estimated for each data set using the auto ARIMA python library [50]; • Exponential Smoothing (ETS) [51]: the ETS parameters were estimated using the Statsmodel library of the Python [52]; • Random Walk Forecasting (RW) [53]: RW model was modeled using the ARIMA(0,1,0). Its parameters were estimated through the Statsmodel library of the Python [52]; • SVR: named herein of Monolithic. The hyperparameters were selected using a grid-search (values in Table 2). Dynamic Predictor Selection class is comprised of the following approaches: • DS-LA (1): dynamic selection approach that employs parameters of the literature [46], [47]. The model is selected based on its performance in the region of competence composed of the time windows from in-sample, which are more similar to the test pattern; • Temporal-window Framework (TWF) [54]: approach that trains the models using specific partitions of the time series. Given a test pattern, TWF selects the model trained in the partition more similar (calculated via the dynamic time warping algorithm [55]) to this new pattern. The TWF parameters were defined based on [54]. Full Pool group is formed of the approaches that combine all forecasting of the Pool generated by Bagging approach: • Bagg A [37]: Bagging of SVRs combined by average; • Bagg M [37]: Bagging of SVRs combinated by median. Dynamic Ensemble Selection class is composed of the approaches that selects an ensemble and combines them: • DS-LA: dynamic selection approach [46], [47] employing parameters values selected in the validation sample for each data set; • DES A : Dynamic Ensemble Selection combined by mean using parameters defined in the literature [23]; • DES M : Dynamic Ensemble Selection combined by median using parameters defined in the literature [23]; • DES-PALR [9]: DES -Predictor Accuracy over Local Region selects a set of forecasting models with the higher performance in the cluster (region of competence) that have the center closer to the new test pattern. All models were assessed in the one step ahead forecasting using the test sample. The evaluation of the results was  [6], [32].
The following nomenclature is used in the definition of the performance metric (Equation 2 to 8): N is the number of observations, output is the predicted value by the model, target is the actual value of the time series, and t corresponds to the respective time (t) at the sample. The average, maximum and minimum values of the sample are represented by target, target max , and target min , respectively.
MSE is a measure commonly used in the literature [32] to evaluate forecasting models. MSE is defined by the following equation: RMSE is the square root of the MSE value. The RMSE results are in the time series interval and hence can be more interpretable than MSE. The RMSE is defined as: The normalized RMSE (NRMSE) is used to compare the performance of models in the time series forecasting with different range values. The NRMSE is defined by the following equation: MAPE computes the average forecast error percentage regardless of the scale of the values and indicates the forecast error margin in percentage. The following equation defines the MAPE: The SMAPE measure evaluates the percentage of average absolute error independent of the scale of the values. The following equation defines the SMAPE: ARV is a relative metric that compares the model's forecast with the forecasting of time series using the mean. If the ARV value is less than 1, the model's accuracy is better than the use of mean; otherwise, the model is worse than the forecasting by the mean. If the ARV value is equal to 1, the forecasting model is equivalent to the mean. The ARV is defined by the following equation: MAE is commonly applied to measure the error of the model on average. The following equation defines the MAE:

VOLUME 9, 2021
The Diebold-Mariano (DM) [56] statistical test was applied to verify whether the proposed approach has a performance statistically equal (pValue > 0.05) to the literature approaches. If the performances are statistically different (pValue ≤ 0.05), the best model is the one with the lowest MSE value. Equation 9 was used to measure the performance percentage difference between the proposed approach and the literature approaches. ε a is the performance of the base method, and ε b is the performance of the proposed approach. The proposed approach is better than the base method when the ratio value is positive, which means the error of the base method (ε a ) is higher than the error of the proposed approach (ε b ). This measure is interesting because it allows quantifying the difference of performance between two approaches [16].
The source-code was implemented in the Python programming language using the Sklearn [57] library and performed on a computer with an Intel Core i7-7500 CPU and 20 GB RAM. Table 4 shows the results of the proposal (DSNAW) and literature approaches using seven well-known performance metrics: MSE, MAPE, ARV, MAE, RMSE, NRMSE, and SMAPE. The DSNAW achieved better values in most performance metrics than single models in 6 out of 10 data sets. For Apple, Pollution, Sunspot, Wine series, the proposed method attained lower performance than at least one single model. In the Apple series, RW obtained better performance than DSNAW. For other data sets, RW, monolithic, and ARIMA models attained better measure values for Pollution, Sunspot, and Wine, respectively.

V. EXPERIMENTAL RESULTS
Compared with Dynamic Predictor Selection approaches (DS-LA (1) and TWF), DSNAW reached the best overall results in 9 out of 10 series. DS-LA (1) attained better error metrics values only for the Star series. TWF attained better values only for specific metrics in some data sets, such as ARV in Apple, Pollution, and Vehicle.
Regarding Ensemble approaches, the proposed method reached smaller measure values in 9 out of 10 data sets. DSNAW obtained a worse overall performance than two ensemble approaches, Bagg A and Bagg M , for the Star series.
Concerning the four Dynamic Ensemble Selection approaches employed in the comparison, the DSNAW attained the best general performance in 9 out of 10 time series. The DS-LA, DES A , DES M , and DES-PALR approaches reached the best accuracy in terms of the set of measures for the Star series. Table 5 shows the percentage difference in performance between the DSNAW and other approaches in terms of MSE using Equation 9. The proposed approach attained a positive gain compared with the other approaches in most time series evaluated. The proposed approach reached a highlighted performance in several cases. For instance, DSNAW attained a percentage gain greater than 80% regarding most of the literature approaches for the Amazon, Apple, Electricity, Goldman, Microsoft, and Wine series. For the Pollution and Vehicle sets, the proposed approach reached a gain greater than 50% regarding most literature approaches.
In the Star and Sunspot series, poor performance was reached, where the DSNAW loss was up to 28%. We can associate the performance percentage difference of the proposed approach to selecting windows with the same distribution in the region of competence. In most time series, where the literature dynamic selection approaches have the GMC value smaller than 70% (see Figure 2), the DSNAW has a performance 50% greater than other methods. Based on these results, we can correlate the high performance of the proposed approach with generating a region of competence composed of the high occurrence of windows with the same behavior. Table 6 shows the statistical comparison between DSNAW and literature approaches using the DM hypothesis test. The symbols ''+'', ''−'' and '' '' mean that DSNAW attained better, worse, and equal accuracy than the concurrency, respectively. The last three rows show the summary of the comparison, where Win represents the number of time series the proposed approach obtained statistically significant results (less than 0.05), Loss represents the number of times the proposed approach obtained worse results than the literature approach, and Tie shows in how many time series the performance of proposed approach and the literature approach was similar. The proposed approach attained a statistically better or equal MSE value than literature approaches in 8 out of 10 data sets. For the Apple series, the DSNAW obtained a lower performance only regarding the RW model. For the Star series, Bagg A , Bagg M , DES A , and DES M approaches reached a better MSE value than the proposed approach. So, considering the 120 direct (10 data sets x 12 approaches) comparisons performed with literature approaches, DSNAW obtains 91 wins, 24 draws, and 4 losses. This result shows a significant performance attained by the proposed approach. Figures 6 and 7 show the forecast of the DSNAW and the two best literature approaches for the Star and Amazon series, respectively. Although there are differences in the performance metrics for the Star series, Figure 6 shows that the forecasts of the selected models are very close. Indeed, Table 4 shows that the forecasting approaches based on ML models attain similar results. On the other hand, Figure 7 shows a significant difference in the forecast of the models for the Amazon series. It is possible to verify that the approaches generate distinct forecasts for the test set of the Amazon series.

A. DISCUSSION
In the previous section, DSNAW presented promising results in terms of accuracy regarding several literature approaches. In general view, MPS (approaches based on ensemble) attained better results than single models. Tables 4, 5, and 6 show the superiority of these MPS based approaches and corroborate with literature findings [9], [12]- [14], [18]. VOLUME 9, 2021    From the analysis of the accuracy of the MPS, it is possible to verify that DPS and DES approaches present higher variability in the accuracy than Ensembles. As all MPS used in this work employed the same pool, it can be inferred that this variability is related to how the forecasting models are chosen. This selection is performed from RoC, and its definition is closely related to system accuracy, as highlighted in the literature [9], [12], [13], [17].
The literature approaches create the RoC using temporal windows of the training and/or validation data according to some similarity criterion. However, there is no guarantee the existence of patterns really similar to the new test patterns since the data can be noisy, or the generator phenomenon of the data can have changed over time. On the other hand, DSNAW creates the RoC using the available temporal records closer to the test pattern. The objective is to increase the chance of using more similar temporal windows to the test pattern. So, an accurate forecasting model in the defined RoC is a promising candidate to forecast the test pattern. This strategy can overcome the issues mentioned earlier once the RoC is generated from the latest temporal data.
Moreover, the DSNAW an adaptable proposal since it defines which dynamic selection approach is the most promising between DPS and DES from their performance in the validation set.

VI. CONCLUSION
This paper presented a novel dynamic selection algorithm for time series forecasting called Dynamic selection based on the nearest windows, DSNAW for short. DSNAW is a multiple predictor system that selects the best models per query instance instead of combining all the models.
One key point in dynamic selection algorithms is the definition of the region of competence, a set composed of time windows that belong to the validation or training set. This region of competence is employed to select the best models per test pattern. A common alternative to select the time windows for the region of competence is to search for the patterns that have a minimum distance to the test pattern, as used in [23]. DSNAW is based on the assumption that the time windows close to the test pattern are more likely to have similar behavior to the test pattern than the previous strategy. Thus, a model that performs well in the region of competence formed by the closest windows to the test pattern is a promising candidate to forecast it.
DSNAW employs this new way to define the region of competence jointly by choosing the most suitable approach between Dynamic Predictor Selection or Dynamic Ensemble Selection for each data set under study. The experimental evaluation was carried out using ten real-world time series. The proposed approach reached better performance values than literature models (single models, ensembles, and dynamic selection approaches) in most time series evaluated. These promising results obtained by DSNAW show that it selected more competent models per test pattern than the concurrence since it better defines the composition of the region of competence.
As future work, given that choosing between the two contexts of the DSNAW (DPS or DES) is time-dependent, we intend to propose a meta-test that aims to select one context or another depending on the test pattern under analysis. His current research interests include machine learning, computer vision, pattern recognition, multiple classifier systems, imbalanced learning, instance reduction, meta-learning, text mining, neural networks, handwriting recognition, document understanding, and image processing. VOLUME 9, 2021