Maximum Visibility: A Novel Approach for Time Series Forecasting Based on Complex Network Theory

This article presents Maximum Visibility Approach (MVA), a new time series forecasting method based on the Complex Network theory. MVA initially maps time series data into a complex network using the visibility graph method. Then, based on the similarity measures between the nodes in the network, MVA calculates the one-step-ahead forecasts. MVA does not use all past terms in the forecasting process, but only the most significant observations, which are indicated as a result of the autocorrelation function. This method was applied to five different groups of data, most of them showing trend characteristics, seasonal variations and/or non-stationary behavior. We calculated error measures to evaluate the performance of MVA. The results of statistical tests and error measures revealed that MVA has a good performance compared to the accuracy obtained by the benchmarks considered in this work. In all cases, MVA surpassed other forecasting methods in Literature, which confirms that this work will contribute to the field of time series forecasting not only in the theoretical aspect, but also in practice.


I. INTRODUCTION
Among the strands in the field of Time Series Analysis is the creation of methods to study time-correlated data and make predictions [1]. This is a dynamic research area that has attracted the attention of the scientific community over the past few decades [2]. Applications of time series modeling and analysis encompass several fields of science, including: medicine [3], robotics [4], cyber defense [5], defense strategy [6], army mission analysis [7], finance [8], [9], social sciences [10], economics [11], seismology [12] and criminology [13]. In order to make estimates of the future terms of a time series, it is necessary to make the hypothesis that each observed data is somehow correlated with past data. Thus, it is natural to establish mathematical models that try to The associate editor coordinating the review of this manuscript and approving it for publication was Dost Muhammad Khan . explain such correlations and describe the observed data as a function of time.
Autoregressive integrated moving average (ARIMA) is one of the most popular forecasting methods in the literature [2], [14], [15]. This model is attributed to George Box and Gwilym Jenkins [15] and is essentially a linear regression model that describes movements of a stationary and univariate time series, that is, time series free from trends [16]- [18]. ARIMA is also a combination of other models, such as the autoregressive (AR) [16], moving average (MA) [16], [17], [19] and the autoregressive moving average (ARMA) [16], [20]. In the case of a time series with seasonality, SARIMA, a variation of ARIMA, can be applied [21]. There are also some extensions that are applicable to nonlinear [22] or non-Gaussian [20] time series.
Data mining is defined as the ''process of automatic extraction of knowledge from large databases, using machine learning, pattern recognition, database techniques, and statistics'' [23]. According to [24] the six pure Machine Learning methods submitted in the M4-competition performed poorly, none of them outperforming the combination approaches and only one achieved accuracy better than Naive2. Under the influence of forecasting competitions, several combined methods have been created in the quest to achieve better accuracy [24]. See these examples: in [25] a hybrid technique based on bagging approach is proposed to achieve more accurate forecasts; [26] shows a combination between ARIMA and artificial neural network (ANN) to predict internet traffic data; [27] calibrated an ensemble structure based on ARIMA, multilayer perceptron (MLP) and long short-term memory (LSTM) with the objective of predict healthcare expenditures; and [28] which presents some hybrid approaches, including mixing the exponential smoothing model and ANN in order to forecast the price of agricultural products.
A wealth of content can be found in the literature on time series data mining, some of which are based on complex networks [29]- [32], [33]. In [30], one can find an approach for time-series forecasting based on complex network analysis. Specifically, this method is based on node similarities [34] extracted from a network constructed using the visibility graph method [35]. In this article, it is referred to as the Mao-Xiao approach (MXA). Its accuracy is better than other methods for a variety of applications including methods using fuzzy logic [36], [37], bandwidth intervalbased forecasting methods [38], and hybrid fuzzy logic and visibility graph approach [39].
This study presents Maximum Visibility Approach (MVA) which is a new time series forecasting technique based on the complex network theory. This presentation follows the subsequent steps. Initially, some basic concepts, necessary to understand the method proposed here, are revisited and then MVA is described. The results of the tests carried out with five different datasets are then presented, with the objective of comparing the performances obtained by the MVA and the benchmarks: a state-of-the-art Mao-Xiao approach (MXA) [30], dynamic auto-ARIMA, support vector machine (SVM) [40], long short-term memory (LSTM) [41], multilayer perceptron (MLP) [27], hybrid additive ARIMA-ANN [26], hybrid additive ETS-ANN [42] and naive estimation.

II. BASIC THEORY
This section provides a brief presentation of the basic concepts and techniques required to describe the MVA.

A. VISIBILITY GRAPH
According to [35], the visibility graph transforms a time series into a complex network. Each time-series observation is associated with a vertex of a complex network. Let (t i , y i ) and (t k , y k ), be two distinct points of a time series. Take any other point (t u , y u ) between them. The points (t i , y i ) and (t k , y k ) are considered visually related if the following rule is FIGURE 1. Scheme for visibility graph approach: A 10-elements time series is mapped into a graph where each node represents a bar, and each edge represents the connection between two bars. Source: Drawn by the authors. satisfied: In Fig.1, the ith bar relates to the ith network node. The rule described by equation (1) is illustrated in Fig.1. It can be seen that each connection between two bars corresponds to a link between the two respective nodes in the complex network.
The visibility graph theory has established itself as an efficient approach for probing the dynamics underlying real complex systems from time series [31]. According to [35], the network obtained from this particular mapping inherits several properties of the time series, and its study reveals nontrivial information about the series itself. This mapping makes the structure of the time series be conserved in the graph topology: periodic series convert into regular graphs, random series into random graphs, and fractal series into scale-free graphs [35].

B. PREVIOUS WORK: THE MAO-XIAO APPROACH
The forecasting method presented by [30], here in this paper, called the Mao-Xiao Approach, is an innovative methodology to provide time series forecasting based on complex networks. Essentially, a time series with n observations is transformed into a network through the visibility graph method [35]. Second, the Node Similarity Matrix (NSM), that is, the matrix that contains the similarity measures between each two pairs of nodes, is calculated by the method explained in [34]. The last line of the NSM gives the similarity between the last node and each of the first n − 1nodes, then the node k, with the highest similarity value, is taken and a linear equation is calculated considering both nodes: N n (t n , y n ) and N k (t k , y k ). The first estimation of the (n + 1)-th point is VOLUME 10, 2022 calculated by (2): Next, a refining is proposed according to (3): The weights w n and w n+1 are, respectively, calculated by: and In (5) and (4), the notation d k→n refers to the horizontal distance between observations n and k [39], therefore, considering the points (t k , y k ) and (t n , y n ), hence d k→n = t n − t k .

C. SIMILARITY BETWEEN TWO NODES IN A GRAPH
Two nodes in a graph are said to be similar if they share the same predefined characteristics. Some techniques produce measures of similarities between vertices of a network and according to [34], [43], [44] such measures are the basis of the link prediction theory. For instance, can be cited similarity based on random walk process [30], [33], [34], the cosine similarity [45], the Jaccard similarity [46], and the inverse log-weighted similarity [47].
In this work we use the Dice similarity [48]. In order to present its definition, consider n( ) as the cardinal of the set . In the Complex Network context, given any two vertices of a network, the Dice similarity [48] is calculated taking twice the number of common neighbors over the sum of the degrees of both vertices, where N (v i ) relates to the neighbor set of the vertex v i and ρ(v i ) is the vertex degree of the vertex v i . It is not difficult to conclude that the Dice index range is from zero to 1. The result is zero if both vertices does not share any neighbor and it is 1 if both neighbors sets are equal.

D. SLIDING WINDOW
Both the autocorrelation function (ACF) and the partial autocorrelation function (PACF) quantify the influence of past observations on the current value of the time series. Additionally, ACF or PACF can indicate whether previous values cause, or not, a significant impact on the current observation. Considering that some past values may not be statistically significant to predict the next term of the time series, it is reasonable to disregard such data in the forecasting process. The set of past observations considered in the forecast is defined as a window. Given the set Y = {y 1 , y 2 , . . . , y n }, a sliding window, Z i , over Y is a subset with fixed number of elements, k, which, during the simulation, changes at each iteration i, according to the following rule: . . . , Z n−k+1 = {y n−k+1 , y n−k+2 , . . . , y n } .
The sliding window process is based on the methodology walk-forward validation presented by [49].

E. STATISTICAL HYPOTHESIS TESTING
In order to analyze the quality of the prediction, it is necessary to use hypothesis tests. Initially they are applied to investigate the characteristics of the original time series and finally to evaluate the characteristics of the forecast residuals.
With the purpose of testing whether the data is level or trend stationary we apply the Augmented Dickey-Fuller (ADF) test [50] and confirm the obtained results with the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test [51].
To check whether the data is independently distributed we use the Ljung-Box test [52]. This test is essential to verify whether ARIMA residuals have white noise signal characteristics. With the objective of identifying the data seasonality we perform the Webel-Ollech [53], which is basically a combination of the methods modified QS [54] and Friedmann [55] tests.
2) Mean Absolute Percentage Error (MAPE), 3) Root Mean Squared Error (RMSE), The performance indicators were used to compare the simulation results obtained in this work.

III. MVA: NEW FORECASTING METHOD
The Maximum Visibility approach (MVA) is a new method based on the complex network theory, inspired by the technique proposed in [30], which, in this work, is called the Mao-Xiao approach (MXA). The first step of the MVA considers the autocorrelation function (ACF) related to the given time series. The ACF gives the correlation between the last observation and any kth observation, hence this step guarantees that only the statistically significant elements will be used in the forecasting task. The next step is to transform the given time series into a complex network. The literature presents some methods which are able to map a given time series into a complex network: visibility graph [35], horizontal visibility graph [56], multiscale limited penetrable horizontal visibility graph [31] and phase space reconstruction [57]. In this work, we chose the visibility graph approach due to the technical reasons described in the section II-A, but mostly due its ease of implementation and low computational cost.
After constructing the network, we calculated the node similarity matrix (NSM). In this matrix, each s i,j represents the similarity calculated between nodes N i and N j .
Letŷ (n+1)/k be the pre-estimator for the time-series next term, that is, y (n+1) , considering only the influence of y k . This pre-estimator is calculated using a linear equation, based on the connection between both observations: y k and y n . We introduced a multiplicative parameter, α, to the original angular coefficient, as shown in (11). Consider the points: (t k , y k ) and (t n , y n ). The expression that givesŷ (n+1)/k is: where α is divided into two parts: a constant K ≥ 0 and γ n,k , which are the autocorrelation between y k and y n .
Observing the equation (11), the closer α is to zero, the greater is the effect of y n onŷ (n+1) , approximating the forecasting result to that obtained from the naive estimation.
We define Y m (n+1) as the vector that contains all preestimators for y (n+1) , calculated through the influence of the first m lags. Y m (n+1) is written as: Let s i,n be the similarity value between nodes N i and N n , associated to the time-series terms y i and y n , respectively and calculated based on the Dice similarity, according to described in the equation (6). If the last m nodes, prior to y n are considered in order to estimate y (n+1) , then the NSM has dimension m + 1 and the last row of the NSM contains the similarities between all last m observations and y n . We define S m n as the vector formed by the first m elements of the last row of the NSM. Therefore, S m n is seen as: S m n = s n−m+1,n , s n−m+2,n , . . . , s n−1,n .
It is possible to find s k,n = s i,n with k = i, therefore more than one node can share the highest level of similarity with y n . Define I S m n as the vector composed by the indexes of the elements contained in S m n which present the highest level of similarity to y n . The one-step-ahead MVA estimator,ŷ (n+1) is calculated as: where h (E n ) is a non-linear function of the error obtained in the estimation of y n , that is E n =ŷ (n) − y n . The non-linear function h (E n ) is given by: where J is positive constant. For some time series, (15) is taken by observing the results achieved by MVA in the validation phase.

A. THE MODEL PARAMETERS
According to presented in the equations (12) and (16) there are two parameters, K and J , respectively that must be defined before the forecasting process. In order to find the best values for K and J capable of providing the best fit in the data we proposed a partitioning of the original time series into three sets: training, validation and testing sets. The simulation has to be performed on the training and validation sets many times as necessary and after that, the best values of K and J are applied to the testing set to ascertain the method's ability to make predictions. In order to generate the results presented in the section IV we kept J constant and variate K , until find the best sum of squared error (SSE) considering the validation set. For example, initially take K from 1 to 6, using step of 1. Suppose K = 3 gives the best SSE in the validation set. Secondly refine the search considering K from 2 to 4, in steps of 0.1. Suppose K = 2.6 gives the smallest SSE in the validation set. Now refine the search from 2.5 and 2.7, using steps of 0.01. Suppose K = 2.66 gives the smallest SSE. Refine the pursuance one more time, searching from 2.65 to 2.67, with steps of 0.001. Register the value of K associated to the smallest SSE. This procedure is repeated considering other values of J .
We started with J ∈ {1, 10, 100, 500}. For most cases, higher values of J produces the effect of not considering the non-linear term. After testing some values of J we create a list of the obtained SSE's. Given the randomness of the real time series, there is no guarantee that the pair (J , K ) related to the smallest SSE achieved in the training and validation phase will produce the smallest SSE in the testing set, however, we kept this procedure to find the pair (J , K ) to be used in the testing phase.

B. ALGORITHM DESCRIPTION
We present two algorithms: (1) Steps of the MVA forecasting task, where the only interest is to calculate the MVA onestep-ahead estimator when the parameters K , J are known; and, (2) Steps to provide accuracy comparison against other methods.
Consider a n-element time-series {y 1 , y 2 , . . . , y n }. Firstly we define the window size w. This is done by taking the first (w − 1) lags, which are in the statistical relevance region provided by the ACF graph, see section II-D.
3) Transform the window into a network using the visibility graph approach. See section II-A. 4) Calculate the Node Similarity Matrix (NSM) using the Dice Similarity approach. See (6). 5) Create S (w−1) n , the last row of the NSM. See (14). 6) Find and register the nodes associated to the the greater value in S (w−1) n . 7) Calculate the pre-estimators related to the nodes registered in the previous step. See (11). 8) Calculateŷ MV ,(n+1) , taking the maximum of the pre-estimators calculated in the previous step. See (15). With the purpose of compare MVA accuracy against other forecasting methods, we recommend using the steps presented in the algorithm 2. The sets K and J contains the values of k and J , respectively, which will be tested in order to find the best MVA parameters to be used in the test set. Also, if one is only interested in estimate the next term of the time series using MVA but wants to search for the best parameters, then the algorithm 2 applies, but the validation and training sets must merge to contain (n − w − 1) elements and the test set has to be composed by the last element y n . The search for the best parameters can be done based on the methodology described in the section III-A.

Algorithm 2
Steps for Accuracy Comparison Against Other Methods Read the i-th window: Calculateŷ w+i using the algorithm 1. Calculate and register Take the pair (J * , K * ) associated to the min {SSE}.

C. WHY MVA WORKS
At the beginning of Section III, MVA is defined as a forecasting technique based on the complex network theory, more specifically, MVA seeks for relations between nodes of a complex network to provide forecasts. In a complex network, two nodes tend to be linked if they have a higher similarity index [30]. Therefore, taking into account that MVA provides forecasts based on the previous time-series observations which present the greatest similarity index with y n , we can expect that MVA will succeed in capturing the patterns contained along the time series. Why MVA is called maximum? Because the final estimator is obtained taking the maximum of the pre-estimators. This means that the method is creating a new term that, based on the visibility graph approach, will be linked to y n and to all other y k which has the same highest level of similarity with y n .
The natural visibility graph maps a time series into a complex network based on the direct visibility criteria, that is, two points in the time series becomes a linked pair of vertices in the network, if and only if, there is an unobstructed straight line connecting both points in the time series plot.
Given the definition of the visibility graph mapping, all points y k which has the maximum level of similarity with y n have an unobstructed link toŷ n+1 . Therefore, if we take as one-step-ahead the maximum of the pre-estimators, so we are forcing that all points, which originated the pre-estimators, be connected to the next-term estimator in the complex network domain.

IV. RESULTS AND ANALYSIS
In this section, we present the performance comparison among the forecasting approaches: MVA, MXA [30], ARIMA, SVM, LSTM, MLP, Hybrid additive ARIMA-ANN (HAAA), Hybrid additive ETS-ANN (HAEA) and naive estimation. To measure the accuracy quantitatively, we calculated the error measurements defined in the equations (8), (9), and (10). It is important to point out that all forecasting processes are one-step-ahead type.
MXA was chosen as a comparative method because, according to [30], it is considered a successful complexnetwork-based forecasting method with a good prediction performance. The ARIMA model was chosen because it is widely used as a benchmark in the literature, so if MVA is more accurate than ARIMA, then MVA can be taken as a benchmark against other new prediction methods. The methods SVM, LSTM and MLP are also being considered benchmarks given to the wide amount of works who use these methods to provide time-series forecasts. Inspired on the results of the M4-competition, presented by [24], combination methods tend to perform better than the pure one, therefore, we considered important compare the MVA forecasting results to other succeeded hybrid methods. We choose the Hybrid additive ARIMA-ANN (HAAA) and Hybrid additive ETS-ANN (HAEA). Lastly, we also considered the naive estimation because, despite its simplicity, for some kinds of time series, it has beaten ARIMA and even some hybrid models in terms of performance.
Before starting each forecasting process, we characterized the respective time series. First, by running the ADF and/or KPSS tests, we determine if each time series is level, trend or non-stationary. Next, we tested seasonal behavior using the Webel-Ollech test. Finally, through the Ljung-Box test, we check if there is any dependence between the pairs of the time-series observations.
Here, the only pre-processing job was: (a) to delete or properly replace all ''NA'', non available, data; and (b) normalizing the data according to de following rule: where Y is the original dataset. Actually, the normalization process, or any other is not necessary. We did that with the purpose of having comparable results, free of any scale. In the case of ARIMA, we used the R function,auto.arima, which automatically performs all necessary preprocessing to obtain the best ARIMA model. The size of the window used in each forecasting process was chosen considering the results of the autocorrelation function, so only the most significant terms, statistically speaking, were taken into account. It is important to emphasize that the mathematical derivations to obtain the best K and J to be considered on MVA is beyond the scope of this work. Hence for each forecasting simulation, we found the pair (J * , K * ) using simulation according to explained in section III-A. In order to compare the forecasting performance achieved from MVA, ARIMA, MXA, MLP, LSTM, SVM, HAAA, HAEA and naive estimation we chose the following time series: three classic time-series from different fields, provided by the R datasets package; a Brazilian econometric time series composed of daily closing prices of the Bovespa index; and one recursive time series that was artificially generated.
It is important to point out that the accuracies obtained by each method, considering each time series, were taken from the same test set. We are emphasizing that this measure was taken to ensure that comparisons between methods were made fairly.

A. CHARACTERIZATION OF THE FIVE TIME SERIES
This section is dedicated to showing the particularities of each time series used in this research. The appropriate hypothesis tests, presented in the section II-E, were chosen to characterize the series as stationary, seasonal or independent.
• Consider the monthly number of passengers for international airlines, in thousands of people, from 1949 to 1960. This a 144-element time series, which in the R astsa package is called AirPassengers, but which is also presented by [58]. ACP shows that the first 40 past observations impact the current value, so this set was chosen to compose the first window. In this forecasting process MVA had K = 0.886, J = 6 and the nonlinear term, expressed by equation (16) is considered in the model. The sizes of the training, validation and testing sets are 77, 36 and 31, respectively.
• In agreement with [59], 'lynx' is a time series of annual numbers of lynx trapped in Canada from 1821 to 1934. There are 114 observations in this time series. The ACF indicates that the first 77 past terms may contain significant information with respect to the current element. Additionally, the most significant values of autocorrelation between terms occur from time to time, suggesting that the series has a seasonal behavior, which was confirmed through the application of the WO test. In order to provide the one-step-ahead forecasting, we chose w = 77, and we settled MVA with K = 0.633, J = 3 and the presence of the nonlinear term in the model. The sizes of the training, validation and testing sets are 90, 13 and 11, respectively.
• Ibovespa is the most important performance indicator of stocks traded on the Brazilian financial market [60]. It was created in 1968 and has since served as a reference for investors worldwide [60]. The third time series is made up of daily closing prices of Ibovespa from January 1, 2019 to July 28, 2020. The 389 data used in this simulation were taken from the Yahoo Finance website and the ACF indicates window of length 200. According to the proper statistical tests this is a non-stationary dataset, with no trend, seasonal and with dependency among its observations. In this case, we calibrated MVA with K = 0.844, J = 150 and the presence of the nonlinear term in the model. The sizes of the training, validation and testing sets are 245, 50 and 94, respectively.
• The fourth dataset is the 60-element time series named ''nhtem'' on the R datasets package which regards the mean annual temperature, in degrees Fahrenheit, in New Haven, Connecticut, from 1912 to 1971 [61]. In this forecasting process MVA had K = 1.087, J = 500 and the presence of the nonlinear term in the model. The sizes of the window, training, validation and testing sets are 10, 30, 20 and 10, respectively.
• The last dataset comprises a recurrence-based time series with 100 observations. Each observation is given by the following law: where w t is a random variable normally distributed with zero mean and variance 1. Additionally, y 0 = 10, y 1 = 9.5, y 2 = 9 and t ∈ {0, 1, 2, . . . , 99}. In this forecasting process MVA had w = 85, K = 2.263, J = 600 and the presence of the nonlinear term in the model. The sizes of the window, training, validation and testing sets are 85, 89, 4 and 7, respectively. The Table 1 summarizes the characteristics of the five time series presented in this work, showing all their features, confirmed by proper statistical tests.

B. ADJUSTING THE MACHINE LEARNING MODELS
One of the most determining steps in creating a suitable forecasting model based on Machine Learning approaches such as ANN and SVM models is the choice of the input variables.

1) THE SVM MODELS
In this work we started the search for the best SVM hyperparameters according to was considered in [40]: C = 10 and = 0.1. The kernel choice is for the radial basis function (RBF) starts with γ = 0.1 since, according to [62] and [63], this kernel has good performance in time series forecasting problem, it is able to capture nonlinear characteristics of the time series and presents less numerical problems.
We applied the SVM models in the training set, for multiples combinations of (C, , γ ). The set which provided the smallest MAE was applied to the testing set. Some forecasting strategies were tested to perform the SVM. Firstly we compared the 10-fold cross validation technique [40] to the ''leave one out cross validation'' (LOOCV) [64]. The second one gives better results in the training sets. However, the walk-forward validation technique, presented by [49], and used in all other methods, gave the smallest MAE considering the training sets. The table 2 shows the parameters considered in the SVM models.

2) THE LSTM MODELS
An LSTM model is an recurrent network model with memory, that is, this model can remember information from earlier points in the network [27]. The architecture of an LSTM consists of units called memory cells, self-connections and gates which are special multiplicative units [41]. According to [27] the connections keeps in memory the temporal state of the memory cells and the gates control the flow of information.
We considered a LSTM model according to proposed by [27]. The hyperparameters used to reach the best accuracy in the validation set, for each time series, are described in the table 3. The column called 'hl' gives the number of neurons in the hidden layer.

3) THE MLP MODELS
According to [27], a Multilayer Perceptron (MLP) is a variant of the original model proposed by Rosenblatt in [65]. The MLP mathematical and structural model is presented by [27]. The training assignment is done based on the backpropagation approach, which is a supervised learning method [66].
The neuron functioning occurs according to described in equation (19): where f (τ ) is the activation function, y n−i is the i-th input, w is the number of inputs, and β ki and b k are, respectively, the weight and the bias associated to the input y n−i [27]. We considered a MLP structure composed by three layers of nodes: an input layer, a hidden layer and an output layer. All neurons were processed by the sigmoidal activation function, except for the output layer for which a linear activation function was used. The forecasting tasks were produced by the mlp function included in the nnfor R package. For each time series, the testing set is the same considered to all other forecasting methods and the forecasts were combined using the median operator. The table 4 shows the hyperparameters which provide the best accuracy in the training and validation sets.
To enable reproducibility, we set the seed at 2, so all randomly generated initial weights will always be the same.

C. ADJUSTING THE HYBRID MODELS
The last two time series forecasting methods are hybrid approaches. These are additive combinations of ANN with ARIMA and ANN with ETS. We consider the number of nodes in the ANN model's input layer to be equal to the window size considered for all other forecasting methods. We tried other values of w, but in all cases, the obtained accuracy, relative to the validation sets, only got worse.
According to [40] and [67], a three-layer feed-forward ANN model is commonly used for time series forecasting. We normalized the training and testing data to fit in the interval [0, 1] according to equation (17). The hyperbolic tangent function was used as the transfer function between the input and the hidden layers because it resulted in the best accuracy. We chose the linear function as the transfer function between the hidden and the output layers due to its robustness for continuous output variables [40]. The optimal  number of neurons in the hidden layer, h, was obtained by trial and error, but initially we used previous practical results [68]: h = 0.5i, where i is the number of input data. The initial model weights were chosen considering a uniform distribution in the interval [−0.1, 0.1], with seed 2. Each training assignment was performed minimizing the mean absolute error.

1) THE ARIMA-ANN MODELS
The hybrid additive approach combining ARIMA and ANN was implemented according to described in [26] using the Khashei-Bijari approach [67]. The table 5 shows the best number or nodes in the hidden layer for each time series. These parameters were defined from the accuracy obtained on the training and validation sets.

2) THE ETS-ANN MODELS
The ETS models are composed by a family of time series models with a basal state space model containing an error term (E) and a level, trend (T) or seasonal (S) components. The hybrid additive approach combining ETS and ANN was implemented according to described in [28]. The ETS-models were defined based on the R function ''ets'' from the package forecast. This function is able to calculate the best ETS model which fits with the given time series. The table 5 shows the best number or nodes in the hidden layer for each time series. These parameters were defined from the accuracy obtained on the training and validation sets.

D. COMPARING THE ERROR MEASUREMENTS
In the last three sections we described the five datasets used in this work and we presented all eight forecasting techniques and their respective set of parameters considered in each forecasting process. In this section we show the results obtained in terms of MAE, MAPE and RMSE, according described in the equations (8), (9) and (10)  based on the MAE, MAPE and RMSE values. For this time series, MVA had its superiority confirmed by the Diebold-Mariano(DM) test when compared to all methods except for LSTM and SVM for which the quality of the forecasts are statistically equivalent. Since the DM test did not find any method to be superior to MVA and considering that its errors are the smallest, we can conclude that MVA is the most accurate method, among all those who participated in this study, to make predictions for this time series.
Considering the results of tables 7 and 9 for the Lynx time series, MXA, naive, Additive-ARIMA-ANN (HAAA), LSTM and MLP are less accurate than MVA, quantitative and qualitatively given that their values of MAE, MAPE and RMSE are higher than MVA's, and the DM test indicates MVA's superiority. However, MVA, ARIMA, Additive-ETS-ANN (HAEA) and SVM showed similar results considering error measures and statistical comparison. In order to illustrate the closeness of these methods to the real values in the test set, we have plotted the comparison graph for lynx time series and presented it in Fig. 3.  methods. Naive is the only method which provided similar results, statistically speaking, confirmed by the DM test. Definitely, in order to produce forecasts for the IBOVESPA time series, MVA is the most indicated method among all methods considered in this study.
The fourth time series, was chosen precisely because it is a dataset with characteristics that are difficult to define. How would all methods respond in the forecasting process for this time series? It is classified as trend stationary, seasonal and not-independent dataset, however, the p-values obtained from each hypothesis test is very close to the 10% type I error. It was expected that most methods would present statistically similar results, therefore the most accurate method would have to be defined based on the error measures. According to the results presented in the table 7 MVA outperforms all other methods in terms of MAE, MAPE and RMSE, thus, it is considered the most accurate method in the forecasting assignment for this time series.
Lastly, we present the results for a recurrence based time series artificially created. This dataset was created with the intention of favoring the ARIMA method, since the sequence is based on an AR(3) model. ARIMA was expected to achieve the best results, however, MVA had the best accuracy. The conclusion is similar to that obtained in the case of the air passengers time series, since MVA obtained the smallest errors, but tied with other methods in the statistical comparison of the quality of the estimation. None of the considered methods was classified by the DM test as superior to MVA.
We consider important to emphasize that all MVA forecasts, presented in this work, were obtained based on using the Dice similarity. We considered using other methods, such as Jaccard [46] and the Inverse Log-weighted [47], both cited in the section II-C. In order to compare the results, the obtained RMSE was taken into account, for each time series.
The Jaccard method led to the same results obtained from the Dice similarity index, considering the five time series presented here. The Inverse Log-weighted similarity produced similar results to those presented here. Better if compared to that obtained from Dice, for the Lynx time series and worse in the other cases. The comparative analysis of the MVA accuracy based on the different measures of similarities is beyond the scope of this article, but we ratify that this is an important topic to be considered since improved versions of MVA can be developed.

E. TIME COMPLEXITY OF MVA
As we commented that one of the advantages of MVA is its low computational cost, it is important to present a brief discussion about the time complexity of the method. Defining n as the size of the window to be mapped in a graph, consider the following worst-case time complexity analysis for each step of the MVA algorithm. All these steps are performed sequentially, so the complexity of the whole algorithm is given by the maximum among the individual complexities, that is, O(dn 2 ). Obviously, the maximum value for d is (n − 1), so in the worst case scenario, our code is O(n 3 ). However, this is very unlikely, given the inputs are real time series, so, rarely, is one observation connected to all other observations. In summary, the complexity for MVA depends on the maximum degree of the vertices in the graph, but based on the randomness of the real time series, this maximum degree is relatively low, for example, considering the time series used in this article, the maximum degree were always less than 5% of the time series length. Therefore, MVA can be considered approximately O(n 2 ) in most applications.
Just for an example, we ran the simulation of the MVA with fixed K and J , considering 95% (also very unlikely) of the data to compose the window. So the algorithm just read the data, normalized it, constructed the complex network, calculated the dice similarities of all pairs of vertices, found the greater value among all values, calculated the pre-estimators considering only the points which presented this greater level of similarity with the current observation, and finally took as the MVA onestep-ahead estimator the maximum value among all preestimators. We ran this simulation for an artificial time series AR(1) composed by: 100, 500, 1000, 5000 and 10000 points, which means networks with 95, 475, 950, 4750 and 9500 vertices. The execution times were, respectively, 0.007, 0.183, 0.340, 6.827 and 27.927 seconds, which is in consonance with quadratic time complexity. The simulations were performed in a Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz with 8.00GB (RAM) processor based on x64 under Windows 8.1. VOLUME 10, 2022

F. MVA: MAIN ADVANTAGES AND DISADVANTAGES
Among the advantages observed with the use of MVA in the forecasting task are the ease of implementation, low computational cost and adaptability to different types of time series, considering characteristics of stationarity, seasonality and dependence between observations. This study showed that the only pre-processing necessary is to delete or properly replace the 'NA' data (not available) if any, no further preprocessing is necessary, thus, the prediction process can be carried out considering the original data. Given the inclusion of the non-linear term, MVA is able to quickly correct the error made in the previous prediction. We can point out as a disadvantage the lower capacity to predict the reversal of the trend of the time series. The biggest errors produced by MVA were mainly at trend reversal points.

V. CONCLUSION
In summary, this study presents the Maximum Visibility Approach (MVA) as new time series forecasting method. To calculate the forecast with MVA, we used the visibility graph technique to map a given time series into a complex network, and then calculate the node similarity matrix using the Dice approach. All nodes with the higher similarity index with the node associated to the last time-series observation are eligible to impact the calculation of the MVA one-step-ahead estimator,ŷ MV ,(n+1) .
Just in order to compare the performances, we chose the comparative methods ARIMA, the Mao-Xiao approach [30], naive estimation, support vector machine (SVM), artificial neural networks (ANN), long-short term memory (LSTM), multilayer perceptron (MLP) and the hybrid additive models ARIMA-ANN and ETS-ANN. The nine methods discussed in this work were applied to five different time series. Each dataset has some or none of the characteristics: seasonal behavior, level or trend stationarity, and dependence between pairs of observations. It is important to note that the time series used in this work also have different time horizons: daily, monthly, annual, and weekly data. The performance results were measured using the mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean squared error (RMSE). Additionally, the Diebold-Mariano test [71] was considered to evaluate the statistical comparison between accuracy achieved by MVA and each benchmark.
Considering the MAE, MAPE and RMSE values, MVA outperformed all methods for all time series, except for the Lynx time series, in which the methods ARIMA, additive ARIMA-ETS and SVM reached the same level of accuracy as that obtained by MVA. This set of results reveals that our proposed method is capable of presenting at least similar accuracy when compared to other well-established methods in the literature, considering varied applications and datasets with different time horizons. Therefore, our method can contribute not only in the academic field, but also in practical aspects.
According to discussed in the section IV-F, we can point as advantages of MVA the ease of implementation, low computational cost, in most applications O(n 2 ), and the adaptability to forecast multiple types of time series, considering characteristics of stationarity, seasonality, and dependence between its observations. Another advantage is the fact that, except for excluding or replacing the nonavailable data (NA), the forecasting process can be carried out considering the original data. In the case of performing sequential forecasts MVA is able to quickly correct the error made in the previous forecasting given the inclusion of the nonlinear term in the model. We consider the disadvantage of MVA its low ability to predict trend reversal in time series. The biggest errors produced by the MVA were mainly in the reversal points.
Despite the good forecasting performance of MVA shown in this work, this method can be improved. As future works, we can mention: the investigation of the impact on accuracy obtained by using different time series mapping methods in complex networks; the creation of a hybrid methodology to combine MVA with MLP, LSTM, ANN or other prediction methods based on machine learning, such as the Elman Recurrent Neural Networks (ERNN), similarly to what was proposed by [72]; develop a mathematical formulation that calculates the best parameters J and K, based on the observations which compose the training set; propose a version of MVA which provides h-step-ahead forecasts, with h > 1; study adequate data preprocessing to increase MVA's forecasting accuracy; provide the comparative analysis of the MVA accuracy based on the different measures of similarities; and develop a multidimensional version of MVA, in which more than one time series can provide information in order to improve the accuracy in the forecasting process of one or more time series at the same time.
We will end this article by making it clear that the results presented here do not mean that MVA will be able to outperform all forecasting methods, considering all time series. However, such results certainly indicate that MVA is an excellent option of approach to be considered in time series forecasting processes, both for research purposes and for practical use.