Forecasting Tourism Demand by a Novel Multi-Factors Fusion Approach

The volatility of tourism demand is often caused by some irregular events in recent years. Typically, inbound tourists are quite sensitive to various factors, including the exchange rate fluctuation, consumer price index, personal or household income or consumption expenditure. We combine these multivariate time series data onto an ingenious multi-factor fusion strategy to contribute to precise tourism demand forecasting. A novel hybrid deep learning forecasting approach is developed by integrating several modules such as improved complete ensemble empirical mode decomposition with adaptive noise, intrinsic mode functions classification, multi-factors fusion and predictors matching. The monthly tourist flow data of Shanghai inbounding from USA, Korea and Japan are conducted to verify the performance of the proposed approach, which outperforms all benchmark models for different prediction horizons. The experimental results show that introducing external influencing factors can improve the prediction accuracy significantly, and therefore confirm the rationality and validity of the proposed approach.


I. INTRODUCTION
In the last few decades, these have been a surge of interest in the research of tourism demand among academics and practitioners. Although the tourism flow is growing steadily in the medium and long term, its rapid fluctuation is usually caused by various uncertain obstacles in the short term, such as financial crisis, regional conflict and epidemic disease [1], [2]. These obstacles pose a great challenge to short term forecasting. Moreover, the perishable nature of tourism products, including vacant hotel rooms, unsold tourism services and surplus employees, can lead to an inevitable financial loss without the rational market demand guidance [3], [4]. From a macro perspective, scientific tourism demand forecasting can provide a reliable reference for tourism related departments to formulate tourist policies and sustainable development strategic plans; from a micro perspective, it can guide relevant tourism enterprises to make development plans and allocate the tourism resources rationally [5]. Therefore, to achieve The associate editor coordinating the review of this manuscript and approving it for publication was Mostafa M. Fouda . reasonable analyses and credible forecasting of short term tourism demand has always been a goal strive for [6].
As we all know, the change of tourism demand is affected by many factors. As stated by Song et al. [7], the numerous influence factors with respect to both origin and destinations, such as tourist income level, tourism price in a destination or the competing destinations, exchange rate, are the main factors corresponding to tourism demand analysis, and this view is also supported by Jiao et al. [8]. Furthermore, many researchers also believe that some crucial influence factors can be used to establish potential map relationships to make up for the insufficiency of predictors, so as to achieve the purpose of improving the prediction accuracy [9], [10]. Notably, the prices of competing European city destinations and tourist income [1], the exchange rate and tourism price in competing destinations [11] and other similar influencing factors have been employed to forecast tourism demand. Typically, Ulrich et al. [1] introduced the price of competing European city destinations and tourist income as the influencing factor and combined them with the Bayesian VAR model. Jiao et al. [11] used the exchange rate and tourism price in competing destinations to forecast the inbound visitor arrivals to 26 European cities. However, due to the low availability and time difference of influence factors data, the extraction and utilization efficiency is poor. Many existing researches combined with influence factors analyze mainly the annual changes of tourism demand, and most predictors used are relatively simple and less precision. Therefore, this study constructs a set of composite influence factors based on exchange rate, consumer price index (CPI) and others to analyze monthly fluctuations of tourist arrivals. The aim is to mine and capture the correlation between the inherent components of historical data and various influence factors. Based on this, the following problem is to regulate the difficulty of prediction task. Undoubtedly, introducing exogenous variables is necessary for improving the prediction accuracy, but it is bound to increase the systematic errors of prediction approach. How to solve this technical problem of multifactors fusion and balance the relationship between them is the primary goal of this paper.
In general, for univariate data (i.e. historical tourism demand data), most of the current researches attempt to denoise the tourist flow time series and transform irregular fluctuation into a smooth form, which is easy to capture and understand, for the purpose of improving the forecasting performance [12], [13]. In the field of short-term time series prediction, one of the popular data preprocessing technologies is extracting the empirical mode features according to different time scales by empirical mode decomposition (EMD) [14]. Then, its related derivative technologies came into being, such as the wavelet decomposition (WD) [15], ensemble EMD (EEMD) [16], [17] and complete EEMD with adaptive noise (CEEMDAN) [2]. Their main theoretical principle is extracting intrinsic mode functions (IMFs) according to the frequencies and time scale of data, which can approximately represent the law and trend of data at different stages. For a single tourist historical time series, its processed goal is to explore the potential periodic rules, trends and complex changes within the data. However, some emergencies and external factors often have a significant impact on the tourism market that can not be ignored. And these uncertain events (white noise) adversely affect the internal development of the tourism industry, and bring difficulties in grasping the future trend of tourist flow [6]. Although introducing influence factors into the decomposition strategy to explain these outliers (i.e. emergencies) can effectively grasp the dynamic of tourists flow and improve the accuracy of prediction results, a new issue subsequently arises on how to fuse the influence factors into tourism data by the preprocessing methods.
Generally speaking, the tourism demand data contains regular (trends, cycles, and seasons) and irregular components. These irregular components of the influence factors (such as the change in tourist income, exchange rate, and travel intention) are the main factors causing the fluctuation of tourist arrivals. To solve the issue of multi-source data fusion, this study innovatively proposes a decomposition ensemble learning framework integrating the influence factors into the irregular components of tourism demand data.
Usually, forecasting the most complex feature components is regarded as the most difficult challenge and as the key to improving forecasting accuracy. Existing studies show that the irregularity contained in these IMFs is generally caused by economic development, policy orientation, international situation, and other factors [18]. Given the importance and availability of these factors in prediction, this paper investigates the impact of certain economic factors on tourism passenger flow. First of all, the fluctuation of the exchange rate produces the most direct impact on the travel cost, especially for overseas tourists [19]. For example, the currency devaluation leads to an increase in the cost of studying and shopping abroad, but for foreign tourists, the cost of international travel is reduced and their willingness to journey is enhanced. Secondly, the consumer price index (CPI) is a macroeconomic indicator reflecting changes in household consumption levels over a certain period. It reflects not only the economic growth capacity of a country or region, but also the willingness of residents to consume and travel [20]. The third factor to be considered is the personal or household consumption expenditure or income, which plays a similar role as CPI and includes a large proportion of spending on transport and travel [21]. It reflects short-term intentions of consumers without delay, while there is a time lag in CPI. Considering these three factors causing short-term tourism demand to fluctuate, this study combines them with the IMFs containing the complex irregularity, thereby excavating their internal correlation by machine learning (ML).
At present, the most popular ML is the deep learning model, which mainly adopts large-scale distributed parallel processing and extends the shallow neural network to more than two nonlinear mapping layers, by imitating the functioning of human brain [22]. And its effectiveness in various situations has been verified by many researchers [23], [24]. Thereinto, the bidirectional gated recurrent unit (BiGRU) can obtain a stronger self-learning function bases on solving the problem of gradient disappearance under long-term information retention and be considered the best predictor for mining potential dependence on historical data [25]. Hence, the superior BiGRU model is employed to explore the relationship between potential factors, which is used as a bridge for multifactor fusion in this study.
Usually, for each IMF, various ML models can be adopted as predictors, such as auto-regressive integrated moving average (ARIMA) [16], nonlinear autoregressive neural network with exogenous input (NARX) [15], and back-propagation neural network (BPNN) [14]. But with these developments, each predictor assessing the tourism demand has its advantages and drawbacks. Although some of them have been widely used in the empirical analysis according to their respective strengths, no single model can deal with all types of IMF components. Therefore, it is necessary to design a model matching strategy for selecting appropriate predictors based on the characteristics of each component [3], [8]. For the low-complexity components, some studies have found that they show two characteristics, namely trend and VOLUME 10, 2022 periodic pattern. The former represents the long-term trend of low-complexity and low volatility of tourism demand, and the future trend of these components is easy to grasp [13], [26]. Therefore, we use the most classical ARIMA model to simulate the linear features of data. For the latter, although it is low-complexity, it exhibits nonlinear characteristics, and a useful nonlinear mapping model is appropriate for these components. The support vector regression (SVR) is a prediction model developed on the basis of statistical theory, some researchers found that it has stronger competition and generalization ability in small sample data, especially in tourism demand, and its prediction results are closer to the actual trend [27], [28]. Hence, SVR is adopted to deal with the decomposition components with low-complexity and obvious nonlinear characteristics. Notably, a well-designed prediction model matching strategy is greatly conducive to reducing system error. Determining an excellent predictor is vital to effectively mine the association information composed of various historical data [29].
In the face of complex datasets related to tourism demands, considering various influence factors comprehensively is a meaningful attempt. Some short-term factors affecting tourism travel are closely related to the short-term fluctuations of the tourism market. Introducing various influence factors can reasonably capture future fluctuations based on more comprehensive information. The difficulty and systematic error of prediction are bound to increase. To solve this issue, the complex task is divided into several modules. It has been confirmed that this divide-and-conquer strategy can deal with intractable problems and achieve better results [16]. On the basis of an in-depth understanding of the historical data and its related influence factors, an information architecture fusing the related influence factors, which causes strong fluctuation with the complex characteristic components of tourism demand, constitutes the core idea of our research. Inspired by the above research, this study proposes a novel multi-factor hybrid ensemble learning approach (ICEEMDAN-LXBSA) based on the improved CEEMDAN (ICEEMDAN), intrinsic mode functions (IMFs) classification and multi-factors fusion. First, aiming at reducing the complexity of the original data, the ICEEMDAN is used to decompose the monthly tourist flow datasets into several IMFs. Second, based on range entropy (RE) and maximal information coefficient (MIC), the obtained IMFs are classified into the following three categories: high-complexity (HC), low-complexity low-correlation (LCLC) and lowcomplexity high-correlation (LCHC) components. Third, some influence factors (such as exchange rate, consumer price index (CPI) and personal or household consumption expenditure or income) are fused with the high-complexity components, and then the least absolute shrinkage and selection operator (LASSO) is used to filter out redundant variables, thereby obtaining an effective feature combination with strong predictive power. BiGRU is adopted as the predictor to fit its dynamic relationship. Furthermore, SVR and ARIMA models are employed to forecast the remaining two components (LCLC) and low-complexity high-correlation (LCHC) components, respectively. Finally, the final result is obtained by summing up all the individual predictions. To verify the effectiveness of the proposed approach, an empirical experiment investigating the monthly tourist arrivals data from three main origins (USA, Korea and Japan) to Shanghai is conducted to analyze the related issues.

II. METHODOLOGY
In this section, we first introduce the decomposition algorithm, the classification method, and the prediction model. Then, the procedure of the proposed ensemble learning approach is detailedly presented. Finally, the tourism demand forecasting model is described.
As a derivative of empirical mode decomposition (EMD), the improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) is developed by the Colominas et al. [30]. The EMD is an empirical signal decomposition method introduced to analyze nonlinear and nonstationary signals, which decomposes a signal (x) into a quantity of intrinsic mode functions (IMFs). As an IMF, a signal (x) must satisfy two conditions: (1) the number of extrema and the number of zero crossing must be equal or differ at most by one; and (2) the mean value of the upper and lower envelope is zero everywhere [31]. But there exists the challenge of mode mixing, which means the existence of similar oscillations in different modes or the presence of disparate oscillations in a same mode. To overcome these defects, ensemble empirical mode decomposition (EEMD), an upgraded version of EMD, is performed with a noiseassisted signal analysis method based on the EMD framework [32]. The IMF component of the EEMD is defined as the mean of the corresponding IMFs obtained via EMD over a series of trials, produced by adding different realizations of white noise with finite variance to the raw signal (x i [n] = x[n] + w i [n]). Yet the very problem with it is that the reconstruction of signals contaminated by different noises may lead to different numbers of modes, and the reconstructed signal contains residual noise. The CEEMDAN (Complete Ensemble Empirical Mode Decomposition with Adaptive Noise), as a variation of the EEMD algorithm, is proposed to address these issues, which can achieve a negligible reconstruction error and solving the problem of different number of modes for different realizations of signal plus noise. In spite of that, the CEEMDAN still has some problems about residual noise and spurious modes. The ICEEMDAN is proposed to obtain an exact reconstruction of the original signal and a better spectral separation of the modes by introducing the operators E k (·) (producing the kth mode by using EMD) and M (·) (producing the local mean of the applied signal). For a new signal (x), two operators meet the conditions of E 1 (x) = x − M (x). The primary steps of ICEEMDAN are described as follows: 1) Get N new signal x (n) (x (n) = x + β 0 E 1 (w (n) )) by decomposing n(n = 1, · · · , N ) Gaussian white noise (w (n) ) using EMD, where β 0 = std(x)/std(E 1 (w (n) )). And the first residue r 1 is obtained by using operator M (·) and performing the action of averaging · on N new signal: 2) When k = 1, the first decomposed mode imf 1 is calculated: 3) When k = 2, the second residue is evaluated by averaging the local mean of r 1 + β 1 E 2 (w (n) ), where the adaptive adjustment parameter of Gaussian white noise is β k = k std(r k ), k = 1, · · · , K . The second decomposed mode is defined as follows: 4) The rest residue r k (k = 3, · · · , K ) can be calculated in the same manner: 5) The kth decomposed mode imf k is obtained by: 6) Return to step 4 until all the intrinsic mode functions conditions are met. The final residue satisfies: Therefore, a signal (x) can be expressed as x = K k=1 imf k + r n . In the field of time series, the ICEEMDAN has achieved a better spectral separation of the IMFs, and at the same time, it can mine effectively the intrinsic information with different features and reduce the difficulty of prediction tasks. Therefore, the ICEEMDAN, as a decomposition tool, is used to decompose the tourism demand time series in this study.

B. INTRINSIC MODE FUNCTIONS (IMFs) CLASSIFICATION METHOD
For the different intrinsic mode functions (IMFs), we first employ range entropy (RE) to evaluate the complexity of time series. Then, the least absolute shrinkage and selection operator (LASSO) is used to filter both the high-complexity components and the preselected influence factors, and the correlation coefficients between the low-complexity components and the original time series are calculated by the maximal information coefficient (MIC) method for further partition. The application and introduction of specific methods are as follows.

1) RANGE ENTROPY (RE)
Approximate entropy (AE) and sample entropy (SE) are the most commonly used methods for measuring complexity in modern signal processing (especially time series analysis). In fact, SE is derived from AE, which can overcome some limitations of the former, limitations including the inconsistency of the embedding dimension m, tolerance r and the strong dependence on the length of input signal. However, AE and SE are susceptible to changes in signal amplitude. Therefore, RE is proposed to solve the problem of insensitivity and alleviate the need for amplitude correction simultaneously [33]. In this study, RE is used to test the complexity of time series, and the embedding dimension and tolerance are set to 5 and 0.8, respectively.

2) LEAST ABSOLUTE SHRINKAGE AND SELECTION OPERATOR (LASSO)
The LASSO is a compression estimation technique for dimensionality reduction of high-dimensional data [34]. Its core principle is to construct a penalty function that forces the absolute value of the parameter to be estimated within a preset threshold, thus achieving the effect of compression model regression coefficients. A smaller threshold can compress some regression coefficients to 0 and eliminate such variables with a coefficient of 0, thereby realizing the effect of variable filter. The LASSO method is equivalent to adding the penalty term (L 1 norm) to the ordinary linear model: where the concomitant variable X = (X (1) , · · · , X (n) ), the response variable Y = (y 1 , · · · , y n ) T , and the regression coefficient β = (β 1 , · · · , β n ) Equivalently, where t and λ are adjustment coefficients. Let t 0 = j=1 d |β(OLS)|, when t < t 0 , the regression coefficients of variables X will be compressed to 0, resulting in sparseness and dimension reduction. Therefore, the adjustment coefficient affects the effect of variable filtering. In this study, we use a combination of grid search method, LASSO with SVR to determine the optimal adjustment coefficient according to the optimal principle of the training set effect. The grid search and LASSO methods are performed by a interface from the Python library, Scikit-learn (0.23.2).

3) MAXIMAL INFORMATION COEFFICIENT (MIC)
In regard to the calculation of correlation, MIC is used to analyze or measure the linear or nonlinear strength between each IMF and the original series in this paper. The MIC VOLUME 10,2022 can not only find the linear function relationship between variables, but also get the nonlinear function relationship (such as exponent, period). The larger the MIC value, the higher the correlation is. The paired data {X m ; Y } consisting of the m-th decomposed mode X m and the original data Y , MIC(X m ; Y ) is defined as: where I (X m ; Y ) is the value of mutual information between the two sequences. The detailed algorithm is also available in Reshef et al. [35].

C. FORECASTING MODELS
Each prediction model has its advantages and disadvantages.
To improve the forecasting accuracy of tourism demand, it is necessary to select a suitable predictor for each component based on the strength of different models and the data feature.
In this paper, bidirectional gated recurrent unit (BiGRU) is used to forecast the high-complexity components, while ARIMA and SVR models are applied to fit the future trend of low-complexity low-correlation (LCLC) and low-complexity high-correlation (LCHC) components, respectively.

1) AUTOREGRESSIVE INTEGRATED MOVING AVERAGE (ARIMA)
The ARIMA model is one of the most common regression models in the time series forecasting. ARIMA(p, d, q) is developed by combing the autoregressive (AR(p)), moving average (MA(q)) and difference degree (d). For a stationary time series x t (t = 1, · · · , T ), the mathematical formula of the ARIMA model can be described as follows: where α and β are the lag parameters of AR and MA part of the model, respectively, t represents the error term, and δ is the constant term.
There are three iterative stages in building an ARIMA model. At the first stage, the time series needs to be stabilized. For an unstable time series, it usually requires to be stabilized through difference disposal of d times. At the second stage, the model parameters are estimated based on the Akaikes information criterion (AIC) and Bayesian information criterion (BIC). The last stage covers the diagnostic checking of residual analysis, time series prediction and evaluation. In this study, the ARIMA model is performed by using a Python library, Statsmodels (v0.12.1).

2) SUPPORT VECTOR REGRESSION (SVR)
The SVR model, as a supervised learning technique, is developed by Sain et al. [36]. Based on the principle of structural risk minimization, SVR uses a kernel function to detect the best regression hyperplane in a highdimensional feature space that can balance learning ability and generalization ability. Compared with other models, SVR has the advantages of fast convergence and multidimensional function estimation. For a set of learning sample (D = x i , y i , where input variables are x i ∈ R m , output value is y i ∈ R m , and m is the number of samples), its function relationship f (x) can be presented as: where ω and b denote the weight coefficient vectors and a bias constant, respectively. The identity function φ(x) can map x i to a high-dimensional feature space. Typically, the kernel function of SVR is employed to dispose the mapping relationship in high dimensional feature space. In this study, the linear function x, x is the most suitable kernel function for the decomposed components of tourism demand time series due to its high computational efficiency. Therefore, the original problem can be equivalently expressed as: where C is the penalty parameter. The samples that differ from the actual target will be penalized by at least . The fitting accuracy and generalization performance of SVR are generally affected by C and . Hence, the cross-validation grid-search is used to optimize the two parameters of these methods over a parameter grid. Specifically, a Python library, Scikit-learn (0.23.2) is adopted to implement SVR model, grid search and cross validation methods.

3) BIDIRECTIONAL GATED RECURRENT UNIT (BiGRU)
As a variant of the long short term memory (LSTM), the gated recurrent unit (GRU) has fewer parameters and faster convergence while maintaining the good learning performance of LSTM. Differing from LSTM, GRU replaces the input gate and forgetting gate of LSTM with update gate, and its internal structure consists of update gate and reset gate ( Figure 1). Among them, the update gate (r t ) controls the influence degree of output hidden layer at the previous moment on the current moment. The larger the update gate value, the higher the influence level. The reset gate (z t ) decides the neglect degree of output information of hidden layer at the previous moment, the less information is ignored with this larger value. At t time, the hidden state (h t ) of the GRU can be calculated as follows: where σ and tanh are the sigmoid and the tanh activation function, respectively. W r , W z , W , U r , U z , and U are the weight matrices of training sample. h t is named the candidate vector, it is used as a modulation operation to control the degree of receiving new input information in the cell state.  BiGRU is developed based on GRU concept in the forward and backward transmission of hidden layers, so that the output layer includes the information from both previous and following states. Its basic structure is shown in Figure 2. The idea of BiGRU is that the current hidden state (h t ) is determined by the forward hidden output ( − → h t ) in positive time direction, the reverse hidden output ( ← − h t ) in negative time direction and the current input information (x t ). It can be described as follows: where α t and β t represent the weight of − → h t and ← − h t corresponding to BiGRU at t time, respectively. b is the bias of h t at t time. This study constructs the BiGRU network architecture based on the Python library, Pytorch (1.6.0).

D. PROCEDURE OF THE PROPOSED MODEL
Generally, exchange rate, tourist income, consumer price index (CPI) can cause short-term changes in tourists traveling intention. To study the changes of overseas tourists arrival in Shanghai, a novel multi-factor hybrid ensemble learning framework is established to forecast the monthly tourist overnights flow from three major origins (the USA, Japan, and Korea). This hybrid approach integrates decomposition ensemble strategy and multi-factors fusion to achieve the purpose of capturing short-term fluctuations. The overall framework is shown in Figure 3. It consists of five fundamental steps: data decomposition, intrinsic mode functions classification, multi-factors fusion, individual prediction and ensemble results.
Step 1: 1) Historical data decomposition. By using ICEEM-DAN, the tourism demand time series are decomposed into several intrinsic mode functions with different frequencies and characteristics (denoted as IMF i , i = 1, 2, · · · , n). In detail, BiGRU is adopted to forecast the time series fusing the influence factor X and HC components Y, while the SVR and ARIMA are applied to fit the future trend of LCHC and LCLC components, respectively. 5) Ensemble results. The final result is obtained by summing the prediction results (PR) of individual prediction for each time series in Step 4.

III. EMPIRICAL STUDY
In this section, the inbound overnight tourist volume from three major origins to Shanghai is selected to verify the effectiveness of our approach. First, the time series of three  Table 1.

B. EXPERIMENT DESIGN
In this paper, all sample data are divided into training set and testing set. Specifically, we define the USA-SH and Korea-SH data as follows: training data set covers the period from In the h-step-ahead forecasting without influence factors, the prediction horizon (h) is set to 1, 3, and 6, its mapping function f (·) can be described as follows: where y t is the time series of tourism flow, and d represents the lag order with d = 10. When the influence factors are considered, the function relationship f (·) of the model can be written as follows: where p i t is the time series of ith influence factors, and LASSO(·) is the operator of feature extraction using the LASSO method.
The developed approach, six single models, namely ARIMA, SARIMA, MLP, SVR, GRU and BiGRU models, are applied as benchmarks. Also, several hybrid ensemble models are employed for comparison aims. In this study, five evaluation metrics, namely, normalized root-mean-square error (NRMSE), mean absolute percentage error (MAPE), Theil coefficient (TC), direction accuracy (DA) and coefficient of determinate (R 2 ), are adopted to  assess the performance of our approach. In more detail, the first three (i.e. NRMSE, MAPE, and TC) are used to take into account the differences between the target and the predicted output. Then, the DA is employed to reflect the consistency of change trend between the predicted value and the real value, and R 2 indicates the fit degree between the original data and the predicted value, which depends largely on the trend analysis and is denoted as a value between 0 and 1. The closer the value is to 1, the better it fits.
The details of the above evaluation criteria are summarized in Table 2, where M is the total number of the exchange rate out-of-samples series;ŷ m and y m represent the forecasted value and the observed value, respectively; andȳ is the mean value of the observed values.
Obviously, the test approach is to select the predicted value with fewer errors based on one of the error measurements described in the aforementioned evaluation criteria. However, we need to take a step to determine whether this difference is significant (for predictive purposes). For this reason, the Diebold-Mariano (DM) and the Kolmogorov-Smirnov predictive accuracy (KSPA) tests are introduced from a statistical perspective.
Here, DM test detects the differences of prediction direction selection between target model (mod0) and benchmark model (mod1). The null hypothesis is that the two models have equal forecasting accuracy. In this study, the loss function (l) is defined as: where y mod0,t and y mod1,t denote the estimates for y t calculated by the mod0 and mod1, respectively, in period t. Thus, the DM statistic (H ) can be described as: wherel = 1/M M t=1 l t andV¯l = γ 0 + 2 ∞ q=1 γ q , (γ q = cov(l t , l t−q )). Other, please refer to Diebold et al. [37] for the details of DM test.
Another point is that the KSPA test is designed to ascertain whether there is a statistical difference among the distributions of prediction errors. It estimates whether a target with a lower error reflects a smaller random error relative to the model. The one-side KSPA test is applied in this study, and its null hypothesis is that the empirical cumulative distribution function of tested model error is smaller than that of competition model. When calculating the statistical value of KSPA test, the absolute value of errors is used as the loss function. For more, please refer to Hassani et al. [38] regarding the strict mathematical derivation of the KSPA statistical test.

C. DECOMPOSITION AND CLASSIFICATION RESULTS
The ICEEMDAN decomposition results of tourism flow from three major origins to Shanghai and their RE and MIC results VOLUME 10, 2022   are displayed in Figure 5 and Table 3, respectively. All IMFs are arranged from the highest frequency to the lowest frequency to represent the changing law of tourism flow from short-term to long-term. Meanwhile, these IMFs present features ranging from high-complexity to low-complexity based on the RE results. The first few rows IMFs of three-time series demonstrate minimal regularity and random factor, while the last few possess the characteristics of periodicity and longterm trend to a certain extent. In addition, according to the MIC results, the more regular components have a higher correlation with the original time series, which means these IMFs contain more effective information than others. For example, in the decomposition result of USA-SH, IMF 2 and IMF 7 components represent the short-term periodicity and long-term trend, respectively, and both have lower complexity and higher correlation.
The decomposed components of different source markets represent the changes caused by different influence factors   in tourist flow. The fluctuation of the first few components means that the short-term tourism volume is affected by certain influence factors, such as weather conditions, holiday length, consumer disposable income and exchange rate movement. Therefore, for short-term fluctuation components with high-complexity, we select CPI, exchange rate and personal or household expenditure or income as the auxiliary variables to assist the forecasting of these components. The changes in the middle few components represent that the emergencies can cause a certain impact on tourism demand within a period of time. For example, due to the global financial crisis in 2008, the international tourist flow from the USA, Korea and Japan to Shanghai decreased simultaneously, and it did not rebound until 2010. Specifically, all IMF 4 of three time series show a significantly downward trend during the financial crisis. In Sep. 2016, the terminal high altitude area defense (THAAD) incident caused tensions in international relations between China and South Korea, and also resulted in a decline of tourist volumes. The IMF 4 of Korea-SH time series displays a downward trend from Sep. 2016 to Oct. 2017. Since the Diaoyu island incident in 2012, the tourist volume from Japan to Shanghai has dropped rapidly, which could also be seen from the IMF 4 of Japan-SH time series. The longterm fluctuation corresponds to last few decomposition components, they are generally affected by local infrastructure, tourism policies, international conditions, and so on. From the last one component of three time series, it can be seen that the tourist flow from USA and Korea to Shanghai has VOLUME 10, 2022 shown a steady growth year by year, while the tourist arrivals from Japan have shown a downward trend since 2008 and have gradually stabilized after 2018.
To obtain better prediction accuracy, we select an appropriate predictor for each IMF based on the complexity and correlation. The components that RE values of IMF are greater than that of the original time series are regarded as the high-complexity ones, and the rest shows low-complexity. According to the MIC results of low-complexity components, those with a value lower than 0.1 are classified as lowcorrelation, and the rest are high-correlation components. The classification results in Table 4 indicate that the first few components represent high-complexity characteristics, while the last few possess high-correlation with raw time series in most cases.
For high-complexity components, we first combine them with preselected influence factors time series to form a matrix, and then use LASSO to extract effective features from all the columns of matrix and fuse them into a time series, meanwhile the input structure is determined. As shown in Table 5, the adjustment coefficients of LASSO are chosen according to the grid search method. Then, BiGRU is applied to forecast each high-complexity component based on the same network hyperparameters, which are set as follows: a number of recurrent layers = 3, hidden units = 50, batch sizes = 30 and epochs = 200. Since the training parameters of BiGRU network have certain randomness, the final prediction results are obtained by averaging the values of 10-time runs. For low-complexity and low-correlation components, the optimal parameters of forecasting model SVR are determined by the grid search and cross validation methods (see Table 6). For low-complexity and high-correlation components, the optimal forms of ARIMA are listed in Table 7 based on AIC and BIC.

D. EXPERIMENTAL RESULTS ANALYSIS
To verify the effectiveness and robustness of the designed approach, a systematic analysis including various benchmark models and metric methods is implemented from the point of error and statistics test.

1) ERROR CRITERION ANALYSIS
The forecasting error-based evaluation can directly reflect the differences between predicted values and observed values. Table 8, Table 9 and Table 10 show the preliminary comparison results of one-step-ahead, three-step-ahead and six-step-ahead forecasting for all considered models, respectively, where the bold represents the best results. Meanwhile, Figure 6, Figure 7 and Figure 8 display visually the forecasting results of all models on the three testing datasets by scatter plots and line plots of forecasting error and forecasting value, respectively. Some meaningful conclusions and analyses can be summarized as follows.
First, from the perspective of single prediction model, deep learning has obvious advantages in complex time series. In the experiment of USA-SH, the BiGRU shows a significant advantage over the other single models (i.e. ARIMA, SARIMA, MLP, SVR and GRU). For one-step-ahead forecasting of USA-SH, MAPE, NRMSE, TC, DA, and R 2 of the BiGRU are 6.5939%, 9.3602%, 0.0393, 0.9730 and 0.8656, respectively, which are the lowest values in the single models. And Figure 6 shows that the line plot of forecasting error of BiGRU is closer to zero than other single models. In the experiment of Korea-SH and Japan-SH, the deep learning model (i.e. BiGRU) shows the same advantages in one-stepahead and multi-step-ahead forecasting. Specifically, compared with the three-step-ahead forecasting of SARIMA, the MAPE of BiGRU is improved by 5.60% (Korea-SH) and 27.28%(Japan-SH), respectively, and compared with that of MLP, the NRMSE of BiGRU is reduced by 1.16%(Korea-SH) and 4.14%(Japan-SH), respectively. The scatter plot of the forecasting values and true value is also closer to a straight line in three-step-ahead forecasting in Figure 7 and Figure 8. Therefore, it is reasonable to conclude that BiGRU is more suitable for training nonlinear and highly complex datasets. Second, the decomposition-ensemble approaches combining different models are superior to the corresponding single model. For example, in USA-SH Korea-SH and Japan-SH, the decomposition-ensemble models (i.e. EMD-MLP, EEMD-MLP, CEEMDAN MLP and ICEEMDAN-MLP) can obtain more accurate forecasting than single MLP predictor, the MAPE values of them increase by about 10% on average. Similar experimental results can be obtained by comparing other single models. Notably, our experiments show that some decomposition-ensemble approaches with ARIMA (such as EMD-ARIMA, EEMD-ARIMA, CEEMDAN-ARIMA and ICEEMDAN-ARIMA) have poorer prediction results than some single models (such as SARIMA, SVR and GRU) in USA-SH and Japan-SH. R 2 values of these decomposition-ensemble approaches are all below 65%. The interesting facts indicate that the unreasonable choice of model will lead to a decrease of prediction performance. When the decomposition-ensemble approach combing BiGRU is used to train and forecast each IMF, the obtained results will be significantly better than those of other decomposition-ensemble approaches with simplex forecasting model (such as ICEEMDAN-ARIMA, ICEEMDAN-MLP, ICEEMDAN-SVR and ICEEMDAN-GRU). In the experiments USA-SH Korea-SH and Japan-SH, the MAPE values of ICEEMDAN-BiGRU are all below 2.90% in the one-step-ahead forecasting, and its R 2 are all above 80% in the three-step-ahead forecasting. From Figure 6, Figure 7 and Figure 8, the line plots also display that the prediction errors of decomposition-ensemble approaches are significantly reduced compared with the single model. Hench, the decomposition-ensemble approach is helpful for the extraction of complex data features, which can significantly reduce the difficulty of prediction through modal decomposition, so as to improve the overall prediction performance.
Third, the forecasting accuracy of simplex decompositionensemble approaches (such as ICEEMDAN-ARIMA, ICEEMDAN-SVR and ICEEMDAN-BiGRU) can be further improved by matching suitable models to different IMF features. For one, three and six-step-ahead of USA-SH, the MAPE of ICEEMDAN-BSA is 2.6247%, 5.1357% and 7.3573% respectively, and all DA values are above 95%. According to one-step-ahead forecasting of Korea-SH, the MAPE, NRMSE, DA and R 2 values of ICEEMDAN-BSA increase by 16.74%, 0.78%, 2.71% and 2.15% compared with ICEEMDAN-BiGRU, respectively. In one-step-ahead forecasting of Japan-SH, the direction accuracy of EEMDBSA and CEEMDAN-BSA even reached 100, which can also be found in Figure 8. In general, with the increase of the prediction step, the accuracy of all models decreases to a certain extent. But on the whole, selecting adaptive predictor according to different IMF features is helpful to slow down the decline of accuracy. Based on this reason, the MAPE values of ICEEMDAN-BSA with respect to three-step-ahead forecasting can remain roughly between 3.5% and 5.5%, and the R 2 values of that are above 0.85. In addition, for six-step-ahead forecasting, the DA values of ICEEMDAN-BSA are also above 75 and the NRMSE values are all below 12.2. Hence, matching adaptive prediction models by using the IMF classification method can enhance the forecasting performance of hybrid approach significantly.
Forth, in the empirical analysis, we find that it is difficult to improve further the prediction accuracy for low-complexity components, but there is still much room for improving highcomplexity components. Here, the influence factors (such as exchange rate, consumer price index (CPI) and personal or household consumption expenditure or income) are added to high-complexity components to compensate for the negative effects caused by short-term fluctuations. In fact, the forecasting accuracy (MAPE) and fitting effect (R 2 ) of proposed approach (ICEEMDAN-LXBSA) are the best in either one-step-ahead or multi-step-ahead forecasting. The MAPE of ICEEMDAN-LXBSA is at a low value (1.8481%, 1.7987% and 1.7545% from USA, Korea and Japan to Shanghai in one-step-ahead forecasting, respectively), and R 2 is at the higher value (0.9854, 0.9737 and 0.9831 from USA, Korea and Japan to Shanghai in one-step-ahead forecasting, respectively). In terms of direction accuracy, ICEEMDAN-LXBSA also achieves a great advantage compared with most of the benchmark models. Especially, the DA values of ICEEMDAN-LXBSA with respect to one, three and six-step ahead forecasting in the experiment from USA to Shanghai are 0.9730, 0.9730 and 1.0000, respectively. Similarly, the DA of ICEEMDAN-LXBSA the tourist flow from Korea to Shanghai also reaches the optimal value. Also, the line plots and scatter plots (i.e. Figure 6, Figure 7 and Figure 8) indicate that the forecasting values are closest to the true values in one and multi-step-ahead forecasting, they are closer to a true line and straight line, respectively. Therefore, it is an effective method to improve the prediction effect by adding influence factors to high-frequency components (IMF).

2) MULTI-FACTOR FUSION ANALYSIS
In order to display better the improved effect of multi-factors fusion on the prediction accuracy of high-frequency components, Table 11 shows a detailed comparison of the prediction results of the high frequency components with and without influence factors. For IMF 1 of USA-SH and Japan-SH, the average NRMSE of LX-BiGRU improves by about 25% and 32% compared to BiGRU. In one-step-ahead forecasting of IMF 1 , IMF 2 and IMF 3 of Korea-SH, the NRMSE of LX-BiGRU that uses ICEEMDAN method increase by 50.57%, 22.49% and 54.50%, respectively. For Korea-SH and Japan-SH, only IMF 4 and IMF 2 obtained by CEEMDAN method are the high-complexity components. Its fitting effect is the best when the influence factors are added into IMF 4 and IMF 2 , R 2 of one-step-ahead forecasting reaches 0.95, and the multi-step-ahead forecasting also achieves a high level. Although a few high-complexity components with influence factors have the opposite effect (the table is marked with an asterisk), there is an overall significant positive effect on onestep-ahead forecasting (all R 2 are around 90%) and a negative effect on a few six-step-ahead forecasting (such as IMF 1 of CEEMDAN and EMD, IMF 3 of CEEMDAN and EEMD). In addition, it can also be seen that using ICEEMDAN gains greater forecasting accuracy compared with other decomposition methods. For one thing, ICEEMDAN can well extract the short-term fluctuation components (i.e. noise), for another, the fitting effect of high-complexity components of ICEEMDAN is significantly enhanced by incorporating influence factors. In conclusion, no matter which decomposition method is adopted, introducing appropriate influence factors is helpful to improve the forecasting accuracy of highcomplexity components.

3) STATISTIC TEST ANALYSIS
In order to prove the feasibility of the developed hybrid approach, DM and KSPA tests are implemented from a statistical perspective. The test results of DM and KSPA are listed in Table 12, Table 13 and Table 14 with one-step-ahead and multi-step-ahead forecasting, respectively. The following is a detailed analysis.
In DM statistical test, except for CEEMDAN-BSA of Korea-SH time series, it can be confirmed that the results of ICEEMDAN-LXBSA reject the null hypothesis at the p = 0.1 level compared to others, in other words, the proposed hybrid approach outperforms other benchmark models with the 90% confidence level. Especially, in most VOLUME 10, 2022  cases, the proposed approach is far superior to others at the p = 0.05 significance level, that is to say, a notable difference is presented in the forecasting error between our approach and the benchmark models.
Similar to DM test, in one-step-ahead and multi-step-ahead forecasting, KSPA test results also indicate that the error distribution of ICEEMDAN-LXBSA has a positive significant difference than that of other models at the p = 0.1 confidence level. In other words, the forecasting error of the proposed approach is closer to the normal distribution, showing the characteristics of white noise, while there may still be unextracted effective information of the prediction error in other models.
Generally, whether from DM or KSPA results, ICEEMDAN-LXBSA shows the characteristics of high precision and sufficient information extraction (i.e. the prediction error is closer to white noise). And from the statistical point of view, the proposed approach has amended, to a certain extent, some problems of the commonly used decomposition ensemble framework.

E. DISCUSSION AND COMPARISON WITH BENCHMARK MODELS
In order to understand better the comprehensive advantages of the hybrid ensemble learning approach constructed in this study, this section discusses in detail the deep reasons for the impact of each module of our approach and benchmark model on the forecasting performance.
First, for the comparison of a single model, SARIMA model achieves higher accuracy than ARIMA model in three time series, and the improvement effect is the most obvious in the USA time series. The reason for it is that the tourist flow from USA is more cyclical than the other two sources. Meanwhile, the performance of SARIMA model is better than that of the AI-based model (SVR, GRU, MLP) in some cases. This opinion is similar to that of Claveria et al. [39], whose research indicated that an AI-based model is more likely to cause information loss than linear models in the filtering process, even though it is excellently equipped to handle nonlinear pattern. In addition, SVR and GRU models outperform MLP model in most cases in the stability and forecasting performance. BiGRU model has higher prediction accuracy than GRU model in one-step-ahead and multi-stepahead forecasting because of the application of bidirectional transmission mechanism of hidden layers. This mechanism can not only enable the model to have stronger learning ability, but also increase the time cost. In general, BiGRU model performs better than other models in all cases. The reason for this result is that deep learning technology can not only capture the hidden regular components in complex time series processing but also better adapt to its nonlinear components.
Second, using decomposition technique reduces the data complexity and prediction difficulty. According to the results of three time series, the performance of single model is worse than that of the decomposition ensemble model that uses corresponding single model as an individual predictor. However, the deep learning method, BiGRU, gains stronger processing capacity than some decomposition ensemble models (such as EMD-ARIMA, EEMD-ARIMA, and EMD-MLP) in different prediction horizons of three time series. The previous studies manifest that the defect of EMD method leads to the decomposed information confusion and development of EEMD, CEEMDAN and ICEEMDAN methods suppresses this problem to a varying degree by adding white noise to the original data to change the distribution of extremum points. In the process of multiple integration averages, the added white noise cancels each other out, so as to reveal the real data component gradually. In all decomposition ensemble models, the ICEEMDAN-based models outperform the models based on other decomposition methods (EMD, EEMD, CEEM-DAN) in most cases. For the model using the same predictor to forecast each decomposed component, in most cases, the hybrid approach based on different decomposition methods and BiGRU displays the best performance in prediction level and direction accuracy.
Third, in order to select adaptively a reasonable predictor to forecast each IMF, we propose an IMFs classification method that considers both complexity and correlation to classify each component. In addition, to further improve forecasting effect, the influence factors are introduced and combined with high-complexity components. However, in some three-stepahead and six-step-ahead prediction cases, the forecasting results of BiGRU without considering influence factors are closer to the actual results. The reason for this result may be that the influence factors do not supplement the useful information for these high-complexity components, so playing an opposite role, or the feature selection method does not effectively filter the redundant variables. But, on the whole, the proposed IMFs classification method is reasonable and can logically select a suitable predictor to improve the overall prediction performance.
Finally, compared with all benchmark models, the performance of the proposed approach (ICEEMDAN-LXBSA) is the most competitive one in different prediction horizons. The main reasons are as follows: (1) ICEEMDAN method, as a data decomposition technique, can effectively extract information with different frequencies and amplitudes in nonlinear and complex time series of tourist flow for tourism demand forecasting. (2) The IMFs classification method is completely divided into three categories based on the complexity and correlation of each component. (3) The deep learning model (BiGRU) is suitable for the most complex IMF with its powerful model learning capabilities, and the   high-complexity components contain random factors and are affected by the short-term market, some reasonable influence factors are selected as supplementary information, and on this basis, the feature selection method (LASSO) is used to eliminate redundant information, thereby improving forecasting accuracy. Therefore, the above reasons and the experimental results prove convincingly the superiority and rationality of our approach.

IV. CONCLUSION
Providing an accurate forecast of tourism demand is helpful for tourism market managers to provide effective tourism services and for investors to specify reasonable investment plans and programs. However, due to the defect of forecasting technology and the randomness and irregularity of time series caused by multiple factors, the accurate prediction of tourism demand becomes a challenging task. To ameliorate these shortcomings, a multi-factor hybrid ensemble learning approach is proposed to forecast the monthly overnight tourist flow from three major origins (USA, Korea and Japan) to Shanghai. The main work includes the following aspects: data decomposition, intrinsic mode functions classification, multifactors fusion, individual prediction and ensemble results. The empirical results demonstrate that the proposed approach outperforms all benchmark models in terms of different prediction horizons.
The contributions of the study can be summarized as follows. First, unlike previous studies that only use historical data, this study develops a decomposition-ensemble approach with multi factor fusion strategy, thereby improving the prediction ability and the explainability. To be specific, the IMFs with high complexity usually present the high frequency and high volatility. These IMF components are often affected by short-term economic factors and are difficult to be effectively extracted, resulting in a decline of the forecasting accuracy to a certain extent. Therefore, we fuse some influence factors (i.e. exchange rate, consumer price index (CPI) and personal or household consumption expenditure or income) into the high-complexity components in an attempt to capture the dynamic of time series well. Second, the advantages of decomposition-ensemble approach have been demonstrated in many studies including the field of tourism demand forecasting, but there are only a few ways to conduct further feature analysis on IMFs. This study proposes innovatively an IMFs classification method based on a comprehensive strategy combing complexity (range entropy) and correlation (maximal information coefficient) to investigate the internal characteristics among all components effectively. Furthermore, we select appropriate prediction models for IMF in each category according to its features rather than the same model for prediction, thus improving the adaptability of prediction model. Third, according to the characteristics analysis of IMF and predictor, we assign the deep Learning technique (BiGRU), machine learning method (SVR) and time series model (ARIMA) to predict the corresponding IMF. BiGRU is used to forecast the high complex IMFs combing influence factors to obtain a better training effect, SVR is applied to non-linear and non-smooth IMF, and ARIMA is employed to the low complexity IMF with obvious trend characteristics. Based on this decomposition and ensemble idea, the advantages of each technical module can be fully utilized, thus improving the overall prediction performance. Thus, a novel hybrid approach (ICEEMDAN-LXBSA) integrating the ICEEMDAN, IMFs classification and multi-factors fusion is proposed to forecast monthly tourism demand. Compared with the existing models, ICEEMDAN-LXBSA has higher precision and stability in different horizons. So, it can server as an effective reference tool for decision-makers to make corresponding plans.
This study reveals some practical implications worthy of further study in the field of tourism demand forecasting. First, in the theoretical framework, the reasonable influence factors can effectively improve the forecasting accuracy of highcomplexity components. According to the empirical results, we can see that the factors, which affect directly or indirectly the tourism industry, also deserve further attention in tourism demand forecasting. Second, from a technical point of view, the proposed classification method based on feature selection of each decomposition component can provide a basis for selecting a suitable predictor. The reliability of the strategy in terms of accuracy and stability is verified in this study. Third, for practitioners and managers in the tourism industry, precise forecast results can provide references for the effective allocation and utilization of resources and policy formulation. In addition, grasping the development of tourist arrivals can effectively transform the business plan of tourism-related enterprises, such as hotels, travel agencies, and catering services.
Although the performance of the presented approach is excellent, there are still some limitations. First, the theoretical framework of this paper is only verified by the tourism demand data from three major origins (USA, Japan, and Korea) to Shanghai, and the amount of data is limited. The proposed model may have certain limitations if the data is missing or huge. Further research needs to be tested in other cities and the amount of data expanded as well. Second, the selected influence factors are mainly some macroeconomic indicators, while other active data sources should be found and introduced into the proposed approach. In addition, since the data in this study are available and complete, how to integrate other influence factor data into the model needs further consideration. Third, predicting each IMF will result in a large amount of computational cost, and the reconstruction strategies and clustering methods are effective means to solve this problem. Therefore, to achieve superior accuracy and lower computational consumption, these limitations are possible directions for future research.

HONGWEI WANG was born in China, in 1978.
He is currently the Vice Dean of the School of Traffic and Transportation, Lanzhou Jiaotong University. His current research interests include tourism management and planning, intelligent transportation systems, regional comprehensive traffic planning, and transportation planning and management.
WENZHENG LIU was born in Gansu, China, in 1996. He received the M.S. degree from the School of Traffic and Transportation, Lanzhou Jiaotong University, in 2021. His research interests include machine learning, data mining, and tourism demand forecasting. VOLUME 10, 2022