Predicting Chinese Commodity Futures Price: An EEMD-Hurst-LSTM Hybrid Approach

This paper proposes an EEMD-Hurst-LSTM prediction method based on the ensemble learning framework, which is applied to the prediction of typical commodities in China’s commodity futures market. This method performs ensemble empirical mode decomposition (EEMD) on commodity futures prices, and incorporates the components obtained by EEMD decomposition and the adaptive fractal Hurst index calculated by using intraday high-frequency data as new features into the LSTM model to decompose its correlation with the external market to detect changes in market conditions. The results show that the EEMD-Hurst-LSTM method has better predictive performance compared to other horizontal single models and longitudinal deep learning combined models. Meanwhile, the trading strategy designed according to this ensemble model can obtain more returns than other trading strategies and have the best risk control level. The research of this paper provides important implications for the trend following of commodity markets and the investment risk management of statistical arbitrage strategies.


I. INTRODUCTION
Financial asset price forecasting is one of the ancient and challenging problems in time series forecasting due to the large amount of noise and high volatility in financial markets. From the perspective of financial trading practice, if the price of financial assets and their movement trends can be predicted more accurately, then traders who are engaged in fundamental analysis or technical analysis will have a great impact on maximizing capital gains and minimizing losses. benefit. Although proponents of random walk (RW) and efficient market hypothesis (EMH) methods argue that financial markets are unpredictable based solely on current The associate editor coordinating the review of this manuscript and approving it for publication was Szidonia Lefkovits . and historical data [1]. However, many studies argue that financial time series are predictable [2]. In recent years, more and more novel methods have been applied successively to the prediction of financial asset price behavior, including the use of technical analysis forecasting, linear time series forecasting, machine learning and deep learning forecasting, etc. The nonlinear method of deep learning is applied to financial asset prices. Forecasting has become one of the main trends, which fully demonstrates that forecasting of financial time series is not worthless. As an improvement over traditional machine learning models, new machine learning models can successfully simulate complex real-world data by extracting robust features that capture relevant information, and have better predictive performance than traditional linear models [3]. Considering the complexity of VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ financial asset prices, combining deep learning with financial market forecasting is considered one of the most attractive topics [4]. This paper proposes an ensemble learning method based on the idea of divide and conquer, that is, the combined forecasting method of EEMD-Hurst-LSTM to forecast commodity futures prices in China. First, we decompose the Ensemble Empirical Mode Decomposition (EEMD) of commodity futures price index with 5-minute high-frequency data within each trading day; then we decompose the low-frequency to high-frequency components obtained by EEMD decomposition, calculate the adaptive Hurst index as a novel feature as the input of the LSTM model; finally, the method is applied to the price prediction of four typical commodities, gold, copper, soybean and sugar, which are closely connected with the international commodity market in the Chinese commodity futures market. This paper will answer the following questions. In addition to the characteristics of technical indicator factors and external predictors, can adding EEMD component and adaptive Hurst index as a combined forecasting method to capture the inherent characteristics of commodity futures price series structure can improve the forecasting performance? In the robustness test, we compare the prediction results with traditional linear autoregressive models, machine learning support vector regression models, and single-input feature LSTMs, and we also specifically discuss the prediction effects of single-step prediction and multi-step prediction.
At present, research on commodity futures price forecasting is relatively rare, especially the combination of intraday high-frequency data, integrated learning framework and the evaluation of the predictability of price series, and the forecasting of commodity futures prices in China is still blank. Compared with the stock market, the commodity futures market has a greater leverage ratio, greater volatility and higher levels of returns and volatility, making it attractive to different traders, including investors attracted by high returns and those who are interested in the market [5]. This paper specializes in the prediction of commodity futures prices in China, mainly because the Chinese commodity futures market has unique research value. For example, there are restrictions on the range of price limits, the unique market supervision mechanism of ''penetrating'' account supervision, and the characteristics of many individual investors and violent price fluctuations. This paper forecasts the futures prices of four commodities: gold, copper, soybeans, and sugar. The reasons for choosing the above four commodities are as follows. 1. The four commodities are all globally traded commodities, with corresponding varieties listed on COMEX in New York, CBOT in Chicago and LME in London; 2. The four commodities are classified into precious metals, non-ferrous metals, agricultural products and soft commodities, with strong liquidity, large transaction volume. The price is not only affected by domestic supply and demand, but also by the futures prices of similar commodities abroad, the macro-financial environment and other factors; 3. The four commodities were listed earlier in the commodity futures market, and we can obtain data since 2008, which helps us to dynamically analyze the prediction effect of the prediction model.
The results of this paper can arouse the interest of market traders who use trend following and arbitrage as trading modes, because these two types of traders pay attention to commodity prices themselves and the changes in spreads, and can also provide reference for academic research to understand the behavior of market traders. When the share of quantitative trading in today's commodity futures market continues to expand, the research in this paper is a useful supplement to the traditional technical analysis-based signal trading model. When the share of quantitative trading in today's commodity futures market continues to expand, the research in this paper is a useful supplement to the traditional technical analysis-based signal trading model. The work of this paper is innovative to a certain extent, and the main contributions are reflected in the innovation of methodology and the novelty of data samples. First of all, this paper takes the lead in applying signal decomposition, market state transition and deep learning models to Chinese commodity futures forecasting, and develops a relatively novel EEMD-Hurst-LSTM combined forecasting model. By decomposing the original commodity futures price series through EEMD to obtain component sub-sequences, we can capture the systematic features hidden inside the price series, and use intraday high-frequency data to calculate the daily adaptive Hurst index, which is used to describe market reversals and trend state transitions changes in, which are fed into the predictive model as part of the input features. Second, in order to obtain more robust forecasting results, this paper especially examines the results of multi-step forecasting, and uses permutation entropy to detect the randomness of the price series, and examines the relationship between the predictability of the price series and the forecasting effect of the combined model; thirdly, in testing the performance of forecasting models, we find that most forecasting studies only use statistical evaluation criteria to evaluate the predictability of asset prices and returns, but less economic evaluation criteria (such as the return and risk of simulated trading strategies). Reference [6] pointed out that statistically good forecast accuracy does not mean that investment strategies based on forecast results are profitable. Therefore, in order to verify the practicability of the model, we add forecast economic evaluation criteria, use forecast results to design quantitative trading strategies, test the accuracy of trading strategy signals and trading profitability performance based on forecast results, and compare trading results with other commonly used strategies. Finally, at the level of forecasting methods, the work of this paper is an extension of the ''Forecast Combination Puzzle'' [7]. This paper further proves that, whether compared with traditional financial time series linear models or non-linear models such as machine learning and deep learning, the prediction performance of the combined model is better than that of a single model. This fact reflects that the multiple technologies inherent in the combined model can extract and capture the systematic characteristics of financial asset price series. The empirical research results in this paper show that the proposed EEMD-Hurst-LSTM method has better predictive performance compared to other horizontal single models and longitudinal deep learning combined models. In particular, this strategy shows strong anti-risk ability during the period of sharp market decline and severe volatility, and the maximum drawdown is significantly smaller than other strategies. In addition, since this paper uses the rolling forward recursive training and verification prediction model technology, the model has good consistency and good generalization ability on different data sets, which is very important for investors and policy makers in commodity futures market.

II. LITERATURE REVIEW
The research in this paper is related to the literature in two fields. The first is deep learning applications for commodity futures and spot price forecasting. As more and more commodities can be publicly traded through the online trading platform provided by the exchange, the commodity futures continuous price index and the main continuous contract data are more and more abundant and perfect, and the commodity futures price forecast topic is getting more and more focus on. Earlier research focused on price forecasts for specific commodities, such as gold, silver, oil, copper and other important commodities. Since the preconditions for forecasting traditional linear financial time series are too harsh, recent research focuses on forecasting commodity prices using deep learning. Reference [8] used Convolutional Neural Networks (CNN) to predict commodity price sub-week and sub-month price directional movements. Recurrent neural network (RNN) and LSTM models have been used in some commodity forecasting research, and Dixon et al. [48] used deep neural network (DNN) for commodity price forecasting. Reference [9] used Elman RNN to predict New York Mercantile Exchange (COMEX) copper spot prices from daily closing prices.
In addition to deep learning forecasting, many studies incorporate new feature variables in commodity price forecasting, such as foreign exchange rates, stock indices, macroeconomic indicators, and crude oil prices [10]. Reference [11] found that foreign exchange rates such as the US dollar have significant predictive power on commodity prices. In the forecast of precious metals and strategic commodities such as gold, [12] concluded that crude oil prices have significant explanatory power for gold price fluctuations. The predictors included in these studies are raw observations of external variables, but the raw predictors may not capture the systematic components of commodity prices and cannot adequately reflect the impact of external markets on commodity prices [13].
In addition, some studies have used mixed models to forecast commodity prices. Reference [10] combined Factorization Machine supported Neural Network (FNN) and Stacked Denoising Autoencoders (SDAE) for WTI crude oil price prediction, and combined it with Support Vector Regression (SVR), Random Walk and Markov State Transition (MRS)) to compare the results of the models. Reference [14] combined DNN and reinforcement learning (RL) to use fuzzy deep direct reinforcement learning (FDDR) for prediction and transaction signal generation. The above studies all found that the portfolio performance evaluation based on the mixed model is better.
Second is the literature related to financial asset and commodity price forecasting methods. There are many methods for price action prediction of financial time series, which can be mainly divided into the following categories.
(1) Technical analysis forecast. This branch mainly uses technical indicators as predictive model input features, including Relative Strength (RSI), Williams Percentage (William%R), Commodity Channel Index (CCI), Percentage Price Oscillator (PPOSC), Momentum, Index, Movement Average Line (EMA) etc. Reference [15] pointed out that information on technical indicators and macroeconomic variables can significantly improve the accuracy of exchange rate forecasts. Reference [16] showed that adding technical indicator variables to benchmark autoregressive models can yield more accurate volatility forecasts. Reference [17] used a variety of technical indicators to prove that in terms of predictability of commodity price changes, technical indicators have stronger predictive ability than economic indicators, and their predictive performance is not affected by data mining or time changes.
(2) Traditional linear financial time series forecasting. In earlier studies, traditional econometric methods were used to predict commodity prices. Commonly used models are moving average model (MA), autoregressive model (AR), autoregressive moving average model (ARMA) [18] and generalized autoregressive conditional heteroskedasticity model (GARCH) [13]. A major disadvantage of the above mathematical models is that the models all have linearity and stationarity assumptions, which are difficult to satisfy due to the complexity and non-stationarity of financial markets. Therefore, these assumptions tend to produce poor prediction results [19].
(3) Commodity price predictors. Macroeconomic and financial variables are often used in the literature as predictors of commodity prices. Fundamental factors used to predict commodity prices include futures market open interest [20], futures prices [21], and other commodity-specific variables [22]. Reference [23] used a series of fundamental factors reflecting the stock market and real economic activities to predict commodity prices, and found that commodity currencies have certain predictive power in the short term.
(4) Machine learning and deep learningprediction. To overcome the shortcomings of traditional linear financial time series forecasting, researchers have widely adopted deep learning methods in financial market forecasting. Deep VOLUME 11, 2023 learning is a special type of artificial neural network that consists of multiple hidden layers. Compared with traditional econometric methods, deep learning has enhanced functional representation and exhibits better performance [24]. Commonly used neural networks include DNN, RNN, CNN, etc., as well as more complex LSTM, deep multilayer perceptron (DMLP), restricted Boltzmann machine (RBM), autoencoder (AE), etc. are widely used in Financial time series forecasting [24]. Kuremoto et al. [49] applied Deep Belief Network (DBN) and RBM to achieve good results in stock index prediction. Reference [14] combined the Deep Direct Reinforcement (DDR) method, FDDR method and Recurrent Deep Neural Network (RDNN) to forecast the Chinese futures market. Reference [25] applied LSTM network to predict the market trend of S&P 500 constituent stocks. In the deep learning forecasting of financial time series, the forecasting model in which LSTM and its variants are the main components occupies an important position. Since the essence of LSTM is to use the time characteristics of time series signals, it is especially suitable for financial time series forecasting and achieves better results.
(5) Combined forecast. In order to further improve the predictive ability, some studies have proposed an ensemble learning method based on the concept of divide and conquer. Reference [26] earlier proposed to use feature selection techniques to combine different machine learning methods into a new ensemble learning architecture that aggregates the best performance and strongest features of each method, trying to play the role of each method. Based on the advantages of this method, studies have shown that the hybrid model has better prediction and classification performance than the earlier single model in terms of technical improvement and diversified applications. In combined forecasting, the original financial time series is first decomposed into different periodic factors, and then the forecast results of these factors are integrated [27]. Reference [19] used the empirical mode decomposition (EMD) technique to extract the high-frequency to low-frequency component features of crude oil prices earlier, and used them as new input features of neural networks to predict crude oil prices. Reference [28] used a hybrid approach of Complementary Ensemble Empirical Mode Decomposition (CEEMD) and Extended Extreme Value Learning Machine (EELM) to predict crude oil prices. Reference [29] used CEEMD to decompose historical crude oil prices into price components, and combined support vector machines (SVM) and neural networks to predict gold prices. Reference [30] proposed a novel method combining CEEMD and VMD, combined with BP neural network (BPNN) for time series forecasting. However, the disadvantage of this approach is that prediction errors from the decomposition may accumulate, negatively affecting the predictive ability of the model.
Combining the above review of related literature, this paper proposes a hybrid deep learning neural network method under the ensemble learning framework, which combines signal decomposition technology, capture sequence mean recovery technology and deep neural network to construct a combined prediction model. In order to further extract the hidden relationship between commodities and external predictors, this paper uses EEMD to decompose the sequence of commodity futures prices and select external predictors into subsequences of different frequencies, and obtains the relationship between markets by modeling multiple correlations of market subsequences. Specifically, this paper does the following work. First, we use EEMD as a signal decomposition technique to decompose the original complex and non-stationary time series into simple and stationary subsequences, which are input into the prediction model as part of the input features, and the low-frequency components and high-frequency components can be obtained through EEMD decomposition. Second, technical indicators such as EMA and W%R are used as input features of the LSTM model. In addition, the dynamic correlation coefficient sequence of the selected external market decomposition subsequence and commodity futures price subsequence is used as the model input feature to represent the internal relationship between commodity futures and external market factors. Third, the adaptive algorithm derives the adaptive Hurst index to identify the state transition patterns of the sequence, such as mean reversion and structural transition features. Through intra-day minutelevel high-frequency data and sliding time windows, the daily calculation of the adaptive Hurst index is incorporated into the prediction model as a new feature, which can describe the structural changes of commodity futures prices over time. Finally, in order to test whether the prediction method proposed in this paper is robust, we use intra-day minute data and sliding windows to calculate the daily permutation entropy to detect the randomness of prices, and change the size of the extrapolated prediction window to examine the prediction effect of multi-day extrapolation. The forecasting method proposed in this paper is completely data-driven and does not contain any pre-assumptions for data and models, which is an earlier combination of signal decomposition, mean reversion and structural change detection and deep neural network for forecasting China's commodity futures market. In addition to the conventional forecast evaluation indicators, in order to verify the application value of the forecast model, we also design a quantitative trading strategy, using the forecast output to identify three types of trading opportunities: buy, sell and hold.

III. RESEARCH DESIGN
This section details the research on the combined prediction method of EEMD-Hurst-LSTM proposed in this paper. Section III-A introduces the logical architecture of the forecasting method, Section III-B describes the detailed forecasting steps, Section III-C introduces the methods used in the theoretical framework of the forecasting model, and Section III-D presents the statistical and profitability criteria used to evaluate the forecasting effect.

A. PREDICTION FRAMEWORK
The EEMD-Hurst-LSTM combined prediction method integrates EEMD, adaptive Hurst index and LSTM, as shown in Figure 1. The framework includes three stages: (1) EEMD preprocesses the data, that is, decomposes the four commodity futures price series involved in this paper, and extracts high-frequency fluctuation components to low-frequency trend components; (2) based on intraday 5-minute highfrequency components data, calculating the daily Hurst index to capture stock index futures price turns; (3) based on a long short-term memory neural network with delay and multifeature input, we predict the forward single-step and multistep results of the daily closing prices of four commodities. We examine the predictive performance of this framework on Chinese commodity futures prices. In addition, in order to further verify the practicability of the forecast model, a quantitative trading strategy is formulated according to the forecast results, the accuracy of the trading strategy signals based on the forecast results and the trading profitability performance are tested, and the trading results are compared with other commonly used strategies.

B. PREDICTION PROCESS 1) COMMODITY FUTURES PRICE EEMD DECOMPOSITION
We first perform EEMD decomposition on the continuous price index series of gold, copper, soybean and white sugar. Through the EEMD smoothing process, the intrinsic mode function (IMF) and its corresponding components and residual terms at different scales can be obtained. The high frequency is continuously reduced, and the stable fluctuation terms of different scales from high frequency to low frequency are separated. The high frequency component contains the volatility that reflects the original price series, and the low frequency component reflects the trend of the original price series. Since different factors have different durations of influence on commodity futures prices, decomposing the original price into components with different frequencies can capture different systematic patterns within the sequence, thereby reflecting the influence of different factors on commodity futures price movements.

2) DYNAMIC ADAPTIVE HURST INDEX AND PERMUTATION ENTROPY
To capture market state changes, we introduce the Hurst index in fractal market theory, which can quantify the ''memory'' or statistical correlation strength of price series. The market meaning of Hurst index is: when 0 < Hurst < 0.5, it means that the price series has the characteristics of antitrend. If the price series is upward in the previous period, there is a possibility of turning downward in the next period; when Hurst = 0.5 the price series is a standard random walk mode, and it can be considered that price information will not affect the future, and the market has no memory; when 0.5 < Hurst < 1, the price series has the characteristics of trend persistence and memory [31]. According to the meaning of the Hurst index, when the direction of the price sequence changes, it means that the previous trend is weakened and the anti-previous trend is strengthened, then the Hurst index corresponding to the price sequence should be reflected in a decline and a turning point. Therefore, the dynamic Hurst index calculated by the time window technique can be used to compare the changes in commodity futures prices to examine whether there is an end and a turning point in the trend of futures price series when the Hurst index is at a relatively low level less than 1/2, so as to detect changes in market conditions.
Permutation entropy (PE) can measure the degree of randomness of the target time series, and it does not make additional assumptions about the time series, including whether it is stationary, nonlinear, ordinal or does not change properties under monotonic transformation. Normalized PEs are distributed in the 0-1 interval. When the PE index is closer to 1, the higher permutation entropy means that the data generation VOLUME 11, 2023 process is more random and unpredictable. For a financial time series, if its permutation entropy is very low, it mathematically means weak randomness and strong predictability, and the financial implications correspond to a violation of the weak form of the efficient market hypothesis. This paper uses intra-day and minute-level data to calculate daily PE, and obtains dynamic PE over time to evaluate the dynamic change of time series randomness [32]. At the same time, by dividing the randomness difference of the time series, we can compare the difference in the effect of forecasting models in different stages of randomness.

3) PREDICTIVE MODEL INPUT FEATURE SELECTION
We use four types of variables as the input features of the deep learning model, the first three types are related to commodity transaction data, but the market information of the commodity itself is not enough to fully reflect its price trend, so we introduce the fourth type of external market information.
The first category is the trading history data of the four commodity futures, including the opening price, the highest price, the lowest price, the closing price, the trading volume, the logarithmic rate of return and the volatility, which can directly reflect the market price behavior. The logarithmic rate of return refers to the logarithmic rate of return of a particular commodity and is defined as follows: where p t represents futures price in time t. Yield volatility measures the dispersion of a commodity's average return over a certain period of time. Typically, the greater the volatility, the higher the risk of trading. We measure volatility using the historical standard deviation of minute-by-minute data for each trading day, as shown in equation (2).
where r i is the logarithmic rate of return at time t, − r i is the average logarithmic rate of return during the calculation period, n is the number of minute-level closing prices on each trading day, and v t is the daily volatility.
The second type of input features are the components of the EEMD decomposition and two types of dynamic variables, the daily Hurst index calculated from the intraday 5-minute data. The role of the EEMD component is to capture the inherent systematic patterns of stock index futures price series, while Hurst is used to capture the characteristics of market state transitions or price inflections.
The third category is technical indicators calculated based on commodity futures prices. Reference [33] pointed out that technical analysts usually use indicators to predict future prices. Reference [15] pointed out that combining information from technical indicators and macroeconomic variables significantly improves and predicts effects on financial time series, such as foreign exchange rates. Similar conclusions have also been confirmed in the stock market and futures market. We refer to similar studies to select the characteristics of technical indicators, choose the difference between the highest price and the lowest price (H-L), the closing price minus the opening price (O-C), and the 3-day index smooth movement. Average (EMA3), 10-day exponentially smoothed moving average (EMA10), 30-day exponentially smoothed moving average (EMA30), 5-day standard deviation, relative strength index (RSI), Williams indicator (W%R) and other commonly used techniques Indicators are input features as technical indicators [5], [15], [16], [34].
Exponentially smoothed moving average (EMA) is an average that weights the closing price, giving more weight to recent price data. Compared with the commonly used simple moving average, the exponentially smoothed moving average is more responsive to recent price changes. The algorithm of EMA is as follows.
where EM A t−1 is the EMA of the previous period and (2/n + 1) is the weighted multiplier applied to the latest price data.
The Relative Strength Index (RSI) is a typical oscillating indicator. It is generally calculated by closing prices, and the value oscillates between 0 and 100. If the RSI value is above 70, the asset price is considered ''overbought;'' if the RSI value is below 30, it is ''oversold.'' The calculation method is as follows: The Williams indicator is another indicator that measures the volatility of asset price momentum. The indicator value oscillates between −100 and 0. Similar to the RSI, if the value is above −20, the asset price is considered ''overbought'', and if the value is below −80, the asset is considered ''oversold''. The algorithm is as follows: The fourth category is external market information. Some studies have pointed out that commodity futures prices are correlated with prices in other financial markets, and fluctuations in related markets may have a significant impact on commodity futures price fluctuations [11], [12]. Drawing on previous research, we introduce the external market information of several important markets as features into the forecasting model to reflect the hidden internal connection between commodity futures prices and external markets. External predictors include China 5-year Treasury bond futures index, CSI 300 index futures price and WTI crude oil futures price.

4) PRICE PREDICTION
We construct forecasting models and extrapolate forecasts uDichtlsing sliding time windows (Figure 2), which utilize features such as commodity prices, EEMD components, Hurst indices, technical indicators, and external predictors over the past 120 days (approximately half-yearly trading days) to predict the daily closing price for the next 1 day. We use the prediction accuracy evaluation indicators (MAE, MAPE, MSE, RMSE, and R2) to evaluate the prediction value accuracy, and use the profitability evaluation standard to evaluate the profitability of the prediction. In order to evaluate the prediction performance of the model, we compare the performance of the EEMD-Hurst-LSTM combined prediction model proposed in this paper with benchmark models such as Autoregressive Differential Moving Average (ARIMA), Support Vector Regression (SVR) and LSTM [35]. In longitudinal comparison, we choose single LSTM, EEMD-LSTM, Hurst-LSTM and EEMD-Hurst-LSTM models for comparison. The advantage of performing longitudinal model comparison is that it allows us to separate out the EEMD decomposition, Hurst index, and the inclusion of the above two types of features, and compare the forecasting effects under three different scenarios. Horizontally, we compare the combined prediction model with the prediction results of ARIMA, SVR, and LSTM benchmarks. ARIMA, SVR, and LSTM represent the linear models represented by traditional financial econometrics, and the benchmark forecast models for time series in machine learning and deep learning, respectively. These models have been successfully applied to financial time series forecasting in different periods. In order to ensure the consistency of the prediction

5) PORTFOLIO EVALUATION
The purpose of the combined evaluation is to further evaluate the predictive performance and economic performance of the method. Each data point in the sample is recorded as a ''hold,'' ''buy,'' or ''sell'' based on the commodity's daily closing price. According to the data labeling rules, we use the generated one-day-ahead forecast results to develop trading strategies. If the daily closing price at time t+1 is higher than the closing price at time t, then time t is marked as ''buy'' and time t+1 is marked as ''sell.'' The labeling algorithm then continues to time t+2. Otherwise, time t is marked as ''hold,'' and the algorithm continues to execute until time t+1. After automatic marking, the transaction is carried out according to the generated strategy. Finally, we use the forecasting results generated by the model to compare the formulated strategy with several other common trading strategies to evaluate its economic performance according to the data labeling rules.
To compare the performance of the method proposed in this paper, this paper selects the trading strategy formulated by the predictive LSTM forecasting model, and three traditional strategies -''buy and hold,'' SMA and RSI as benchmarks. In a buy and hold strategy, gold futures are bought at the beginning of each test range and sold at the end of each test range, with no trades in between. In the SMA strategy, the 5-day and 20-day SMA values are calculated each trading day. If the 5-day SMA crosses above the 20-day SMA, triggering a long trade, this data point is marked as a ''buy'' point. Conversely, if the 5-day SMA crosses below the 20-day SMA, that data point is marked as a ''sell'' point. Otherwise, it is marked as a ''hold'' point. Likewise, a 14-day RSI is calculated every trading day in the RSI strategy. By convention, if the RSI value corresponding to the trading day is greater than 70, the commodity is considered ''overbought'' and marked as a ''sell'' point. An RSI of less than 30, the commodity is considered ''oversold'' and marked as a ''buy'' view. Otherwise, it is marked as a ''position'' point.

1) EEMD
From the perspective of deep learning and combined forecasting, a solution to traditional linear forecasting models is proposed, one of which is to denoise or smooth financial time series [34]. However, the previous practice is to perform one-time wavelet noise reduction or smoothing on the entire sample sequence, and divide the denoised sequence into a training set and a test set, which is essentially a static sample segmentation, and may use the test set information during the training process. A better approach is to use a sliding time window to implement dynamic single-step or multistep processing with forward rolling recursion. A difficult problem in the application of wavelet transform is the difficulty in choosing the wavelet function and the number of decomposition layers. As an improvement of wavelet decomposition, [36] proposed an adaptive time series decomposition technique, Empirical Mode Decomposition (EMD), which decomposes the sequence according to certain rules. As a very effective method to deal with non-stationary nonlinear sequences, EMD has self-adaptation and overcomes the problem that the basis function of wavelet decomposition needs to be set. The EMD algorithm separates the series into stationary fluctuation terms of different scales and a residual trend term through a fixed pattern, where each fluctuation term is an intrinsic mode function (IMF). However, there are two deficiencies in the EMD algorithm: 1. The components decomposed by EMD have modal aliasing, and the IMF may contain features of different time scales; 2. When decomposing the IMF, multiple iterations are required, but the iteration conditions are not clear objective standard. In order to solve the frequency mixing problem of each IMF component after EMD decomposition, [37] proposed ensemble empirical mode decomposition, namely EEMD, which is an improvement on EMD. The principle is to add white noise to the signal according to the feature that the mean value of white noise is 0, and average the results of EMD decomposition. Since neural networks are susceptible to noise in futures prices, this paper combines EEMD decomposition with neural networks.
Assume a financial series is x(t), the EEMD algorithm is as follows.
1. Assume x(t) average processing times for M , i = 1, 2, . . . M ; 2. Add a standard normal distribution of random white noise n i (t) to x(t), reconstructing sequences into a new signal 3. EMD decomposition for the new series x i (t), which is suitable for the EMD decomposition, j is the number of the IMF, c i,j (t) is the IMF component, r i,j (t) is the residual weight, 4. Repeat procedure 2 and 3 M times, each time adding different amplitude of the size of the white noise, obtained a series of the IMF. Through to the IMF for average, to obtain EEMD IMF component c j (t),

2) ADAPTIVE HURST INDEX
The main methods for calculating Hurst index mainly include the rescaled range method(R/S Statistic)proposed by Hurst (1951) early, periodogram regression [38], wavelet analysis [39] and detrended fluctuation analysis (DFA) [40], etc., where DFA [40] is a common method for calculating Hurst index. The advantage of this method is that it can detect long-range correlations in non-stationary time series while avoiding judgment errors. However, when the boundary between adjacent line segments of the series is discontinuous, if the series has trend or oscillation characteristics, then DFA may not be a good choice. The newly emerging Adaptive Fractal Analysis (AFA) can overcome the shortcomings of the DFA method (Gao et al. [50]). The procedure for calculating Hurst index by AFA method is as follows: First of all, by the sequence {x 1 , x 2 , x 3 , . . . , x N }, we create a random walk process, u(i) = i k=1 (x k −x), i = 1, 2, . . . , n; Secondly, using the adaptive filter to get relative to u(i) the trend of global smooth v(i), so we get: where H is the Hurst index [41].
When P j = 1/m! and E p (m) = ln(m!), PE reaches the maximum value, and standardized PE is obtained after normalization treatment:

4) LSTM
Traditional linear time series forecasting models, such as ARIMA, usually require strict assumptions about the distribution and stationarity of the time series. Since financial time series are generally considered to be very complex, non-stationary and high-noise, traditional time series analysis models are limited by fixed model frameworks and stricter assumptions, and are often unable to analyze complex financial time series data. accurate predictions. However, time series analysis methods based on deep learning pay more attention to the driving of the data itself, and can deal with nonlinear problems by using activation functions, so it can better deal with ''non-ideal'' time series data and obtain more accurate forecast result. With the development of deep learning, [43] proposed a recurrent neural network specifically for processing time series, namely LSTM, which can filter out what information in the past should be processed and retained. The advent of LSTM has sparked a new series of studies on time series forecasting [44]. The advantage of LSTM is that it is a non-parametric model, not only can handle the nonlinear pattern of time series, but also does not need to make assumptions about the distribution of prediction error terms. LSTMs don't even require the target sequence to follow a stationary process and thus not be affected by the unit root. LSTM is a type of recurrent neural network (RNN) with additional feedback links on some layers of the network. Unlike traditional RNNs, it is very suitable for learning from experience to predict when arbitrary time steps exist. In addition, LSTM can also solve the problem of vanishing gradient by making the memory cells retain time-dependent information for an arbitrary amount of time, which is proved to be more effective than traditional methods.
LSTMs have been used in different applications such as Natural Language Processing (NLP), Language Modeling, Language Translation, Speech Recognition, Sentiment Analysis, Predictive Analysis, and Financial Time Series Analysis, etc., and can achieve greater success in time series data analysis [45].
An LSTM network consists of LSTM cells, where each LSTM cell is combined to form an LSTM layer. The LSTM unit consists of an input gate, an output gate, and a forget gate, and the three gates control the flow of information. With these features, each cell can remember the desired value at any time interval. Equation 11-15 is the mathematical form of LSTM: The x t LSTM unit of input vector, f t is the activation vector for forgotten door, i t is activation vector for input door, o t is the activation vector for output door, h t is the activation vector for output door, c t is cell state vector, σ g is the sigmoid function, σ c σ h is the hyperbolic tangent function, * is the product of the elements, W and U need to learn the weights matrix, b is a need to learn the offset vector parameter.

D. PREDICTION EVALUATION 1) STATISTICAL EVALUATION OF PREDICTIVE MODELS
In the process of establishing a model for prediction, a good prediction model not only needs to have good fitting performance in the historical backtesting of the training set and validation set, but also must have a high prediction ability in the prediction set, so that it can be used in the real commodity futures market. forecasting application. In this paper, we use five categories of statistical evaluation indicators to judge the prediction performance, namely MSE, RMSE and R 2 . The definitions of the indicators are as follows: First, to evaluate the predictive performance of the model, Mean Squared Error (MSE) is chosen as the loss function due to its robustness. The smaller the MSE, the better the estimation quality of the predictive model.
Second, in order to measure the fitting performance of the prediction model, the root mean square error (RMSE) is selected.
We pursue the prediction of model performance, while taking into account the generalization ability of the model, and try to avoid letting the model fall into the trap of overfitting. A simple way to judge overfitting is when the error on the training set is significantly smaller than the error on the test set under the same performance criterion. The above evaluation indicators are usually used to evaluate the prediction effect of the model, but the above indicators are easily affected by the dimensions of output variables (target variables) and input features (independent variables). For example, assuming that the order of magnitude of the target variables in model A is 1000, 2000 . . . , and the target variables in model B are all 1, 2 . . . , then the RMSE in model A is likely to be larger than model B, but this does not explain the generalization ability of model A is worse than that of model B. In order to solve the above problems, in addition to normalization, a standard for measuring the degree of fitting, the coefficient of determination (R 2 ), can be introduced to describe the fitting ability of the model.

2) PROFITABILITY OF PREDICTION MODEL
We build a buy-sell trading rule strategy to test the profitability and forecasting accuracy of the model, which is widely used for profitability performance [5], [34], [35], [46]. Buy when the forecast value (target) of the next period is greater than the current market closing price, and sell when the forecast value of the next period is less than the current market closing price: The total revenue of this trading rule can be calculated by the following formula:

IV. DATA DESCRIPTION
This section details the data samples and variables that are input features to the predictive model. We obtained the opening price, highest price, lowest price, closing price and trading volume data of the daily frequency and intraday 5-minute frequency of the four commodity futures price indices of gold, copper, soybean and sugar from JoinQuant. 1 The sample data is from June 10, 2008. until December 30, 2021. Since a single futures contract has an expiration date, splicing contracts to construct a continuous price series is one of the thorny problems in basic data processing. The usual practice is to roll adjacent futures contracts, that is, to only hold futures contracts with the most recent expiry date to construct a continuous time series of futures prices [47]. However, in practice this will still encounter ''price gaps'' which are not the best solution. In order to avoid the problem of data discontinuity caused by contract splicing, we directly use the commodity price indices provided by JoinQuant to generate continuous time series futures prices, which avoids the disadvantage of using different methods to splicing contracts, and at the same time keeps the commodities to the greatest extent possible. trend structure. Our motivation for using intraday minute data is that the daily adaptive Hurst index and permutation entropy can be calculated from which to assess market conditions and randomness over time. In addition, using intraday highfrequency data, we are able to perform EEMD decomposition of the training data set in each training window to obtain dynamic EEMD decomposition results, which is better than the traditional method of statically dividing training and test sets. After EEMD decomposition, the frequency of each component is from high frequency to low frequency, where IMF_1 is the highest frequency and IMF_5 is the lowest frequency. Each subsequence represents a hidden oscillatory factor in the time series. The lowest frequency mode, IMF5, can capture long-term relatively long-term trends in the data, while the high-frequency modes, IMF_1-IMF_4, describe relatively short-term price fluctuations in the data, and can effectively capture the sensitive short-term noise in the market.
To test the predictability of financial time series, unit root tests and autocorrelation tests are required. Table 3 gives the statistics of the two ADF unit root tests, the Ljung-Box Q autocorrelation test and their squared process. The ADF test performs a test for no constant term and a deterministic trend based on the data. The copper and soybean price series have unit roots and the series are non-stationary, while the gold and sugar price series reject the null hypothesis of the unit root test and the series are stationary. At the 1% significance level, the Ljung-Box Q statistic of the four commodity price series with 5 and 10 lag orders and its square process test, the 5 and 10 order lags of all commodities reflect the absence of autocorrelation in the series. The squared processes of copper all reflect autocorrelation.

A. RESULTS COMPARISON
In order to obtain a more realistic model prediction performance and avoid using future information in the prediction process, this paper adopts the rolling forward prediction method, and performs EEMD decomposition on the closing price sequence of each commodity in each training model window to obtain dynamic EEMD decomposition results. Figure 3 shows the results of EEMD decomposition using intraday high-frequency data, which decomposes the closing price of each commodity into 4 sub-signals with different frequencies.
We conduct longitudinal and horizontal comparisons of several benchmark models to evaluate the predictive performance of the models. The longitudinal performance comparison results are shown in Table 4, and the horizontal performance comparison results are shown in Table 5. In order to facilitate the comparison of the prediction effects of each commodity and each model, the average price of all commodities is standardized, using a time window of 100 trading days, and rolling forward in a single step. In the longitudinal comparison, a single LSTM model and a combined model with EEMD and Hurst added to the LSTM were selected. Each model includes technical indicators, closing prices of commodity prices, and external market information. For horizontal comparison, this paper will combine the prediction model with the classic linear model ARIMA, the classic SVR and LSTM in machine learning, and build a benchmark model for horizontal comparison. We obtain the following predictions. 1. In the longitudinal comparison of the models, the EEMD-Hurst-LSTM model proposed in this paper is superior to other models. Among the four commodities, the four indicators of MSE, MAPE, RMSE and MAE that reflect the prediction error are all smaller than other models, suggesting that after introducing EEMD decomposition and Hurst index features at the same time, the combination of EEMD-Hurst can reduce the prediction error and improve the fitting performance. By introducing EEMD decomposition and Hurst index, the extracted internal factors and the correlation between different market sub-sequences decomposed contain information closely related to the volatility of commodity futures market, which enables the prediction model to better identify the systematic characteristic changes of market prices. Furthermore, the proposed method produces better results when the Hurst index is introduced to identify changes in market conditions. Compared with the single LSTM model and the model without Hurst exponent, this method can further reduce the prediction error of all intervals, and the fitting performance is stronger. 2. In the horizontal comparison of models, the performance of the EEMD-Hurst-LSTM combined prediction model is significantly better than the linear model, slightly better than the single LSTM model, and the prediction accuracy is similar to that of the SVR.

B. ROBUSTNESS CHECK
In order to verify the reliability of the model proposed in this paper, this paper conducts robustness tests from two aspects. First, by adjusting the number of extrapolated forecast steps, the forecast effect of the combined forecast model in the multi-step step-by-step forecast window scenario is tested; second, the daily permutation entropy is calculated through the intraday minute high-frequency data to examine whether   the randomness of the price sequence affects the predict the effect.  In order to further examine the effect of the predictability of commodity futures price series on the prediction model, we calculate the daily dynamic permutation entropy using the intraday minute high-frequency data. In this paper, according to the dynamic random changes of the intraday data evaluation sequence, the specific method is to calculate the average value of each forecast evaluation index and the average value of permutation entropy by year (Table 10). Figure 8 shows that there is a positive correlation between the arrangement entropy of the four commodities and the MAE, that is, the MAE index related to the increase of the arrangement entropy also increases, indicating that the predictability represented by the randomness of the sequence is closely related to the prediction accuracy of the model. The previous theoretical mechanism is verified.

C. PROFIT ANALYSIS
The effect of forecasting model is evaluated from the perspective of portfolio profit performance, so that the forecasting model is verified in theory and has application value in transaction practice. To assess profitability performance, five categories of strategy comparisons are proposed. First, based on the combined prediction model of EEMD-Hurst-LSTM and the LSTM prediction model, trading signals are traded according to the trading rules designed in Section III-D2; the second is the ''buy and hold'' strategy, which is generally regarded as a benchmark. From the first trading day of the sample, a portfolio of four commodities is constructed, and the weight of commodity allocation is equal; the third is a technical timing strategy, including 5-day and 20-day simple moving average crossover buy signals and RSI upper and lower threshold buy signals. We uniformly implement quantitative trading strategies on the daily frequency, and consider the transaction cost of 0.05%. The performance of all strategies across the entire sample is shown in Table 11 and Fig 9. Overall, forecast-based trading strategies are significantly better than ''buy and hold'' strategies and technical timing strategies, which is due to the better forecasting effect of the forecasting model. The cumulative return of the EEMD-Hurst-LSTM method on the entire test set is 48.18%. In contrast, in the LSMT model, under the buy-and-hold and moving average crossover strategies, the net value of  the portfolio gained 67.11%, 10.03%, and 9.8%, respectively, while the RSI timing strategy lost 22.5% of the net value. From the results of the two most important return indicators, cumulative return and Sharpe ratio, the trading strategy predicted by the LSTM model achieves the highest return and the largest Sharpe, which is more advantageous than other strategies, followed by the EEMD-Hurst-LSTM method. However, to evaluate a quantitative  trading strategy, in addition to examining its profitability, it is also necessary to pay attention to its risk management level.  An important risk indicator in quantitative trading is the maximum drawdown, which is the maximum drawdown of the return rate when the net value of the portfolio reaches the lowest point during the sample period. Among all strategies, only the strategy designed according to the EEMD-Hurst-LSTM method controls the maximum drawdown of the net value curve of the commodity portfolio at about 22%, and other strategies exceed 30%, implying that the trading strategy implemented according to the combination model is better in terms of stability and anti-risk ability.
We want to emphasize that the sample for this article spans a 13-year period from 2008 to 2021, and it is very challenging to consistently outperform the market over a long period of time according to the forecasting model. During this period, the market showed different volatility patterns and experienced major emergencies such as the financial crisis, crude oil crisis, China-US trade war and the COVID-19 shock. The integrated predictive model trading strategy proposed in this paper has advantages compared with other trading strategies. During several time periods when other trading strategies underperformed and the returns fell sharply, the method proposed in this paper performed more robustly, with the smallest portfolio equity drawdown and higher strategy stability. Therefore, the model can effectively reduce the investment risk of commodity futures, and has important application value to market investors and financial institutions.

VI. CONCLUSION AND DISCUSSION
By integrating signal processing, nonlinear science and deep learning models, this study proposes a combined forecasting model of EEMD-Hurst-LSTM and uses this model to predict the price trend of China's commodity futures market. The method integrates EEMD, adaptive Hurst index and LSTM. First, we use EEMD to preprocess the data, decompose the four commodity futures price series involved in this paper, and extract high-frequency fluctuation components to low-frequency trend components; second, we use the 5-minute high-frequency data within each commodity day to calculate the daily Hurst index to capture the price transition of stock index futures, and use LSTM with delay and multi-feature input to make single-step extrapolation forecasts for commodity futures prices with daily frequency;Third, in order to verify the robustness of the forecasting model, we specially check the results of multistep forecasting, and use permutation entropy to detect the randomness of the price series, and examined the relationship between the predictability of the price series and the forecasting effect of the combined model; finally, We formulate quantitative trading strategies based on forecast results, test the accuracy of trading strategy signals based on forecast results and trading profitability performance, and compare trading results with other commonly used strategies.
The empirical findings in this paper show that the proposed EEMD-Hurst-LSTM method has better predictive performance compared to other horizontal single models and longitudinal deep learning combined models. The quantitative trading strategy designed according to the combination forecast model can obtain returns that exceed other trading strategies and the best risk control level. In particular, during the period of sharp market decline and severe volatility, this strategy shows strong anti-risk ability, and the maximum drawdown is significantly smaller than other strategies. In addition, since this paper uses the rolling-forward recursive training and validation prediction model technique, the model has good consistency and good generalization ability on different datasets. The combination forecasting model proposed in this paper has practical and guiding significance for investors and policy makers who use trend following or arbitrage as the trading mode of commodity futures.
One of the limitations of this study is that, in order to simplify and facilitate implementation, the quantitative trading strategy proposed in this paper is only a long strategy. However, in practical trading, short-selling is also an important part of the market strategy due to the unique short-selling mechanism of the commodity futures market, but the introduction of the short-selling mechanism will increase the complexity of the strategy. The commodities involved in this article are traditional and major commodities in China's commodity futures market, and have a long existence. Therefore, if the strategy can be incorporated into short positions, it will inevitably generate more trading opportunities, which may improve the profitability of the model. Therefore, adopting a long-short hybrid strategy will be one of the major directions of future work.