A Fast Training Algorithm for Deep Convolutional Fuzzy Systems with Application to Stock Index Prediction

A deep convolutional fuzzy system (DCFS) on a high-dimensional input space is a multi-layer connection of many low-dimensional fuzzy systems, where the input variables to the low-dimensional fuzzy systems are selected through a moving window (a convolution operator) across the input spaces of the layers. To design the DCFS based on input-output data pairs, we propose a bottom-up layer-by-layer scheme. Specifically, by viewing each of the first-layer fuzzy systems as a weak estimator of the output based only on a very small portion of the input variables, we can design these fuzzy systems using the WM Method. After the first-layer fuzzy systems are designed, we pass the data through the first layer and replace the inputs in the original data set by the corresponding outputs of the first layer to form a new data set, then we design the second-layer fuzzy systems based on this new data set in the same way as designing the first-layer fuzzy systems. Repeating this process we design the whole DCFS. Since the WM Method requires only one-pass of the data, this training algorithm for the DCFS is very fast. We apply the DCFS model with the training algorithm to predict a synthetic chaotic plus random time-series and the real Hang Seng Index of the Hong Kong stock market.


I. INTRODUCTION
T HE GREAT success of deep convolutional neural networks (DCNN) [1], [2] in solving complex practical problems [3], [4] reveals a basic fact that multilevel structures are very powerful models in representing complex relationships. The main problems of DCNN are the huge computational load to train the tones of parameters of the DCNN and the lack of interpretability for these huge number of model parameters [5]. The goal of this article is to develop deep convolutional fuzzy systems (DCFS) and fast training algorithms for the DCFS to explore the power of multilevel rule-based representations and to overcome the computational and interpretability difficulties of the DCNN.
Hierarchical fuzzy systems were proposed by Raju et al. [6], roughly the same time when LeCun et al. [1] introduced the DCNNs in 1990. In the late 1990s, the basic properties of hierarchical fuzzy systems such as universal approximation were proved in [7] and a backpropagation (BP) algorithm was developed in [8] to train the hierarchical fuzzy systems based on input-output data. Then, a wave of research on the hierarchical fuzzy systems was conducted in the fuzzy community around the middle 2000s-the same period when Hinton et al. [2] proposed the celebrated new training algorithm for deep neural networks in 2006 which has led to the current AI boom. During this period, the structural and approximation properties of hierarchical fuzzy systems were studied in depth [9]- [12] and many new methods for designing hierarchical fuzzy systems were proposed [13]- [16]. Since then, the hierarchical fuzzy models have been applied to a wide variety of practical problems, such as environmental monitoring [17], educational assessment [18], video deinterlacing [19], price negotiation [20], mobile robots automation [21]- [23], self-nominating in peer-to-peer networks [24], linguistic hierarchy [25], hotel location selection [26], smart structures [27], weapon target assignment [28], image description [29], nutrition evaluation [30], spacecraft control [31], photovoltaic management [32], and wastewater treatment [33].
More recently, the research on the hierarchical fuzzy systems has been advanced along many directions, such as fast implementation [34], adaptive control [35], multiobjective optimization [36], interpretability [37], classification [38], [39] and so on. The research works summarized in the last paragraph demonstrate that the hierarchical fuzzy models are successful in solving many practical problems. However, the applications of the hierarchical fuzzy models have been generally restricted to the low-dimensional problems with small datasets. Furthermore, the current training algorithms for hierarchical fuzzy systems are of the same types as those for deep neural networks, which are computationally intensive when applied to the high-dimensional problems with big data. The heavy computational load is mainly due to the iterative nature of the training algorithms (multiple passes of the data) and it may take a long time to converge. Since the parameters of fuzzy systems have clear physical meanings (a clear connection to the input/output variables and the data) which the neural network parameters do not have, we can take advantage of these physical meanings to develop the fast training algorithms for the parameters. The Wang-Mendel (WM) method proposed in [40] and [41] is such a fast training algorithm that uses the training data only one-pass to determine the fuzzy system parameters. The basic idea of this article is to use the WM method to design the low-dimensional fuzzy systems in a bottom-up layer-by-layer fashion so that a DCFS is eventually constructed, where the inputs to the low-dimensional fuzzy systems are selected through a convolutional operator (a moving window). These low-dimensional fuzzy systems may be viewed as the weak estimators [42] of the output variable. But, unlike the classical ensemble methods in machine learning [43] such as bagging [44], random forest [45], or boosting [46], the weak estimators (the low-dimensional fuzzy systems) in our DCFS models are constructed in a layer-by-layer fashion. Specifically, the first-level fuzzy systems may be viewed as the ordinary weak estimators, with each fuzzy system uses only a very small number of the input variables from the high-dimensional input space. After the first-level fuzzy systems are designed using the standard WM method, they are fixed and their outputs form the input space to the second-level fuzzy systems. By passing the training data through the fixed first-level fuzzy systems, a new dataset is generated, and the second-level fuzzy systems are designed based on this new dataset in the same way as the first-level fuzzy systems. This process continues, layer after layer, until the DCFS is constructed.
To test the DCFS models and the training algorithms, we apply them to predict a synthetic chaotic plus random time-series and the real Hang Seng Index (HSI) of the Hong Kong stock market. Although it was generally believed that stock prices follow random walks [47], [48] and, therefore, are not predictable, many research works showed that the stock prices do not follow random walks [49], [50] and demonstrated that predicting the market is possible [51]- [55]. Since the stock prices are driven by the buying and selling operations of human traders who are influenced by human psychology, such as greed and fear [56], [57], it is reasonable to believe that there are some predictable elements in the stock prices. Of course, it is a very challenging task to catch up with these predictable elements in a timely fashion to make a profit [53], [54].
The rest of this article is organized as follows. In Section II, we show the structural details of the DCFS. In Section III, we develop the four training algorithms for the DCFS. In Section IV, we apply the DCFS models with the training algorithms to predict a chaotic plus random time-series and the real HSI of the Hong Kong stock market. Finally, Section V concludes this article, and the MATLAB code of the main training algorithm is provided in the Supplemental Material.

II. STRUCTURE OF DCFS
We begin with the definition of the general DCFS. Definition 1: The general structure of a DCFS is illustrated in Fig. 1, where the input vector (x 0 1 , x 0 2 , . . . , x 0 n ) to the DCFS is generally of very high dimension, and the output x L is a scalar (a multioutput DCFS may be designed as multiple single-output DCFS). Level l (l = 1,2, …, L − 1) consists of n l fuzzy systems FS l i (i = 1, 2, . . . , n l ) whose outputs are denoted as x l i , which are inputs to Level l + 1. The top level, Level L, has only one fuzzy system FS L that combines the n L−1 outputs from Level L − 1 to produce the final output x L . The input sets I l 1 , I l 2 , . . . , I l n l to the fuzzy systems FS l 1 , FS l 2 , . . . , FS l n l (l = 1, 2, . . . , L − 1) are selected from the previous level's outputs x l−1 1 , x l−1 2 , . . . , x l−1 n l−1 through a moving window of the length m, where the window size m is usually a small number such as 3, 4, or 5. The moving window may take a variety of moving schemes. For example, it may move one variable at a time starting from x l−1 1 until x l−1 n l−1 is covered, and this gives where l = 1, 2, …, L − 1 (for Level l = 1 we have n 0 = n). For this one-variable-at-a-time moving scheme, we have for l = 1, 2, …, L − 1 with n 0 = n, from which we get If we do not want to use too many fuzzy systems FS l i in the construction of the DCFS to improve the efficiency of each fuzzy system FS l i , we may move the window more than one variable at a time to cover the input variables in the levels. In the extreme case, we may move the window m variables each time for all the fuzzy systems FS l i so that a L level DCFS can cover m L input variables. For a L = 5 level, DCFS with m = 5 inputs to each fuzzy system FS l i , for example, m L = 3125 input variables can be covered. Also, the window size m may be different for different fuzzy systems FS l i to introduce more flexibility to the DCFS model.
The fuzzy systems FS l i (i = 1, 2, . . . , n l , l = 1, 2, . . . , L − 1) are standard fuzzy systems [58] constructed as follows. For each input variable x l−1 i , . . . , x l−1 m+i−1 ∈ I l i to the fuzzy system FS l i (note that x l−1 i is the first input variable to FS l i and may not be the ith input variable to Level l), define q fuzzy sets A 1 , A 2 , . . . , A q , as shown in Fig. 2, where the centers of the q fuzzy sets are equally spaced and the endpoints minx j and maxx j are determined from the training data (the details will be given in Section III when we will develop the training algorithms for the DCFS). The fuzzy system FS l i : which is constructed from the following q m fuzzy IF-THEN rules: where the membership functions A j 's are given in Fig. 2, and the parameters c j 1 ,...,j m are the centers of the fuzzy sets B j 1 ,...,j m and will be designed using the training algorithms in Section III. For the membership functions in Fig. 2, we see that and the fuzzy system FS l i of (4) is simplified to where i = 1, 2, . . . , n l and l = 1,2, …, L − 1. The top level fuzzy system FS L is in the same form of (7) with n L−1 input variables , . . . , x L−1 n L−1 . We claimed in Section I that the DCFS has better interpretability than the DCNN, and now we show the details of how to interpret the DCFS in terms of the fuzzy IF-THEN rules (5) and the parameters c j 1 ,...,j m . First, notice that the fuzzy system . That is, the fuzzy system takes local actions with one rule responsible mainly for one cell. Therefore, for a given point (x l−1 i , . . . , x l−1 m+i−1 ), the action of the fuzzy system FS l i (x l−1 i , . . . , x l−1 m+i−1 ) of (7) can be represented by a single parameter c j 1 ,...,j m that represents the fuzzy IF-THEN rule in the form of (5).
Because each of the fuzzy system FS l i in the general DCFS of Fig. 1 can be represented by a single parameter c j 1 ,...,j m (as we demonstrated in the last paragraph), the whole action of the DCFS on any given input point can be interpreted by a graph of connected c j 1 ,...,j m 's. Fig. 3 (the right part) shows an example of six-level DCFS with n = 7 inputs to the DCFS and m = 2 inputs to each fuzzy system FS l i in the DCFS. For a given input Fig. 3, if the DCFS gives a bad output y (such as causing an accident in an autocar application), then from the DCFS graph in the right part of Fig. 3 we can easily check what rules cause the bad output and take appropriate changes to these rules so that the mistake will not happen again. We will discuss this easy-error-correction property of the DCFS in more details after we develop the training algorithms in Section III.
Finally, we introduce a special DCFS that can greatly reduce the memory requirement and the computational cost of the training algorithms for the DCFS to be developed in Section III. We see from (7) that each fuzzy system FS l i has q m free parameters c j 1 ,...,j m to be designed and stored in the computer memory. The total computational and storage requirement is proportional to q m L l=1 n l , which may be a large number for the high-dimensional problems. To reduce the computational and storage requirement, we introduce a parameter sharing scheme where the fuzzy systems in the same level are identical (sharing the same c j 1 ,...,j m parameters). Definition 2: A DCFS with parameter sharing is a DCFS where the fuzzy systems FS l i in the same level l are identical, i.e., FS l 1 = · · · = FS l n l for l = 1, 2, …, L, and we denote this identical fuzzy system in Level l as FS l . A DCFS with parameter sharing has q m L free parameters c j 1 ,...,j m to be designed and stored.
We now move to Section III to develop a number of fast training algorithms to determine the parameters c j 1 ,...,j m of the fuzzy system (7) based on input-output data for the general DCFS and the DCFS with parameter sharing.

Task 1 (Offline training):
Given N input-output data pairs are the inputs and y 0 (k) is the output, our task is to design a DCFS in Fig. 1 to match these input-output data pairs.
First, we develop a training algorithm for the general DCFS in Definition 1 to match the input-output data pairs of (9) (Training Algorithm 1). Then, we show how to design the DCFS with parameter sharing in Definition 2 in Training Algorithm 2. Finally, in Training Algorithms 3 and 4, we show how to do an online training for the general DCFS and the DCFS with parameter sharing, respectively.
Training Algorithm 1 (For general DCFS): Given the inputoutput data pairs of (9), we design the general DCFS in Definition 1 with the fuzzy systems FS l i in the form of (7) through the following steps.
Step 1: Choose the moving window size m and the moving scheme (such as moving one-variable-at-a-time or other schemes).
Step 2: Design the Level 1 fuzzy systems FS 1 i in the form of (7) (with m input x 1 i , . . . , x 1 m+i−1 ), i = 1, 2, . . . , n 1 , using the WM method [40], [41], where the input-output data pairs used to design the FS 1 i are set the initial values of the weight parameter w j 1 , ..., j m and the weight-output parameter u j 1 , ..., j m equal to zero. Fig. 2 and choose the endpoints as N). (11) 2.3: For each input-output data pairs of (10) starting from k = 1, determine the fuzzy sets A j * 1 , . . . , A j * m that achieve the maximum membership values among the q fuzzy sets 2.4: Update the weight and weight-output parameters for cell 2.5: Repeat 2.3 and 2.4 for k = 1, 2, . . . , N. For the cells (j 1 , . . . , j m ) with w j 1 ,...,j m = 0, determine the parameters c j 1 ,...,j m in the fuzzy system FS 1 i of (7) as We call the cells (j 1 , . . . , j m ) with w j 1 ,...,j m = 0 covered by data, and define as the new input-output data pairs for designing the Level l fuzzy systems FS l i . 3.2: Use the same procedure in Step 2 to design the Level l fuzzy systems FS l i in the form of (7), i = 1, 2, . . . , n l , with the original input-output data pairs (9) replaced by the new inputoutput data pairs (18), and all the Level 1 variables in Step 2 replaced by the corresponding Level l variables (for example, replace n 1 by n l ,  fuzzy system FS 2 i selects only a small number of m variables from x 1 1 , x 1 2 , . . . , x 1 n 1 as its inputs, and the n 2 fuzzy systems FS 2 i (i = 1, 2, . . . , n 2 ) are viewed as Level 2 weak estimators for the output y 0 . This process continues up to the top Level L whose output x L is the final estimate of the y 0 .
Remark 2 (Physical meaning of the parameter design): From (13)- (15), we see that the parameters c j 1 ,...,j m in the fuzzy system (7) are designed as the weighted average of the outputs y 0 (k) whose corresponding inputs x l−1 i (k), . . . , x l−1 m+i−1 (k) fall into the cell (j 1 , . . . , j m ), with the weight equal to the membership value A j 1 (x l−1 i (k)) · · · A j m (x l−1 m+i−1 (k)). If no data fall into a cell (j 1 , . . . , j m ), then the c j 1 ,...,j m of the cell is determined through the extrapolation scheme of Steps 2.6 and 2.7 (illustrated in Fig. 4). We can use this simple scheme to design the parameters c j 1 ,...,j m because of the clear physical meaning of the c j 1 ,...,j m . Specifically, the c j 1 ,...,j m is the center of the then part membership function of the fuzzy IF-THEN rule (5) for the cell (j 1 , . . . , j m ), so the c j 1 ,...,j m may be viewed as the estimate of the desired output y 0 based on the fuzzy IF-THEN rule (5) at cell (j 1 , . . . , j m ). Therefore, a good way to design the c j 1 ,...,j m is to put it as the weighted average of the y 0 (k)'s whose corresponding inputs x l−1 i (k), . . . , x l−1 m+i−1 (k) fall into the cell (j 1 , . . . , j m ). The optimality of designing the parameters in this way is studied in [59].
Remark 3 (Fast training with low computational cost): From 2.3 to 2.5 in Training Algorithm 1, we see that to design each of the fuzzy systems FS l i in the DCFS, the N input-output data pairs (10) are used only once (just one-pass through the data). In the popular gradient-decent-based BP algorithm, multiple passes through the data are needed to ensure the convergence of the parameters, so the computational cost is high and the speed of the algorithm is slow. Since the data are passed through just once in Training Algorithm 1, it is a very fast algorithm. Specifically, the computational load of Training Algorithm 1 (for general DCFS) is approximately O(N + q m ) L l=1 n l , where O(N ) accounts for the one-pass of data in the computation of (12)- (14), O(q m ) accounts for the computation of the parameters c j 1 ,...,j m in Steps 2.5-2.7 (illustrated in Fig. 4), and L l=1 n l is the number of the fuzzy systems FS l i (i = 1, 2, . . . , n l , l = 1, 2, . . . , L) in the DCFS.
Since each fuzzy system FS l i (7) in the general DCFS has q m free parameters c j 1 ,...,j m , the total computational and storage requirement is proportional to q m L l=1 n l , which may be a large number for the high-dimensional problems. Therefore, we introduced the DCFS with parameter sharing in Definition 2. Now we show how to design the DCFS with parameter sharing based on the input-output data pairs (9).
Training Algorithm 2 (For the DCFS with parameter sharing): Given the input-output data pairs of (9), the DCFS with parameter sharing in Definition 2 are designed as follows: Step 1: The same as Step 1 in Training Algorithm 1.
Step 2: Design the first Level 1 fuzzy system FS 1 1 = FS 1 in the form of (7) with the following n 1 N input-output data pairs:  Step 3: The same as Step 3 of Training Algorithm 1 except that the Step 2 there is replaced by the Step 2 here.   Fig. 3, for example, we have L = 6 and L l=1 n l = 21, so the memory requirement is reduced from 21O(q m ) to 6O(q m ), a reduction of (21−6) 21 = 71.4%. In Training Algorithms 1 and 2, the N input-output data pairs (9) are processed in a batch fashion. In many real-world situations, the data are collected in real time, so it is interesting to design the DCFS in a recursive way with the new data coming in real time. This gives Task 2 as follows: Task 2 (Online training): Let the input-output data pairs x 0 1 (k) , x 0 2 (k) , . . . , x 0 n (k) ; y 0 (k) , k = 1, 2, 3, . . . (20) be given online with k being the time index. Suppose a DCFS has already be designed up to k − 1, and we denote this DCFS as the DCFS(k − 1). Let the initial DCFS(0) be designed either by Training Algorithm 1 for general DCFS or by Training Algorithm 2 for the DCFS with parameter sharing, and our task is to update the DCFS(k − 1) based on the new data pair . , x 0 n (k); y 0 (k)] to get DCFS(k). We now show how to do Task 2 for the general DCFS and the DCFS with parameter sharing in Training Algorithms 3 and 4, respectively.
Training Algorithm 4 (Online training for the DCFS with parameter sharing): For Task 2, let the initial DCFS(0) be a parameter sharing DCFS (Definition 2) designed by Training Algorithm 2 and the c j 1 ,...,j m (k − 1) be the parameters of the fuzzy systems FS l in the form of (7) in the DCFS (k − 1). Then, the parameters c j 1 ,...,j m (k) of the FS l in the DCFS(k) are updated through the following steps: Step 1: Update the c j 1 ,...,j m (k) of the first Level 1 fuzzy system FS 1 1 = FS 1 in the form of (7) with the following n 1 inputoutput data pairs: . . , n 1 . (24) Specifically, for each of the n 1 input-output data pairs of (24), update the c j * 1 ,...,j * m (k) using (21) and (22); for the rest c j 1 ,...,j m (k), keep them the same using (23). Then, copy this , which is a measure of the degree that the input data point (x 0 i (k), . . . , x 0 m+i−1 (k)) belongs to the cell (j * 1 , . . . , j * m ). Since only one parameter c j * 1 , ..., j * m (k) needs to be updated and the computational load of the updating law (21)-(22) is very low, Training Algorithms 3 and 4 may be implemented on the simple devices such as a mobile phone. Therefore, an initial DCFS designed with the offline Training Algorithm 1 or 2 may be downloaded to the customer's simple device such as a mobile phone, then the customer may updated the DCFS using the online Training Algorithm 3 or 4 with the new data that the customer observes in real life.
Remark 6 (Easy error correction): As we discussed in Section I that a main problem of the black-box DCNN is that if something goes wrong, we do not know which part of the DCNN should be corrected so that the same mistake will not take place again. The DCFS designed with the Training Algorithm 3 or 4 can easily solve this problem of the DCNN. Specifically, if a DCFS makes a mistake at an input point, say the point (x 0 * 1 , . . . , x 0 * 7 ) to the DCFS in Fig. 3, then we can update the c ij 's of the DCFS in Fig. 3 that are responsible for the mistake, using Training Algorithm 3 or 4 with the correct input-output data.
We now apply the DCFS designed with Training Algorithms 1-4 to the time-series prediction problems in Section IV.

IV. APPLICATION TO HSI PREDICTION
Before we apply the DCFS with the training algorithms to predict the real HSI of the Hong Kong stock market in Examples 2, 4, and 5, we first try it for a synthetic chaotic time-series in Examples 1 and 3 to get some feeling about the performance of the method. Specifically, in Examples 1 and 2, we test the Training Algorithms 1 and 2 for predicting the Mackey-Glass chaotic time-series and the real HSI, respectively. In Examples 3 and 4, we apply the online Training Algorithms 3 and 4 to predict the Mackey-Glass chaotic time-series and the real HSI, respectively. In Example 5, we add more related stocks to the inputs to get better performance.
Example 1: Consider the Mackey-Glass chaotic time-series generated by the differential equation with τ = 50. Let r(t) be the return (relative change) of the chaotic time-series x(t) plus a white Gaussian noise n(t), i.e., and we generate a synthetic chaotic plus random time-series y(t) whose return is r(t), i.e., We view y(t)(t = 0, 1, 2, . . .) of (27) as the daily closing prices of a stock index. Fig. 9 plots a realization of 3000 points of y(t).
We now use the DCFS of Fig. 1 with the Training Algorithms 1 and 2 to predict the return sequence r(t) of (26). Specifically, let the n past returns up to day t − 1 : r(t − 1), r(t − 2), . . . , r(t − n) be the inputs x 0 1 , x 0 2 , . . . , x 0 n to the DCFS and the output x L of the DCFS be the prediction of wherer(t) is the prediction of r(t). In this case, the input-output data pairs of (9) become where k = t − 1, t − 2, . . . , t − N + n (the current day is t − 1), that is, at day t − 1, N past returns r(t − 1), r(t − 2), . . . , r( t − N ) constitute the N − n input-output pairs in the form of (29). With n = 11, m = 3, and one-variable-at-a-time moving scheme, a five-level DCFS is established, where Levels 1-5 have 9, 7, 5, 3, and 1 fuzzy systems FS l i , respectively. Using the first 2000 points in Fig. 9 as the training data and the last 1000 points as the testing data, we simulate the Training Algorithms 1 and 2 for different values of q (q is the number of fuzzy sets covering each input variable, see Fig. 2). Figs. 10 and 11 plot the training (blue) and testing (red) errors of Training Algorithm 1 (for general DCFS) and Training Algorithm 2 (for the DCFS with parameter sharing) as a function of q, respectively. We see from Figs. 10 and 11 that for both general DCFS (with Training Algorithm 1) and the DCFS with parameter sharing (with Training Algorithm 2), the training errors (blue curves) keep decreasing as more fuzzy sets are used to cover the input variables. For the testing errors (red curves in Figs. 10 and 11), we see that they are first decreasing as q increases, but then begin to increase as the DCFS models overfit the data.  We also compare the DCFS predictor with the fuzzy model trained by the BP algorithm [60], [61]. For models with similar complexity (roughly the same number of free parameters), the training and testing errors of the DCFS predictor are about 20% less than that of the BP trained fuzzy model, with a training speed at least ten times faster. Specifically, for q = 20 (roughly q m L l=1 n l = 20 3 25 = 200 000 free parameters) and the 3000 chaotic time-series data in Fig. 9, the training (first 2000 points) and testing (last 1000 points) errors of the DCFS predictor and the BP trained fuzzy model are plotted in Fig. 12, where the mean square training and testing errors of the DCFS predictor are 0.0051 and 0.0063, respectively, and those of the BP trained fuzzy model are 0.0061 and 0.0077, respectively.
Example 2: The same as Example 1 except that the index generated by the Mackey-Glass chaotic time-series plus noise (see Fig. 9) is replaced by the real HSI (see Fig. 13). Specifically, considering the HSI daily closing from 2010/7/6 to 2018/8/30 (2000 data points) plotted in Fig. 13, we use the first 1500 points (from 2010/7/6 to 2016/8/4) as the training data and the last 500 points (from 2016/8/5 to 2018/8/30) as the testing data. Fig. 14 plots the training (blue) and testing (red) errors of Training Algorithm 1 (for general DCFS) as a function of q. Comparing Fig. 14 with Fig. 10, we see that the real HSI is much more difficult to predict than the index generated by the the Mackey-Glass chaotic time-series.  Since we view y(t) of (27) as the daily closing prices of a stock index, the value of an Index Fund with y(t) at day t, denoted as ValueIndex(t), is updated daily according to where t = 1, 2, . . ., and we assume the initial investment ValueIndex(0) = 100. We now propose a trading strategy based on the DCFS predictionr(t) of (28).
Trading Strategy Based on the DCFS Prediction: At day t − 1 [the most recent return available is r(t − 1)], use the Training Algorithm 3 or 4 in Section III to design a DCFS of (28) to get the predictionr(t) of the return at day t. Ifr(t) > 0, meaning that we predict the index y(t) will go up in day t, then long (buy) the index at day t − 1; ifr(t) < 0, meaning that we predict the index y(t) will go down in day t, then short (sell) the index at day t − 1.
The meaning of (31) and (32) is that: If sign(r(t)) = sign(r(t)) that means we made the right prediction at day t − 1, then our fund value will increase by |r(t)| no matter the index y(t) goes up or down at day t; on the other hand, if   sign(r(t)) = sign(r(t)) that means we made a wrong prediction at day t − 1, then our fund will decrease in value by |r(t)| no matter the index y(t) goes up or down at day t.
We now test this trading strategy for the chaotic plus random index of Fig. 9 and the real HSI of Fig. 13 in Examples 3 and 4, respectively.
Example 3: The same as in Example 1 except that Training Algorithm 3 or 4 is used to build the DCFS predictor (28). Figs. 15 and 16 plot the index fund value ValueIndex(t) of (30) and the DCFS fund value ValueDCFS(t) of (31) using Training Algorithms 3 and 4, respectively. We see from Figs. 15 and 16 that in both cases the DCFS fund performs much better than the index fund and the general DCFS with Training Algorithm 3 is better than the parameter sharing DCFS with Training Algorithm 4. Fig. 19. Two-level DCFS in Example 5, where the HSI returns and the returns of other four major stocks are used to construct the five fuzzy systems in the first level whose outputs are combined by the second-level fuzzy system to form the final prediction of the HSI return.
Example 4: The same as Example 3 except that the index generated by the Mackey-Glass chaotic time-series plus noise ( Fig. 9) is replaced by the real HSI (Fig. 13). Figs. 17 and 18 plot the index fund value ValueIndex(t) and the DCFS fund value ValueDCFS(t) using Training Algorithms 3 and 4, respectively. We see from Figs. 17 and 18 that the DCFS fund performs slightly better than the index fund in both the cases, and the general DCFS with Training Algorithm 3 is better than the parameter sharing DCFS with Training Algorithm 4.
To improve the performance of the DCFS fund, we notice that there may be other factors that influence the real HSI returns r(t), in addition to the past returns r(t − 1), r(t − 2), . . . , r(t − n) used in the DCFS model (28). For example, the returns of the major stocks at the current day t − 1 may have a stronger influence on the tomorrow's HSI return r(t) than the HSI returns r(t − n) long in the past (when n is large). Indeed, the long-past HSI returns r(t − n) may introduce more noise than useful information for the prediction of tomorrow's HSI return r(t). Hence, we add the returns of four major stocks in the Hong Kong stock market to the input space of the DCFS model and use only the five most recent returns as the inputs to the prediction model in the next example.

V. CONCLUSION
The DCFS models and the training algorithms proposed in this article have the following advantages.
1) It is fast: The data are used only once in the design of the fuzzy systems in the DCFS and no iterative training is needed. 2) It is highly interpretable: The fuzzy systems in different levels of the DCFS are weak estimators of the output variable that are constructed in a layer-by-layer, bottom-up fashion.
3) It is very flexible: The size of the moving window, the steps of each move, the number of fuzzy sets to cover the input variables, and the number of layers can be easily adjusted for better performance. 4) It is easy to correct mistakes: Because of the clear physical meaning of the parameters, it is easy to redesign the parameters with new data so that the same mistake will not happen again. 5) It may be implemented on simple devices: Due to simple computation and low memory requirement, the online training algorithms may be run on the user-end simple devices such as a mobile phone. 6) It supports online learning: Users can continuously update the DCFS models with their own new data on their own simple devices so that a wide variety of user-specific intelligent systems would be created. 7) It provides a natural structure for parallel computing: All the fuzzy systems in the same level can be trained in parallel, making the fast training algorithms even faster through parallel computing. 8) It is suitable for the high-dimensional problems.

ACKNOWLEDGMENT
The author would like to thank the reviewers for their very insightful comments that helped to improve the article.