Predicting Stock Market Trends Using Machine Learning and Deep Learning Algorithms Via Continuous and Binary Data; a Comparative Analysis

The nature of stock market movement has always been ambiguous for investors because of various influential factors. This study aims to significantly reduce the risk of trend prediction with machine learning and deep learning algorithms. Four stock market groups, namely diversified financials, petroleum, non-metallic minerals and basic metals from Tehran stock exchange, are chosen for experimental evaluations. This study compares nine machine learning models (Decision Tree, Random Forest, Adaptive Boosting (Adaboost), eXtreme Gradient Boosting (XGBoost), Support Vector Classifier (SVC), Naïve Bayes, K-Nearest Neighbors (KNN), Logistic Regression and Artificial Neural Network (ANN)) and two powerful deep learning methods (Recurrent Neural Network (RNN) and Long short-term memory (LSTM). Ten technical indicators from ten years of historical data are our input values, and two ways are supposed for employing them. Firstly, calculating the indicators by stock trading values as continuous data, and secondly converting indicators to binary data before using. Each prediction model is evaluated by three metrics based on the input ways. The evaluation results indicate that for the continuous data, RNN and LSTM outperform other prediction models with a considerable difference. Also, results show that in the binary data evaluation, those deep learning methods are the best; however, the difference becomes less because of the noticeable improvement of models’ performance in the second way.


I. INTRODUCTION
The task of stock prediction has always been a challenging problem for statistics experts and finance. The main reason behind this prediction is buying stocks that are likely to increase in price and then selling stocks that are probably to fall. Generally, there are two ways for stock market prediction. Fundamental analysis is one of them and relies on a company's technique and fundamental information like mar-The associate editor coordinating the review of this manuscript and approving it for publication was Zhe Xiao . ket position, expenses and annual growth rates. The second one is the technical analysis method, which concentrates on previous stock prices and values. This analysis uses historical charts and patterns to predict future prices [1], [2].
Stock markets were normally predicted by financial experts in the past time. However, data scientists have started solving prediction problems with the progress of learning techniques. Also, computer scientists have begun using machine learning methods to improve the performance of prediction models and enhance the accuracy of predictions. Employing deep learning was the next phase in improving VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ prediction models with better performance [3], [4]. Stock market prediction is full of challenges, and data scientists usually confront some problems when they try to develop a predictive model. Complexity and nonlinearity are two main challenges caused by the instability of stock market and the correlation between investment psychology and market behavior [5]. It is clear that there are always unpredictable factors such as the public image of companies or political situation of countries, which affect stock markets trend. Therefore, if the data gained from stock values are efficiently preprocessed and suitable algorithms are employed, the trend of stock values and index can be predicted. In stock market prediction systems, machine learning and deep learning approaches can help investors and traders through their decisions. These methods intend to automatically recognize and learn patterns among big amounts of information. The algorithms can be effectively self-learning, and can tackle the predicting task of price fluctuations in order to improve trading strategies [6].
Since recent years, many methods have been improved to predict stock market trends. The implementation of a model combination with Genetic Algorithms (GA), Artificial Neural Networks and Hidden Markov Model (HMM) was proposed by Hassan et al. [7]; the purpose was transforming the daily stock values to independent groups of prices as inputs to HMM. The predictability of financial trend with SVM model by evaluating the weekly trend of NIKKEI 225 index was investigated by Huang et al. [8]. A comparison between SVM, Linear Discriminant method, Elman Backpropagation Neural Networks and Quadratic Discriminant method was their goal. The results indicated that SVM was the best classifier method. New financial prediction algorithm based on SVM ensemble was proposed by Sun et al. [9]. The method for choosing SVM ensemble' s base classifiers was proposed by deeming both diversity analysis and individual prediction. Final results showed that SVM ensemble was importantly better than individual SVM for classification. Ten data mining methods were employed by Ou et al. [10] to predict value trends of Hang index from Hong Kong market. The methods involved Tree based classification, K-nearest neighbor, Bayesian classification, SVM and neural network. Results indicated that the SVM outperformed other predictive models. The value fluctuations by a developed Legendre neural network was forecasted by Liu et al. [11] by assuming investors' positions and their decisions by analyzing the prior data on the values. Indeed, they examined a random function (time strength) in the prediction model. Araújo et al. [12] proposed the morphological rank linear forecasting approach to compare its results with time-delay added evolutionary forecasting approach and multilayer perceptron networks.
From the above research background, it is clear that each of the algorithms can effectively solve stock prediction problems. However, it is vital to notice that there are specific limitations for each of them. The prediction results not only are affected by the representation of the input data but also depend on the prediction method. Moreover, using only prominent features and identifying them as input data instead of all features can noticeably develop the accuracy of the prediction models.
Employing tree-based ensemble methods and deep learning algorithms for predicting the stock and stock market trend is a new area of research activities. In light of employing bagging and majority vote methods, Tsai et al. [13] used two different kinds of ensemble classifiers, such as heterogeneous and homogeneous methods. They also consider macroeconomic features and financial ratios from Taiwan stock market to examine the performance of models. The results demonstrated that with respect to the investment returns and prediction accuracy, ensemble classifiers were superior to single classifiers. Ballings et al. [14] compared the performance of AdaBoost, Random Forest and kernel factory versus single models involving SVM, KNN, Logistic Regression and ANN. They predict European company's prices for oneyear ahead. The final results showed that Random Forest outperformed among all models. Basak et al. [15] employed XGBoost and Random Forest methods for the classification problem to forecast the stock increase or decrease based on previous values. Results showed that the prediction performances have advanced for several companies in comparison with the existing ones. For examining macroeconomic indicators to accurately predict stock market for one-month ahead, Weng et al. [16] improved four ensemble models, boosting regressor, bagging regressor, neural network ensemble regressor and random forest regressor. Indeed, another aim was employing a hybrid way of LSTM to prove that the macroeconomic features are the most successful predictors for stock market.
Moving on using deep learning algorithms, a deep neural network algorithm with the transaction records and public market data was investigated by Long et al. [17] to assess stock price trends. Their final results indicated that bidirectional LSTM could forecast the future of market for investors, and the technique attained the greatest performance. The employment of RNN and CNN algorithms was examined by Rekha et al. [18] to compare the accuracy of those with real values from stock markets. LSTM with an automatic encoder and LSTM with an embedded layer were utilized by Pang et al. [19] to acquire better stock market estimations. The result of experimental works indicated that LSTM with an embedded layer outperformed for the Shanghai composite index with 57.2% accuracy. The deep convolutional LSTM algorithm was employed by Kelotra and Pandey [20] to efficiently calculate stock market movements. They used a model with Rider-based monarch butterfly optimization method and gained the RMSE and MSE of 2.6923 and 7.2487. A forecasting LSTM model and an overfitting prevention LSTM module were suggested by Baek and Kim [21] to predict stock market. They showed that using the overfitting prevention module make results more accurate. Using a hybrid method of LSTM and GA was presented by Chung and Shin [22] to develop a new stock market prediction method. Their results indicated that the method outperformed the benchmark model.
Overall, regarding the above literature, prior studies often concentrated on macroeconomic or technical features with recent machine learning methods to detect stock index or values movement without considering appropriate preprocessing methods.
Tehran's stock market has been greatly popular lately due to the remarkable growth of the main index in the last decade. The important reason behind that is privatizing most of the state-owned in the Iranian constitution firms under the general policies of article 44. The shares of lately privatized firms can be bought by ordinary people under particular conditions. The market has some special features compared to other country's stock markets; for example, dealing price limitation that is ±5% of opening price for every index in each trading day. This matter hampers scatter market shocks and irregular market fluctuations, political matters, etc. over a particular time and could form the market smoother. However, the effect of fundamental parameters on the market is considerable and the prediction task of future movements is not easy [23].
This study employed stock market groups (that are important for traders) to investigate the task of predicting future trends. In spite of remarkable progress in Tehran stock market in the recent decade, there has been not adequate papers on the stock price predictions and trends via novel machine learning algorithms. However, a paper has been published recently by Nabipour et al. [23] where they employed tree_based models and deep learning algorithms to estimate future stock prices from 1 day ahead to 30 days ahead as a regression problem. The experimental results indicated that LSTM (as the superior model) could successfully predict values (from Tehran Stock Exchange) with the lowest error.
In this research, we concentrate on comparing prediction performance of nine machine learning models (Decision Tree, Random Forest, Adaboost, XGBoost, SVC, Naïve Bayes, KNN, Logistic Regression and ANN) and two deep learning methods (RNN and LSTM) to predict stock market movement. Ten technical indicators are utilized as inputs to our models. Our study includes two different approaches for inputs, continuous data and binary data, to investigate the effect of preprocessing; the former uses stock trading data (open, close, high and low values) while the latter employs preprocessing step to convert continuous data to binary one. Each technical indicator has its specific possibility of up or down movement based on market inherent properties. The performance of the mentioned models is compared for the both approaches with three classification metrics, and the best tuning parameter for each model (except Naïve Bayes and Logistic Regression) is reported. All experimental tests are done with ten years of historical data of four stock market groups (petroleum, diversified financials, basic metals and non-metallic minerals), that are totally crucial for investors, from Tehran stock exchange. We believe that this study is a new research paper that incorporates multiple machine learning and deep learning methods to improve the prediction task of stock groups' trend and movement.
This paragraph is organized to show the structure of our paper. Section 2 defines our research data with some statistical data, and two approaches supposed for input values. Eleven prediction models, including nine machine learning and two deep learning algorithms, are introduced and discussed in Section 3. The final results of prediction are presented in Section 4 with analyzing, and Section 5 concludes our paper.

II. RESEARCH DATA
In this study, ten years of historical data of four stock market groups (petroleum, diversified financials, basic metals and non-metallic minerals) from November 2009 to November 2019 is employed, and all data is gained from www.tsetmc.com website. Figures 1-4 show the number of increase or decrease cases for each group during ten years.
In the case of predicting stock market movement, there are several technical indicators and each of them has a specific ability to predict future trends of market; however, we choose ten technical indicators in this paper based on previous studies [24]- [26].     This paper involves two approaches for input information. continuous data is supposed to be based on actual time series, and binary data is presented with a preprocessing step to convert continuous data to binary one with respect to each indicator nature.

A. CONTINUOUS DATA
In this method, input values to prediction models are computed from formulas in Table 10 for each technical indicator. The indicators are normalized in the range of (0, +1) before using to prevent overwhelming smaller values by larger ones. Figure 5 shows the process of stock trend prediction with continuous data.

B. BINARY DATA
In this approach, a new step is added to convert continuous values of indicators to binary data based on each indicator's nature and property. Figure 6 indicates the process of stock trend prediction with binary data. Here, binary data is intro-  Details about the way of calculating indicators are presented here [26]- [28]: The average of values in a particular range computes SMA indicator, and this helps investors to decide if a price will go further in the same trend. WMA indicator provides us a weighted average of the last n prices, where the weighting falls with every previous price.
• SMA and WMA: if current value is below the moving average then the trend is −1, otherwise it is +1. MOM indicator computes the speed of the fall or rise in values and it is a handy indicator of strength (or weakness) in estimating prices.
• MOM: if the value of MOM is positive then the trend is +1, otherwise it is −1. STCK indicator is a momentum to liken a specific closing price of a stock to its range over a certain period. The oscillator's sensitivity to market movements is decreased by adjusting that period or by a moving average of values. The relative position of the closing values is calculated by STCD compared to the amplitude of price oscillations in a particular period. This is assumed that as prices grow, the closing price tends towards the prices which fit to the higher part of the area of price movements in the previous period and when prices drop, the reverse is true. LWR indicator is a kind of momentum indicator that estimates overbought and oversold levels. Occasionally LWR indicator is employed to discover entry and exit times for investors. MACD indicator demonstrates the association between two moving averages of a stock's price as another kind of momentum indicator. Investors regularly use this to buy the stock after the MACD indicator goes beyond its signal line and sell the shares in the opposite situation. ADO indicators is typically employed to observe the flow of money into or out of stock. Investors ordinarily use ADO line to find buying or selling time of stock or confirm the strength of a movement.
• STCK, STCD, LWR, MACD and ADO: if the current value (time t) is more than the previous price (time t-1) then the trend is +1, otherwise it is −1. RSI indicator is another momentum that assesses the magnitude of recent price changes to evaluate oversold or overbought situations for stock values. RSI indicator is indicated as an oscillator (a line graph that moves between two extremes) and changes between 0 to 100.

III. PREDICTION MODELS
In this study, we use nine machine learning methods (Decision Tree, Random Forest, Adaboost, XGBoost, SVC, Naïve Bayes, KNN, Logistic Regression and ANN) and two deep learning algorithms (RNN and LSTM).

A. DECISION TREE
Decision Tree is a common supervised learning approach employed for both regression and classification problems. The goal of technique is forecasting a target by using easy decision rules shaped from the dataset and related features. Being easy to interpret or able to solve problems with different outputs are two advantages of using this model; on the contrary, constructing over-complex trees that cause overfitting is a typical disadvantage. A schematic illustration of Decision Tree is indicated in Figure 7.

B. RANDOM FOREST
Great number of decision trees make a random forest model. The model basically averages the forecast result of trees, which is named a forest. Also, the algorithm includes three random ideas, selecting training data randomly when forming  trees, randomly choosing some subsets of variables when dividing nodes and deeming only a subset of all variables for splitting every node in each basic decision tree. Every basic tree learns from a random sample of the dataset during the training process of a random forest. A schematic illustration of the model is shown in Figure 8.

C. ADABOOST
The process of converting some weak learners to a powerful one is named Boosting method. AdaBoost is a specific type of Boosting that is an ensemble model to progress the predictions of every learning technique. The goal of boosting is to train weak learners sequentially for adjusting their previous predictions. This model is a meta-predictor which starts by fitting a model on the basic dataset before fitting additional copies of it on the same dataset. During the process of training, samples' weights are modified based on the current forecasting error; therefore, the consequent model focuses on tough items.

D. XGBoost
XGBoost is a recent ensemble model based on decision trees. This employs the rules of Boosting for weak learners similarly. XGBoost was presented for better performance and VOLUME 8, 2020 speed compared to other tree-based models. Regularization for preventing overfitting, In-built cross-validation capability, proficient handling of missing data, catch awareness, parallelized tree building and tree pruning are significant benefits of XGBoost method.

E. SVC
Support Vector Machines (SVMs) are a set of supervised learning approaches that can be employed for classification and regression problems. The classifier version is named SVC. The method's purpose is finding a decision boundary between two classes with vectors. The boundary must be far from any point in the dataset, and support vectors are the sign of observation coordinates with a gap named margin. SVM is a boundary that best separates two classes with employing a line or hyperplane. The decision boundary is defined in Equation 1 where SVMs can map input vectors x i R d into a high dimensional feature space (x i ) H, and is mapped by a kernel function K(x i , x j ). Figure 9 shows the schematic illustration of SVM method.
SVMs can perform a linear or non-linear classification efficiently, but for non-linear, they must use a kernel trick which map inputs to high-dimensional feature spaces. SVMs convert non-separable classes to separable ones by kernel functions such as linear, non-linear, sigmoid, radial basis function (RBF) and polynomial. The formula of kernel functions is shown in Equations 2-4 where γ is the constant of radial basis function and d is the degree of polynomial function. Indeed, there are two adjustable parameters in the sigmoid function, the slope α and the intercepted constant c.
SVMs are often effective in high dimensional spaces and cases where the number of dimensions is greater than the number of samples, but to avoid over-fitting in selecting regularization term and kernel functions, the number of features should be much greater than the number of samples.

F. NAÏVE BAYES
Naïve Bayes classifier is a member of probabilistic classifiers based on Bayes' theorem with strong independence assumptions between the features given the value of the class variable. This method is a set of supervised learning algorithms.
The following relationship is stated in Equation 5 by Bayes' theorem where y is class variable, and x 1 through x n are dependent feature vectors.
Naive Bayes classifier can be highly fast in comparison with more sophisticated algorithms. The separation of the class distributions means that each one can be independently evaluated as a one-dimensional distribution. This in turn helps for alleviating problems from the dimensionality curse.

G. KNN
Two properties usually are suggested for KNN, lazy learning and non-parametric algorithm, because there is not any assumption for underlying data distribution by KNN. The method follows some steps to find targets: Dividing dataset into training and test data, selecting the value of K, determining which distance function should be used, choosing a sample from test data (as a new sample) and computing the distance to its n training samples, sorting distances gained and taking k-nearest data samples, and finally, assigning the test class to the sample on the majority vote of its k neighbors. Figure 10 shows the schematic illustration of KNN method.

H. LOGISTIC REGRESSION
Logistic regression is used to assign observations to a separated set of classes as a classifier. The algorithm transforms its output to return a probability value with the logistic sigmoid function, and predicts the target by the concept of probability. Logistic Regression is similar to Linear Regression model, but the Logistic Regression employs sigmoid function, instead of logistic one, with more complexity. The hypothesis behind logistic regression tries to limit the cost function between 0 and 1.

I. ANN
ANNs are a prominent subset of machine learning algorithms that are usually single or multi-layer nets which fully connected together. Figure 11 is shown as an example of ANN with an output and input layer and also two hidden layers. Each node (in a layer) is connected to all other nodes (in the next layer). By the rise in the number of hidden layers, it is able to form the network deeper.  An illustration of relationship between inputs and output for ANN [23]. Figure 12 indicates the relationship between inputs and output for ANNs. A node gets the weighted sum of the input values then add the result to a bias. A non-linear function commonly is used to calculate an outcome that is the output of the node, which makes another input for the next layer. The process of a network goes from the input layer to the output layer, and the final output is computed by implementing this method for all nodes in the network. The learning procedure of weights and biases is linked with all nodes to train the neural network. Equation 6 demonstrates the relationship between weights, biases and nodes. The weighted sum of inputs passed through a non-linear activation function from a layer to another one. It can be supposed as a vector, where n is the number of inputs for the final node, f is activation function, X 1 , X 2 . . . and Xn are inputs, w 1 , w 2 , . . . and w n are weights and z is the final output.
The training process is finalized with some rules by computing weights and biases. Randomly initializing the weights and biases for each node, implementing a forward pass by the current weights and biases, computing each node output, comparing the final output with the real target, and adjusting the weights and biases consequently by gradient descent with the backpropagation technique.

J. RNN
A very significant version of neural networks is recognized as RNN, which is widely employed in different problems. In a typical neural network, the input passes through some layers, and output is created. It is proposed that two consecutive inputs are totally independent; however, the condition is not true in all processes. For instance, to forecast stock market at a certain period, it is vital to observe the prior samples. RNN is named recurrent due to it does the same task for each item of a sequence when the output is related to the previous computed values. As another important point, RNN has a specific memory, which stores previous computed information for a long time. In theory, RNN can use information randomly for long sequences, but in real practices, there is a limitation to look back just a few steps. Figure 13 shows the architecture of RNN. FIGURE 13. An illustration of recurrent network [23]. VOLUME 8, 2020 K. LSTM LSTM is a particular type of RNN with an extensive range of uses such as document classification, time series analysis, voice and speech recognition. Opposite to feedforward networks, the predictions (created by RNNs) are dependent on prior estimations. In experimental works, RNNs are not applied broadly due to include a few lacks that result in impractical estimations.
Without investigation of too much detail, LSTM solves the problems by employing assigned gates for forgetting old information and learning new ones. LSTM layer is made of four neural network layers that interact in a specific method. A usual LSTM unit involves three different parts, a cell, an output gate and a forget gate. The main task of cell is recognizing values over random time intervals and the task of controlling the information flow into the cell and out of it belongs to the gates.

L. MODELS' PARAMETERS
Since stock market data are time-series information, there are two approaches for training dataset of prediction models. Because of the recurrent nature of RNN and LSTM models, the technical indicators of one or more days (up to 30 days) are considered and rearranged as input data to be fed into the models. For other models except RNN and LSTM, ten technical indicators are fed to the model. Output of all models is the stock trend value with respect to input data. For recurrent models, output is the stock trend value of the last day of the training sample.
All models (except Naïve Bayes) have one or several parameters known as hyper-parameters which should be adjusted to obtain optimal results. In this paper, one or two parameters of every model (except Decision Tree and Logistic Regression which fixed parameter(s) is used) is selected to be adjusted for an optimal result based on numerous experimental works. In Tables 1-3, all fixed and variable parameters of tree-based models, traditional supervised models, and neural-network-based models are presented, respectively.

IV. EXPERIMENTAL RESULTS
Among classification metrics, Accuracy is a good metric, but it is not enough for all classification problems. It is often necessary to look at some other metrics to make sure that a model is reliable. F1-Score might be a better metric to employ if results need to achieve a balance between Recall and Precision, especially when there is an uneven class distribution. ROC-AUC is another powerful metric for classification problems, and is calculated based on the area under ROC-AUC curve from prediction scores.

B. RESULTS
For training machine learning models, we implement the following steps: normalizing features (just for continuous data), randomly splitting the main dataset into train data and test data (30% of dataset was assigned to the test part), fitting the models and evaluating them by validation data (and ''early stopping'') to prevent overfitting, and using metrics for final evaluation with test data. The creating deep models is different from machine learning when the input values must be three dimensional (samples, time_steps, features); so, we use a function to reshape the input values. Also, weight regularization and dropout layer are employed to prevent overfitting here. All coding process in this study is implemented by python3 with Scikit Learn and Kears library. Based on extensive experimental works by deeming the approaches, the following outcomes are obtained: In the first approach, continuous data for the features is used, and Tables 4-6 show the result of this method. For each model, the prediction performance is evaluated by the three metrics. Also, the best tuning parameter for all models (except Naïve Bayes and Logistic Regression) is reported. For achieving a better image of experimental works, Figure 14 is made to indicate the average of F1-score based on average running time through the stock market groups. It can be seen that Naive-Bayes and Decision Tree are least accurate  (approximately 68%) while RNN and LSTM are top predictors (roughly 86%) with a considerable difference compared to other models. Indeed, the running time of those superiors is more than other algorithms.  In the second approach, binary data for the features is employed, and Tables 7-9 demonstrate the result of this way. The structure and experimental works here are similar to the first approach except inputs where we use an extra layer to convert continuous data to binary one based on the nature and property of the features. Similarly, for better understanding, Figure 15 is made to show the average of F1-score based on average running time through the stock market groups. It is TABLE 7. Tree-based models with best parameters for binary data. clear that there is a significant improvement in the prediction performance of all models in comparison with the first approach, and this achievement is obviously shown in Figure 16. There is no change in the inferior methods (Naive-Bayes and Decision Tree with roughly 85% F1-score) and the superior predictors (RNN and LSTM with approximately 90% F1-score), but the difference between them becomes less by binary data. Also, the prediction process for all models is faster in the second approach.
As a prominent result, deep learning methods (RNN and LSTM) show a technical skill to forecast stock movement in both approaches, especially for continuous data when the performance of machine learning models is so weaker than binary method. However, the running time of those is always more than others due to use large amount of epochs and prices from some days before.  Overall, it is obvious that the whole of algorithms predict well as they are trained with continuous values (up to 67%), but the models' performance is remarkably improved when they are trained with binary data (up to 83%). The result behind this improvement is interpreted as follows: an extra layer is employed in the second approach, and the duty of the layer is comparing each current continuous value (at time t) with previous value (at time t-1). So the future up or down trend is identified and when binary data is given as the input values to the predictors, we enter data with a recognized trend based on each feature's property. This critical layer is able to convert non-stationary values in the first approach to trend deterministic values in the second one, and algorithms must find the correlation between input trends and output movement as an easier prediction task.
Despite careful tries to find valuable researches on the similar stock market, no important paper could be reported and compared; therefore, this deficiency is presented as a novelty of this study, which will be a baseline for future articles.

V. CONCLUSIONS
The purpose of this study was the prediction task of stock market movement by machine learning and deep learning algorithms. Four stock market groups, namely diversified financials, petroleum, non-metallic minerals and basic metals, from Tehran stock exchange were chosen, and the dataset was based on ten years of historical records with ten technical features. Also, nine machine learning models (Decision Tree, Random Forest, Adaboost, XGBoost, SVC, Naïve Bayes, KNN, Logistic Regression and ANN) and two deep learning methods (RNN and LSTM) were employed as predictors. We supposed two approaches for input values to models, continuous data and binary data, and we employed  three classification metrics for evaluations. Our experimental works showed that there was a significant improvement in the performance of models when they use binary data instead of continuous one. Indeed, deep learning algorithms (RNN and LSTM) were our superior models in both approaches. VOLUME 8, 2020