Stochastic Neural Networks for Cryptocurrency Price Prediction

Over the past few years, with the advent of blockchain technology, there has been a massive increase in the usage of Cryptocurrencies. However, Cryptocurrencies are not seen as an investment opportunity due to the market’s erratic behavior and high price volatility. Most of the solutions reported in the literature for price forecasting of Cryptocurrencies may not be applicable for real-time price prediction due to their deterministic nature. Motivated by the aforementioned issues, we propose a stochastic neural network model for Cryptocurrency price prediction. The proposed approach is based on the random walk theory, which is widely used in financial markets for modeling stock prices. The proposed model induces layer-wise randomness into the observed feature activations of neural networks to simulate market volatility. Moreover, a technique to learn the pattern of the reaction of the market is also included in the prediction model. We trained the Multi-Layer Perceptron (MLP) and Long Short-Term Memory (LSTM) models for Bitcoin, Ethereum, and Litecoin. The results show that the proposed model is superior in comparison to the deterministic models.


I. INTRODUCTION
In a continuously evolving technological landscape, there has been a paradigm shift in the mode of transactions from physical payments like cash and cheques to digital transactions. One important aspect of using currency, either as a medium of transaction or as an asset, is to predict its expected value. To a great extent, the value and stability of any currency depends on the controlling authority, which in the case of fiat currencies is the Government of the country. Detrimental Governmental interference in the financial system can lead to unforeseen consequences of devastating scales, as seen in Venezuela [1]. But, in the case of digital currencies, the value is determined by the consistency and security of the platform that the currency is deployed on. Conventional digital cash is prone to the flaw of double-spending. Digital currencies in cyberspace are exposed to security attacks which may lead to The associate editor coordinating the review of this manuscript and approving it for publication was Hong-Ning Dai . transaction data manipulation. With an increasing number of such flaws, traditional currencies fall prey to instability and devaluation [2].
A plausible solution to the aforementioned issues is the usage of blockchain-based Cryptocurrencies. Blockchain is an emerging technology, which stores the information in an immutable way across a network to provide security, decentralization, and transparency, which is precisely what is needed for an effective currency [3]- [5]. Cryptocurrencies, unlike conventional money, use cryptographical ciphers to conduct financial transactions. Over the past decade, digital finance has grown exponentially, with Cryptocurrencies at the helm of this innovative stride forward [6]. The market capitalization of Cryptocurrencies is calculated to $266 billion and is projected to have a growth of 11.9% by 2024 as per the CAGR reports [7]. The essential feature of a Cryptocurrency is that it cannot be controlled by a central authority due to its decentralized nature inherited from blockchain and thus it restricts corruption. Due to this, Cryptocurrencies are naturally robust  towards corruption induced devaluations, which may occur in fiat currencies. Cryptocurrencies avert the problem of doublespending through multiple verifications from the neighboring nodes in the blockchain network. As the number of confirmation increases, the transaction becomes more and more reliable and irreversible. Transactional records in the ledger of blockchain are immutable as a record is virtually impossible to alter in all network nodes. Thus, after a successful transaction, the record can not be tampered.
As a consequence of the aforementioned advantageous characteristics and global access to Cryptocurrencies, they can be used as a medium of transaction, as well as a store of wealth [8], [9]. However, the value of Cryptocurrencies still heavily relies on erratic market trends and social sentiments. Also, Cryptocurrencies have a low correlation with major financial assets [10], [11], thus traditional methods, that have been used in finance are rendered ineffective. Table 1 shows the Pearson correlation of Bitcoin with other major financial assets. It is evident from Fig. 1 that the price Bitcoin has no correlation with the price of Gold.
Having vast amounts of openly available data on the Cryptocurrencies market and social trends information, machine learning algorithms can be used to forecast the prices with Cryptocurrencies [12]. These algorithms are a set of methods for learning mathematical models from data without explicitly programming the computer to do a specific task. But, with an increase in the complexity of the data for the Cryptocurrency market, there is a need of different models, which can capture more complex representations of data. Deep learning models [13] specifically recurrent neural networks can be used to solve the time-series problem of predicting the prices of Cryptocurrencies. Numerous research has been explored by various authors in the last to predict the value of equity and securities using machine learning and deep learning algorithms [14], [15]. However, comparatively fewer research work has been carried out on forecasting the price of Cryptocurrencies.

A. MOTIVATION
Cryptocurrencies are primarily used as a means of money exchange. But, in the past few years, we have seen that trading with Cryptocurrencies has been an attractive investment opportunity [16]. It is the prevalent opinion of stock market professionals and other investors that the Cryptocurrency market is the most uncertain place for investment due to its volatility and heavy reliance on social sentiments. Along with this, effectively predicting the price of various cryptocurrencies will allow us to predict the total compute power of blockchain [17]. The value of Cryptocurrencies is primarily affected by a large number of factors like social sentiment, legislature, past price trends, and trade volumes. A significant amount of research work to anticipate Cryptocurrency prices using machine learning techniques has already been explored in the last few years [18], [19]. Motivated from the aforementioned discussion, in this paper, we provide a prediction model, which is used to predict the price of different cryptocurrencies using deep learning models. We address the problem of erratic fluctuations in the prices of Cryptocurrencies by inducing stochastic behaviour in deep neural networks to simulate market volatility.

B. CONTRIBUTION
Following are the research contributions of this paper.
• A technique to predict the prices of prevalent Cryptocurrencies, i.e., Bitcoin, Ethereum, and Litecoin is designed using a stochastic neural network process.
• A mathematical formulation of stochastic layers in deep neural networks is done that characterizes the erratic behavior of a financial system.
• We determine the pattern of the market's reaction to any updated information regarding the Cryptocurrency and exhibits improvement over existing models in predicting the prices of Cryptocurrencies.

C. ORGANIZATION
Rest of the paper is organized as follows. Section II discusses the previous work that has been done to forecast Cryptocurrency prices. Section III explains the concept of stochasticity and its application in the financial market. In Section IV, we present a mathematical formulation of stochasticity in the context of neural networks. Section V discusses the experimental details of models for predicting Cryptocurrency prices and finally, Section VI concludes the paper.

II. RELATED WORK
A substantial amount of research has been carried in the prediction of stability and prices of equity and other market VOLUME 8, 2020 assets over the past decades. However, due to its newfangledness, there is unsubstantial research done on the value prediction of Cryptocurrencies. Nonetheless, there is an increasing trend in the research effort done to anticipate the prices of Cryptocurrencies. In this section, we present pivotal milestones in the field. Here, we acquaint the reader with a diversified set of machine learning approaches [20] that have been used to predict price trends of various currencies.

A. REGRESSION
The fundamental task in modeling a Cryptocurrency is to predict the price given to the priors. A simplistic approach to forecast the price over a continuous space is regression.
Regression is a type of statistical method to determine the relationship between a dependant variable and one or more independent variables. This relationship is represented as a sum of products of independent variables with some relational constant weight. In the context of price prediction, market indicators and social sentiments can be used as the independent variables and price as the dependent variable.
Saad et al. [21], [22] used a multivariate regression model trained using gradient descent over mean square error. Features such as price, mining difficulty, hash rate, user count, etc were used regress and obtain the predicted price. They achieved a mean absolute error of 0.0162 and 0.0563 over bitcoin and Ethereum, respectively when testing over half of the dataset. Mittal et al. [23] extended usage of regression to social sentiments. They exhibited a positive correlation between the price fluctuations of Bitcoin and social sentiment. They showed that there is a significant correlation between Google trends, tweet sentiment, and tweet volume. Linear regression and polynomial regression were used to predict the prices. They evaluated the models by calculating the frequency of correct predictions within the bounds of margin accuracy. An accuracy of 77.01% and 66.66% was observed incorrectly predicting the trend of the price using tweet volume and google trends respectively. An obvious issue with regression models is that they are unable to learn non-linear and multi-leveled dependencies among the features.

B. MULTILAYER PERCEPTRON
Following the Moore's Law, faster and more efficient computing power has been harnessed to train Artificial Neural Networks (ANN). Neural nets can effectively learn and represent linear and non-linear dependencies between key variables and the output variable. They consist of input layers that are further connected to a hierarchy of hidden layers, which in turn pass the learned information to the output layer. Each edge connecting the neurons in different layers comprise of weights and bias representing the relation between the connected neurons. Activation functions are applied after linear matrix computation is carried out. These functions are responsible for introducing non-linearities into the network, which can essentially help in price forecasting [24] as required in our case. Commonly used non-linear activation functions are sigmoid and tanh functions. The architecture of the network determines the kind of dependencies the neural network learns. Neural nets are trained using a learning algorithm known as backpropagation [25]. The training process begins by initiating network weights and biases randomly, and then iteratively calculating error and propagating error signal as gradients of weights. Weights are updated iteratively till a feasible set of weight values is obtained. Fig. 2 illustrates an exemplification of MLP in case of Cryptocurrency price prediction.
The approach of an ensemble of neural networks was incorporated in the model proposed by Sin et al. [26], to predict the upward or downward trend using bitcoin market data of the 50 consecutive days. Each network module in the ensemble was a multi-layered perceptron three layers deep, taking a total of 190 features. They used the Lavenberg-Marquardt algorithm for training the MLPs. Genetic Algorithm based Selective Ensemble (GASEN) was employed to select the five best-performing perceptrons. They achieved an accuracy of 64% in classifying whether an upward trend or downward trend is to be expected. Their model did not predict the price, rather just a green or red signal. Their work was limited to only one Cryptocurrency, Bitcoin.
Using Bayesian theory, Jang et al. [27] explained the Bitcoin's high price volatility. They proposed a multi-layer perceptron that maximizes the value of posterior, instead of maximizing likelihood like traditional neural architectures. Their model was trained using the rollover framework, wherein an old price time-step value is discarded for every new price. In this way, they dispose of long term dependencies that may be irrelevant for anticipating newer prices. Using the rollover strategy means less computational cost while training as opposed to sequential recurrent architectures. They achieved a test Mean Absolute Percentage Error of 1% in predicting the log price of Bitcoin in the surge.

C. RECURRENT NEURAL NETWORKS
Forecasting prices of currencies is an inherently sequential task. To deal with time-dependent data, a new class of neural networks was introduced termed as Recurrent Neural Networks [28]. In this architecture, a directed graph along the sequence is formed to manage sequential data. Here, the data at the previous time step is used to feed as inputs to predict the values at the next time step. Typical RNNs use previous values as well as some inputs at that time step to predict the values at each time step.
Associating this with price prediction in Fig. 3, new market information, x is fed into the model at each timestep. The model picks up the necessary temporal dependencies, h utilizes them to predict the price, y t of the Cryptocurrency.

D. LONG SHORT-TERM MEMORY
Recurrent Neural Networks are unable to capture long term dependencies after some amount of sequence length and thus again a new powerful architecture was proposed by Hochreiter et al. [35] known as LSTM. In this network, a new memory cell state is added along with gating functionality that controls what information is to be discarded and what new information is to be added to provide long term dependencies. Here, the cell states are passed across the network and accordingly they are updated and modified as per the importance of the previous cell state data which is being carried since it became important for the future. Gates are used to modifying the cell state as well as process inputs and produce output.
In Eq. (4) and (5) the f gate, by observing the previous activations, h and the current market and social indicators, x outputs a value between 0 and 1 for each cell states, where 1 represents to keep this information for future purposes while the 0 discards the information [36]. The i gate conveys the amount of current information that will be relevant in the future.
The candidate for the cell update is the C * . Using f , i and C * , a suitable update is performed to the cell state that is passed throughout the network to capture long term dependencies.
Based on the cell state, o gate determines what part of the cell is going to be the output. This output is combined with the current cell state activated by the tanh layer to give the current activation h.
Mittal et al. [23] proposed an RNN and LSTM model that utilizes google trends and tweet volume along with market factors to anticipate the price of bitcoin. They showed that sequential models outperformed ARIMA (Autoregressive integrated moving average) [37], the standard model to analyze time-series data in traditional statistics and econometrics. The RNN model achieves an accuracy of 62.45% and 53.46% on trends and volume respectively. On the other hand, their LSTM model achieves an accuracy of 50% and 49.89% on trends and volume respectively. Smuts [29] used VADER [38], a sentiment analysis python library for social media text, to correlate telegram sentiment and the price of Bitcoin and Ethereum and predict the price of the currencies using an LSTM. They achieved an accuracy of 63% on Bitcoin data and 56% on Ethereum data.

E. OTHER METHODS
Apart from the previously introduced methods, regression trees, decision trees, support vector regression [39], and other algorithms have been used. For example, Laura et al. [30] designed two models, an ensemble of regression trees and a recurrent neural net. Two versions of the ensemble of regression trees were considered, the first was a single model to describe the price change for all currencies combined. The second method was to construct individual models for each currency. The regression tree models were built using the XGBoost algorithm. To exploit temporal dependencies, a LSTM was selected as their second model. They evaluate the models by calculating the return on investment (ROI) [40] and comparing the performance against a simple moving average.

III. PROBLEM FORMULATION AND STOCHASTIC PROCESSES
In this section, we describe the key obstacle restricting the effective price prediction, which is erratic fluc-tuations in Cryptocurrency prices. As randomness is at the core of the problem, we introduce stochastic processes and show how they are related to the Cryptocurrency market. At the end of the section, we show how the market can be modeled as a random walk.

A. THE PROBLEM OF ERRATIC FLUCTUATIONS
The value of market assets is determined by various factors that include supply and demand, the performance of the economy, growth rate, inflation, political factors, and human psychology. With all of these factors continuously changing, the aggregation of these factors generates erratic and irregular fluctuations in the prices of market assets. Cryptocurrencies, like other market assets, are prone to the problem of randomlike fluctuations in their prices. The financial market is not inherently stochastic, however, it is sufficiently complex to be incomprehensible to us and our systems. And thus, designing a deterministic model that takes into consideration, all the socio-economic factors is out of option. In this section, we will describe how to model market assets nondeterministically.

B. STOCHASTIC PROCESSES
All processes in nature can be classified as deterministic given all the information pertaining to them. However, most natural systems are too complicated to be modeled given limited information about them. Thus, stochastic processes come into the picture, wherein partial information of the system can be used to determine a possible outcome over the set of all possibilities in the probability space. Stochastic processes are sets of random variables that evolve over time in an arbitrary manner. Most natural systems can be considered as stochastic processes from our perspective. Examples of stochastic processes include the weather system, audio-video signals, and the financial market.
The value of a market asset, like a Cryptocurrency, is determined by intricate factors that are continuously evolving in a near-random direction at a seemingly random pace [41]. At the core, collective human behavior determines the supply and demand of an asset which in turn determines the current value of the said asset. Modeling, human behavior is an impossible task, and thus we consider the value of an asset to stochastically determined. Defining the behavior of the market as an erratic process gives us the advantage of using the well-defined mathematical field of stochasticity.
Before we introduce stochasticity into the market scene, we present the types of stochastic processes and how we can relate them to the Cryptocurrency market. The most rudimentary stochastic process is the Bernoulli process, where random variables hold binary value and the sequence produced by it is Identically and Independently Distributed (IID). The IID property, which states that the system of random variables is drawn from the same probability distribution and all the random variables are mutually independent, is at the essence of the market's stochasticity. Building on that, random walk is a stochastic process that sums up IID random variables and thus has the property of evolution in time. Burton Malkiel [42] proposed that market assets follow a random walk.
A random walk is defined as the path formed by a sequence of random IID steps, given the starting point.
where ξ i is an IID at time step i and y 0 is the starting point of the process. Alternatively, we can obtain a path uncovered by a random walk as a bootstrapping process, taking into account the most recent state of the system. Consider the second last state of a system y t−1 , Moving one step forward as defined by the process, we obtain the next state of system as shown below.
Eq. (13) is more convenient to use as it is more computationally efficient when t is extremely large. Hence, this notion is used in the implementation of the proposed approach.
Bringing this into context with the value of a market asset, like a Cryptocurrency, the value can be thought of as being produced by a random walk, where ξ t can be considered as an aggregation of all market factors that may have possibly affected the value of the asset [32]. However, this approach does not take into consideration information regarding essential market factors. We address this issue by introducing neural networks to take into account important market statistics and social sentiment [43], [44].

IV. PROPOSED APPROACH: STOCHASTIC NEURAL NETWORKS
In this section, we incorporate stochasticity into neural networks and formulate the mathematics for a layer-wise stochastic walk. Moreover, we introduce the algorithm for stochastic forward propagation in neural networks. We propose stochastic MLP and LSTM models to predict the prices of Cryptocurrencies.

A. STOCHASTICITY IN NEURAL NETWORKS
According to the efficient market hypothesis by Malkiel [45], all the past information regarding the market asset is reflected in the current value of the asset and the market will instantly acknowledge new information and react to it accordingly. Therefore, all the effort of predicting prices by analyzing information is futile. However, we can observe how the market reacts to information and develop a pattern that exhibits the behavior of the market when new information is widely available. This pattern has to be stochastic [46] so as to accommodate the multiplicity of all possible outcomes to the arrival of new knowledge. VOLUME 8, 2020 Before we introduce stochasticity into the picture, we need a way to distill market features and describe the interdependencies between market statistics and social sentiment. To do this, we use a neural network because a neural network is a universal function approximator that tries to map dependencies between variables. The final value of a market asset is determined by a hierarchy of features that roots from factors like supply and demand, economy and human behavior and a neural network is an excellent candidate to do just this [47].
There are two ways to inculcate randomness in a neural network, the first is to randomly change the weights by a small degree and the second way is by adding randomness to the activations at runtime. The first approach is not ideal because it would mean that feature detection will get noisy as the network evolves and may eventually forget dependencies. Intuitively, the second approach seems fitting because the randomness in activations can be interpreted as random changes in features, which in turn can be thought of as replicating the erratic behaviors of the market.
We propose a generalized formulation of the stochastic behavior of a layer in a deep neural network as follows, where h i is the activation values of the i th time step. We define γ as a perturbation factor that controls the amount of stochasticity. ξ is an operator that produces a vector of random variables of the same dimensions as the activation. reaction is a general function that determines how the current activations will react with respect to the activations of the previous time step. Finally, s i is the vector of values of the post-stochastic operation.
Let us break down each of the terms in the generalized equation of stochasticity in the layers of the neural network. γ is the perturbation factor that determines the amount of randomness to be infused in the activations. reaction is a function that determines the direction to move based on the current activation values and the previous post-stochastic operation values. If we define ξ to be an operator that produces a vector of IIDs as a probability, i.e 0 < X < 1, ∀X ∈ ξ , then we can interpret each neuron as having its own probability of absorbing randomness.
In determining the reaction function, we only include two parameters that are h t and s t−1 . This choice is more suitable and intuitive due to the Markov property exhibited by the financial markets. This implies that given the prior stochasticactivation s t−1 , the current stochastic-activation s t is independent of the other past activations. Thus, we model the reaction function as the difference between current activations and previous activations, showing the direction in which to move.
Therefore from Eq. (15) and Eq. (16), Eq. (16) can be thought of as a random-like walk that takes into account the pattern of reaction of the market in progressive time steps. In a continuously evolving market, it is of utmost importance that the direction of movement corresponds to the pattern that has been observed in the recent time steps as opposed to initial time steps. The pattern should be adapting to changes in market reaction. Here, we show how this formulation gives priority to recent activations over older activations. Eq. (16) for a system at time step, t can be written as follows.
Extending this till time, t = 0 where s 0 = 0, we obtain a general form of the Eq., which can be written as follows.
From the above Eq. (20), we can infer that the first term is given more attention than the second term. We observe that previous activations have exponentially decaying significance because 0 < γ , ξ < 1. Thus, the recent activations have higher priority as compared to the previous ones in determining the direction of the stochastic walk. Implementation of Eq. (16) in a neural network, during runtime is shown in Algorithm 1. Algorithm 1 describes the forecasting phase of the proposed model where the processed input data (X t ) is fed into the neural network at every timestep. A reaction vector is calculated at every layer from the previous timestep's stochastic activation (s t−1 ) and current timestep's feature activations (h t ). At each layer in the proposed model a random vector (r) is created and multiplied to a perturbation factor (γ ). Then, a hadamard product between the reaction vector and random vector is added to the current feature activation h t . The resultant vector is s t . The s t of the last layer of the neural network is the predicted price.

B. STOCHASTIC MODELS
At the heart of the stochastic neural network is the stochastic module, which is appended at the output of every layer in a neural network. After performing a matrix multiplication and activation of the layer, the output is passed on to the stochastic module. Along with current activations, the module also takes in the previous timestep's stochastic activations as input. Fig. 5 is a neural network integrated with stochastic modules.
The stochastic module as shown in Fig. 6 consists of 3 core components; the reaction submodule, random variable vector generator, and perturbation factor. The reaction submodule takes in current timestep's activations and previous timestep's stochastic activations as input. Here, the reaction submodule can be any function capable of capturing the market's pattern of reaction to new information. In our model, we used a sim-   ple subtraction operation. However, the reaction submodule could be a neural network itself.

C. PROPOSED SYSTEM MODEL
To predict the price of a Cryptocurrency, we take 3 essential data sources [31], the first being market statistics. Market statistics include the day low/high and volume. Secondly, we use blockchain network information that includes hash rate, transaction count, transaction fee, e.t.c. Finally, we use social sentiment information like google trends and tweet volume. All the data from the 3 sources are accumulated into a dataloader. The data features are mean-normalized and N-day data stacks are created. This data is sequentially forwarded to a stochastic neural network which predicts the (N+1) th day's price. Fig. 7 illustrates the design of the system model.

V. RESULTS AND DISCUSSION
In this section, we describe the features of the data we use to forecast the price of various Cryptocurrencies. Moreover, we explain the preprocessing task of the data. We present the evaluation metrics used to measure the performance of the models and the training process used. Finally, the results obtained by the proposed models are exhibited.

A. DATASET DESCRIPTION
A total of 23 features are used in the proposed models with the window side of 7. We trained the model with the previous data of the past 7 days to predict the price of the eighth day. We trained the model on the data ranging from mid of 2017 to the end of 2019. A total of 850 data points used in the proposed model to extract patterns from the data. We had used data available on bitinfocharts [48].

1) TRANSACTIONS
Number of transactions performed on a given particular day. Cryptocurrency, unlike the share markets, is not listed in any stock exchange. Here, there is no opening time or closing time because there is no regulatory body having power over it. We considered the number of the coins traded in a single day as one of our features.

2) MARKET VOLUME
This is the worth of market movement on a given particular day. It is necessary to know the total units of currency flowing through the market. The quantity of the coins flowing the market is an indicator of the value it possess.

3) MINING DIFFICULTY AND HASHRATE
Mining difficulty is the computational difficulty required to mine a single block of the coin. Mining and confirming transactions requires special hardware. The better the hardware, the more hashes it will hit to mine a block. Thus, there is a tradeoff between the hash rate and power consumption. So, we considered the mining difficulty and hash rate of coin and how much profit is it to mine it.

4) MINING PROFITABILITY
Profitable income to the miner against the use of resources for consuming power and time. With the increase in the number of miners, reward per miner is decreased exponentially. VOLUME 8, 2020

5) TRANSACTIONS FEE
It is the average transaction fee paid by the parties for the confirmation for their transactions. The fee is required in order to process the transaction in the network.

6) CONFIRMATION TIME
Average time required to confirm the transaction made by one party to another. Transaction made by one party to another party is to be confirmed by all the others and must be logged in the table of blocks. Logging the transaction requires time to confirm the transaction made known as confirmation time (usually around 10 minutes). That time depends upon the currently active users and their geographical location to update their block table.

7) MARKET CAPITALIZATION
Total amount of the Cryptocurrency present in the market on a given particular day in USD.

8) TWEETS AND GOOGLE TRENDS
It is observed that people tend to perform more transactions when tweet volume increases [23], [29]. These are not only correlations but also includes causation with a feedback loop [49]. That correlation does not end with tweet volume. People tend to search more about the trending topics to remain updated about it [50]. Spike in Google search also seemed to have an association with the price of the coins, which too is a trivial assumption [51].

9) HIGHEST AND LOWEST VALUE
The highest and the lowest value a coin had reached on a given particular day. The market for trading Cryptocurrencies remains open for the whole day. So, we considered the peak and the lowest value of the coin attained in a day.

B. DATA PREPROCESSING
We took the utmost care in choosing factors that may affect the price. Some factors that seemed redundant were either removed or removed in the later stage when the trained model was examined. The dataset was normalized in two different ways. The first way was to normalize the whole dataset by the respective feature means. This dataset is referred to as the Norm dataset. The latter one includes the meannormalization of all features except that of price. This is named as UNorm dataset. The reasoning for the same is that the magnitude of the future price is determined by the previous prices and the deviations from the previous price is captured by other features present in the dataset as described above. Accordingly, the Norm and UNorm datasets were further split into train and test. The training dataset is 75%, while the testing is 25% of the whole dataset. This was done for all the three Cryptocurrencies viz. Bitcoin, Ethereum, and Litecoin.

C. EVALUATION METRICS
The trained models are evaluated on the basis of the following different performance metrics. The Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Squared Error (MSE) are used to assess the proposed models. The formulas for the same are presented below [52].
where y t is the actual values andŷ t are the forecasted values.

D. RESULTS
Two prediction model classes were trained viz. Multilayer perceptron and LSTM were trained on 3 marketdominating Cryptocurrencies i.e. Bitcoin, Ethereum, and  Litecoin. We trained two model variations for each currencyarchitecture pair, with the difference in the choice of normalization of data. The first variation has all data features normalized by the feature mean of the train data. The second variation has all features but one, the price normalized. The intuition behind using leaving the price unnormalized is that we wanted the models to obtain the magnitude of the price of the currency directly from data instead of obtaining it from the model. In this paper, we have used MLP and LSTM models because their is a need of capturing the non-linear dependencies between the market factors, blockchain data and social sentiment and neural networks are most suitable ones for these tasks. MLP is the most basic type of neural network and the LSTM models are most widely used in the scenarios of time dependent data, like the market prices of cryptocurrencies. Thus, for the amalgamation of stochasticity and nonlinear dependencies these models were chosen.

1) DETERMINISTIC MODELS
In this section, we present the results obtained by the deterministic models on train and test data. Four models were trained for each currency thus leading to a total of twelve models.
We trained two models on both Norm and UNorm datasets. The architecture of models for all the Cryptocurrencies used is the same as mentioned below.      containing 20, 15, 7, 1 neurons respectively, to give the final output of the next day price. All the activations used were ReLU and trained using Adam algorithm for 1500 epochs.

2) STOCHASTIC MODELS
In this section, the results obtained by our models on test data using stochastic layers in the neural networks are presented. The parameters of the trained model remain the same as in the deterministic models.
However, here we test the models using non-zero perturbation factors thus inducing stochasticity in the models. The results are divided by the currency namely Bitcoin, Ethereum and Litecoin. Each run of a stochastic model is known as a realization as referred to in the literature of stochastic processes. To test out our hypothesis, we ran all twelve models with stochasticity activated for 100 realizations to investigate the effect of inducing randomness into the neural network layers.
We demonstrate the probability distribution of MAPE of predicted prices by stochastic neural networks over these realizations. Two perturbation factor values are tested for each trained model to determine the effect of variation of the value of γ . With a very minute change in the perturbation factor, γ , the error distribution changed disproportionately as shown in Fig. 11. Furthermore, the model performed slightly poor as compared to the deterministic model. We speculate that a stochastic model will perform worse when minor changes in γ lead to major changes in the type of error distribution.
Hyperparameter tuning is the key to choosing the optimal values for γ . Choosing 0.12 over 0.1 significantly improves the performance of our MLP model for Litecoin as shown in Fig. 12. The LSTM model for Litecoin on UNorm dataset showed a better performance with γ = 0.12, while the MLP model for Ethereum using on UNorm dataset performed better with γ = 0.1 as shown in Fig. 13. Thus, we show that the magnitude of γ has no correlation with the performance of a stochastic neural net [53].
For every trained model, we see that the MAPE error distribution has no correlation with the value of γ . Therefore, we are posed with the problem of selecting the optimal values of γ for every model. The process for choosing these values    manually is time-consuming and there is no guarantee for finding the optimum solution.
We can speed up this task by treating γ as a learnable parameter. After training the deterministic neural network, we can freeze all parameters except γ and optimize the values of γ by error backpropagation in combination with gradient descent. Another advantage of doing this is that each layer has its individual value of γ , and hence the optimization process picks up the maximum value of γ that the individual layers can tolerate. By learning the values of γ , we observe a significant improvement in model performance as shown in tables 7, 8 and 9.

VI. CONCLUSION
In this paper, we predict the price of Cryptocurrencies by using a stochastic neural network model. We introduced a technique to adaptively learn the pattern of the market's reaction to any updated information. The results show that the proposed hypothesis was not only valid but effective in decrypting market volatility. Almost all of the stochastic versions of the neural net models outperformed the deterministic versions. The average relative improvement by using stochastic neural networks over regular neural networks is 1.56% in the Norm dataset at γ = 0.1, 1.73% at γ = 0.12 and 1.76% when γ is set as a learnable parameter. The improvement is much more significant for the UNorm dataset where the average relative improvement is 3.91% at γ = 0.1, 4.52% at γ = 0.12 and 7.41% when γ is set as a learnable parameter.
In future, it may be worth exploring an optimizing technique to tune the hyperparameter, γ to find its most suitable value. Moreover, alternate reaction functions can be tested to learn the pattern of the market's reaction to fresh data. These functions can themselves be stochastic in nature to better simulate market volatility. In addition to that, there are a lot of unexplored territories in the cross-disciplinary field of stochastic processes and neural networks that can be exploited in Cryptocurrency markets.
PATEL JAY is currently pursuing the bachelor's degree from Nirma University, Ahmedabad, India. His research interests include computer vision, natural language processing, energy-based models, and reinforcement learning.
VASU KALARIYA is currently pursuing the bachelor's degree with Nirma University, Ahmedabad, Gujarat, India. His research interests are computer vision, energy-based models, reinforcement learning, and quantum machine learning.
PUSHPENDRA PARMAR is currently pursuing the bachelor's degree from Nirma University, Ahmedabad, Gujarat, India. His research interests are computer vision, probabilistic graphical models, and quantum machine learning. He delivered many invited and keynote speeches, 24 events in 2019 alone. His research is multidisciplinary that focuses on cyber security and digital forensics of computer systems with a focus on cybercrime detection and prevention. He convened and chaired more than 50 conferences and workshops. He works closely with government and industry on many projects, including Northern Territory (NT) Department of Information and Corporate Services, IBM, Trend Micro, the Australian Federal Police (AFP), Westpac, and the Attorney Generals Department. He is the Founding Chair of the IEEE Northern Territory (NT) Subsection.