Loading web-font TeX/Main/Regular
Fusing Sell-Side Analyst Bidirectional Forecasts Using Machine Learning | IEEE Journals & Magazine | IEEE Xplore

Fusing Sell-Side Analyst Bidirectional Forecasts Using Machine Learning


Proposed framework for stock price directional prediction.

Abstract:

Sell-side analysts’ recommendations are primarily targeted at institutional investors mandated to invest across many companies within client-mandated equity benchmarks, s...Show More

Abstract:

Sell-side analysts’ recommendations are primarily targeted at institutional investors mandated to invest across many companies within client-mandated equity benchmarks, such as the FTSE/JSE All-Share index. Given the numerous sell-side recommendations for a single stock, making unbiased investment decisions is not often straightforward for portfolio managers. This study explores the use of historical sell-side recommendations to create an unbiased fusion of analyst forecasts such that bidirectional accuracy is optimised using random forest, extreme gradient boosting, deep neural networks, and logistic regression. We introduced 12-month rolling features generated from standard sell-side recommendations, such as analyst coverage, point and directional accuracy, while avoiding forward-looking biases. We introduce a novel “AI analyst” by fusing forecast features from numerous analysts using machine learning algorithms. We observed the added benefits of using these features from more than one analyst by systematically generating unbiased and incrementally better prediction accuracy from publicly available sell-side recommendations, with the Random forest algorithm showing the highest relative performance. In highly volatile sectors, like resources, the machine learning algorithms perform better than in low volatility sectors, suggesting the importance of rolling features in bi-directional prediction in the presence of high volatility. Using feature importance, we observe the incremental contribution of rolling features, showing the relationships between analyst coverage, volatility, and bidirectional forecast accuracy. Furthermore, parameters from logistic regression identify volatility features and initial and target price as some of the essential features when modelling analysts’ directional predictions.
Proposed framework for stock price directional prediction.
Published in: IEEE Access ( Volume: 10)
Page(s): 76966 - 76974
Date of Publication: 21 July 2022
Electronic ISSN: 2169-3536

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

Institutional investors are primarily mandated to invest in an equity benchmark representing the most prominent companies listed on an exchange [1]. Among these companies within the equity benchmark, brokerage firms hire analysts to produce investment reports called sell-side equity analyst reports. Sell-side analyst reports are highly valued in the investment industry, and generally include three standard quantitative outputs: earnings forecasts, target prices, and buy/sell recommendations [2].

The importance of sell-side reports has been demonstrated in several ways. Sell-side analysts play an essential role in interrogating company results and executive management of their strategies, while ensuring the reduction of any mismanagement or unethical behaviour [3]. Another important aspect of sell-side analyst reports is reducing the cost of capital whenever companies issue new equity to raise funds. This is done mainly by reducing information asymmetry between potential investors and their respective companies, which improves stock liquidity. However, this introduces a potential conflict of interest because analysts can generate forecasts or recommendations for other economic incentives, such as securing underwriting business and boosting trading volume [4]. Other issues with sell-side analysts’ reports were found in Chiang et al. [5], who indicate that analysts tend to herd towards a consensus when issuing recommendations, and this tendency increases with market sentiment. Market- or stock-specific sentiment has been shown to impact stock price movements [6].

Investors typically use sell-side analyst information to determine earnings growth and directional movements of stock prices, and hence, their investment decisions [7]. In summary, this approach is best described as obtaining market benchmark expectations from sell-side analyst reports. Therefore, assuming that equity markets are semi-strong, investors can generate alpha returns using these reports [8]. The question of whether sell-side reports can generate alpha performance has been of interest in previous research. Some studies have shown the existence of potential investment strategies in sell-side reports. Barber et al. [9] found that by using analyst ratings, the resultant investment strategies generate excess returns above 4%. Womack et al. [10] also found that brokerage sell-side analysts tend to produce good directional accuracy for stock price predictions.

Most prior studies on the value and potential alpha of these reports has focused on using analyst ratings. In contrast, researchers have only recently begun to focus on earnings and target price analyses because of lack of data. Major financial data vendors have only recently begun to capture target price data continuously [11]. There has been an increase in sell-side analysts in the South African brokerage market, resulting in more companies and sell-side reports being issued by various equity analysts. Inevitably, this has presented the need for investors to process sell-side data rapidly and in an unbiased maner while assisting with their investment processes [9]. Big investment houses have the resources to deal with more sell-side data, but smaller investment firms may need to think outside the box in order to remain competitive with the rest of the market.

In this study, we explore the idea of generating superior directional predictions by fusing sell-side data reports within companies belonging to the FTSE/JSE All-Share index using machine learning algorithms. We propose an artificial intelligence analyst or “AI analyst” to assist in using sell-side analyst data and generate investment recommendations, including directional predictions with little human intervention. We do this by considering investment signals from sell-side reports as quantitative input features to various machine-learning algorithms to predict the directional movement of individual stocks.

Machine learning algorithms have limited applications to sell-side earnings forecast data. To the best of our knowledge, at the time of writing, we do not know any studies investigating the application of these algorithms to sell-side earnings forecast data. A better approximation of market consensus estimates using an iterative filtering algorithm compared to simply using the mean average of consensus estimates was introduced in [12]. The closest research on the topic was in [13], where analyst features, including analyst ratings, were used for stock price prediction. Analyst ratings and profitability of recommendations have been the focus of burgeoning literature [14], [15]. However, most prior studies have examined these outputs in isolation rather than studying the possible use of the combined outputs from the reports. Loh et al. [16] provide the value and use cases of sell-side earnings forecast data for investment recommendations.

The large number of resources that generate sell-side earnings forecasts suggests that they may be helpful in potential investment strategies. It is evident in [17] that in addition to earnings forecasts, several pieces of information on sell-side research data may provide additional information. Stock price forecasts by analysts typically have a short forecast horizon and allow investors to easily observe the expected direction and magnitude of equity price movements. Analysts may release revisions to their forecasts when new information becomes available. We use the methodology and findings of previous studies on sell side earnings forecast accuracy to conduct our analysis.

Accuracy in target prices or earnings forecasting is helpful for users of sell-side reports. Analysts have different abilities and skills within their respective coverage to forecast earnings and target prices for different time horizons [14]. The accuracy of near-term forecasts may depend on many factors, including analysts’ ability to interpret market sentiment and the current economic and industrial themes [15]. Mikhail et al. [18] found that stock coverage, analyst following, and the ability to extract specific earnings guidance from insiders have a significant influence on analysts’ forecast accuracy and in the long term, analysts’ ability to predict these economic, industry, and company trends determines their forecast accuracy.

Prior evidence on the profitability of using sell-side data is mixed [11], [19]. If analysts with the most profitable recommendations, such as those described in [14], also exhibit superior forecasting skills, this would suggest that their stock-picking ability presents various phenomena occurring in the stock market [20]. Differentiation between sell side analyst may present an opportunity to introduce different investment strategies based on sell side data [11]. This study introduces 12-month rolling forecast accuracy scores to differentiate between sell-side analyst reports.

Contributions: Our contributions in this work can be summarised as follows:

  • The first is generation of unbiased and improved directional predictions. Groysberg et al. [21] found that analysts may be inclined to generate sell-side reports to increase their compensation, particularly because factors such as forecast accuracy are not considered when making variable remuneration. This is mainly because of the compensation structure of the investment brokerage firms. The coverage of specific companies can increase brokerage revenue by increasing trading activity. An unbiased AI analyst can provide a solution to investors and analysts facing conflicts of interest.

  • Second, there has been an increase in sell-side analyst reports that are available to investors. Human consumption of these data can be challenging compared to using scalable technologies, such as artificial intelligence or machine learning algorithms. The direct valuation implications of target price forecasts make them potentially useful investment signals regarding directional stock price movements using the AI analyst.

The remainder of this paper is organised as follows: In section II, we discuss the collection of South African sell-side data describing various forecast accuracy and valuation recommendation metrics used. We also examine how the prior literature considers these metrics. We also describe how we introduced 12-month rolling features generated from standard outputs from sell-side reports from a research design perspective. A description of the machine learning algorithms used for the prediction is provided in section III. The results are presented in section IV, starting with the prediction accuracy for all the machine learning models and followed by a discussion on the feature importance of some machine learning algorithms in V. Section VI concludes the paper and outlines ideas for future research.

SECTION II.

Data and Research Design

Similar to [11], the data used in this manuscript were sourced from Bloomberg. It is a collection of standard outputs from the historical sell-side reports of companies within the FTSE/JSE all-share index from January 2004 to June 2018. The main features generated from the reports include analysts’ firm of employment at the date of the forecast, stock industry and sector, rank within the FTSE/JSE all-share index in terms of market capitalisation, forecast stock price including the date, and forecast stock rating which is split into five categories. We use some of these features to generate additional input features, which are discussed in the next section. To avoid any look-ahead bias, the first year of the dataset was removed to calculate 12 month rolling features. The dataset used for these experiments, including rolling features, ranged from July 2005 to June 2018.

A. 12 Month Rolling Forecast Accuracy Metrics

Before introducing rolling features, we first discuss generic definitions from previous research. Our research focus and experimental design have drawn considerable attention to directional accuracy measures of sell-side analysts. Directional forecast accuracy refers to the analysts’ accuracy in predicting the future direction of stock prices. This measure is significant, as many investors choose to trade based on the expected directional movement (e.g. long/short or option and tactical strategies) rather than the expected price [22].

The standard methodology for the directional accuracy can be measured using a confusion matrix [23]. A confusion matrix contains the number of Type I and Type II errors together with the number of correct forecasts. When there is a decrease in stock price, while the analyst reports forecasts an upward movement, it is defined as Type I (i.e. a false positive). A Type II error is defined as an increase in stock price when the analyst forecast moves downward (i.e. a false negative). Correct forecasts are then split into true-positives and true-negatives.

Directional accuracy (DA) can be considered as a measurement of classification accuracy. Our research focused on a binary classification problem, that is,: upward or downward classes. Let true positives (TP) refer to the number of positive classes correctly predicted by the classifier. True negatives (TN) refers to the number of negative classes correctly predicted by the classifier. Type I error or false-positive (FP) and type II error or false negative (FN) refer to the number of classes incorrectly classified as either positive class as a negative or negative class as positive. DA can be defined as \begin{equation*} DA=\frac {TN + TP}{TP + TN + FP + FN } \tag{1}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

We introduced the directional accuracy score (DAS) based on the generic or classic definition of directional accuracy. The measurement is a cumulative score based on the number of times the analyst correctly (TP or TN ) and incorrectly (FP or FN ) predicted directional movement over the previous 12-month period. For a given period, the directional accuracy score can be measured as \begin{align*} DAS = \begin{cases} \sum _{i} x_{i} \times 0.75, \text {if } x = 1 \\ \sum _{i} x_{i} \times 0.01, \text {if } x = 0\\ \end{cases}\tag{2}\end{align*}

View SourceRight-click on figure for MathML and additional features. where x is the directional prediction and the scaling factor is chosen to differentiate between the analyst scores.

We also consider point accuracy to be the percentage difference between the target and realised prices. It measures the accuracy of an analyst’s price prediction. In this study, we only focused on the unassigned point accuracy (UPA) [24], which is the absolute value of the point accuracy, defined as \begin{equation*} UPA =\big |\frac {P_{tg} - P_{12}}{P_{12}}\big |\tag{3}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where P_{tg} and P_{12} is the price target and realised price after 12 months.

We introduce the point accuracy score (PAS) which is the average absolute point accuracy for an analyst over a 12 month period.\begin{equation*} PAS = \frac {\sum _{i} |UPA_{i}|}{\sum n }\tag{4}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where n is the number of reports issued of the period.

We consider two metrics used in prior research to measure the term accuracy [11], [25], [26]. The first metric, MetEnd, measures whether the target prices are achieved at the end of the period by comparing the target price with the actual closing price at the end of the horizon. The second metric, (MetAny), measures whether target prices are achieved at any point during the forecast period. These are then binary variables where 1 indicates that the target price was achieved and 0 indicates that it was not achieved during the forecast period.

We introduce a MetEndScore (MES) and MetAnyScore (MAS) which is the cumulative number of times an analyst prediction meets the target price at the end of the forecast period or at anytime during the forecast horizon. Similar to DAS , these rolling scores are defined as \begin{align*} MES = \begin{cases} \sum _{i} x_{i} \times 0.90, if~TP \\ \sum _{i} x_{i} \times -0.05, if~ TN\\ \end{cases} \tag{5}\\ MAS = \begin{cases} \sum _{i} x_{i} \times 0.95, if~TP \\ \sum _{i} x_{i} \times -0.15, if~TN\\ \end{cases} \tag{6}\end{align*}

View SourceRight-click on figure for MathML and additional features. where x is analysts’ MetEnd in equation (5) and MetAny in equation (6) between periods i and n with the scaling factor chosen to differentiate between analyst scores.

For features relating to stock coverage and earnings momentum, we introduce analyst coverage AC, which shows the total number of reports for the past 12 months on a specific stock, defined as \begin{equation*} AC = \sum _{i}^{n} x_{i}\tag{7}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where n is the previous 12 months from the sell-side analyst’s issue date for stock x .

For earnings momentum, we use the price target momentum (PTM) , defined as the percentage change in the current issued target price from the previously published target price.\begin{equation*} PTM = \frac {Current TP-Previous TP}{Previous TP}\%\tag{8}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

Similar to [11], we include analyst firm and forecast stock ratings as analyst report features. Stock volatility related features include rolling 20-day volatility (20dvol) , monthly Top40 Index cross-sectional volatility (Top40CSV) , and the ratio between the two, defined as:\begin{equation*} RelVol = \frac {20d vol}{Top40 CSV}\tag{9}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

Table 1 shows a sample of the input features calculated from their definitions.

TABLE 1 Sample of Input Features. * Indicating 12 Month Rolling Features
Table 1- 
Sample of Input Features. * Indicating 12 Month Rolling Features

In this study, supervised machine learning techniques were used. The target to be predicted over the forecast period is calculated using the realised closing price P_{12} of the respective stock using the following formula:\begin{equation*} Y_{i} = Sign(P_{12} - P_{0})\tag{10}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where P_{0} is the price at issue date for analyst report i .

The target is labelled as either 1 or 0; a zero value of the target indicates that the price of the stock has fallen, whereas a positive value of the target indicates that the price of the stock has risen during the sell-side analysts’ forecast period. This directional change framework is used to transform a financial time series into an event-based series [22]. Figure 1 shows the frequency distribution of potential return target and realised directional movement. We see that more analyst have positive return potentials. This is maybe due to the fact that markets have been trending upwards over the long term.

FIGURE 1. - Class distribution of analyst directional prediction and realised stock directional movement.
FIGURE 1.

Class distribution of analyst directional prediction and realised stock directional movement.

FIGURE 2. - Proposed framework for stock price directional prediction.
FIGURE 2.

Proposed framework for stock price directional prediction.

B. Research Design

Notably the starting period of the newly generated features was 12 months after the starting period of the sample data. This is because, to generate these scores for analysts, one would need previous forecast accuracy data from previous reports to prevent any forward look-head bias. From the definitions mentioned in Section 3.1, we generated 14 input features based on the FTSE/JSE all-share sell-side data as inputs to the machine learning models, which we discuss in the next section.

In summary, our proposed workflow is depicted in figure 2 shown below

C. Performance Metrics

Specific performance metrics were used to measure the performance of the classification algorithm. We used classification accuracy and precision scores to evaluate the performance of respective machine learning models in predicting stock price directional movement. The classification accuracy of a model is the ratio of the number of correct predictions made to the total number of predictions made, similar to equation (1).

The precision score is given by the ratio between the true positive predictions and the total positive predictions given by \begin{equation*} Precision Score=\frac {TP}{TP + FP} \tag{11}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

SECTION III.

Machine Learning Algorithms

To predict the directional movement of stock prices based on the features discussed, we used machine learning algorithms. These algorithms have become popular with the increase in computing power and data availability. We propose and design a feature-generating process using sell-side analyst data to predict the directional movement of stocks.

A. Random Forest (RF)

Random forest is a machine learning algorithm that uses an ensemble model that combines several base decision trees to produce an optimal predictive model and make point or classification predictions [27]. Decision trees are models with a high variance and low bias. When increasing the number of decision trees, the random forest reduces the high variance by having uncorrelated individual decision trees and taking the average class prediction from each decision tree while the bias remains constant. In each training step, samples from the training sets are selected randomly with replacements and a subset of features is selected randomly and whichever feature provides the best split or decision is used. The best split or branching of the tree is given by a feature where the root node gives the lowest Gini index in equation (12). \begin{equation*} Gini Index = 1-\sum _{i=1}^{C}(f_{i})^{2} \tag{12}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where C is the number of classes and f_{i} is the frequency of label i on a node.

In our study, the stock price movement random forest is a classifier consisting of an ensemble of decision tree-structured classifiers \{h(x, \theta _{k}), k = 1,\ldots \} where \{\theta _{k}\} are independent identically distributed random vectors and each decision tree predicts a unit class of either upward or downward movement to obtain an average class prediction of sell side analyst report features as inputs. The number of decision trees to be boosted was 100 with no maximum depth specified.

B. Extreme Gradient Boosting (XGB)

State-of-art machine learning techniques for classification or regression tasks include extreme gradient boosting, first introduced by Friedman (2001) [28]. Gradient boosting is an optimisation problem, where the goal is to locate the minimum or minimise the model’s loss function by adding weak learners using gradient descent. During the learning process, decision trees are added at each stage, and the existing trees in the ensemble are not replaced. The contribution of the weak learner to the ensemble is based on the gradient descent optimisation process performed on each weak learner. The calculated contribution of each tree is then based on minimising the overall error of the composite learner.

In our experiment, the inputs x^{p}_{i} and y_{i} target set can be defined as \{(x^{p}_{i},y_{i})\}_{i=1}^{n} with a differentiable function L(y,F(x)) , where F(x) is the function mapping x^{p} to y and is the weaker model within the ensemble tree with number of iterations m . The proposed optimisation problem is defined as follows \begin{align*} \gamma _{jm}=&{arg\,min}_{\gamma }\sum _{x_{i}\in R_{jm}}L(y_{i},F_{m-1}(x_{i})+ \gamma) \tag{13}\\ F_{m}(x)=&F_{m-1} + \sum _{j=1}^{J_{m}}\gamma _{jm}1_{R_{jm}}(x)\tag{14}\end{align*}

View SourceRight-click on figure for MathML and additional features. where chooses a separate optimal value \gamma _{jm} for each of the tree’s regions, instead of a single \gamma _{m} . The XGB model has the benefit of better interpretability in which we can estimate the relative importance of input features in predicting stock price directional movement. The number of decision trees to be boosted was 950 with a maximum depth of 4 for each tree.

C. Deep Neural Networks (DNN)

We also model stock price direction movement using deep neural networks. These are artificial neural networks (ANN) with several layers between the chosen architectures. ANNs are machine learning algorithms inspired by the human brain that help generate human intelligence without explicit programming.

We also employ a deep neural network to classify or predict the directional movement of stock prices. It has four layers of interconnected neuron units, through which the data are transformed. The input layer with 14 neurones represents the generated input features and market volatility features described earlier. The hidden layers have a ReLU activation function, whereas the output layer uses a sigmoid function. The first hidden layer has a dimension of 7, with a total of 105 parameters. The second hidden layer has a dimension of 3 with a total of 24 parameters. The last output layer with the sigmoid function has a total of 4 parameters. The adam optimiser was used to update all the 133 trainable parameters. The objective function is a binary cross-entropy loss function, that reduces the uncertainty of the distribution of upward or downward movements in stock prices. Using different input data such as fundamental company data, researchers have shown that ANNs can successfully classify the directional movement of stock prices with 72% accuracy [29].

D. Logistic Regression (LR)

We modelled the bidirectional accuracy of the AI analyst using a logistic regression (LR). It is a parametric classification model based on a sigmoid function and its properties [30]–​[32]. The input features described in section II-A are calculated as the weighted sum of the input features to generate probabilities between 0 and 1 using equation (15). \begin{equation*} p(\textbf {x}) = \frac {1}{1+e^{-(\beta _{0}+\beta _{1}x_{1}+\beta _{2}x_{2}+\ldots +\beta _{m}x_{m})}} \tag{15}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where p is the predicted probability of an observation belonging to a class, \beta _{i} is the parameter for i-th feature x_{i} and m is the number of input features.

To calibrate the model, the loss function l is the cross-entropy loss given by \begin{equation*} l({D_{x}}) = -\sum _{i}^{N_{obs}} Y_{i} \ln (p_{i}) +(1-Y_{i}) \ln (1-p_{i})\tag{16}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where {D_{x}} represents the data and N_{obs} represents the number of observations, p_{i} represents the predicted probability of i-th observation. In this study, since there are 14 sell side report features, m=14 . These are the coefficients or parameters chosen such that they minimise the cross-entropy loss.

SECTION IV.

Experiment Results

Data normalisation was applied to obtain the input dataset at the same scale. The data is transformed to a range between −1 and 1. We begin with the entire universe after generating the input score features. These start from 11 November 2005 to 29 June 2018 to give a total of 29935 analyst reports with their respective forecast accuracy scores. We run four experiments selecting between the complete analyst dataset and selecting a subset thereof. Table 2 outlines the experiments conducted as a subset of the complete universe.

TABLE 2 Experiment Definitions
Table 2- 
Experiment Definitions

To evaluate the performance of the model, the data were split at 80:20 for the training and validation sets. We used accuracy and precision scores to evaluate the performance of respective machine learning model in predicting directional movement of stock price. Table 3 The presents the results.

TABLE 3 Accuracy Score Results
Table 3- 
Accuracy Score Results

Next, we examined the performance of the model using the precision scores listed in Table 4.

TABLE 4 Precision Score Results
Table 4- 
Precision Score Results

Table 3 shows that machine learning algorithms have performed better than individual analyst predictions for directional movement. RF and XGB model also outperformed the deep neural networks. We believe that the reason for the outperformance of tree-based models is the nature of some of the categorical data as inputs to the feature inputs. We can also see from Table 3 that there is evidence that the accuracy of directional prediction increases with analyst specialization within sectors. RF showed the highest prediction accuracy of 81.78% in experiment 2, whereas the lowest accuracy score was in experiment 1. The DNN showed a similar differential performance between experiments. The LR showed the lowest performance among the machine learning models, with the highest accuracy score in experiment 3. Analyst consensus performance showed the highest accuracy in experiment 3, whereas the lowest performance was in experiment 2. We believe that this is because resource stocks are more difficult to forecast than financials because of the relatively higher volatility of resource stocks. However, it is important to note that the RF model performs best with resource stocks, indicating the benefits of using fused sell-side analysts’ features. Table 4 shows precision score results which were similar to the respective accuracy scores across all the experiments.

SECTION V.

Feature Importance

We explore the feature importance for XGB, RF and logistic regression when generating bidirectional stock price movement predictions. In figure 3, the feature importance highlights Top40CSV , AC and P_{0} as strong features for directional prediction in experiment 1 under both XGB and RF while LR shows MES and MAS as important parameters for positive and negative class predictions respectively. Features in LR shows only firm, rating, \frac {TP}{P_{0}} and AC were statistically significant (\alpha = 5\%) . Ahmed et al. [11] also find that volatility is a pivotal driver of directional accuracy. This is as expected because, given a specific mean return, volatility will drive the dynamics of whether the expected return is either negative or positive. We also found that analyst coverage is a vital feature in most experiments under both the RF and XGB models, but not under LR. The different results between experiments also show that the increase in industry coverage had spillover effects on analysts’ forecast accuracy [33], resulting in analyst coverage having a higher importance score, and, hence, higher prediction accuracy scores. In figure 4 and 5, the rolling score features are more important in experiments 2 and 3 under both XGB and RF. The rolling accuracy scores under XGB and RF show MAS and PAS with the highest feature importance in most experiments, whereas analysts’ stock ratings are the lowest in most experiments. This indicates that the framework to differentiate between analyst reports using rolling features has added benefits. We expected the PTM feature to have a higher importance score in most experiments because the earnings momentum factor influences changes in analysts’ recommendations or target prices [34].

FIGURE 3. - Feature importance for experiment 1.
FIGURE 3.

Feature importance for experiment 1.

FIGURE 4. - Feature importance for experiment 2.
FIGURE 4.

Feature importance for experiment 2.

FIGURE 5. - Feature importance for experiment 3.
FIGURE 5.

Feature importance for experiment 3.

FIGURE 6. - Feature importance for experiment 4.
FIGURE 6.

Feature importance for experiment 4.

SECTION VI.

Conclusion

There is increasing evidence that analysts’ reports are used in different investment processes because of their market influence. We conducted a study to build a generic fused AI analyst to generate directional predictions of stock price movements based on sell-side report outputs. The results show that a generic AI analyst performs better than an individual analyst directional prediction, with accuracy for generic fused AI analyst between 75%-79% and individual analyst accuracy between 54%-57%, similar to the findings of Ahmed et al. [11]. The precision score showed a similar performance.

Using machine learning algorithms, we have demonstrated that sell side report outputs including P_{0} , AC and TP together with the rolling scores PAS , MAS and DAS are strong features for directional prediction of stock prices. It appears that the machine learning algorithms use P_{0} to establish a reference point while using the other features to predict directional movement.

Possible investment strategies can be explored using AI analyst predictions to determine whether there are other factors or alpha contributions beyond standard factor returns within the FTSE/JSE all-share universe. We also find that features such as volatility and analyst coverage show heightened importance in all the experiments. It will be worthwhile to explore this relationship because one would suspect that analyst coverage increases accuracy in earnings or price predictions, and hence, lowers the volatility of the stock.

References

References is not available for this document.