CBITS: Crypto BERT Incorporated Trading System

Most textual analysis-based trading approaches in cryptocurrency (crypto) involve lexical, rule-based methods for extracting news sentiments. Furthermore, language models (LMs) are not always suitable for the crypto domain due to jargon that is not covered in general-purpose texts. This study answers the question of “Is it possible that the LMs can profit by effectively applying the sentiment score of the natural language processing task with chart score in the BTC trading system?” by focusing on the effectiveness of both scores, which significantly affect the profit of the trading system. We introduce CBITS: Cryptocurrency BERT Incorporated Trading System based on pre-trained LMs for Korean crypto sentiment analysis to aid Bitcoin (BTC) trading models. We pre-trained crypto-specific LMs, which are transformer encoder-based architectures. Along with our pre-trained LMs, we also present our custom fine-tuning dataset used to train our LMs on the BTC sentiment classifier and show that using sentiment scores along with BTC chart data boosts the performance of BTC trading models and also allows us to create a market-neutral trading strategy.


I. INTRODUCTION
Since the advent of the cryptocurrency (crypto) market, which is now a trillion-dollar market as of 2022, Bitcoin (BTC) [29] has attracted attention in the research community due to its unconventional volatile price fluctuations and unpredictability. To address the challenge of predicting BTC price movements, numerous works were published. It is well known in the domain of finance that news sentiments are helpful in predicting the price fluctuations of financial assets [13], [27]. In order to effectively capture crucial trade signals from news articles in an automated manner, this begs the question of ''Why there are not any publicly available language models (LM) for the crypto and the blockchain field?''. Since the emergence of BTC, the crypto market has grown quickly, and many financial investors have been trying to The associate editor coordinating the review of this manuscript and approving it for publication was Rahim Rahmani . gain an advantage in the market using statistical [16] as well as machine learning [4] methodologies. Especially, methods incorporating textual analysis [1], [31] for price predictions of crypto began to emerge.
Unlike the financial field, which has a domain adapted to pre-trained LM, the textual analysis techniques used in this crypto domain mainly were achieved through lexical and rule-based methods [19], [20], [33]. Following the previous success in financial textual analysis achieved by FinBERT [2], we propose CBITS: Cryptocurrency BERT Incorporated Trading System based on pre-trained LMs that are applicable to the crypto and the blockchain domain in Korean. In this research, we focus on calculating crypto news sentiment scores and use these calculated scores to enhance the performance of our BTC trading model.
Korean language, which is agglutinative in its morphology, is challenging due to the intermediate characteristics positioned between isolating and inflectional language [18]. Basically, an eojeol, which is a spacing unit, consists of more than one morpheme. The first challenge is that the meaning of an eojeol can vary depending on the grammatical morpheme. For example, the meaning of the noun ' (He)' can change depending on the grammatical morpheme postpended to it, such as ' (He is)' or ' (to him)'. The second challenge is that a syllable, which is consisted of a consonant and vowel, can be represented differently depending on the meaning of the morpheme. For example, ' ' composed of ' ' and ' ' is transformed in meaning to ' ', which means 'cow' through the variation in a vowel. For these reasons, domain-specific LM should be considered differently than general LM, and we designed pre-trained LM with Sentencepiece [21], which is a language-independent subword tokenization algorithm that does not require language-specific processing. We collected various Korean crypto datasets to pre-train three transformer-based [35] LMs: BERT [11], RoBERTa [26] and DeBERTa [14]. We also created our custom fine-tuning dataset to train our pre-trained LMs. Our fine-tuning corpus is designed to let our CBITS learn how the news article may affect the prices of BTC.
The sentiment classification task is not simply calculating the polarity of the news articles but is focused on how they may affect the movement of BTC prices. Figure 1 is an overview of our CBITS with scores both of sentiment polarity and chart in the BTC trading system. We designed a trading model for the BTC Tether (USDT) perpetual derivatives market motivated by the classification approach described in [40], and then supplemented this trading bot with news sentiment features from our fine-tuned LMs to show that news sentiment scores from pre-trained crypto LMs boost the performance of this trading bot. We also show that TabNet [3] is an optimal choice for the BTC trading model by experimentally verifying that it produces the most robust results among our candidate models. Our trading models were purposely designed to trade in the derivatives market as we wanted to create a market-neutral strategy (i.e., the model can profit in both the bullish and the bearish markets). The contributions are summarized as follows: 1) We propose a novel Korean crypto news sentiment dataset specifically tailored for BTC. News is labeled positive, negative, or neutral depending on the effect of BTC price movements.
2) We design Korean crypto domain-specific pre-trained LMs that are fine-tuned on our dataset. 3) We propose trading models for trading BTC in the derivatives market (BTC/USDT Perpetual). 4) We prove the effectiveness of our CBITS enhancing the performance of our BTC trading models.

II. RELATED WORK
Previous studies have been incorporating deep learning approaches for textual analysis, and a notable example is the state-augmented reinforcement learning (RL) framework in the finance field [28], [39]. The state augmentation is achieved by incorporating news sentiment into an RL framework by training a hierarchical attention network [38] with three different word embedding techniques, and they proved that this framework beats the buy and hold as well as other online portfolio selection methods [24].
Recently, Mohan et al. [28] predicted stock prices by feeding a long short-term memory (LSTM) network [15] with data on stock price and news headline sentiment text from the fine-tuned FinBERT. They showed that using the textual information significantly boosted the performance of intraday stock trading models. There were also studies using other approaches to incorporating sentiment features, such as Sonkiya et al. [32], which leverages news sentiments from BERT and feeds these textual features to Generative Adversarial Networks [12], as well as Chen et el. [7], which makes use of contextualized embeddings generated from news headlines for price prediction. Even though other recent approaches include more sophisticated deep learning architectures for crypto textual polarity classification [31], [36], none of them involve the use of pre-trained LMs adapted to the crypto domain.

III. DATA ANNOTATION PROCESS
To improve the quality of annotations, we conduct the annotation process based on expertise in the crypto domain.

A. CRYPTO NEWS CORPUS COLLECTION
For the fine-tuning LMs of our CBITS, we crawled the data from a mainstream Korean crypto news source website called Coinness Korea (CK), 1 from 2018-01-19 14:00 UTC to 2022-04-16 00:00 UTC. The CK, which posted about 100 daily crypto news articles, is dedicated to delivering crypto and blockchain trade information day and night to help actors make better crypto investment decisions.

1) ANNOTATION PROCESS
The dataset was labeled by a professional day trader and was peer-reviewed by another cross-checking day trader. During the labeling process, the annotators were provided with sentiment classification results from FinBERT as well as the information on bear/bull market votes from the CK website for reference. Furthermore, the news data was not analyzed in isolation, but the annotators were instructed to observe both the chart movements and the news together for labeling. Along with the additional information, the annotators adhered to the annotation guidelines when tagging news sentiment labels. Details of the guidelines are described in Section III-A2.

2) ANNOTATION GUIDELINES
• Because our trading bot will trade BTC in the derivatives market, the news sentiments are labeled according to how this particular news may affect BTC prices. So if the news article talks about a crypto coin that is deemed to not affect BTC whatsoever, even if the general sentiment of that news is positive or negative, that news is labeled neutral. In other words, the sentiments are focused on BTC.
• The price increase, bullish news, and investment attraction of large crypto coins up to the top 15th in market capitalization are also reflected as good news for BTC and given a positive label. However, if BTC dominance dropped sharply at that point, a negative label is assigned instead.
• News containing information that large amounts (typically in the ten to couple hundred million dollar ranges) of BTC are deposited to exchange is given a negative label because it is usually the case that many people are willing to either sell or short BTC. However, if the deposit amount is small or seems like an inflow of stablecoins, we give it a neutral label.
• Transfer of crypto from large exchanges (e.g., Binance, Bybit 2 ) to an anonymous wallet or a transfer of crypto from anonymous to anonymous is labeled as neutral.
• Most of Elon musk's provocative tweets are labeled as being positive, as Elon has never led a decline in prices, but we believe his influence over the market is gradually weakening over time.
• Aside from Elon, opinions from other influencers or public figures are labeled as neutral. However, words from the US senate or the fed officials are deemed important and are labeled either positive or negative.
• News about a company or a country using or investing in a crypto is generally labeled as being positive. However, if the company is not well known or deemed insignificant then we give the news a neutral label.
• All breaking news, which talks about the number of times BTC was mentioned/tweeted, is carefully analyzed with the chart data and is given either a positive or negative label. If there are many Twitter mentions for a particular crypto, it usually indicates a significant fluctuation for this crypto.
• Rising price index, rising oil prices, rising gold prices, and rising government bonds are major causes of the NASDAQ decline. Since BTC and NASDAQ tend to be coupled, we give a negative label to such news articles.
2 https://www.bybit.com/ • News about small investment attractions is labeled neutral, whereas large fund and institutional investment attractions are given positive labels.
• Any small-scale investment attraction based on large coins (top 15 market cap) is given a neutral label.
• News that mentions that NFTs or metaverse platforms reached high revenue or trading volume is given a positive label.
• Any indication of decreasing BTC dominance is given a negative label.
• News related to the introduction or use of BTC in any country in the world is generally given positive labels.
• Decrease in BTC hash rate is considered negative, whereas an increase in hash rate is considered positive.
• News about a country's economic collapse is considered negative.
• The lockup release of large cryptocurrencies is in general considered neutral.
• News about token burns of cryptocurrencies with a large market cap is generally considered positive.

B. SILVER DATASET
The annotated gold dataset is described in Figure 2. Label 0 is positive, label 1 is negative, and label 2 is neutral. Approximately 77% of the labels are neutral, 15% of the labels are positive, and 8% of the labels are negative. Due to the imbalance of sentiment labels in the annotated gold dataset, we leveraged the best fine-tuned LM described in Section V-B to extract only the positive and the negative news among our unlabeled news data from 2018-01-19 to 2022-04-16. The CBITS labeled 18,005 news corpus as either positive or negative, and this silver dataset was also cross-checked by the annotators. Once the true positives and negatives were selected from the silver dataset, they were combined with the gold dataset to be used for statistical testing in Section IV-A.

C. CHART DATA
We collect 4-hour interval BTC/USDT chart trading data from Binance, 3  The description of the features we used is as follows: • Differencing: This is simply the ratio of raw chart features across different periods. Since we are using 4-hour interval chart data, the difference between a single period (e.g., t and t − 1) would be 4 hours. The first differencing of close prices would be calculated as follows: In general, the K th differencing of the close prices is simply: We carried out this differencing procedure for all open, high, low, close, and volume and used K = 1, 2, 3, 4, 5 for differencing.
• EBSW: The even better sinewave (EBSW) is a variation of the Hilbert sine wave, and it is an indicator that can inform the model about the bullish and bearish cycle of prices. Typically, an EBSW value greater than 0.85 suggests that the asset is overbought whereas a value smaller than -0.85 suggests that the asset is oversold. The pandas-ta 4 library was used for EBSW calculation.
• Chaikin Money Flow: The Chaikin money flow (CMF) is an indicator used to monitor both accumulation and 4 https://github.com/twopirllc/pandas-ta distribution of an asset over a specified period. The value range is from -1 to +1 and any crosses above or below 0 can be used to identify certain momentums in buying and selling. The pandas-ta library was used for CMF calculation. The default period of 20 is used for CMF.
• VWAP/Open: Ratio of volume weighted average price (VWAP) and open price. The calculation of VWAP first involves calculating the typical price TP, which is simply the average of low, high and close prices.
VWAP at time t is calculated by the following formula:

IV. METHODOLOGY
In Section IV-A, we describe the strong correlation between news sentiments and BTC returns. Motivated by this fact, we present a classification based approach for our BTC trading model in Section IV-B.

A. SENTIMENT SCORE AND BTC RETURNS
As it is a pervasive problem in trading research whereby some features are fed to a machine learning algorithm, and the modification of the structure is repeated until the desired backtest result shows up, we perform Kolmogorov-Smirnov (KS) test to first show that the BTC's price changes right after a certain sentiment score are in fact differently distributed than the price changes right after a neutral or no sentiment scores, proving the statistical significance of the correlation between news sentiment scores and BTC's price changes [9], [10].
Here, we specify sample A s and B where sample A s is a set of BTC's price changes right after the sentiment score s, in which s = 0 represents positive, s = 1 represents negative, s = 2 represents neutral sentiment score according to the labels in Figure 2. Sample B shows a set of price changes right after a neutral or no sentiment score. This assumption follows intuitive reasoning as if the news sentiment scores have zero impact on the BTC's price changes, then the distribution of sample A s will be similar to sample B, i.e., the price changes right after the said news sentiment scores will be no different from when there were neutral or no sentiment scores. Applying the KS test, we show that the probability of observing sample A and sample B, given that they are sampled from the same distribution, is statistically low.
To first find the KS test statistic of sample A and sample B, we find their respective empirical cumulative distribution function (ECDF) by binning each sample and finding the empirical number of values less than or equal to a certain value.
F n (a) = number of sample's elements ≤ a n (1) where n represents the total size of the sample. To apply the ECDF, we first have to find appropriate bins where the sample's element can be ''binned'' for calculating the ECDF in practice. Although the size of the bin can arbitrarily be small, we apply Freedman-Diaconis rule, which is used for performing a histogram as a density estimator, as a rule of thumb for finding the size of the bin. Given discrete empirical measurements, this rule used to find the bin width is defined as: where IQR(x) represents the interquartile range of the data and n represents the number of elements in the data. The number of bins is as follows: Upon finding the appropriate size of the bin, we iterate through each bin and find the number of examples less than or equal to the bin's upper value to find the ECDF for both sample A and sample B, named ECDF A,b and ECDF B,b respectively where ECDF A,5 represents sample A's ECDF value at 5 th bin. The KS test is based on the intuitive idea that if two samples are sampled from the same distribution, then their ECDF's must be identical, and thus the ''distance'' between these two ECDF's must be small, with enough data making the distance arbitrarily close to zero. The KS test statistic for two empirical cumulative distribution functions are as follows: We then apply a permutation test where sample A and sample B are combined and permutated. It randomly split to create new sets of two samples sample A ′ and sample B ′ , and then performs monte carlo simulation to find the estimated distribution of the ''distance'' between newly randomly sampled samples A ′ and B ′ . The idea behind the permutation test is that if sample A and sample B are in fact sampled from the same distribution, then their initially found ECDF distance will be expected to be similar to permutated and newly sampled sample A ′ and sample B ′ . Based on this simulation, we can then calculate the p-value given that samples A and B are in fact sampled from the same distribution.
Before directly applying KS test on sample A and sample B, we define the range of t where t = 1 represents hourly prices, and t = 2 represents every other hour's prices. The reason for conducting the KS test not only for different sentiment scores (positive or negative) but also for the range of t is that financial prices, such as BTC's prices, are non-stationary, and have a low signal-to-noise ratio [34]. Thus, formulating a trading strategy that is based on a small value of t (e.g., t = 1 minute), then the noise in the data might overwhelm the signal in the time series and therefore overwhelm the predictive impact that the crypto news have on BTC's returns. we perform and present the probability of observing the KS test statistic for each data frequency t and for each sentiment score s = 0 (positive), and s = 1 (negative) in Table 1.

B. BTC TRADING SYSTEM 1) CLASSIFICATION BASED BTC TRADING MODEL
We tackle the problem of BTC trading by leveraging a classification approach. Three classes define our target variable. If the equation is as follows: The labels are defined as follows: • c 0 : BTC price rises by at least 0.75% within the next 4-hours i.e., u t+1 >= 0.0075 • c 1 : BTC price drops by at least 0.75% within the next 4-hours i.e., v t+1 <= −0.0075 • c 2 : BTC price change within the next 4-hours is less than 0.75% i.e., u t+1 < 0.0075 and v t+1 > −0.0075 We defined the threshold at 0.0075 to account for the trade commissions and bid-ask spread in practice when both u t+1 >= 0.0075 and v t+1 <= −0.0075 happen at the same time. We inspected that this case happens in approximately 14.5% of the data. In a situation when both the short and long position results in at least a 0.75% profit, we favored long over short.
Class distributions are 51.5% for c 0 , 35.7% for c 1 and 12.8% for c 2 . When both c 0 and c 1 occur we favored c 0 which TABLE 2. Crypto based pre-trained LMs: BERT, DeBERTa, RoBERTa. The hyperparameters in the middle refer to #ML = max sequence length, #L = learning rate, #U = the number of total updates, and #P = perplexity. The right side shows the results of fine-tuned LMs' scores using crypto news sentiment corpus, which is annotated by a professional day trader in Section III-A1. explains why most of the labels are c 0 . Class c 2 has the least occurrence, which suggests that it is quite common for BTC to fluctuate by at least 0.75% (in either direction) within the next 4 hours.

2) COMPARISON OF CHART ONLY MODELS
We considered three candidate models to be used for our classification based trading bot.
• LSTM: The long short term memory [15] network is a type of recurrent neural network (RNN) that aims to solve the long-term dependency problem of RNN via gating mechanisms.
• XGBoost: eXtreme Gradient Boosting [8] is a scalable and highly accurate implementation of gradient boosting.
• TabNet: TabNet [3] is attention based neural network architecture specifically designed to tackle problems involving tabular data. It uses soft feature selection to focus on features that are important and this process is accomplished via a sequential multi-step decision mechanism. We purposefully did not include well-known attention based time series forecasting models such as the temporal fusion transformer [25] and the informer [41] since we are dealing with tabular data and the dataset size is not large enough to train complicated neural network architectures. To account for the class imbalance, we use balanced class weights when training all three models.

V. EXPERIMENT A. EXPERIMENTAL PRELIMINARIES 1) OPTIMAL PRE-TRAINED LMs FOR SENTIMENT CLASSIFICATION
We proceed with a practical preliminary step to prove the optimal fine-tuned LM for the sentiment classification task. As shown in Table 2, we pre-trained three crypto LMs: BERT, RoBERTa, DeBERTa. Perplexity (PPL) corresponding to the intrinsic evaluation represents the degree of confusion of the model and is primarily used as an evaluation metric for LMs. We also pre-experiment the effectiveness of sentiment classification by fine-tuning the three cryptos pre-trained LMs with two original multilingual LMs: mBERT and XLM-RoBERTa.
As a result, the ideal LMs compared with the five LMs is CryptoRoBERTa, which shows the best performance on the crypto news sentiments analysis task while showing a low PPL. In other words, it can be concluded that this CBITS is an optimal model that effectively calculates news sentiment scores used to aid our BTC trading models.

2) PRE-TRAINING CORPUS
To design LMs optimized for the crypto domain, data used for pre-training CBITS were mostly Korean crypto and blockchain related news. It was intended because news articles follow a format that is easy to work with, and also covers the most recent information as well as technical terms used in the crypto domain. Aside from news sources, texts from crypto blogs, crypto mining community forums, and Wikipedia were used. Overall, we managed to collect 900K texts that were amassed to 880 MB in size for pre-training CBITS.
For the crypto sentiment classification task, we set consistent experimental environments following fine-tuning hyperparameters: learning rate = {1e −5 , 3e −5 , 5e −5 }; batch size = 32; and max epoch = 10 except when early stopping occurs. Considering that the performance of the model may vary depending on the initialization value, the average score with three learning rates and five random initialization values is recorded.

B. FINE-TUNING CRYPTO SENTIMENT ANALYSIS TASK
The pre-trained crypto LMs were subsequently fine-tuned on our crypto news dataset. To enhance our experimental procedure, we conducted an evaluation with ten-fold crossvalidation, i.e., 10 times for each model, where 8:1:1 for training, validating, and testing, respectively. To verify the effectiveness of our CBITS, two representative multilingual LMs were also fine-tuned, and our model recorded about 3 points higher. The detailed scores are on the right side of Table 2.

C. INCORPORATING NEWS SENTIMENT SCORES
Given the performance of chart only models in Section IV-B2, we experimented with adding news sentiment scores from our current best fine-tuned crypto LM, CryptoRoBERTa. As we are dealing with a 4-hour interval trading bot, we simply collected all the CK news that was uploaded within the 4-hours window, calculated the sentiment scores for each of them, and added these sentiment scores. The added sentiment score is fed into the model along with our feature-engineered chart data to output a prediction. We only used positive and negative sentiment scores as we consider neutral news as noisy data. After grouping our CK news data into 4-hour intervals, approximately 62.56% of the chart data had news available within that 4-hour intervals. For intervals that had no news data at all, zero positive and negative sentiment scores were given as inputs.
The results of adding news sentiment scores are shown in Table 3. All models show performance improvements of +1.37%, +1.96%, +3.18% on accuracy score, and +1.20%, +1.39%, +1.91% on F1 score, respectively compared to chart only models. This is probably due to the fact that XGBoost and TabNet are more suited for problems involving tabular data. XGBoost has the highest accuracy, but TabNet has the highest F1 score, suggesting that TabNet is the more robust model.

D. SEMANTIC SEARCH BASED SENTIMENT SCORE CALCULATION
We observed that some news articles exceed the 512 token length limitations of LMs. This motivated us to use semantic search to find the most relevant top K sentences from the content to the title of the news article, where K = 5, 10.
The model used for calculating semantic similarity was KR-SBERT, 6 which is a siamese network consisting of a pretrained KR-BERT fine-tuned on the KLUE NLI dataset and augmented with KorSTS dataset. The top 5 and top 10 most relevant sentences calculated by the cosine similarity between the title embedding and the content embedding were used instead of the entire content for calculating the news sentiment scores. Models trained using these methods are referred to as RoBERTa top 5 and RoBERTa top 10. The results are shown in Table 4.
We observe that the best accuracy and F1 score are achieved by TabNet RoBERTa top 10. In general, the F1 score for all three models improves when using the top 10 semantically similar sentences from the content. Because TabNet shows the most robust results with the best accuracy and F1 score, we decided to use TabNet as the trading model going forward. Observing the feature importance plot from TabNet RoBERTa top 10 in Figure 3, we show that the most weights are given to the negative and the positive sentiment scores 6 https://github.com/snunlp/KR-SBERT calculated by our CBITS, followed by a volume related feature. This suggests that sentiment scores played an essential role in the trading model.

E. BACKTEST RESULTS
To verify the performance of the trading model, we conducted a backtest of our TabNet models on the test dataset. Along with these, we also considered some other baseline trading methods for comparison.
• BAH: Buy and Hold strategy. It simply buys BTC and holds it during the entire testing period.
• Monkey Trader: A random agent, that chooses long, short or hold (doing nothing) with uniform probability.
• OLMAR: Online Portfolio Selection with Moving Average Reversion (OLMAR) strategy. OLMAR [23] assigns portfolio weights to each of the assets in the portfolio. In our case, the two assets will be the cash agent and BTC. When OLMAR assigns at least 60% of the weight on BTC, then we take this as a signal to take a long position, and if OLMAR places at least 60% of the weight on the cash agent, we take this as a signal to take a short position. Otherwise, we resort to 'hold'. The TabNet models predict what action to take (long, short, hold), and their profits are recorded after every action they make. When trading with the TabNet models, we assume a 0.75% take profit and no stop loss. No take profits or stop losses are assumed for other models. Furthermore, commissions of 0.04% are used to simulate trading as a market taker in the Binance USDT market. When calculating the wealth achieved for the backtest, we used 0.08% as the commissions (twice the actual commissions) to account for factors such as the bid-ask-spread and the slippage that are difficult to simulate for backtesting. The initial seed money for all the models is set to $1000. Figure 4 shows the backtest results (portfolio value) of various BTC trading models and Table 5 shows their total  TABLE 3. Performance of chart only models & chart + news sentiment score models. The highest score is shown in bold text, and the underline indicates the highest gap of the score when using the crypto LMs.   percentage profits. Our TabNet models outperform other trading methodologies by a wide margin, with the best profit of 304.65% achieved by the TabNet RoBERTa top 10. All the other trading models result in a negative profit during the testing period.

VI. LIVE RUN RESULTS & LIMITATIONS
In Figure 5, We carried out live run experiments using our proposed CBITS architecture from 2022/05/26 5:00pm to 22/06/04 1:00am. Our live run was conducted on Bybit's BTCUSDT perpetual market, taking profit set at 0.75% and stop loss naively set at 2.0%. The initial seed was approximately $100. We compared the performance of CBITS with the performance of the buy and hold strategy, and CBITS managed to achieve a final profit of +7.926%, a maximum profit of +8.700%, and a maximum drawdown of −1.274%. VOLUME 11, 2023 On the other hand, buy and hold resulted in a final profit of -0.527%, maximum profit of +11.927%, and maximum drawdown of −7.926%. During this testing period, CBITS outperformed buy and hold by a large margin and showed lower risk than buy and hold. By observing these results, we also show the limitations in the current CBITS methodology. Since the take profit is set at 0.75%, in extremely bullish markets, CBITS cannot achieve more profit than buy and hold. Similarly, in extremely bearish markets, our CBITS cannot achieve more profit than a strategy that shorts during that period. A more intricate take profit or stop loss strategy can potentially take our proposed CBITS to the next level.

VII. CONCLUSION AND FUTURE WORK
In this research, we created LM adapted to the crypto domain to calculate crypto news sentiments and showed that these sentiment scores help improve the performance of BTC trading models. Our primary focus in this paper was proving the effectiveness of news sentiments, but one might be able to further improve the performance of the trading bot by considering other features along with news sentiments, such as on-chain data [6], BTC dominance [22]. If more news and chart data are collected, we may be able to experiment with more complicated multi-modal structures such as work done by [5]. Crypto LMs may also be applied to other tasks in natural language processing such as detecting pump-anddump scheme articles, crypto named entity recognition [17], and crypto news clustering based on similarity analysis.

(Gyeongmin Kim, Minsuk Kim, and Byungchul Kim contributed equally to this work.)
GYEONGMIN KIM received the B.S. degree in computer science and information security from Baekseok University, Cheonan, South Korea, in 2017. He is currently pursuing the Ph.D. degree in computer science and engineering with Korea University, Seoul, South Korea. Since 2017, he has been a Researcher with the Natural Language Processing and Artificial Intelligence Laboratory, Korea University. His research interests include natural language processing, multimodal learning, and machine reading comprehension with neural symbolic knowledge. Particularly, his research focuses on how machines can understand like humans.
MINSUK KIM received the B.S. degree in mathematics from Stanford University, Stanford, CA, USA, in 2021. He is currently working as an AI Scientist and an Engineer at MindsLab. His research interests include financial machine learning, deep learning for tabular data, convex optimization, reinforcement learning, pattern matching, and representation learning.
BYUNGCHUL KIM received the B.S. degree in physics from the University of California at Los Angeles, Los Angeles, CA, USA, in 2016. He is currently working as a Chief Technology Officer at HighDev and a Chief Data Officer at Synotex. His work involves architectural design of the research and development pipeline. His research interests include financial machine learning, reinforcement learning, and practical application of machine learning in engineering perspective.
HEUISEOK LIM received the B.S., M.S., and Ph.D. degrees in computer science and engineering from Korea University, Seoul, South Korea, in 1992, 1994, and 1997, respectively. He is currently a Professor with the Department of Computer Science and Engineering, Korea University. His research interests include natural language processing, machine learning, and artificial intelligence. VOLUME 11, 2023