Multi-Agent Deep Reinforcement Learning With Progressive Negative Reward for Cryptocurrency Trading | IEEE Journals & Magazine | IEEE Xplore

Multi-Agent Deep Reinforcement Learning With Progressive Negative Reward for Cryptocurrency Trading


Schematic diagram of our system overview: (a) data preparation, (b) multi-agent proximal policy optimization (MAPPO), and (c) simulated cryptocurrency market environment.

Abstract:

Recently, reinforcement learning has been applied to cryptocurrencies to make profitable trades. However, cryptocurrency trading is a very challenging task due to the vol...Show More

Abstract:

Recently, reinforcement learning has been applied to cryptocurrencies to make profitable trades. However, cryptocurrency trading is a very challenging task due to the volatility of the market, especially during bearish periods. In addressing this problem, the existing literature employs single-agent techniques such as deep Q-network (DQN), advantage actor-critic (A2C), and proximal policy optimization (PPO), or their ensembles. Moreover, in the context of cryptocurrencies, the mechanisms for restricting losses during a bearish market are insufficiently robust. Consequently, the performance of reinforcement learning methods for cryptocurrency trading in the existing literature is constrained. To overcome this limitation, we propose a novel cryptocurrency trading method based on multi-agent proximal policy optimization (MAPPO) with a collaborative multi-agent scheme and a local-global reward function to optimize both the individual and collective performance of the agents. Both a multi-objective optimization technique and a multi-scale continuous loss (MSCL) reward are used to train agents using a progressive penalty to avoid consecutive losses of portfolio value. For evaluation, we compared our method to multiple baselines. As a result, better cumulative returns are achieved than when baseline methods are used. In addition, the superiority of our method is emphasized by the result of the bearish test set, where only our method can make a profit. Specifically, our method obtains a 2.36% cumulative return, whereas the baseline methods result in negative cumulative returns. In comparison to FinRL-Ensemble, a reinforcement learning-based method, our method achieves a 46.05% greater cumulative return in the bullish test set.
Schematic diagram of our system overview: (a) data preparation, (b) multi-agent proximal policy optimization (MAPPO), and (c) simulated cryptocurrency market environment.
Published in: IEEE Access ( Volume: 11)
Page(s): 66440 - 66455
Date of Publication: 27 June 2023
Electronic ISSN: 2169-3536

References

References is not available for this document.