Abstract:
Flocking formation of unmanned aerial vehicles (UAVs) is an open challenge due to kinematics complexity and uncertainties in complex environments. In this paper, the UAV ...Show MoreMetadata
Abstract:
Flocking formation of unmanned aerial vehicles (UAVs) is an open challenge due to kinematics complexity and uncertainties in complex environments. In this paper, the UAV flocking control problem is formulated as a partially observable Markov decision process (POMDP) and solved by deep reinforcing learning. In particular, we consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. The simulation results demonstrate that the trained optimal policy converges to flocking formation without parameter tuning and has good generalization ability for different UAVs.
Date of Conference: 19-22 November 2021
Date Added to IEEE Xplore: 04 January 2022
ISBN Information: