A Policy Gradient Based Reinforcement Learning Method for Supply Chain Management | IEEE Conference Publication | IEEE Xplore