Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates | IEEE Conference Publication | IEEE Xplore