HEPPO: Hardware-Efficient Proximal Policy Optimization a Universal Pipelined Architecture for Generalized Advantage Estimation | IEEE Conference Publication | IEEE Xplore