Residual Policy Optimization with Trust Region Constraints: A Learning Framework for Stable and Agile Wheel-Legged Locomotion | IEEE Journals & Magazine | IEEE Xplore