Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and Replay | IEEE Conference Publication | IEEE Xplore