On Improving the Performance of Multi-threaded CUDA Applications with Concurrent Kernel Execution by Kernel Reordering | IEEE Conference Publication | IEEE Xplore