Using Fermi Architecture Knowledge to Speed up CUDA and OpenCL Programs | IEEE Conference Publication | IEEE Xplore