Skip to Main Content
Applying the stream-based computing approach, the general purpose computing on graphics processing units has become to be considered as a breakthrough to overcome the performance bottleneck as seen in the recent CPU architecture. However, the program potentially includes the data transfer overhead if it has recursive I/Os. During the recursive operation in the GPU-based program, the output streams are copied to the input ones and this overhead degrades the performance. This paper proposes the best method to eliminate the transfer overheads and shows design and implementation of the method based on CUDA and OpenCL. The experimental evaluation using realistic applications shows the method eliminates the transfer overhead and the method exploits the potential performance of GPU.