Skip to Main Content
In this paper we evaluate two life science algorithms, namely Needleman-Wunsch sequence alignment and Direct Coulomb Summation, for GPUs. Whereas for Needleman-Wunsch it is difficult to get good performance numbers, Direct Coulomb Summation is particularly suitable for graphics cards. We present several optimization techniques, analyze the theoretical potential of the optimizations with respect to the algorithms, and measure the effect on execution times. We target the recent NVIDIA Fermi architecture to evaluate the performance impacts of novel hardware features like the cache subsystem on optimizing transformations. We compare the execution times of CUDA and OpenCL code versions for Fermi and predecessor models with parallel OpenMP versions executed on the main CPU.