Graphical Processing Units (GPUs) are frequently used for simulations of physical and biological systems. The simulated systems are often composed of simple elements that communicate only with their neighbors. But in some systems, such as large-scale neuronal networks, each element can communicate with any other element in the simulation. In this work, we present an efficient CUDA algorithm that enables this type of communication, even when using multiple GPUs. We show that it can benefit from the large memory bandwidth and number of cores in the GPU, despite the small number of required floating point operations. We implemented and evaluated this algorithm in a GPU simulator for large-scale neuronal networks. We obtained speedups of over 10 for the communication steps for simulations with 50k neurons and 50M connections, using a single computer with 2 graphic boards with 2 GPUs each, when compared with a modern quad-core CPU. When we consider the complete neuronal network simulation, its execution was nearly 40 times faster in the GPU than in the CPU.
Published in:
High Performance Computing (HiPC), 2011 18th International Conference on
Date of Conference: 18-21 Dec. 2011