Optimization of the parallel black-box fast multipole method on CUDA | IEEE Conference Publication | IEEE Xplore