XKBlas: a High Performance Implementation of BLAS-3 Kernels on Multi-GPU Server | IEEE Conference Publication | IEEE Xplore