Tensor Contractions with Extended BLAS Kernels on CPU and GPU | IEEE Conference Publication | IEEE Xplore