Exploiting Hierarchical Parallelism and Reusability in Tensor Kernel Processing on Heterogeneous HPC Systems | IEEE Conference Publication | IEEE Xplore