Exploring Attention Sparsity to Accelerate Transformer Training on GPUs | IEEE Journals & Magazine | IEEE Xplore