Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling | IEEE Conference Publication | IEEE Xplore