Enabling Energy-Efficient Inference for Self-Attention Mechanisms in Neural Networks | IEEE Conference Publication | IEEE Xplore