A Low Power Attention and Softmax Accelerator for Large Language Models Inference

A Low Power Attention and Softmax Accelerator for Large Language Models Inference | IEEE Conference Publication | IEEE Xplore