GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference

GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference | IEEE Conference Publication | IEEE Xplore