A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm | IEEE Journals & Magazine | IEEE Xplore