A 118 GOPS/mm23D eDRAM TensorCore Architecture for Large-scale Matrix Multiplication | IEEE Conference Publication | IEEE Xplore