A 40GOPS 250mW massively parallel processor based on matrix architecture | IEEE Conference Publication | IEEE Xplore