Optimization of parallel BP implementation: training speed of 1056 MCUPS on the massive | IEEE Conference Publication | IEEE Xplore