ZeRO: Memory optimizations Toward Training Trillion Parameter Models | IEEE Conference Publication | IEEE Xplore