An Efficient Checkpointing System for Large Machine Learning Model Training | IEEE Conference Publication | IEEE Xplore