Revisiting LARS for Large Batch Training Generalization of Neural Networks | IEEE Journals & Magazine | IEEE Xplore

Revisiting LARS for Large Batch Training Generalization of Neural Networks


Impact Statement:The rapid growth in deep learning, especially in building foundation models (e.g. large (visual) language model, etc.) has led to increasing demand for efficient and scal...Show More

Abstract:

This article investigates large batch training techniques using layer-wise adaptive scaling ratio (LARS) across diverse settings. In particular, we first show that a stat...Show More
Impact Statement:
The rapid growth in deep learning, especially in building foundation models (e.g. large (visual) language model, etc.) has led to increasing demand for efficient and scalable training techniques, particularly with large batch sizes. Large batch training enhances both training and hardware utilization for building AI models. However, large batch training occurs in unstable performance. Current methods such as LARS with warm-up, while effective, often face challenges in maintaining performance, especially in the presence of sharp minimizers. This article introduces TVLARS, a novel approach that addresses these limitations by enabling more robust training and improved generalization across diverse settings. With demonstrated improvements of up to 2% in classification tasks and up to 10% in self-supervised learning scenarios, TVLARS has the potential to significantly enhance the efficiency and effectiveness of large-scale, accurate, and reliable deep learning applications.

Abstract:

This article investigates large batch training techniques using layer-wise adaptive scaling ratio (LARS) across diverse settings. In particular, we first show that a state-of-the-art technique, called LARS with the warm-up, tends to be trapped in sharp minimizers early on due to redundant ratio scaling. Additionally, a fixed steep decline in the latter phase restricts deep neural networks from effectively navigating early-phase sharp minimizers. To address these issues, we propose time varying LARS (TVLARS), a novel algorithm that replaces warm-up with a configurable sigmoid-like function for robust training in the initial phase. TVLARS promotes gradient exploration early on, surpassing sharp optimizers and gradually transitioning to LARS for robustness in later stages. Extensive experiments demonstrate that TVLARS consistently outperforms LARS and LAMB in most cases, with up to 2% improvement in classification scenarios. In all self-supervised learning cases, TVLARS achieves up to 10%...
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 6, Issue: 5, May 2025)
Page(s): 1321 - 1333
Date of Publication: 30 December 2024
Electronic ISSN: 2691-4581

References

References is not available for this document.