Abstract:
Noise plays an important role in the gradient-based optimization methods, and a series of numerical experiments have demonstrated that adding gradient noise improves lear...Show MoreMetadata
Abstract:
Noise plays an important role in the gradient-based optimization methods, and a series of numerical experiments have demonstrated that adding gradient noise improves learning for neural networks. However, the mathematical interpretation of the noise remains a challenge. In this paper, we show that, the noise variation can be regarded as a smoothing factor, and we prove that, under certain conditions, a noisy gradient descent (NG) enjoys linear global convergence in expectation sense. We contribute to this problem by introducing an intermediate which connect the NG method to the smoothed function. On the one hand, this connection reveals that applying the NG method to a function is the same as applying the gradient method to the corresponding function smoothed by the noise; and on the other hand, it allows us to establish the convergence behavior of the NG in a global sense. Moreover, we also consider what conditions make the global minimizer of the smoothed function not far from the original global minimizer.
Date of Conference: 09-12 October 2022
Date Added to IEEE Xplore: 18 November 2022
ISBN Information: