On the Convergence Proof of AMSGrad and a New Version | IEEE Journals & Magazine | IEEE Xplore

On the Convergence Proof of AMSGrad and a New Version


Reddi et al. (ICLR 2018) have recently shown that the Adam optimizer (ICLR 2015) is problematic and they have also proposed a variant of Adam called AMSGrad as a fix. We ...

Abstract:

The adaptive moment estimation algorithm Adam (Kingma and Ba) is a popular optimizer in the training of deep neural networks. However, Reddi et al. have recently shown th...Show More

Abstract:

The adaptive moment estimation algorithm Adam (Kingma and Ba) is a popular optimizer in the training of deep neural networks. However, Reddi et al. have recently shown that the convergence proof of Adam is problematic, and they have also proposed a variant of Adam called AMSGrad as a fix. In this paper, we show that the convergence proof of AMSGrad is also problematic. Concretely, the problem in the convergence proof of AMSGrad is in handling the hyper-parameters, treating them as equal while they are not. This is also the neglected issue in the convergence proof of Adam. We provide an explicit counter-example of a simple convex optimization setting to show this neglected issue. Depending on manipulating the hyper-parameters, we present various fixes for this issue. We provide a new convergence proof for AMSGrad as the first fix. We also propose a new version of AMSGrad called AdamX as another fix. Our experiments on the benchmark dataset also support our theoretical results.
Reddi et al. (ICLR 2018) have recently shown that the Adam optimizer (ICLR 2015) is problematic and they have also proposed a variant of Adam called AMSGrad as a fix. We ...
Published in: IEEE Access ( Volume: 7)
Page(s): 61706 - 61716
Date of Publication: 13 May 2019
Electronic ISSN: 2169-3536

References

References is not available for this document.