Abstract:
The extreme success of stochastic optimization (SO) in large-scale machine learning problems, information retrieval, bioinformatics, etc., has been widely reported, espec...Show MoreMetadata
Abstract:
The extreme success of stochastic optimization (SO) in large-scale machine learning problems, information retrieval, bioinformatics, etc., has been widely reported, especially in recent years. As an effective tactic, conjugate gradient (CG) has been gaining its popularity in accelerating SO algorithms. This paper develops a novel type of stochastic conjugate gradient descent (SCG) algorithms from the perspective of the Powerball strategy and the hypergradient descent (HD) technique. The crucial idea behind the resulting methods is inspired by pursuing the equilibrium of ordinary differential equations (ODEs). We elucidate the effect of the Powerball strategy in SCG algorithms. The introduction of HD, on the other side, makes the resulting methods work with an online learning rate. Meanwhile, we provide a comprehension of the theoretical results for the resulting algorithms under non-convex assumptions. As a byproduct, we bridge the gap between the learning rate and powered stochastic optimization (PSO) algorithms, which is still an open problem. Resorting to numerical experiments on numerous benchmark datasets, we test the parameter sensitivity of the proposed methods and demonstrate the superior performance of our new algorithms over state-of-the-art algorithms.
Published in: IEEE Transactions on Big Data ( Volume: 9, Issue: 6, December 2023)