By Topic

A Difference of Convex Functions Approach to Large-Scale Log-Linear Model Estimation

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Tsiligkaridis, T. ; Dept. of Electr. Eng. & Comput. Sci., Univ. of Michigan, Ann Arbor, MI, USA ; Marcheret, E. ; Goel, V.

We introduce a new class of parameter estimation methods for log-linear models. Our approach relies on the fact that minimizing a rational function of mixtures of exponentials is equivalent to minimizing a difference of convex functions. This allows us to construct convex auxiliary functions by applying the concave-convex procedure (CCCP). We consider a modification of CCCP where a proximal term is added (ProxCCCP), and extend it further by introducing an ℓ1 penalty. For solving the ` convex + ℓ1' auxiliary problem, we propose an approach called SeqGPSR that is based on sequential application of the GPSR procedure. We present convergence analysis of the algorithms, including sufficient conditions for convergence to a critical point of the objective function. We propose an adaptive procedure for varying the strength of the proximal regularization term in each ProxCCCP iteration, and show this procedure (AProxCCCP) is effective in practice and stable under some mild conditions. The CCCP procedure and proposed variants are applied to the task of optimizing the cross-entropy objective function for an audio frame classification problem. Class posteriors are modeled using log-linear models consisting of approximately 6 million parameters. Our results show that CCCP variants achieve a much better cross-entropy objective value as compared to direct optimization of the objective function by a first order gradient based approach, stochastic gradient descent or the L-BFGS procedure.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:21 ,  Issue: 11 )