Abstract:
In order to reduce the quadratic cost of matrix-vector multiplications in dense and attention layers, Monarch matrices have been recently introduced, achieving a sub-quad...Show MoreMetadata
Abstract:
In order to reduce the quadratic cost of matrix-vector multiplications in dense and attention layers, Monarch matrices have been recently introduced, achieving a sub-quadratic complexity. It consists in factorizing a matrix using fixed permutations and learned block diagonal matrices, at the price of a small performance drop. We propose a more general model where some permutations are learned. The optimization algorithm explores the space of permutations using a Straight-Through Estimator (STE) inspired by the support exploration algorithm designed for sparse support recovery. Our experimental results demonstrate performance improvement in the context of sparse matrix factorization and of end-to-end sparse learning.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: