Abstract:
Existing approaches to neural network compression have failed to holistically address algorithmic (training accuracy) and computational (inference performance) demands of...Show MoreMetadata
Abstract:
Existing approaches to neural network compression have failed to holistically address algorithmic (training accuracy) and computational (inference performance) demands of real-world systems, particularly on resource-constrained devices. We present C3-Flow, a new approach adding non-uniformity to low-rank approximations and designed specifically to enable highly-efficient computation on common hardware architectures while retaining more accuracy than competing methods. Evaluation on two state-of-the-art acoustic models (versus existing work, empirical limit study approaches, and hand-tuned models) demonstrates up to 60% lower error. Finally, we show that our co-design approach achieves up to 14X inference speedup across three Haswell- and Broadwell-based platforms.
Published in: 2019 56th ACM/IEEE Design Automation Conference (DAC)
Date of Conference: 02-06 June 2019
Date Added to IEEE Xplore: 22 August 2019
ISBN Information:
Print on Demand(PoD) ISSN: 0738-100X
Conference Location: Las Vegas, NV, USA