Skip to Main Content
The generic matrix multiply (GEMM) subprogram is the core element of high-performance linear algebra software used in computationally-demanding digital signal processing (DSP) systems. We propose an acceleration technique for GEMM based on dynamically adjusting the precision of computation. Our technique employs DSP methods (such as scalar companding and rounding), followed by a new form of tight packing in floating-point that allows for concurrent calculation of multiple results. Since the companding process controls the increase of concurrency (via packing), the increase in processing throughput (and the corresponding loss in precision) depends on the input data statistics: low-variance parts of the matrix multiplication are computed faster than high-variance parts and the error is controlled in a stochastic and not in a worst-case sense. This can convert high-performance numerical DSP libraries into a computation channel where the output error increases when higher throughput is requested. Potential DSP applications that can benefit from the proposed approach are highlighted.