Abstract:
Reducing power consumption and increasing efficiency is a key concern for many applications. It is well-accepted that specialization and heterogeneity are crucial strateg...Show MoreMetadata
Abstract:
Reducing power consumption and increasing efficiency is a key concern for many applications. It is well-accepted that specialization and heterogeneity are crucial strategies to improve both power and performance. Yet, how to design highly efficient processing elements while maintaining enough flexibility within a domain of applications is a fundamental question. In this paper, we present the design of a specialized Linear Algebra Core (LAC) for an important class of computational kernels, the level-3 Basic Linear Algebra Subprograms (BLAS). We demonstrate a detailed algorithm/architecture co-design for mapping a number of level-3 BLAS operations onto the LAC. Results show that our prototype LAC achieves a performance of around 64 GFLOPS (double precision) for these operations, while consuming less than 1.3 Watts in standard 45nm CMOS technology. This is on par with a full-custom design and up to 50× and 10× better in terms of power efficiency than CPUs and GPUs.
Published in: 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors
Date of Conference: 09-11 July 2012
Date Added to IEEE Xplore: 10 November 2012
ISBN Information: