Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

Generation of Heterogeneous Distributed Architectures for Memory-Intensive Applications Through High-Level Synthesis

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
4 Author(s)
Chao Huang ; Virginia Polytech. Inst. & State Univ., Blacksburg ; Ravi, S. ; Raghunathan, A. ; Jha, N.K.

Memory-intensive applications present unique challenges to an application-specific integrated circuit (ASIC) designer in terms of the choice of memory organization, memory size requirements, bandwidth and access latencies, etc. The high potential of single-chip distributed logic-memory architectures in addressing many of these issues has been recognized in general-purpose computing, and more recently, in ASIC design. The high-level synthesis (HLS) techniques presented in this paper are motivated by the fact that many memory-intensive applications exhibit irregular array data access patterns. Synthesis should, therefore, be capable of determining a partitioned architecture, wherein array data and computations may have to be heterogeneously distributed for achieving the best performance speed-up. We use a combination of clustering and min-cut style partitioning techniques to yield distributed architectures, based on simulation profiling while considering various factors including data access locality, balanced workloads, inter-partition communication, etc. Our experiments with several benchmark applications show that the proposed techniques yielded two-way partitioned architectures that can achieve upto 2.1 x (average of 1.9 x) performance speed-up over conventional HLS solutions, while achieving upto 1.5 x (average of 1.4 x) performance speed-up over the best homogeneous partitioning solution feasible. At the same time, the reduction in the energy-delay product over conventional single-memory designs is upto 2.7 x (average of 2.0 x). A larger amount of partitioning makes further system performance improvement achievable at the cost of chip area.

Published in:

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on  (Volume:15 ,  Issue: 11 )