By Topic

Generation of Heterogeneous Distributed Architectures for Memory-Intensive Applications Through High-Level Synthesis

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Chao Huang ; Virginia Polytech. Inst. & State Univ., Blacksburg ; Ravi, S. ; Raghunathan, A. ; Jha, N.K.

Memory-intensive applications present unique challenges to an application-specific integrated circuit (ASIC) designer in terms of the choice of memory organization, memory size requirements, bandwidth and access latencies, etc. The high potential of single-chip distributed logic-memory architectures in addressing many of these issues has been recognized in general-purpose computing, and more recently, in ASIC design. The high-level synthesis (HLS) techniques presented in this paper are motivated by the fact that many memory-intensive applications exhibit irregular array data access patterns. Synthesis should, therefore, be capable of determining a partitioned architecture, wherein array data and computations may have to be heterogeneously distributed for achieving the best performance speed-up. We use a combination of clustering and min-cut style partitioning techniques to yield distributed architectures, based on simulation profiling while considering various factors including data access locality, balanced workloads, inter-partition communication, etc. Our experiments with several benchmark applications show that the proposed techniques yielded two-way partitioned architectures that can achieve upto 2.1 x (average of 1.9 x) performance speed-up over conventional HLS solutions, while achieving upto 1.5 x (average of 1.4 x) performance speed-up over the best homogeneous partitioning solution feasible. At the same time, the reduction in the energy-delay product over conventional single-memory designs is upto 2.7 x (average of 2.0 x). A larger amount of partitioning makes further system performance improvement achievable at the cost of chip area.

Published in:

Very Large Scale Integration (VLSI) Systems, IEEE Transactions on  (Volume:15 ,  Issue: 11 )