Scheduled System Maintenance:
On Wednesday, July 29th, IEEE Xplore will undergo scheduled maintenance from 7:00-9:00 AM ET (11:00-13:00 UTC). During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

Efficient synthesis of array intensive computations onto FPGA based accelerators

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Shenoy, N. ; Dept. of Electr. & Comput. Eng., Northwestern Univ., Evanston, IL, USA ; Banerjee, P. ; Choudhary, A. ; Kandemir, M.

Array intensive computations are characterized by processing of large arrays stored in external memory in multiple loops. Synthesizing these computations onto FPGAs involves automatic translation of the behavioral description into state machines controlled by a clock such that the execution time of the program as a whole is the minimum and area requirement does not exceed a predefined limit. The synthesis algorithm also needs to efficiently sequence the array, accesses taking into account memory access requirements such as pipelining. In this paper we present two algorithms each with a specific emphasis to handle this synthesis problem. Our heuristic algorithm generates good solutions in a very short time (less than a second), while our mixed integer linear programming (MILP) based algorithm can generate optimal solution given sufficient time. Both try to minimize execution time and area. Our algorithms not only look at individual loops to exploit parallelism but also consider them together while deciding the clock. The overall execution time is minimized and not just the number of cycles or the cycle time. They also efficiently synthesize memory accesses to fully exploit the memory pipelining. We compare these two algorithms in terms of their relative strengths

Published in:

VLSI Design, 2001. Fourteenth International Conference on

Date of Conference: