By Topic

Exploiting memory customization in FPGA for 3D stencil computations

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

6 Author(s)
Shafiq, M. ; Comput. Sci., Barcelona Supercomput. Center, Barcelona, Spain ; Pericas, M. ; de la Cruz, R. ; Araya-Polo, M.
more authors

3D stencil computations are compute-intensive kernels often appearing in high-performance scientific and engineering applications. The key to efficiency in these memory-bound kernels is full exploitation of data reuse. This paper explores the design aspects for 3D-Stencil implementations that maximize the reuse of all input data on a FPGA architecture. The work focuses on the architectural design of 3D stencils with the form n × (n + 1) × n, where n = {2, 4, 6, 8, ...}. The performance of the architecture is evaluated using two design approaches, ¿Multi-Volume¿ and ¿Single-Volume¿. When n = 8, the designs achieve a sustained throughput of 55.5 GFLOPS in the ¿Single-Volume¿ approach and 103 GFLOPS in the ¿Multi-Volume¿ design approach in a 100-200 MHz multi-rate implementation on a Virtex-4 LX200 FPGA. This corresponds to a stencil data delivery of 1500 bytes/cycle and 2800 bytes/cycle respectively. The implementation is analyzed and compared to two CPU cache approaches and to the statically scheduled local stores on the IBM PowerXCell 8i. The FPGA approaches designed here achieve much higher bandwidth despite the FPGA device being the least recent of the chips considered. These numbers show how a custom memory organization can provide large data throughput when implementing 3D stencil kernels.

Published in:

Field-Programmable Technology, 2009. FPT 2009. International Conference on

Date of Conference:

9-11 Dec. 2009