Many compute-intensive applications generate single result values by accessing clusters of nearby points in grids of one, two, or more dimensions. Often, the performance of FGPA implementations of such algorithms would improve if there were concurrent, non-interfering access to all points in each cluster. When clusters contain dozens of points and access patterns are irregular, multiported memories are infeasible and vector-oriented approaches are inapplicable. Instead, the grid points can be distributed across multiple interleaved memory banks so that, when accessing any cluster, each point comes from a different memory bank. We present a general technique based on the knowledge of the application¿s multidimensional indexing. This technique maps access clusters into a custom-interleaved memory using the FPGA¿s multiple on-chip RAMs and configurable data paths. Case studies examine rectangular and non-rectangular grids of different dimensionality, including performance vs. resource tradeoffs when cluster sizes are not powers of two. We also present a prototype tool for generating interleaved memories automatically from concise, application-specific definitions.
Published in:
Field Programmable Logic and Applications, 2006. FPL '06. International Conference on
Date of Conference: Aug. 2006