1 Introduction
Common digital image and signal processing algorithms are structured as a sequence of localized operators (e.g., gradient or min/max) over overlapping blocks of data, typically organized as windows. Given the natural n-dimensional arrangement of the data associated with these algorithms, these window-based computations are naturally expressed in tight loop nests in popular imperative languages such as C. As operators in these algorithms tend to exhibit substantial amounts of instruction level parallelism and the potential for customized implementation (e.g., specific arithmetic formats and/or operations), they are natural candidates for efficient implementation in hardware using Field-Programmable-Gate-Arrays (FPGA) devices.