Skip to Main Content
Two-dimensional convolutions are local by nature; hence every pixel in the output image is computed using surrounding information, i.e., a moving window of pixels. Although the operation is simple, the hardware is conditioned by the fact that due to bandwidth efficiency full raster rows must be read from the external memory, and that a row-major image scan should be performed to support shift-variant convolutions. When extending the architectures developed in prior-art to support shift-variant convolutions, we realize that they require large amounts of on-chip memory. While this fact may not have a large cost increase in ASIC implementations, it makes field-programmable gate arrays (FPGA) implementations expensive or not feasible. In this paper, we propose several novel FPGA-efficient architectures for generating a moving window over a row-wise print path. Because the proposed concepts have different throughput and resource utilization, we provide a criteria to choose the optimum one for any design point.