Skip to Main Content
Most hardware compilers apply loop pipelining to increase the parallelism achieved, but pipelining is restricted to the only innermost level in a nested loop. In this work we extend and adapt an existing outer loop pipelining approach known as single dimension software pipelining to generate schedules for field-programmable gate-array (FPGA) hardware coprocessors. Each loop level in nine test loops is pipelined and the resulting schedules are implemented in VHDL and targeted to an Altera Stratix II FPGA. The results show that the fastest solution for all but one of the loops occurs when pipelining is applied one to three levels above the innermost loop. Across the nine test loops we achieve an acceleration over the innermost loop solution of up to seven times, with a mean speedup of 3.2 times. The results suggest that inclusion of outer loop pipelining in future hardware compilers may be worthwhile as it can allow significantly improved results to be achieved at the cost of a small increase in compile time.