Many video and image/signal processing applications can be structured as sequences of data-dependent tasks using a consumer/producer communication paradigm and are therefore amenable to pipelined execution. This paper presents an execution technique to speed-up the overall execution of successive, data-dependent tasks on a reconfigurable architecture. The technique pipelines sequences of data-dependent tasks by overlapping their execution subject to data-dependences. It decouples the concurrent data-path and control units and uses a custom, application data-driven, fine-grained synchronization and buffering scheme. In addition, the execution scheme allows for out-of- order, but data-dependent producer-consumer pairs not allowed by previous data-driven pipelining approaches. The approach has been exploited in the context of a high-level compiler targeting FPGAs. The preliminary experimental results reveal noticeable performance improvements and buffer size reductions for a number of benchmarks over traditional approaches.
Published in:
Field-Programmable Custom Computing Machines, 2007. FCCM 2007. 15th Annual IEEE Symposium on
Date of Conference: 23-25 April 2007