Skip to Main Content
Multimedia and DSP applications have several computationally intensive kernels which are often off loaded and accelerated by application-specific hardware. This paper presents a speculative loop pipelining technique to overcome limitations of binary translation for hardware acceleration. Although many compilers have been developed at source level, it is desirable to translate the binary targeted to popular processors onto hardware for several practical benefits. However, the translated code can be less optimized. In particular, it is difficult to optimize memory accesses on binary to exploit pipeline parallelism since memory optimization techniques require perfect dependence information for correctness and efficiency. This information is not often available at binary level or even at the source level. Our technique synthesizes the pipeline with memory dependence speculation and postpones some phases of compilation by generating a small dependence analysis code or logic which makes use of runtime values. Such speculative optimization achieves the large amount of parallelism and does not depend on any user annotation. The experimental results show a promising speedup of up to 2.53 compared with the code in which memory accesses are not optimized in the pipeline fashion due to conservative memory analysis. In addition, we have evaluated our technique at hardware level implementation on FPGA devices and achieved comparable clock frequency and power consumption compared to a conservative method while achieving significant improvement in throughput.