Skip to Main Content
Loops are the most important sections for embedded applications. To achieve high performance, two loop transformation techniques are often applied, namely loop pipelining and loop partitioning, loop pipelining is an effective approach to increase parallelism and reduce schedule length. Loop partitioning with prefetching increases data locality and hides memory latency. However, loop pipelining increases register pressure and loop partitioning increases local memory requirement. As most embedded systems have limited number of registers and limited memory, without careful study, these two techniques can not be applied effectively. In this paper, we propose an effective scheduling framework, Register and Memory Sensitive Partition-ing(RMSP), to minimize average schedule length per iteration under register and memory dual constraints for parallel embedded systems. Experiments show that RMSP reduces schedule length by 14.1% in average compared to previous methods applied directly.