Skip to Main Content
The presence of multiple active threads on the same processor can mask latency by rapid context switching, but it can adversely affect performance due to competition for shared datapath resources. In this paper we present Macro Software Pipelining (MSWP), a loop scheduling technique for multithreaded processors, which is based on the loop distribution transformation for loop pipelining. MSWP constructs loop schedules by partitioning the loop body into tasks and assigning each task to a thread that executes all iterations for that particular task. MSWP is applied top-down on a hierarchical program representation, and utilizes thread-level speculation for maximal exploitation of parallelism. We tested MSWP on a multithreaded architectural model, Coral 2000, using synthetic and SPEC benchmarks. We obtained speedups of up to 30% with respect to highly optimized superblock-based schedules on loops with unpredictable branches, and a speedup of up to 25% on perl, a highly sequential SPEC95 integer benchmark.