Skip to Main Content
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. In our previously proposed framework we have speculatively executed a loop as a doall, and applied a fully parallel data dependence test to determine if it had any cross-processor dependences; if the test failed, then the loop was re-executed serially. While this method exploits doall parallelism well, it can cause slowdowns for loops with even one cross-processor flow dependence because we have to re-execute sequentially. Moreover, the existing, partial parallelism of loops is not exploited. We now propose a generalization of our speculative doall parallelization technique, called the Recursive LRPD test, that can extract and exploit the maximum available parallelism of any loop and that limits potential slowdowns to the overhead of the runtime dependence test itself. We present the base algorithm and an analysis of the different heuristics for its practical application and a few experimental results on loops from Track, Spice, and FMA3D.