I. Introduction
With the growth in the adoption of multicore architectures extracting parallelism from programs has become a central task in improving program performance. Given that program's loops account for most of the program's execution time, loop parallelization has always been considered a central optimization problem.