Skip to Main Content
Auto-parallelizing compilers for embedded applications have been unsuccessful due to the widespread use of pointer arithmetic and the complex memory model of multiple-address space digital signal processors (DSPs). This work develops, for the first time, a complete auto-parallelization approach, which overcomes these issues. It first combines a pointer conversion technique with a new modulo elimination transformation for program recovery enabling later parallelization stages. Next, it integrates a novel data transformation technique that exposes the processor location of partitioned data. When this is combined with a new address resolution mechanism, it generates efficient programs that run on multiple address spaces without using message passing. Furthermore, as DSPs do not possess any data cache structure, an optimization is presented which transforms the program to both exploit remote data locality and local memory bandwidth. This parallelization approach is applied to the DSPstone and UTDSP benchmark suites, giving an average speedup of 3.78 on four analog devices TigerSHARC TS-101 processors.
Parallel and Distributed Systems, IEEE Transactions on (Volume:16 , Issue: 3 )
Date of Publication: March 2005