Skip to Main Content
This paper describes a new approach for automatic generation of efficient parallel programs from sequential blocked linear algebra programs. By exploiting recent progress in fine-grain parallel architectures such as iWarp, and in libraries based on matrix-matrix block operations such as LAPACK, the approach is expected to be effective in parallelizing a large class of linear algebra computations. An implementation of LAPACK on iWarp is under development. In the implementation, block routines are executed on the iWarp processor array using highly parallel systolic algorithms. Matrices are distributed over the array in a way that allows parallel block routines to be used wherever the original program calls a sequential block routine. This data distribution scheme significantly simplifies the process of parallelization, and as a result, efficient parallel versions of programs can be generated automatically. We discuss experiences and performance results from our preliminary implementation, and present the design of a fully automatic system.