Skip to Main Content
We describe some modifications of the LAPACK dense linear-algebra algorithms using recursion. Recursion leads to automatic variable blocking. LAPACK's level-2 versions transform into level-3 codes by using recursion. The new recursive codes are written in FORTRAN 77, which does not support recursion as a language feature. Gaussian elimination with partial pivoting and Cholesky factorization are considered. Very clear algorithms emerge with the use of recursion. The recursive codes do exactly the same computation as the LAPACK codes, and a single recursive code replaces both the level-2 and level-3 versions of the corresponding LAPACK codes. We present an analysis of the recursive algorithm in terms of both FLOP count and storage usage. The matrix operands are more “squarish” using recursion. The total area of the submatrices used in the recursive algorithm is less than the total area used by the LAPACK level-3 right-/left-looking algorithms. We quantify the difference; we also quantify how the FLOPS are computed. Also, we show that the algorithms exhibit high-performance on RISC-type processors. In fact, except for small matrices, the recursive version outperforms the level-3 LAPACK versions of DGETRF and DPOTRF on an RS/6000™ workstation. For the level-2 versions, the performance gain approaches a factor of 3. We also demonstrate that a change to the LAPACK DLASWP routine can improve the performance of both the recursive version and DGETRF by more than 15 percent.
Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.