Skip to Main Content
A fast, efficient, parallel algorithm for stable matrix inversion based on Givens plane rotations is described. The algorithm is implemented on VLSI systolic architecture that is capable of inverting any n*n nonsingular dense matrix in 5n units of time, including I/O time. The array architecture consists of n2+n processing elements (PEs) arranged as a cascade of two triangular arrays of (n2+n)/2 PEs each. The parallel algorithm involves the following processes: QR-decomposition by the Givens rotations technique; inversion of the upper triangular matrix R and multiplication of R-1 by Q. All three components of the algorithm are maximally overlapped with no need for intermediate I/O or global communications. A novel technique for concurrently inverting distinct matrices on the same array has also been suggested. This technique, based on the use of two levels of pipelining, significantly improves the array throughput. The execution speed of the algorithm matches that of the fastest systolic implementation of matrix inversion using Gaussian elimination without pivoting (which is unstable in most cases) reported to date.