Skip to Main Content
A matrix multiplication algorithm on a linear array of processing elements is described. The local storage required by the processing elements and the I/O bandwidth required to drive the array are both constants that are independent of the sizes of the matrices being multiplied. The algorithm is therefore modular, that is, arbitrarily large matrices can be multiplied on a large array built by cascading smaller arrays. Each of the matrix elements is read only once from a fixed I/O port and the algorithm does not use global broadcasting. It is also shown that the proposed algorithm computes the n3 scalar products (where n is the size of the two matrices being multiplied) using an optimal number of processing elements.