Skip to Main Content
Proposed here are two kinds of vectorized LU decomposition algorithms for an unstructured sparse matrix arising from large scale circuit simulation. Either algorithm implemented on our supercomputer S810 improves efficiency 11 to 82 times for LU decomposition and 2.1 to 8.9 times in total simulation, as compared with a conventional algorithm. Both algorithms detect operational parallelism in the irregularity of a matrix. While one of them limits the scope of parallelism detection to each set of consecutive columns so as to take advantage of the dense matrix method applicable to the lower right corner, the other tries to locate parallelism all over the matrix in every step without switching to a faster linear-index vector.