Skip to Main Content
Sparse matrix-vector multiplication is an important computational kernel in scientific applications. However, it performs poorly on modern processors because of a low compute-to-memory ratio and its irregular memory access patterns. This paper discusses the implementations of sparse matrix-vector algorithm using OpenMP to execute iterative methods on the Dawning S4800A1. Two storage formats (CSR and BCSR) for sparse matrices and three scheduling schemes (static, dynamic and guided) provided by the standard OpenMP are evaluated. We also compared these three schemes with non-zero scheduling, where each thread is assigned approximately the same number of non-zero elements. Experimental data shows that, the non-zero scheduling can provide the best performance in most cases. The current implementation provides satisfactory scalability for most of matrices. However, we only get a limited speedup for some large matrices that contain millions of non-zero elements.