Skip to Main Content
Existing and emerging parallel computing clusters have nodes with multiple-core CPUs. The distributed-memory property across nodes and shared-memory property within a node coexist with each other. The hybrid architecture can be well exploited by combining the MPI (message passing interface) and OpenMP libraries. This combination is able to reduce memory usage and communication costs compared with either individual approach. In addition, the proposed hybrid static-dynamic load scheduling can yield excellent load-balancing without introducing extra cost. Careful implementation of OpenMP threads can diminish parallel overhead significantly, and expedite the iterative solver in several ways. Numerical experiments validate the high performance of the presented hybrid approach.