Feasibility of Running Singularity Containers with Hybrid MPI on NASA High-End Computing Resources | IEEE Conference Publication | IEEE Xplore

Feasibility of Running Singularity Containers with Hybrid MPI on NASA High-End Computing Resources


Abstract:

This work investigates the feasibility of a Singularity container-based solution to support a customizable computing environment for running users' MPI applications in “h...Show More

Abstract:

This work investigates the feasibility of a Singularity container-based solution to support a customizable computing environment for running users' MPI applications in “hybrid” MPI mode-where the MPI on the host machine works in tandem with MPI inside the container-on NASA's High-End Computing Capability (HECC) resources. Two types of real-world applications were tested: traditional High-Performance Computing (HPC) and Artificial Intelligence/Machine Learning (AI/ML). On the traditional HPC side, two JEDI containers built with Intel MPI for Earth science modeling were tested on both HECC in-house and HECC AWS Cloud CPU resources. On the AI/ML side, a NVIDIA TensorFlow container built with OpenMPI was tested with a Neural Collaborative Filtering recommender system and the ResNet-50 computer image system on the HECC in-house V100 GPUs. For each of these applications and resource environments, multiple hurdles were overcome after lengthy debugging efforts. Among them, the most significant ones were due to the conflicts between a host MPI and a container MPI and the complexity of the communication layers underneath. Although porting containers to run with a single node using just the container MPI is quite straightforward, our exercises demonstrate that running across multiple nodes in hybrid MPI mode requires knowledge of Singularity, MPI libraries, the operating system image, and the communication infrastructure such as the transport and network layers, which are traditionally handled by support staff of HPC centers and hardware or software vendors. In conclusion, porting and running Singularity containers on HECC resources or other data centers with similar environments is feasible but most users would need help to run them in hybrid MPI mode.
Date of Conference: 14-14 November 2021
Date Added to IEEE Xplore: 28 December 2021
ISBN Information:
Conference Location: St. Louis, MO, USA

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.