Abstract:
Experimental and observational instruments for scientific research (such as light sources, genome sequencers, accelerators, telescopes and electron microscopes) increasin...Show MoreNotes: As originally submitted and published there was an error in this document. The authors subsequently provided the following text: "This research used resources of the Argonne Leadership Computing Facility and the Advanced Photon Source, which are U.S. Department of Energy (DOE) Office of Science User Facilities operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357; the Linac Coherent Light Source (LCLS), SLAC National Accelerator Laboratory, is supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences under Contract No. DE-AC02-76SF00515; the National Energy Research Scientific Computing Center (NERSC) using NERSC award ASCR-ERCAP0016375 and the Advanced Light Source, which are U.S. Department of Energy Office of Science User Facilities located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231; the Genomic Science Program in the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER) under contract numbers DE-AC02-05CH11231 (LBNL), 89233218CNA000001 (LANL), DE-AC05-00OR22725 (ORNL), and DE-AC05-76RL01830 (PNNL); and the Compute and Data Environment for Science (CADES) and the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725." The original article PDF remains unchanged.
Metadata
Abstract:
Experimental and observational instruments for scientific research (such as light sources, genome sequencers, accelerators, telescopes and electron microscopes) increasingly require High Performance Computing (HPC) scale capabilities for data analysis and workflow processing. Next-generation instruments are being deployed with higher resolutions and faster data capture rates, creating a big data crunch that cannot be handled by modest institutional computing resources. Often these big data analysis pipelines also require near real-time computing and have higher resilience requirements than the simulation and modeling workloads more traditionally seen at HPC centers. While some facilities have enabled workflows to run at a single HPC facility, there is a growing need to integrate capabilities across HPC facilities to enable cross-facility workflows, either to provide resilience to an experiment, increase analysis throughput capabilities, or to better match a workflow to a particular architecture. In this paper we describe the barriers to executing complex data analysis workflows across HPC facilities and propose an architectural design pattern for enabling scientific discovery using cross-facility workflows that includes orchestration services, application programming interfaces (APIs), data access and co-scheduling.
Notes: As originally submitted and published there was an error in this document. The authors subsequently provided the following text: "This research used resources of the Argonne Leadership Computing Facility and the Advanced Photon Source, which are U.S. Department of Energy (DOE) Office of Science User Facilities operated for the DOE Office of Science by Argonne National Laboratory under Contract No. DE-AC02-06CH11357; the Linac Coherent Light Source (LCLS), SLAC National Accelerator Laboratory, is supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences under Contract No. DE-AC02-76SF00515; the National Energy Research Scientific Computing Center (NERSC) using NERSC award ASCR-ERCAP0016375 and the Advanced Light Source, which are U.S. Department of Energy Office of Science User Facilities located at Lawrence Berkeley National Laboratory, operated under Contract No. DE-AC02-05CH11231; the Genomic Science Program in the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER) under contract numbers DE-AC02-05CH11231 (LBNL), 89233218CNA000001 (LANL), DE-AC05-00OR22725 (ORNL), and DE-AC05-76RL01830 (PNNL); and the Compute and Data Environment for Science (CADES) and the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725." The original article PDF remains unchanged.
Date of Conference: 15-18 December 2021
Date Added to IEEE Xplore: 13 January 2022
ISBN Information: