Abstract:
Modern scientific instruments generate enormous amount of data. Typically, the data collected from the instruments are stored in one or more files that are then moved to ...Show MoreMetadata
Abstract:
Modern scientific instruments generate enormous amount of data. Typically, the data collected from the instruments are stored in one or more files that are then moved to a distant supercomputer for processing. The final results are sent back to the user. In order to make effective use of the time on expensive instruments, experimenters want to process the data as they are generated. They want to stream the data from instruments’ memory directly to a supercomputer’s memory for analysis. Since the compute nodes in a supercomputer are not connected directly to the wide area network, the data streams need to be passed through intermediate gateway nodes. As opposed to the best effort file transfers, data streaming applications require resources at a specific time for a specific period. In this paper, we present a system model for enabling data streaming through gateway nodes and an algorithm to efficiently allocate gateway node resources along with compute nodes. We evaluate the algorithm using real-world traces on the Chameleon Cloud. The results show that our system can schedule compute and gateway resources efficiently for streaming analysis.
Date of Conference: 03-06 August 2020
Date Added to IEEE Xplore: 30 September 2020
ISBN Information: