Skip to Main Content
The huge data requirements of large nowadays applications in science and engineering make optimised and scalable data placement mechanisms an essential need. For this purpose, we propose a scheduling scheme based on an efficient data locality management for data-intensive workflows. Transfer and placement decisions are made based on constructions in the workflow, representing inter-relationships between inputs and outputs at its different levels. When running large applications, most of the input data would not be shipped, keeping the data close to the jobs, and resulting on mush less communication and transfer overheads. We have implemented these techniques for the YML workflow system. This paper presents results showing a substantial improvement in the performance of many interdependent multi-level workflows through these data placement optimisations.