Skip to Main Content
This paper presents a light-weight and scalable framework that enables non-privileged users to effortlessly and instantaneously describe, deploy, and execute data-intensive workflows on arbitrary computing resources from clusters, clouds, and supercomputers. This framework consists of three major components: GXP parallel/distributed shell as resource explorer and framework back-end, GMount distributed file system as underlying data sharing approach, and GXP Make as the workflow engine. With this framework, domain researchers can intuitively write workflow description in GNU make rules and harness resources from different domains with low learning and setup cost. By investigating the execution of real-world scientific applications using this framework on multi-cluster and supercomputer platforms, we demonstrate that our processing framework has practically useful performance and are suitable for common practice of data-intensive workflows in various distributed computing environments.