By Topic

Many-Task Computing on Grids and Supercomputers (MTAGS), 2010 IEEE Workshop on

Date 15-15 Nov. 2010

Filter Results

Displaying Results 1 - 12 of 12
  • Table of contents

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (36 KB)  
    Freely Available from IEEE
  • Many Task Computing for modeling the fate of oil discharged from the Deep Water Horizon well blowout

    Page(s): 1 - 7
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (111 KB) |  | HTML iconHTML  

    The Deep Water Horizon well blowout on April 20th 2010 discharged between 40,000-1.2 million tons of crude oil into the Gulf of Mexico. In order to understand the fate and impact of the discharged oil, particularly on the environmentally sensitive Florida Keys region, we have implemented a multi-component application which consists of many individual tasks that utilize a distributed set of computational and data management resources. The application consists of two 3D ocean circulation models of the Gulf and South Florida and a 3D oil spill model. The ocean models used here resolve the Gulf at 2 km and the South Florida region at 900 m. This high resolution information on the ocean state is then integrated with the oil model to track the fate of approximately 10 million oil particles. These individual components execute as MPI based parallel applications on a 576 core IBM Power 5 cluster and a 5040 core Linux cluster, both operated by the Center for Computational Science, University of Miami. The data and workflow between is handled by means of a custom distributed software framework built around the Open Project for Networked Data Access Protocol (OPeNDAP). In this paper, we present this application as an example of Many Task Computing, report on the execution characteristics of this application, and discuss the challenges presented by the many task distributed workflow involving heterogeneous components. The application is a typical example from the ocean modeling and forecasting field and imposes soft timeliness and output quality constraints on top of the traditional performance requirements. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Many-task applications in the Integrated Plasma Simulator

    Page(s): 1 - 10
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (501 KB) |  | HTML iconHTML  

    This paper discusses the Integrated Plasma Simulator (IPS), a framework for coupled multiphysics simulation of fusion plasmas, in the context of many-task computing. The IPS supports multiple levels of parallelism: individual computational tasks can be parallel, components can launch multiple tasks concurrently, tasks from multiple components can be executed concurrently within a simulation, and multiple simulations can be run simultaneously. Each level of parallelism is constructed on top of the many-task computing capabilities implemented in the IPS, the foundation for the parallelism presented at the multiple simulation level. We show that a modest number of simultaneous simulations, with appropriately sized resource allocations, can provide a better trade-off between resource utilization and overall execution time than if they are run as separate jobs. This approach is highly beneficial for situations in which individual simulation tasks may differ significantly in parallel scalability, as is the case in many scientific communities where coupled simulations rely substantially on legacy code. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compute and data management strategies for grid deployment of high throughput protein structure studies

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (92 KB) |  | HTML iconHTML  

    The study of macromolecular protein structures at an atomic resolution is the source of many data and compute intensive challenges, from simulation, to image processing, to model building. We have developed a general platform for the secure deployment of structural biology computational tasks and workflows into a federated grid which maximizes robustness, ease of use, and performance, while minimizing data movement. This platform leverages several existing grid technologies for security and web-based data access, adding protocols for VO, user, task, workflow, and individual job data staging. We present the strategies used to deploy and maintain tens of GB of data and applications to a significant portion of the US Open Science Grid, and the workflow management mechanisms to optimize task execution, both for performance and correctness. Significant observations are made about real operating conditions in a grid environment from automated analysis of hundreds of thousands of jobs over extended periods. We specifically focus on one novel application which harnesses the capacity of national cyberinfrastructure to dramatically accelerate the process of protein structure determination. This workflow requires 20 - 50 thousand hours to compute with 1e5 tasks, requiring tens of GB of input data, and producing commensurate output. We demonstrate the success of our platform through the successful completion of this workflow in half a day using Open Science Grid. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Processing massive sized graphs using Sector/Sphere

    Page(s): 1 - 10
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (111 KB) |  | HTML iconHTML  

    Data intensive computing is having an increasing awareness among computer science researchers. As the data size increases even faster than Moore's Law, many traditional systems are failing to cope with the extreme large volumetric datasets. In this paper we use a real world graph processing application to demonstrate the challenges from the emerging data intensive computing and present a solution with a system called Sector/Sphere that we developed in the last several years. Sector provides scalable, fault-tolerant storage using commodity computers, while Sphere supports in-storage parallel data processing with a simplified programming interface. This paper describes the rationale behind Sector/Sphere and how to use it to effectively process massive sized graphs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Easy and instantaneous processing for data-intensive workflows

    Page(s): 1 - 10
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (332 KB) |  | HTML iconHTML  

    This paper presents a light-weight and scalable framework that enables non-privileged users to effortlessly and instantaneously describe, deploy, and execute data-intensive workflows on arbitrary computing resources from clusters, clouds, and supercomputers. This framework consists of three major components: GXP parallel/distributed shell as resource explorer and framework back-end, GMount distributed file system as underlying data sharing approach, and GXP Make as the workflow engine. With this framework, domain researchers can intuitively write workflow description in GNU make rules and harness resources from different domains with low learning and setup cost. By investigating the execution of real-world scientific applications using this framework on multi-cluster and supercomputer platforms, we demonstrate that our processing framework has practically useful performance and are suitable for common practice of data-intensive workflows in various distributed computing environments. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Detecting bottlenecks in parallel DAG-based data flow programs

    Page(s): 1 - 10
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (250 KB) |  | HTML iconHTML  

    In recent years, several frameworks have been introduced to facilitate massively-parallel data processing on shared-nothing architectures like compute clouds. While these frameworks generally offer good support in terms of task deployment and fault tolerance, they only provide poor assistance in finding reasonable degrees of parallelization for the tasks to be executed. However, as billing models of clouds enable the utilization of many resources for a short period of time for the same cost as utilizing few resources for a long time, proper levels of parallelization are crucial to achieve short processing times while maintaining good resource utilization and therefore good cost efficiency. In this paper, we present and evaluate a solution for detecting CPU and I/O bottlenecks in parallel DAG-based data flow programs assuming capacity constrained communication channels. The detection of bottlenecks represents an important foundation for manually or automatically scaling out and tuning parallel data flow programs in order to increase performance and cost efficiency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving Many-Task computing in scientific workflows using P2P techniques

    Page(s): 1 - 10
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (406 KB) |  | HTML iconHTML  

    Large-scale scientific experiments are usually supported by scientific workflows that may demand high performance computing infrastructure. Within a given experiment, the same workflow may be explored with different sets of parameters. However, the parallelization of the workflow instances is hard to be accomplished mainly due to the heterogeneity of its activities. Many-Task computing paradigm seems to be a candidate approach to support workflow activity parallelism. However, scheduling a huge amount of workflow activities on large clusters may be susceptible to resource failures and overloading. In this paper, we propose Heracles, an approach to apply consolidated P2P techniques to improve Many-Task computing of workflow activities on large clusters. We present a fault tolerance mechanism, a dynamic resource management and a hierarchical organization of computing nodes to handle workflow instances execution properly. We have evaluated Heracles by executing experimental analysis regarding the benefits of P2P techniques on the workflow execution time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic task scheduling for the Uintah framework

    Page(s): 1 - 10
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (177 KB) |  | HTML iconHTML  

    Uintah is a computational framework for fluid-structure interaction problems using a combination of the ICE fluid flow algorithm, adaptive mesh refinement (AMR) and MPM particle methods. Uintah uses domain decomposition with a task-graph approach for asynchronous communication and automatic message generation. The Uintah software has been used for a decade with its original task scheduler that ran computational tasks in a predefined static order. In order to improve the performance of Uintah for petascale architecture, a new dynamic task scheduler allowing better overlapping of the communication and computation is designed and evaluated in this study. The new scheduler supports asynchronous, out-of-order scheduling of computational tasks by putting them in a distributed directed acyclic graph (DAG) and by isolating task memory and keeping multiple copies of task variables in a data warehouse when necessary. A new runtime system has been implemented with a two-stage priority queuing architecture to improve the scheduling efficiency. The effectiveness of this new approach is shown through an analysis of the performance of the software on large scale fluid-structure examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic and coordinated job recovery for high performance computing

    Page(s): 1 - 9
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (399 KB) |  | HTML iconHTML  

    As the scale of high-performance computing systems continues to grow, the impact of failures on the systems is increasingly critical. Research has been performed on fault prediction and associated precautionary actions. While this approach is valuable, it is not adequate because of the inevitability of failures. Postfailure recovery is equally important; however, most current work relies mainly on checkpoint/restart, not addressing the problem from the system level. We propose AuCoRe, an automatic and coordinated job recovery framework. AuCoRe provides a coordination mechanism for failed-job recovery, taking the execution of regular jobs into account; users specify job recovery policy for their jobs, and an incentive mechanism minimizes gaming. We have implemented AuCoRe in Cobalt, a production resource manager, and evaluated it using real workloads from the Blue Gene/P system at Argonne National Laboratory. Experimental results demonstrate that AuCoRe improves system performance by efficiently managing job recovery. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scheduling many-task workloads on supercomputers: Dealing with trailing tasks

    Page(s): 1 - 10
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (207 KB) |  | HTML iconHTML  

    In order for many-task applications to be attractive candidates for running on high-end supercomputers, they must be able to benefit from the additional compute, I/O, and communication performance provided by high-end HPC hardware relative to clusters, grids, or clouds. Typically this means that the application should use the HPC resource in such a way that it can reduce time to solution beyond what is possible otherwise. Furthermore, it is necessary to make efficient use of the computational resources, achieving high levels of utilization. Satisfying these twin goals is not trivial, because while the parallelism in many task computations can vary over time, on many large machines the allocation policy requires that worker CPUs be provisioned and also relinquished in large blocks rather than individually. This paper discusses the problem in detail, explaining and characterizing the trade-off between utilization and time to solution under the allocation policies of Blue Gene/P Intrepid at Argonne National Laboratory. We propose and test two strategies to improve this trade-off: scheduling tasks in order of longest to shortest (applicable only if task runtimes are predictable) and downsizing allocations when utilization drops below some threshold. We show that both strategies are effective under different conditions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Blue Gene/Q resource management architecture

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (322 KB) |  | HTML iconHTML  

    As supercomputers scale to a million processor cores and beyond, the underlying resource management architecture needs to provide a flexible mechanism to manage the wide variety of workloads executing on the machine. In this paper we describe the novel approach of the Blue Gene/Q (BG/Q) supercomputer in addressing these workload requirements by providing resource management services that support both the high performance computing (HPC) and high-throughput computing (HTC) paradigms. We explore how the resource management implementations of the prior generation Blue Gene (BG/L and BG/P) systems evolved and led us down the path to developing services on BG/Q that focus on scalability, flexibility and efficiency. Also provided is an overview of the main components comprising the BG/Q resource management architecture and how they interact with one another. Introduced in this paper are BG/Q concepts for partitioning I/O and compute resources to provide I/O resiliency while at the same time providing for faster block (partition) boot times. New features, such as the ability to run a mix of HTC and HPC workloads on the same block are explained, and the advantages of this type of environment are examined. Similar to how Many-task computing (MTC) [1] aims to combine elements of HTC and HPC, the focus of BG/Q has been to unify the two models in a flexible manner where hybrid workloads having both HTC and HPC characteristics are managed simultaneously. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.