Scheduled System Maintenance:
On Monday, April 27th, IEEE Xplore will undergo scheduled maintenance from 1:00 PM - 3:00 PM ET (17:00 - 19:00 UTC). No interruption in service is anticipated.
By Topic

Grid Computing (GRID), 2011 12th IEEE/ACM International Conference on

Date 21-23 Sept. 2011

Filter Results

Displaying Results 1 - 25 of 43
  • [Title page i]

    Publication Year: 2011 , Page(s): i
    Save to Project icon | Request Permissions | PDF file iconPDF (19 KB)  
    Freely Available from IEEE
  • [Title page iii]

    Publication Year: 2011 , Page(s): iii
    Save to Project icon | Request Permissions | PDF file iconPDF (52 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2011 , Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (121 KB)  
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2011 , Page(s): v - viii
    Save to Project icon | Request Permissions | PDF file iconPDF (133 KB)  
    Freely Available from IEEE
  • General Chair's message

    Publication Year: 2011 , Page(s): ix
    Save to Project icon | Request Permissions | PDF file iconPDF (90 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Program Committee Chair Message

    Publication Year: 2011 , Page(s): x
    Save to Project icon | Request Permissions | PDF file iconPDF (96 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Organizing Committee

    Publication Year: 2011 , Page(s): xi - xii
    Save to Project icon | Request Permissions | PDF file iconPDF (74 KB)  
    Freely Available from IEEE
  • Program Committee

    Publication Year: 2011 , Page(s): xiii - xiv
    Save to Project icon | Request Permissions | PDF file iconPDF (90 KB)  
    Freely Available from IEEE
  • External Reviewers

    Publication Year: 2011 , Page(s): xv - xvi
    Save to Project icon | Request Permissions | PDF file iconPDF (88 KB)  
    Freely Available from IEEE
  • A WS-Agreement-Based QoS Auditor Negotiation Mechanism for Grids

    Publication Year: 2011 , Page(s): 1 - 8
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (174 KB) |  | HTML iconHTML  

    High performance platforms composed of commodity computing resources, such as grids and peer-to-peer systems, have greatly evolved and assumed an important role in the last decade. Nevertheless, their wide commercial use still depends on the establishment of an effective quality of service (QoS) infrastructure in those environments. For this reason, a variety of proposals have recently emerged in which consumer and provider monitor and control grid resources in order to guarantee previously established service level agreements. However, in many cases there is lack of trust between provider and consumer in relation to monitoring those agreements. In such cases, it becomes necessary to introduce a third entity - an impartial and trustworthy QoS auditor - in order to solve conflicts of interest. Though, as there may be several auditors trusted by provider and consumer, we claim that the QoS auditor needs to be negotiated and established just as the service level agreement is negotiated by the parties. In order to support this issue, the present paper proposes and evaluates a negotiation mechanism for QoS auditors in computational grids. Some of the proposed mechanism's characteristics are low intrusiveness and use of open standards, such as the WS-Agreement. Experimental analysis on a prototype of the proposed negotiation mechanism have shown that the auditor negotiation process took less than a minute to finish, which is far less than the service execution time in most grid computing use cases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mediation of Service Overhead in Service-Oriented Grid Architectures

    Publication Year: 2011 , Page(s): 9 - 18
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (346 KB) |  | HTML iconHTML  

    Grid computing applications and infrastructures build heavily on Service-Oriented Computing development methodology and are often realized as Service-Oriented Architectures. The Grid Job Management Framework (GJMF) is a flexible Grid infrastructure and application support tool that offers a range of abstractive and platform independent interfaces for middleware-agnostic Grid job submission, monitoring, and control. In this paper we use the GJMF as a test bed for characterization of Grid Service-Oriented Architecture overhead, and evaluate the efficiency of a set of design patterns for overhead mediation mechanisms featured in the framework. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mutual Job Submission Architecture That Considered Workload Balance Among Computing Resources in the Grid Interoperation

    Publication Year: 2011 , Page(s): 19 - 25
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (919 KB) |  | HTML iconHTML  

    Computing resource federation among collaborators is necessary for smooth promotion of collaborations. However, this is difficult for the collaborators who are using different type grid infrastructures, because of incompatibilities of the grid middleware. Therefore an inter grid job submission specification named HPC Basic Profile (HPCBP) has been defined by the Open Grid Forum (OGF) and many grid projects have implemented it. However, there still are many problems in the grid interoperation using the HPCBP. One of them is the workload disruption problem. The interoperation architecture, which is popular in the implementation of many prototypes, has a race condition between detection of the job submission from another grid and resource allocation for a submitted job from local client. This race condition disrupts the workload balance among the computing resources, and increases number of waiting jobs. In this paper, we explain and analyze the workload problem by an experiment and a simulation, and propose an architecture which can solve the problem, and show the effectiveness of the architecture by a simulation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-Aware Ant Colony Based Workload Placement in Clouds

    Publication Year: 2011 , Page(s): 26 - 33
    Cited by:  Papers (20)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (155 KB) |  | HTML iconHTML  

    With increasing numbers of energy hungry data centers energy conservation has now become a major design constraint. One traditional approach to conserve energy in virtualized data centers is to perform workload (i.e., VM) consolidation. Thereby, workload is packed on the least number of physical machines and over-provisioned resources are transitioned into a lower power state. However, most of the workload consolidation approaches applied until now are limited to a single resource (e.g., CPU) and rely on simple greedy algorithms such as First-Fit Decreasing (FFD), which perform resource-dissipative workload placement. Moreover, they are highly centralized and known to be hard to distribute. In this work, we model the workload consolidation problem as an instance of the multi-dimensional bin-packing (MDBP) problem and design a novel, nature-inspired workload consolidation algorithm based on the Ant Colony Optimization (ACO). We evaluate the ACO-based approach by comparing it with one frequently applied greedy algorithm (i.e., FFD). Our simulation results demonstrate that ACO outperforms the evaluated greedy algorithm as it achieves superior energy gains through better server utilization and requires less machines. Moreover, it computes solutions which are nearly optimal. Finally, the autonomous nature of the approach allows it to be implemented in a fully distributed environment. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Graph-Cut Based Coscheduling Strategy Towards Efficient Execution of Scientific Workflows in Collaborative Cloud Environments

    Publication Year: 2011 , Page(s): 34 - 41
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (242 KB) |  | HTML iconHTML  

    Recently, cloud computing has emerged as a promising computing infrastructure for performing scientific workflows by providing on-demand resources. Meanwhile, it is convenient for scientific collaboration since different cloud environments used by the researchers are connected through Internet. However, the significant latency arising from frequent access to large datasets and the corresponding data movements across geo-distributed data centers has been an obstacle to hinder the efficient execution of data-intensive scientific workflows. In this paper, we propose a novel graph-cut based data and task co scheduling strategy for minimizing the data transfer across geo-distributed data centers. Specifically, a dependency graph is firstly constructed from workflow provenance and cut into sub graphs according to the datasets which must appear in fixed data centers by a multiway cut algorithm. Then, the sub graphs might be recursively cut into smaller ones by a minimum cut algorithm referring to data correlation rules until all of them can well fit the capacity constraints of the data centers where the fixed location datasets reside. In this way, the datasets and tasks are distributed into target data centers while the total amount of data transfer between them is minimized. Additionally, a runtime scheduling algorithm is exploited to dynamically adjust the data placement during execution to prevent the data centers from overloading. Simulation results demonstrate that the total volume of data transfer across different data centers can be significantly reduced and the cost of performing scientific workflows on the clouds will be accordingly saved. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizing Resource Consumptions in Clouds

    Publication Year: 2011 , Page(s): 42 - 49
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (182 KB) |  | HTML iconHTML  

    This paper considers the scenario where multiple clusters of Virtual Machines (i.e., termed as Virtual Clusters) are hosted in a Cloud system consisting of a cluster of physical nodes. Multiple Virtual Clusters (VCs) cohabit in the physical cluster, with each VC offering a particular type of service for the incoming requests. In this context, VM consolidation, which strives to use a minimal number of nodes to accommodate all VMs in the system, plays an important role in saving resource consumption. Most existing consolidation methods proposed in the literature regard VMs as "rigid" during consolidation, i.e., VMs' resource capacities remain unchanged. In VC environments, QoS is usually delivered by a VC as a single entity. Therefore, there is no reason why VMs' resource capacity cannot be adjusted as long as the whole VC is still able to maintain the desired QoS. Treating VMs as being "mouldable" during consolidation may be able to further consolidate VMs into an even fewer number of nodes. This paper investigates this issue and develops a Genetic Algorithm (GA) to consolidate mouldable VMs. The GA is able to evolve an optimized system state, which represents the VM-to-node mapping and the resource capacity allocated to each VM. After the new system state is calculated by the GA, the Cloud will transit from the current system state to the new one. The transition time represents overhead and should be minimized. In this paper, a cost model is formalized to capture the transition overhead, and a reconfiguration algorithm is developed to transit the Cloud to the optimized system state at the low transition overhead. Experiments have been conducted in this paper to evaluate the performance of the GA and the reconfiguration algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Strategy to Improve Resource Utilization in Grids Based on Network-Aware Meta-scheduling in Advance

    Publication Year: 2011 , Page(s): 50 - 57
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (270 KB) |  | HTML iconHTML  

    The provision of Quality of Service (QoS) in Grids (systems made of heterogeneous computing resources geographically dispersed) is still a challenging task that needs the attention of the research community. Since reservations of resources may not always be possible, another possible way of enhancing the QoS perceived by Grid users is by performing meta-scheduling of jobs in advance, where jobs are scheduled some time before they are actually executed. Hence, it becomes more likely that the appropriate resources are available to run the job whenever needed. One of the drawbacks of this scenario is that fragmentation appears as a well known effect in job allocations into resources. Fragmentation also becomes the cause for poor resource utilization. For these reasons, a new technique has been developed to tackle fragmentation problems, which consists of rescheduling already scheduled tasks. To this end, some heuristics have been implemented to figure out which intervals need replanning and to select the jobs which are involved in that rescheduling process. On top of that, another heuristic has been implemented to put rescheduled jobs as close together as possible so that fragmentation is avoided or reduced to the minimum. This technique has been tested using a real test bed involving heterogeneous computing resources from different organizations. An evaluation is presented that illustrates the efficiency of this approach to meet the users' QoS requirements. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Highly Scalable Decentralized Scheduler of Tasks with Deadlines

    Publication Year: 2011 , Page(s): 58 - 65
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (193 KB) |  | HTML iconHTML  

    Scheduling of tasks in distributed environments, like cloud and grid computing platforms, using deadlines to provide quality of service is a challenging problem. The few existing proposals suffer from scalability limitations, because they try to manage full knowledge of the system state. To our knowledge, there is no implementation yet that reaches scales of a hundred thousand nodes. In this paper, we present a fully decentralized scheduler, that aggregates information about the availability of the execution nodes throughout the network and uses it to allocate tasks to those nodes that are able to finish them in time. Through simulation, we show that our scheduler is able to operate on different scenarios, from many-task applications in cloud computing sites to volunteer computing projects. Simulations on networks of up to a hundred thousand nodes show very competitive performance, reaching allocation times of under a second and very low overhead in low latency gigabit networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive Scheduling on Power-Aware Managed Data-Centers Using Machine Learning

    Publication Year: 2011 , Page(s): 66 - 73
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (198 KB) |  | HTML iconHTML  

    Energy-related costs have become one of the major economic factors in IT data-centers, and companies and the research community are currently working on new efficient power-aware resource management strategies, also known as "Green IT". Here we propose a framework for autonomic scheduling of tasks and web-services on cloud environments, optimizing the profit taking into account revenue for task execution minus penalties for service-level agreement violations, minus power consumption cost. The principal contribution is the combination of consolidation and virtualization technologies, mathematical optimization methods, and machine learning techniques. The data-center infrastructure, tasks to execute, and desired profit are casted as a mathematical programming model, which can then be solved in different ways to find good task scheduling. We use an exact solver based on mixed linear programming as a proof of concept but, since it is an NP-complete problem, we show that approximate solvers provide valid alternatives for finding approximately optimal schedules. The machine learning is used to estimate the initially unknown parameters of the mathematical model. In particular, we need to predict a priori resource usage (such as CPU consumption) by different tasks under current workloads, and estimate task service-level-agreement (such as response time) given workload features, host characteristics, and contention among tasks in the same host. Experiments show that machine learning algorithms can predict system behavior with acceptable accuracy, and that their combination with the exact or approximate schedulers manages to allocate tasks to hosts striking a balance between revenue for executed tasks, quality of service, and power consumption. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting Inherent Task-Based Parallelism in Object-Oriented Programming

    Publication Year: 2011 , Page(s): 74 - 81
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (239 KB) |  | HTML iconHTML  

    While object-oriented programming (OOP) and parallelism originated as separate areas, there have been many attempts to bring those paradigms together. Few of them, though, meet the challenge of programming for parallel architectures and distributed platforms: offering good development expressiveness while not hindering application performance. This work presents the introduction of OOP in a parallel programming model for Java applications which targets productivity. In this model, one can develop a Java application in a totally sequential fashion, without using any new library or language construct, thus favouring programmability. We show how this model offers a good trade-off between ease of programming and runtime performance. A comparison with other approaches is provided, evaluating the key aspects of the model and discussing some results for a set of the NAS parallel benchmarks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MARIANE: MApReduce Implementation Adapted for HPC Environments

    Publication Year: 2011 , Page(s): 82 - 89
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (211 KB) |  | HTML iconHTML  

    MapReduce is increasingly becoming a popular framework, and a potent programming model. The most popular open source implementation of MapReduce, Hadoop, is based on the Hadoop Distributed File System (HDFS). However, as HDFS is not POSIX compliant, it cannot be fully leveraged by applications running on a majority of existing HPC environments such as Teragrid and NERSC. These HPC environments typically support globally shared file systems such as NFS and GPFS. On such resourceful HPC infrastructures, the use of Hadoop not only creates compatibility issues, but also affects overall performance due to the added overhead of the HDFS. This paper not only presents a MapReduce implementation directly suitable for HPC environments, but also exposes the design choices for better performance gains in those settings. By leveraging inherent distributed file systems' functions, and abstracting them away from its MapReduce framework, MARIANE (MApReduce Implementation Adapted for HPC Environments) not only allows for the use of the model in an expanding number of HPC environments, but also allows for better performance in such settings. This paper shows the applicability and high performance of the MapReduce paradigm through MARIANE, an implementation designed for clustered and shared-disk file systems and as such not dedicated to a specific MapReduce solution. The paper identifies the components and trade-offs necessary for this model, and quantifies the performance gains exhibited by our approach in distributed environments over Apache Hadoop in a data intensive setting, on the Magellan test bed at the National Energy Research Scientific Computing Center (NERSC). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Benchmarking MapReduce Implementations for Application Usage Scenarios

    Publication Year: 2011 , Page(s): 90 - 97
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (125 KB) |  | HTML iconHTML  

    The MapReduce paradigm provides a scalable model for large scale data-intensive computing and associated fault-tolerance. With data production increasing daily due to ever growing application needs, scientific endeavors, and consumption, the MapReduce model and its implementations need to be further evaluated, improved, and strengthened. Several MapReduce frameworks with various degrees of conformance to the key tenets of the model are available today, each, optimized for specific features. HPC application and middleware developers must thus understand the complex dependencies between MapReduce features and their application. We present a standard benchmark suite for quantifying, comparing, and contrasting the performance of MapReduce platforms under a wide range of representative use cases. We report the performance of three different MapReduce implementations on the benchmarks, and draw conclusions about their current performance characteristics. The three platforms we chose for evaluation are the widely used Apache Hadoop implementation, Twister, which has been discussed in the literature, and LEMO-MR, our own implementation. The performance analysis we perform also throws light on the available design decisions for future implementations, and allows Grid researchers to choose the MapReduce implementation that best suits their application's needs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adjustable Module Isolation for Distributed Computing Infrastructures

    Publication Year: 2011 , Page(s): 98 - 105
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (217 KB) |  | HTML iconHTML  

    Cloud Computing infrastructures and Grid Computing platforms are representatives of a new breed of systems that leverage the modularity paradigm to assemble large-scale dynamic applications from modules contributed by different, possibly untrustworthy providers. Increased susceptibility to faults, diminished accountability, and complex system configuration are major challenges when assembling and operating such systems. In this paper, we describe how to solve these problems by retrofitting module management systems with the ability to deploy modules to execution environments with adjustable degree of isolation. We give a formal definition of the underlying hierarchical Module Isolation Problem and devise an online algorithm to solve it in an incremental fashion. We discuss how to apply our approach to a state-of-the-art module management system and demonstrate its effectiveness by an experimental evaluation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved Grid Security Posture through Multi-factor Authentication

    Publication Year: 2011 , Page(s): 106 - 113
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (259 KB) |  | HTML iconHTML  

    While methods of securing communication over the Internet have changed from clear text to secure encrypted channels over the last decade, the basic username-password combination for authentication has remained the mainstay in academic research computing and grid environments. Security incidents affecting grids, such as the TeraGrid stakkato incident of 2004 and 2005, has demonstrated that the use of reusable passwords for authentication can be readily exploited and can lead to a widespread security incident across the grid [1, 2]. The University of Tennessee's National Institute for Computational Sciences (NICS) founded in 2008 has provided resources to the TeraGrid, including Kraken, a 1.17 petaflops Cray XT5, and has implemented and promoted the use of multi-factor authentication mechanisms since its founding. The benefits of use of this stronger authentication method has been higher productivity and resource availability for users due to no known user account compromises caused by stolen NICS user credentials that led to disabling accounts or system resources. NICS has been developing and experimenting with expanding our use of multi-factor authentication to the grid. NICS has integrated multi-factor authentication with our certificate authority so that users can now run my proxy and receive a multi-factor authenticated certificate. NICS is also exploring the federation of multi-factor authentication systems, with the goal of "one user, one token". This is especially important, as new grid resources, such as Blue Waters, will only allow multi-factor authentication, and we want the users to only carry one token, not many tokens. XSEDE, the TeraGrid successor, will also be deploying multi-factor authentication in addition to the other existing authentication methodologies. XSEDE will also work closely with science gateways and workflows to develop and maintain secure frameworks for the highest level of security possible. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Detecting Credential Abuse in the Grid Using Bayesian Networks

    Publication Year: 2011 , Page(s): 114 - 120
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (196 KB) |  | HTML iconHTML  

    Proxy Credentials serve as a principal for authentication and authorization in the Grid. Despite their limited lifetime, they can be intercepted and abused by an attacker. We counter this threat by enabling Grid users to track their credentials' use in Grid infrastructures, reporting all authentication and delegation operations to an auditing service. Our approach combines modifications to the security infrastructure with a Bayesian classifier in order to provide a reliable method for detecting abusive Grid credential usage and alerting the legitimate user. To validate this approach we created an extensive Grid simulation, simulating different types of legitimate and illegitimate use of credentials. Our experiments show that we can detect 99.5% of all abuse and our solution can thus help to increase security in the Grid. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable and Distributed Processing of Scientific XML Data

    Publication Year: 2011 , Page(s): 121 - 128
    Cited by:  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (143 KB) |  | HTML iconHTML  

    A seamless and intuitive search capability for the vast amount of datasets generated by scientific experiments is critical to ensure effective use of such data by domain specific scientists. Currently, searches on enormous XML datasets is done manually via custom scripts or by using hard-to-customize queries developed by experts in complex and disparate XML query languages. Such approaches however do not provide acceptable performance for large-scale data since they are not based on a scalable distributed solution. Furthermore, it has been shown that databases are not optimized for queries on XML data generated by scientific experiments, as term kinship, range based queries, and constraints such as conjunction and negation need to be taken into account. There exists a critical need for an easy-to-use and scalable framework, specialized for scientific data, that provides natural-language-like syntax along with accurate results. As most existing search tools are designed for exact string matching, which is not adequate for scientific needs, we believe that such a framework will enhance the productivity and quality of scientific research by the data reduction capabilities it can provide. This paper presents how the MapReduce model should be used in XML metadata indexing for scientific datasets, specifically TeraGrid Information Services and the NeXus datasets generated by the Spallation Neutron Source (SNS) scientists. We present an indexing structure that scales well for large-scale MapReduce processing. We present performance results using two MapReduce implementations, Apache Hadoop and LEMO-MR, to emphasize the flexibility and adaptability of our framework in different MapReduce environments. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.