By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 10 • Date Oct. 2011

Filter Results

Displaying Results 1 - 21 of 21
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (155 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (156 KB)  
    Freely Available from IEEE
  • Efficient Algorithms for Topology Control Problem with Routing Cost Constraints in Wireless Networks

    Page(s): 1601 - 1609
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1218 KB) |  | HTML iconHTML  

    Topology control is one vital factor to a wireless network's efficiency. A Connected Dominating Set (CDS) can be a useful basis of a backbone topology construction. In this paper, a special CDS, named α Minimum routing Cost CDS (α-MOC-CDS), will be studied to improve the performance of CDS based broadcasting and routing. In this paper, we prove that construction of a minimum α-MOC-CDS is NP-hard in a general graph and we propose a heuristic algorithm for construction of α-MOC-CDS. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient and Scalable Starvation Prevention Mechanism for Token Coherence

    Page(s): 1610 - 1623
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1937 KB) |  | HTML iconHTML  

    Token Coherence is a cache coherence protocol that simultaneously captures the best attributes of the traditional approximations to coherence: direct communication between processors (like snooping-based protocols) and no reliance on bus-like interconnects (like directory-based protocols). This is possible thanks to a class of unordered requests that usually succeed in resolving the cache misses. The problem of the unordered requests is that they can cause protocol races, which prevent some misses from being resolved. To eliminate races and ensure the completion of the unresolved misses, Token Coherence uses a starvation prevention mechanism named persistent requests. This mechanism is extremely inefficient and, besides, it endangers the scalability of Token Coherence since it requires storage structures (at each node) whose size grows proportionally to the system size. While multiprocessors continue including an increasingly number of nodes, both the performance and scalability of cache coherence protocols will continue to be key aspects. In this work, we propose an alternative starvation prevention mechanism, named priority requests, that outperforms the persistent request one. This mechanism is able to reduce the application runtime more than 20 percent (on average) in a 64-processor system. Furthermore, thanks to the flexibility shown by priority requests, it is possible to drastically minimize its storage requirements, thereby improving the whole scalability of Token Coherence. Although this is achieved at the expense of a slight performance degradation, priority requests still outperform persistent requests significantly. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Chemical Reaction Optimization for Task Scheduling in Grid Computing

    Page(s): 1624 - 1631
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (631 KB) |  | HTML iconHTML  

    Grid computing solves high performance and high-throughput computing problems through sharing resources ranging from personal computers to supercomputers distributed around the world. One of the major problems is task scheduling, i.e., allocating tasks to resources. In addition to Makespan and Flowtime, we also take reliability of resources into account, and task scheduling is formulated as an optimization problem with three objectives. This is an NP-hard problem, and thus, metaheuristic approaches are employed to find the optimal solutions. In this paper, several versions of the Chemical Reaction Optimization (CRO) algorithm are proposed for the grid scheduling problem. CRO is a population-based metaheuristic inspired by the interactions between molecules in a chemical reaction. We compare these CRO methods with four other acknowledged metaheuristics on a wide range of instances. Simulation results show that the CRO methods generally perform better than existing methods and performance improvement is especially significant in large-scale applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel Frequent Item Set Mining with Selective Item Replication

    Page(s): 1632 - 1640
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (625 KB) |  | HTML iconHTML  

    We introduce a transaction database distribution scheme that divides the frequent item set mining task in a top-down fashion. Our method operates on a graph where vertices correspond to frequent items and edges correspond to frequent item sets of size two. We show that partitioning this graph by a vertex separator is sufficient to decide a distribution of the items such that the subdatabases determined by the item distribution can be mined independently. This distribution entails an amount of data replication, which may be reduced by setting appropriate weights to vertices. The data distribution scheme is used in the design of two new parallel frequent item set mining algorithms. Both algorithms replicate the items that correspond to the separator. NoClique replicates the work induced by the separator and NoClique2 computes the same work collectively. Computational load balancing and minimization of redundant or collective work may be achieved by assigning appropriate load estimates to vertices. The experiments show favorable speedups on a system with small-to-medium number of processors for synthetic and real-world databases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Deadlock-Free Dynamic Reconfiguration Scheme for Source Routing Networks Using Close Up*/Down* Graphs

    Page(s): 1641 - 1652
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1593 KB) |  | HTML iconHTML  

    Computer performance has significantly increased in recent years and, consequently, communication subsystems have become bottlenecks within systems. To counter this problem, current high-performance distributed systems employ switch-based interconnection networks. In this scenario, after the occurrence of a topological change, a management mechanism must reestablish connectivity between network devices. This requires performing a network reconfiguration, which consists in updating the routing function. The main challenge involved in network reconfiguration is the reduction of performance degradation during the change assimilation process. As shown in the performance evaluation section, former reconfiguration techniques significantly reduce network service since the application traffic is temporally stopped in order to avoid deadlocks. In addition, current solutions are only designed for networks that use distributed routing. In this paper, we propose and evaluate a first reconfiguration method for source routing networks that does not restrict the injection of packets during the change assimilation process. Without requiring additional network resources, our scheme is able to recover topology connectivity maintaining network throughput. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Athanasia: A User-Transparent and Fault-Tolerant System for Parallel Applications

    Page(s): 1653 - 1668
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1833 KB) |  | HTML iconHTML  

    This article presents Athanasia, a user-transparent and fault-tolerant system, for parallel applications running on large-scale cluster systems. Cluster systems have been regarded as a de facto standard to achieve multitera-flop computing power. These cluster systems, as we know, have an inherent failure factor that can cause computation failure. The reliability issue in parallel computing systems, therefore, has been studied for a relatively long time in the literature, and we have seen many theoretical promises arise from the extensive research. However, despite the rigorous studies, practical and easily deployable fault-tolerant systems have not been successfully adopted commercially. Athanasia is a user-transparent checkpointing system for a fault-tolerant Message Passing Interface (MPI) implementation that is primarily based on the sync-and-stop protocol. Athanasia supports three critical functionalities that are necessary for fault tolerance: a light-weight failure detection mechanism, dynamic process management that includes process migration, and a consistent checkpoint and recovery mechanism. The main features of Athanasia are that it does not require any modifications to the application code and that it preserves many of the high performance characteristics of high-speed networks. Experimental results show that Athanasia can be a good candidate for practically deployable fault-tolerant systems in very-large and high-performance clusters and that its protocol can be applied to a variety of parallel communication libraries easily. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Conditional-Fault Diagnosability of Multiprocessor Systems with an Efficient Local Diagnosis Algorithm under the PMC Model

    Page(s): 1669 - 1680
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (794 KB) |  | HTML iconHTML  

    Diagnosis is an essential subject for the reliability of multiprocessor systems. Under the PMC diagnosis model, Dahbura and Masson proposed a polynomial-time algorithm with time complexity O(N2.5) to identify all the faulty processors in a system with N processors. In this paper, we present a novel method to diagnose a conditionally faulty system by applying the concept behind the local diagnosis, introduced by Somani and Agarwal, and formalized by Hsu and Tan. The goal of local diagnosis is to identify the fault status of any single processor correctly. Under the PMC diagnosis model, we give a sufficient condition to estimate the local diagnosability of a given processor. Furthermore, we propose a helpful structure, called the augmenting star, to efficiently determine the fault status of each processor. For an N-processor system in which every processor has an O(log N) degree, the time complexity of our algorithm to diagnose any given processor is O((log N)2), provided that each processor can construct an augmenting star structure of full order in time O((log N)2) and the time for a processor to test another one is constant. Therefore, the time totals to O(N(log N)2) for diagnosing the whole system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive Power Control with Online Model Estimation for Chip Multiprocessors

    Page(s): 1681 - 1696
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1511 KB) |  | HTML iconHTML  

    As chip multiprocessors (CMPs) become the main trend in processor development, various power and thermal management strategies have recently been proposed to optimize system performance while controlling the power or temperature of a CMP chip to stay below a constraint. The availability of per-core dynamic voltage and frequency scaling (DVFS) also makes it possible to develop advanced management strategies. However, most existing solutions rely on open-loop search or optimization with the assumption that power can be estimated accurately, while others adopt oversimplified feedback control strategies to control power and temperature separately, without any theoretical guarantees. In this paper, we propose a chip-level power control algorithm that is systematically designed based on optimal control theory. Our algorithm can precisely control the power of a CMP chip to the desired set point while maintaining the temperature of each core below a specified threshold. Furthermore, an online model estimator is designed to achieve analytical assurance of control accuracy and system stability, even in the face of significant workload variations or unpredictable chip or core variations. To further improve system performance, we also integrate dynamic cache resizing into our control framework so that power can be shifted among CPU cores and the shared L2 cache. Empirical results on a physical testbed show that our controller outperforms two state-of-the-art control algorithms by having better SPEC benchmark performance and more precise power control. In addition, extensive simulation results demonstrate the efficacy of our algorithm for various CMP configurations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Requirement-Aware Strategies with Arbitrary Processor Release Times for Scheduling Multiple Divisible Loads

    Page(s): 1697 - 1704
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (513 KB) |  | HTML iconHTML  

    This paper investigates the problem of scheduling multiple divisible loads in networked computer systems with a particular emphasis in capturing two important real-life constraints, the arbitrary processor release times (or ready times) and heterogeneous processing requirements of different loads. We study two distinct cases of interest, static case, where processors' release times are predetermined and known, and dynamic case, where release times are unknown until processors are released. To address the two cases, we propose two novel scheduling strategies, referred to as Static Scheduling Strategy (SSS) and Dynamic Scheduling Strategy (DSS), respectively. In addition, we capture a task's processing requirements in our strategies, a unique feature that is applicable for handling loads on networks that run proprietary applications only on certain nodes. Thus, each task can only be processed by some certain nodes in our formulation. To handle the contention of multiple applications that have various processing requirements but share the same processing nodes, we propose an efficient load selection policy, referred to as Most Remaining Load First (MRF). We integrate MRF into SSS and DSS to address the problem of scheduling multiple divisible loads with arbitrary processor release times and heterogeneous requirements. We evaluate the strategies using extensive simulation experiments. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Effective Memory Optimization for Virtual Machine-Based Systems

    Page(s): 1705 - 1713
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (733 KB) |  | HTML iconHTML  

    Utilizing the popular virtualization technology (VT), users can benefit from server consolidation on high-end systems and flexible programming interfaces on low-end systems. In these virtualization environments, the intensive memory multiplexing for I/O of Virtual Machines (VMs) significantly degrades system performance. In this paper, we present a new technique, called Batmem, to effectively reduce the memory multiplexing overhead of VMs and emulated devices by optimizing the operations of the conventional emulated Memory Mapped I/O in Virtual Machine Monitor (VMM)/hypervisor. To demonstrate the feasibility of Batmem, we conduct a detailed taxonomy of the memory optimization on selected virtual devices. We evaluate the effectiveness of Batmem in Windows and Linux systems. Our experimental results show that 1) for high-end systems, Batmem operates as a component of the hypervisor and significantly improves the performance of the virtual environment, and 2) for low-end systems, Batmem could be exploited as a component of the VM-based malware/rootkit (VMBR) and cloak malicious activities from users' awareness. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Secure Communications in Shared Memory Multiprocessor Systems

    Page(s): 1714 - 1721
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (348 KB) |  | HTML iconHTML  

    Protection and security are becoming essential requirements in commercial servers. To provide secure memory and cache-to-cache communications, we presented Interconnect-Independent Security Enhanced Shared Memory Multiprocessor System (I2SEMS), mainly focusing on how to manage a global counter to encrypt, decrypt, and authenticate data messages with little performance overhead. However, I2SEMS was vulnerable to replay attacks on data messages and integrity attacks on control and counter messages. This paper proposes three authentication schemes to remove those security vulnerabilities. First, we prevent replay attacks on data messages by inserting Request Counter (RC) into request messages. Second, we also use RC to detect integrity attacks on control messages. Third, we propose a new counter, referred to as GCC Counter (GC), to protect the global counter messages. We simulated our design with SPLASH-2 benchmarks on up to 16-processor shared memory multiprocessor systems by using Simics with Wisconsin multifacet General Execution-driven Multiprocessor Simulator (GEMS). Simulation results show that the overall performance slowdown is 4 percent on average with the highest keystream hit rate of 78 percent. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mobility in IPv6: Whether and How to Hierarchize the Network?

    Page(s): 1722 - 1729
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1154 KB) |  | HTML iconHTML  

    Mobile IPv6 (MIPv6) offers a basic solution to support mobility in IPv6 networks. Although Hierarchical MIPv6 (HMIPv6) has been designed to enhance the performance of MIPv6 by hierarchizing the network, it does not always outperform MIPv6. In fact, two solutions have different application scopes. Existing work studies the impact of various parameters on the performance of MIPv6 and HMIPv6, but without analyzing their application scopes. In this paper, we propose a model to analyze the application scopes of MIPv6 and HMIPv6, through which an Optimal Choice of Mobility Management (OCMM) scheme is designed. Different from the existing work that either propose new mobility management schemes or enhance existing mobility management schemes, OCMM chooses the better alternative between MIPv6 and HMIPv6 according to the mobility and service characteristics of users, addressing whether to hierarchize the network. Besides that, OCMM chooses the best mobility anchor point and regional size when HMIPv6 is adopted, addressing how to hierarchize the network. Simulation results demonstrate the impact of key parameters on the application scopes of MIPv6 and HMIPv6 as well as the optimal regional size of HMIPv6. Finally, we show that OCMM outperforms MIPv6 and HMIPv6 in terms of total cost including average registration and packet delivery costs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effective Delay-Controlled Load Distribution over Multipath Networks

    Page(s): 1730 - 1741
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2595 KB) |  | HTML iconHTML  

    Owing to the heterogeneity and high degree of connectivity of various networks, there likely exist multiple available paths between a source and a destination. An effective model of delay-controlled load distribution becomes essential to efficiently utilize such parallel paths for multimedia data transmission and real-time applications, which are commonly known to be sensitive to packet delay, packet delay variation, and packet reordering. Recent research on load distribution has focused on load balancing efficiency, bandwidth utilization, and packet order preservation; however, a majority of the solutions do not address delay-related issues. This paper proposes a new load distribution model aiming to minimize the difference among end-to-end delays, thereby reducing packet delay variation and risk of packet reordering without additional network overhead. In general, the lower the risk of packet reordering, the smaller the delay induced by the packet reordering recovery process, i.e., extra delay induced by the packet reordering recovery process is expected to decrease. Therefore, our model can reduce not only the end-to-end delay but also the packet reordering recovery time. Finally, our proposed model is shown to outperform other existing models, via analysis and simulations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Coordinated Locomotion and Monitoring Using Autonomous Mobile Sensor Nodes

    Page(s): 1742 - 1756
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1609 KB) |  | HTML iconHTML  

    Stationary wireless sensor networks (WSNs) fail to scale when the area to be monitored is unbounded and the physical phenomenon to be monitored may migrate through a large region. Deploying mobile sensor networks (MSNs) alleviates this problem, as the self-configuring MSN can relocate to follow the phenomenon of interest. However, a major challenge here is to maximize the sensing coverage in an unknown, noisy, and dynamically changing environment with nodes having limited sensing range and energy, and moving under distributed control. To address these challenges, we propose a new distributed algorithm, Causataxis, which enables the MSN to relocate toward the interesting regions and adjust its shape and position as the sensing environment changes. (In Latin, causa means motive/interest. A taxis (plural taxes) is an innate behavioral response by an organism to a directional stimulus. We use Causataxis to refer to an interest driven relocation behavior.) Unlike conventional cluster-based systems with backbone networks, a unique feature of our proposed approach is its biosystem inspired growing and rotting behaviors with coordinated locomotion. We compare Causataxis with a swarm-based algorithm, which uses the concept of virtual spring forces to relocate mobile nodes based on local neighborhood information. Our simulation results show that Causataxis outperforms the swarm-based algorithm in terms of the sensing coverage, the energy consumption, and the noise tolerance with a slightly high communication overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • General Maximal Lifetime Sensor-Target Surveillance Problem and Its Solution

    Page(s): 1757 - 1765
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1482 KB) |  | HTML iconHTML  

    We address a new and general maximal lifetime problem in sensor-target surveillance. We assume that each sensor can watch at most k targets (k ≥ 1) and each target should be watched by ft sensors (h ≥ 1) at any time. The problem is to schedule sensors to watch targets and forward the sensed data to a base station such that the lifetime of the surveillance network is maximized. This general problem includes the existing ones as its special cases (k = 1 and h = 1 in and k = 1 and h ≥ 2 in). It is also important in practice because some sensors can monitor multiple or all targets within their surveillance ranges and multisensor fusion (i.e., watching a target by multiple sensors) gives better surveillance results. The problem involves several subproblems and one of them is a new matching problem called (k, h)-matching. The (k, h)-matching problem is a generalized version of the classic bipartite matching problem (when k = h = 1, (k, h)-matching becomes bipartite matching). We design an efficient (k, h)-matching algorithm to solve the (k, h)-matching problem and then solve the general maximal lifetime problem. As a byproduct of this study, the (k, h)-matching problem and the proposed (k, h)-matching algorithm can potentially be applied to other problems in computer science and operations research. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tracking Dynamic Boundaries Using Sensor Network

    Page(s): 1766 - 1774
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (903 KB) |  | HTML iconHTML  

    We examine the problem of tracking dynamic boundaries occurring in natural phenomena using a network of range sensors. Two main challenges of the boundary tracking problem are accurate boundary estimation from noisy observations and continuous tracking of the boundary. We propose Dynamic Boundary Tracking (DBTR), an algorithm that combines the spatial estimation and temporal estimation techniques. The regression-based spatial estimation technique determines discrete points on the boundary and estimates a confidence band around the entire boundary. In addition, a Kalman Filter-based temporal estimation technique tracks changes in the boundary and aperiodically updates the spatial estimate to meet accuracy requirements. DBTR provides a low energy solution compared to similar periodic update techniques to track boundaries without requiring prior knowledge about the dynamics. Experimental results demonstrate the effectiveness of our algorithm; estimated confidence bands indicate a loss of coverage of less than 2 to 5 percent for a variety of boundaries with different spatial characteristics. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Comment on “A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks”

    Page(s): 1775 - 1776
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (155 KB) |  | HTML iconHTML  

    The purpose of this comment is to show that Duato's condition for deadlock freedom is only sufficient and not necessary. We propose a fix to keep the condition necessary. The issue is subtle but essential: in a wormhole network worms necessarily do not intersect. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cover3

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (156 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (155 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology