By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 8 • Date Aug. 2012

Filter Results

Displaying Results 1 - 24 of 24
  • Cover1

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (124 KB)  
    Freely Available from IEEE
  • Cover2

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (156 KB)  
    Freely Available from IEEE
  • A Sequentially Consistent Multiprocessor Architecture for Out-of-Order Retirement of Instructions

    Page(s): 1361 - 1368
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1188 KB) |  | HTML iconHTML  

    Out-of-order retirement of instructions has been shown to be an effective technique to increase the number of in-flight instructions. This form of runtime scheduling can reduce pipeline stalls caused by head-of-line blocking effects in the reorder buffer (ROB). Expanding the width of the instruction window can be highly beneficial to multiprocessors that implement a strict memory model, especially when both loads and stores encounter long latencies due to cache misses, and whose stalls must be overlapped with instruction execution to overcome the memory latencies. Based on the Validation Buffer (VB) architecture (a previously proposed out-of-order retirement, checkpoint-free architecture for single processors), this paper proposes a cost-effective, scalable, out-of-order retirement multiprocessor, capable of enforcing sequential consistency without impacting the design of the memory hierarchy or interconnect. Our simulation results indicate that utilizing a VB can speed up both relaxed and sequentially consistent in-order retirement in future multiprocessor systems by between 3 and 20 percent, depending on the ROB size. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Survey of Parallel Programming Models and Tools in the Multi and Many-Core Era

    Page(s): 1369 - 1386
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (477 KB) |  | HTML iconHTML  

    In this work, we present a survey of the different parallel programming models and tools available today with special consideration to their suitability for high-performance computing. Thus, we review the shared and distributed memory approaches, as well as the current heterogeneous parallel programming model. In addition, we analyze how the partitioned global address space (PGAS) and hybrid parallel programming models are used to combine the advantages of shared and distributed memory systems. The work is completed by considering languages with specific parallel support and the distributed programming paradigm. In all cases, we present characteristics, strengths, and weaknesses. The study shows that the availability of multi-core CPUs has given new impulse to the shared memory parallel programming approach. In addition, we find that hybrid parallel programming is the current way of harnessing the capabilities of computer clusters with multi-core nodes. On the other hand, heterogeneous programming is found to be an increasingly popular paradigm, as a consequence of the availability of multi-core CPUs+GPUs systems. The use of open industry standards like OpenMP, MPI, or OpenCL, as opposed to proprietary solutions, seems to be the way to uniformize and extend the use of parallel programming models. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cashing in on the Cache in the Cloud

    Page(s): 1387 - 1399
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2418 KB) |  | HTML iconHTML  

    Over the past decades, caching has become the key technology used for bridging the performance gap across memory hierarchies via temporal or spatial localities; in particular, the effect is prominent in disk storage systems. Applications that involve heavy I/O activities, which are common in the cloud, probably benefit the most from caching. The use of local volatile memory as cache might be a natural alternative, but many well-known restrictions, such as capacity and the utilization of host machines, hinder its effective use. In addition to technical challenges, providing cache services in clouds encounters a major practical issue (quality of service or service level agreement issue) of pricing. Currently, (public) cloud users are limited to a small set of uniform and coarse-grained service offerings, such as High-Memory and High-CPU in Amazon EC2. In this paper, we present the cache as a service (CaaS) model as an optional service to typical infrastructure service offerings. Specifically, the cloud provider sets aside a large pool of memory that can be dynamically partitioned and allocated to standard infrastructure services as disk cache. We first investigate the feasibility of providing CaaS with the proof-of-concept elastic cache system (using dedicated remote memory servers) built and validated on the actual system, and practical benefits of CaaS for both users and providers (i.e., performance and profit, respectively) are thoroughly studied with a novel pricing scheme. Our CaaS model helps to leverage the cloud economy greatly in that 1) the extra user cost for I/O performance gain is minimal if ever exists, and 2) the provider's profit increases due to improvements in server consolidation resulting from that performance gain. Through extensive experiments with eight resource allocation strategies, we demonstrate that our CaaS model can be a promising cost-efficient solution for both users and providers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cost-Driven Scheduling of Grid Workflows Using Partial Critical Paths

    Page(s): 1400 - 1414
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2998 KB) |  | HTML iconHTML  

    Recently, utility Grids have emerged as a new model of service provisioning in heterogeneous distributed systems. In this model, users negotiate with service providers on their required Quality of Service and on the corresponding price to reach a Service Level Agreement. One of the most challenging problems in utility Grids is workflow scheduling, i.e., the problem of satisfying the QoS of the users as well as minimizing the cost of workflow execution. In this paper, we propose a new QoS-based workflow scheduling algorithm based on a novel concept called Partial Critical Paths (PCP), that tries to minimize the cost of workflow execution while meeting a user-defined deadline. The PCP algorithm has two phases: in the deadline distribution phase it recursively assigns subdeadlines to the tasks on the partial critical paths ending at previously assigned tasks, and in the planning phase it assigns the cheapest service to each task while meeting its subdeadline. The simulation results show that the performance of the PCP algorithm is very promising. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed Diagnosis of Dynamic Events in Partitionable Arbitrary Topology Networks

    Page(s): 1415 - 1426
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1223 KB)  

    This work introduces the Distributed Network Reachability (DNR) algorithm, a distributed system-level diagnosis algorithm that allows every node of a partitionable arbitrary topology network to determine which portions of the network are reachable and unreachable. DNR is the first distributed diagnosis algorithm that works in the presence of network partitions and healings caused by dynamic fault and repair events. Both crash and timing faults are assumed, and a faulty node is indistinguishable of a network partition. Every link is alternately tested by one of its adjacent nodes at subsequent testing intervals. Upon the detection of a new event, the new diagnostic information is disseminated to reachable nodes. New events can occur before the dissemination completes. Any time a new event is detected or informed, a working node may compute the network reachability using local diagnostic information. The bounded correctness of DNR is proved, including the bounded diagnostic latency, bounded startup and accuracy. Simulation results are presented for several random and regular topologies, showing the performance of the algorithm under highly dynamic fault situations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed Privacy-Preserving Access Control in Sensor Networks

    Page(s): 1427 - 1438
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2250 KB) |  | HTML iconHTML  

    The owner and users of a sensor network may be different, which necessitates privacy-preserving access control. On the one hand, the network owner need enforce strict access control so that the sensed data are only accessible to users willing to pay. On the other hand, users wish to protect their respective data access patterns whose disclosure may be used against their interests. This paper presents DP2AC, a Distributed Privacy-Preserving Access Control scheme for sensor networks, which is the first work of its kind. Users in DP2AC purchase tokens from the network owner whereby to query data from sensor nodes which will reply only after validating the tokens. The use of blind signatures in token generation ensures that tokens are publicly verifiable yet unlinkable to user identities, so privacy-preserving access control is achieved. A central component in DP2AC is to prevent malicious users from reusing tokens, for which we propose a suite of distributed token reuse detection (DTRD) schemes without involving the base station. These schemes share the essential idea that a sensor node checks with some other nodes (called witnesses) whether a token has been used, but they differ in how the witnesses are chosen. We thoroughly compare their performance with regard to TRD capability, communication overhead, storage overhead, and attack resilience. The efficacy and efficiency of DP2AC are confirmed by detailed performance evaluations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic Beacon Mobility Scheduling for Sensor Localization

    Page(s): 1439 - 1452
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2160 KB) |  | HTML iconHTML  

    In mobile-beacon assisted sensor localization, beacon mobility scheduling aims to determine the best beacon trajectory so that each sensor receives sufficient beacon signals and becomes localized with minimum delay. We propose a novel DeteRministic dynamic bEAcon Mobility Scheduling (DREAMS) algorithm, without requiring any prior knowledge of the sensory field. In this algorithm, the beacon trajectory is defined as the track of Depth-First Traversal (DFT) of the network graph, thus deterministic. The mobile beacon performs DFT dynamically, under the instruction of nearby sensors on the fly. It moves from sensor to sensor in an intelligent heuristic manner according to Received Signal Strength (RSS)-based distance measurements. We prove that DREAMS guarantees full localization (every sensor is localized) when the measurements are noise-free, and derive the upper bound of beacon total moving distance in this case. Then, we suggest to apply node elimination and Local Minimum Spanning Tree (LMST) to shorten beacon tour and reduce delay. Further, we extend DREAMS to multibeacon scenarios. Beacons with different coordinate systems compete for localizing sensors. Loser beacons agree on winner beacons' coordinate system, and become cooperative in subsequent localization. All sensors are finally localized in a commonly agreed coordinate systems. Through simulation we show that DREAMS guarantees full localization even with noisy distance measurements. We evaluate its performance on localization delay and communication overhead in comparison with a previously proposed static path-based scheduling method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Hardware Barrier Synchronization in Many-Core CMPs

    Page(s): 1453 - 1466
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1546 KB) |  | HTML iconHTML  

    Traditional software-based barrier implementations for shared memory parallel machines tend to produce hotspots in terms of memory and network contention as the number of processors increases. This could limit their applicability to future many-core CMPs in which possibly several dozens of cores would need to be synchronized efficiently. In this work, we develop GBarrier, a hardware-based barrier mechanism especially aimed at providing efficient barriers in future many-core CMPs. Our proposal deploys a dedicated G-line-based network to allow for fast and efficient signaling of barrier arrival and departure. Since GBarrier does not have any influence on the memory system, we avoid all coherence activity and barrier-related network traffic that traditional approaches introduce and that restrict scalability. Through detailed simulations of a 32-core CMP, we compare GBarrier against one of the most efficient software-based barrier implementations for a set of kernels and scientific applications. Evaluation results show average reductions of 54 and 21 percent in execution time, 53 and 18 percent in network traffic, and also 76 and 31 percent in the energy-delay2 product metric for the full CMP when the kernels and scientific applications, respectively, are considered. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data

    Page(s): 1467 - 1479
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (929 KB) |  | HTML iconHTML  

    Cloud computing economically enables the paradigm of data service outsourcing. However, to protect data privacy, sensitive cloud data have to be encrypted before outsourced to the commercial public cloud, which makes effective data utilization service a very challenging task. Although traditional searchable encryption techniques allow users to securely search over encrypted data through keywords, they support only Boolean search and are not yet sufficient to meet the effective data utilization need that is inherently demanded by large number of users and huge amount of data files in cloud. In this paper, we define and solve the problem of secure ranked keyword search over encrypted cloud data. Ranked search greatly enhances system usability by enabling search result relevance ranking instead of sending undifferentiated results, and further ensures the file retrieval accuracy. Specifically, we explore the statistical measure approach, i.e., relevance score, from information retrieval to build a secure searchable index, and develop a one-to-many order-preserving mapping technique to properly protect those sensitive score information. The resulting design is able to facilitate efficient server-side ranking without losing keyword privacy. Thorough analysis shows that our proposed solution enjoys “as-strong-as-possible” security guarantee compared to previous searchable encryption schemes, while correctly realizing the goal of ranked keyword search. Extensive experimental results demonstrate the efficiency of the proposed solution. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-Efficient Topology Control in Cooperative Ad Hoc Networks

    Page(s): 1480 - 1491
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1711 KB) |  | HTML iconHTML  

    Cooperative communication (CC) exploits space diversity through allowing multiple nodes cooperatively relay signals to the receiver so that the combined signal at the receiver can be correctly decoded. Since CC can reduce the transmission power and extend the transmission coverage, it has been considered in topology control protocols [1], [2]. However, prior research on topology control with CC only focuses on maintaining the network connectivity, minimizing the transmission power of each node, whereas ignores the energy efficiency of paths in constructed topologies. This may cause inefficient routes and hurt the overall network performance in cooperative ad hoc networks. In this paper, to address this problem, we introduce a new topology control problem: energy-efficient topology control problem with cooperative communication, and propose two topology control algorithms to build cooperative energy spanners in which the energy efficiency of individual paths are guaranteed. Both proposed algorithms can be performed in distributed and localized fashion while maintaining the globally efficient paths. Simulation results confirm the nice performance of all proposed algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploring the Optimal Replication Strategy in P2P-VoD Systems: Characterization and Evaluation

    Page(s): 1492 - 1503
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2922 KB) |  | HTML iconHTML  

    P2P-Video-on-Demand (P2P-VoD) is a popular Internet service which aims to provide a scalable and high-quality service to users. At the same time, content providers of P2P-VoD services also need to make sure that the service is operated with a manageable operating cost. Given the volume-based charging model by ISPs, P2P-VoD content providers would like to reduce peers' access to the content server so as to reduce the operating cost. In this paper, we address an important open problem: what is the “optimal replication ratio” in a P2P-VoD system such that peers will receive service from each other and at the same time, reduce the access to the content server? We address two fundamental issues: 1) what is the optimal replication ratio of a movie if we know its popularity, and 2) how to achieve these optimal ratios in a distributed and dynamic fashion. We first formally show how movie popularities can impact server's workload, and formulate the video replication as an optimization problem. We show that the conventional wisdom of using the proportional replication strategy is “suboptimal,” and expand the design space to both “passive replacement policy” and “active push policy” to achieve the optimal replication ratios. We consider practical implementation issues, evaluate the performance of P2P-VoD systems and show how to greatly reduce server's workload and improve streaming quality via our distributed algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hamiltonian Decomposition of the Rectangular Twisted Torus

    Page(s): 1504 - 1507
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (933 KB) |  | HTML iconHTML  

    We show that the 2atimes a rectangular twisted torus introduced by Cámara et al. [5] is edge decomposable into two Hamiltonian cycles. In the process, the 2a × a × a prismatic twisted torus is edge decomposable into three Hamiltonian cycles, and the 2a × a × a prismatic doubly twisted torus admits two edge-disjoint Hamiltonian cycles. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Load Balancing Hashing in Geographic Hash Tables

    Page(s): 1508 - 1519
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (739 KB) |  | HTML iconHTML  

    In this paper, we address the problem of balancing the network traffic load when the data generated in a wireless sensor network is stored on the sensor node themselves, and accessed through querying a geographic hash table. Existing approaches allow balancing network load by changing the georouting protocol used to forward queries in the geographic hash table. However, this comes at the expense of considerably complicating the routing process, which no longer occurs along (near) straight-line trajectories, but requires computing complex geometric transformations. In this paper, we demonstrate that it is possible to balance network traffic load in a geographic hash table without changing the underlying georouting protocol. Instead of changing the (near) straight-line georouting protocol used to send a query from the node issuing the query (the source) to the node managing the queried key (the destination), we propose to “reverse engineer” the hash function used to store data in the network, implementing a sort of “load-aware” assignment of key ranges to wireless sensor nodes. This innovative methodology is instantiated into two specific approaches: an analytical one, in which the destination density function yielding quasiperfect load balancing is analytically characterized under uniformity assumptions for what concerns location of nodes and query sources; and an iterative, heuristic approach that can be used whenever these uniformity assumptions are not fulfilled. In order to prove practicality of our load balancing methodology, we have performed extensive simulations resembling realistic wireless sensor network deployments showing the effectiveness of the two proposed approaches in considerably improving load balancing and extending network lifetime. Simulation results also show that our proposed technique achieves better load balancing than an existing approach based on modifying georouting. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New Memoryless Online Routing Algorithms for Delaunay Triangulations

    Page(s): 1520 - 1527
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (584 KB) |  | HTML iconHTML  

    Memoryless online routing (MOR) algorithms are suitable for the applications only using local information to find paths, and Delaunay triangulations (DTs) are the class of geometric graphs widely proposed as network topologies. Motivated by these two facts, this paper reports a variety of new MOR algorithms that work for Delaunay triangulations, thus greatly enriching the family of such algorithms. This paper also evaluates and compares these new algorithms with three existing MOR algorithms. The experimental results shed light on their performance in terms of both Euclidean and link metrics, and also reveal certain properties of Delaunay triangulations. Finally, this paper poses three open problems, with their importance explained. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Maximizing the Lifetime of Wireless Sensor Networks Using Virtual Backbone Scheduling

    Page(s): 1528 - 1535
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (758 KB) |  | HTML iconHTML  

    Wireless Sensor Networks (WSNs) are key for various applications that involve long-term and low-cost monitoring and actuating. In these applications, sensor nodes use batteries as the sole energy source. Therefore, energy efficiency becomes critical. We observe that many WSN applications require redundant sensor nodes to achieve fault tolerance and Quality of Service (QoS) of the sensing. However, the same redundancy may not be necessary for multihop communication because of the light traffic load and the stable wireless links. In this paper, we present a novel sleep-scheduling technique called Virtual Backbone Scheduling (VBS). VBS is designed for WSNs has redundant sensor nodes. VBS forms multiple overlapped backbones which work alternatively to prolong the network lifetime. In VBS, traffic is only forwarded by backbone sensor nodes, and the rest of the sensor nodes turn off their radios to save energy. The rotation of multiple backbones makes sure that the energy consumption of all sensor nodes is balanced, which fully utilizes the energy and achieves a longer network lifetime compared to the existing techniques. The scheduling problem of VBS is formulated as the Maximum Lifetime Backbone Scheduling (MLBS) problem. Since the MLBS problem is NP-hard, we propose approximation algorithms based on the Schedule Transition Graph (STG) and Virtual Scheduling Graph (VSG). We also present an Iterative Local Replacement (ILR) scheme as a distributed implementation. Theoretical analyses and simulation studies verify that VBS is superior to the existing techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable RFID Systems: A Privacy-Preserving Protocol with Constant-Time Identification

    Page(s): 1536 - 1550
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (887 KB) |  | HTML iconHTML  

    In RFID literature, most “privacy-preserving” protocols require the reader to search all tags in the system in order to identify a single tag. In another class of protocols, the search complexity is reduced to be logarithmic in the number of tags, but it comes with two major drawbacks: it requires a large communication overhead over the fragile wireless channel, and the compromise of a tag in the system reveals secret information about other, uncompromised, tags in the same system. In this work, we take a different approach to address time complexity of private identification in large-scale RFID systems. We utilize the special architecture of RFID systems to propose a symmetric-key privacy-preserving authentication protocol for RFID systems with constant-time identification. Instead of increasing communication overhead, the existence of a large storage device in RFID systems, the database, is utilized for improving the time efficiency of tag identification. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Trustworthy Coordination of Web Services Atomic Transactions

    Page(s): 1551 - 1565
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1791 KB) |  | HTML iconHTML  

    The Web Services Atomic Transactions (WS-AT) specification makes it possible for businesses to engage in standard distributed transaction processing over the Internet using Web Services technology. For such business applications, trustworthy coordination of WS-AT is crucial. In this paper, we explain how to render WS-AT coordination trustworthy by applying Byzantine Fault Tolerance (BFT) techniques. More specifically, we show how to protect the core services described in the WS-AT specification, namely, the Activation service, the Registration service, the Completion service and the Coordinator service, against Byzantine faults. The main contribution of this work is that it exploits the semantics of the WS-AT services to minimize the use of Byzantine Agreement (BA), instead of applying BFT techniques naively, which would be prohibitively expensive. We have incorporated our BFT protocols and mechanisms into an open-source framework that implements the WS-AT specification. The resulting BFT framework for WS-AT is useful for business applications that are based on WS-AT and that require a high degree of dependability, security, and trust. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CPS Handles the Details for you [advertisement]

    Page(s): 1566
    Save to Project icon | Request Permissions | PDF file iconPDF (227 KB)  
    Freely Available from IEEE
  • Stay Connected with the IEEE Computer Society [advertisement]

    Page(s): 1567
    Save to Project icon | Request Permissions | PDF file iconPDF (232 KB)  
    Freely Available from IEEE
  • New Transactions Newsletter [advertisement]

    Page(s): 1568
    Save to Project icon | Request Permissions | PDF file iconPDF (1647 KB)  
    Freely Available from IEEE
  • [Inside back cover]

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (156 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (124 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology