By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 3 • Date March 2013

Filter Results

Displaying Results 1 - 16 of 16
  • Autogeneration and Autotuning of 3D Stencil Codes on Homogeneous and Heterogeneous GPU Clusters

    Page(s): 417 - 427
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1352 KB)  

    This paper develops and evaluates search and optimization techniques for autotuning 3D stencil (nearest neighbor) computations on GPUs. Observations indicate that parameter tuning is necessary for heterogeneous GPUs to achieve optimal performance with respect to a search space. Our proposed framework takes a most concise specification of stencil behavior from the user as a single formula, autogenerates tunable code from it, systematically searches for the best configuration and generates the code with optimal parameter configurations for different GPUs. This autotuning approach guarantees adaptive performance for different generations of GPUs while greatly enhancing programmer productivity. Experimental results show that the delivered floating point performance is very close to previous handcrafted work and outperforms other autotuned stencil codes by a large margin. Furthermore, heterogeneous GPU clusters are shown to exhibit the highest performance for dissimilar tuning parameters leveraging proportional partitioning relative to single-GPU performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Iterative Divide-and-Merge-Based Approach for Solving Large-Scale Least Squares Problems

    Page(s): 428 - 438
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (988 KB) |  | HTML iconHTML  

    Singular value decomposition (SVD) is a popular decomposition method for solving least squares estimation (LSE) problems. However, for large data sets, applying SVD directly on the coefficient matrix is very time consuming and memory demanding in obtaining least squares solutions. In this paper, we propose an iterative divide-and-merge-based estimator for solving large-scale LSE problems. Iteratively, the LSE problem to be solved is processed and transformed to equivalent but smaller LSE problems. In each iteration, the input matrices are subdivided into a set of small submatrices. The submatrices are decomposed by SVD, respectively, and the results are merged, and the resulting matrices become the input of the next iteration. The process is iterated until the resulting matrices are small enough which can then be solved directly and efficiently by SVD. The number of iterations required is determined dynamically according to the size of the input data set. As a result, the requirements in time and space for finding least squares solutions are greatly improved. Furthermore, the decomposition and merging of the submatrices in each iteration can be independently done in parallel. The idea can be easily implemented in MapReduce and experimental results show that the proposed approach can solve large-scale LSE problems effectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Buffer Management for Aggregated Streaming Data with Packet Dependencies

    Page(s): 439 - 449
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (615 KB) |  | HTML iconHTML  

    In many applications, the traffic traversing the network has interpacket dependencies due to application-level encoding schemes. For some applications, e.g., multimedia streaming, dropping a single packet may render useless the delivery of a whole sequence. In such environments, the algorithm used to decide which packet to drop in case of buffer overflows must be carefully designed, to avoid goodput degradation. We present a model that captures such interpacket dependencies, and design algorithms for performing packet discard. Traffic consists of an aggregation of multiple streams, each of which consists of a sequence of interdependent packets. We provide two guidelines for designing buffer management algorithms, and demonstrate their effectiveness. We devise an algorithm according to these guidelines and evaluate its performance analytically, using competitive analysis. We also perform a simulation study that shows that the performance of our algorithm is within a small fraction of the performance of the best known offline algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and Performance Evaluation of Overhearing-Aided Data Caching in Wireless Ad Hoc Networks

    Page(s): 450 - 463
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1524 KB) |  | HTML iconHTML  

    Wireless ad hoc network is a promising networking technology to provide users with Internet access anywhere anytime. To cope with resource constraints of wireless ad hoc networks, data caching is widely used to efficiently reduce data access cost. In this paper, we propose an efficient data caching algorithm which makes use of the overhearing property of wireless communication to improve caching performance. Due to the broadcast nature of wireless links, a packet can be overheard by a node within the transmission range of the transmitter, even if the node is not the intended target. Our proposed algorithm explores the overheard information, including data request and data reply, to optimize cache placement and cache discovery. To the best of our knowledge, this is the first work that considers the overhearing property of wireless communications in data caching. The simulation results show that, compared with one representative algorithm and a naive overhearing algorithm, our proposed algorithm can significantly reduce both message cost and access delay. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic Optimization of Multiattribute Resource Allocation in Self-Organizing Clouds

    Page(s): 464 - 478
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3171 KB) |  | HTML iconHTML  

    By leveraging virtual machine (VM) technology which provides performance and fault isolation, cloud resources can be provisioned on demand in a fine grained, multiplexed manner rather than in monolithic pieces. By integrating volunteer computing into cloud architectures, we envision a gigantic self-organizing cloud (SOC) being formed to reap the huge potential of untapped commodity computing power over the Internet. Toward this new architecture where each participant may autonomously act as both resource consumer and provider, we propose a fully distributed, VM-multiplexing resource allocation scheme to manage decentralized resources. Our approach not only achieves maximized resource utilization using the proportional share model (PSM), but also delivers provably and adaptively optimal execution efficiency. We also design a novel multiattribute range query protocol for locating qualified nodes. Contrary to existing solutions which often generate bulky messages per request, our protocol produces only one lightweight query message per task on the Content Addressable Network (CAN). It works effectively to find for each task its qualified resources under a randomized policy that mitigates the contention among requesters. We show the SOC with our optimized algorithms can make an improvement by 15-60 percent in system throughput than a P2P Grid model. Our solution also exhibits fairly high adaptability in a dynamic node-churning environment. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enabling Efficient WiFi-Based Vehicular Content Distribution

    Page(s): 479 - 492
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1481 KB) |  | HTML iconHTML  

    For better road safety and driving experience, content distribution for vehicle users through roadside Access Points (APs) becomes an important and promising complement to 3G and other cellular networks. In this paper, we introduce Cooperative Content Distribution System for Vehicles (CCDSV) which operates upon a network of infrastructure APs to collaboratively distribute contents to moving vehicles. CCDSV solves several important issues in a practical system, like the robustness to mobility prediction errors, limited resources of APs and the shared content distribution. Our system organizes the cooperative APs into a novel structure, namely, the contact map which is based on the vehicular contact patterns observed by APs. To fully utilize the wireless bandwidth provided by APs, we propose a representative-based prefetching mechanism, in which a set of representative APs are carefully selected and then share their prefetched data with others. The selection process explicitly takes into account the AP's storage capacity, storage status, inter-APs bandwidth and traffic loads on the backhaul links. We apply network coding in CCDSV to augment the distribution of shared contents. The selection of shared contents to be prefetched on an AP is based on the storage status of neighboring APs in the contact map in order to increase the information utility of each prefetched data piece. Through extensive simulations, CCDSV proves its effectiveness in vehicular content distribution under various scenarios. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Flexible Symmetrical Global-Snapshot Algorithms for Large-Scale Distributed Systems

    Page(s): 493 - 505
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (777 KB) |  | HTML iconHTML  

    Most existing global-snapshot algorithms in distributed systems use control messages to coordinate the construction of a global snapshot among all processes. Since these algorithms typically assume the underlying logical overlay topology is fully connected, the number of control messages exchanged among the whole processes is proportional to the square of number of processes, resulting in higher possibility of network congestion. Hence, such algorithms are neither efficient nor scalable for a large-scale distributed system composed of a huge number of processes. Recently, some efforts have been presented to significantly reduce the number of control messages, but doing so incurs higher response time instead. In this paper, we propose an efficient global-snapshot algorithm able to let every process finish its local snapshot in a given number of rounds. Particularly, such an algorithm allows a tradeoff between the response time and the message complexity. Moreover, our global-snapshot algorithm is symmetrical in the sense that identical steps are executed by every process. This means that our algorithm is able to achieve better workload balance and less network congestion. Most importantly, based on our framework, we demonstrate that the minimum number of control messages required by a symmetrical global-snapshot algorithm is Ω(N log N), where N is the number of processes. Finally, we also assume non-FIFO channels. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hardware Signature Designs to Deal with Asymmetry in Transactional Data Sets

    Page(s): 506 - 519
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2780 KB) |  | HTML iconHTML  

    Transactional Memory (TM) systems must track memory accesses made by concurrent transactions in order to detect conflicts. Many TM implementations use signatures for this purpose, which summarize reads and writes in fixed-size bit registers at the cost of false positives (detection of nonexisting conflicts). Signatures are commonly implemented as two separate same-sized Bloom filters, one for reads and other for writes. In contrast, transactions frequently exhibit read and write sets of uneven cardinality. This mismatch between data sets and filter storage introduces inefficiencies in the use of signatures that have some impact on performance. This paper presents different signature designs as alternatives to the common scheme to deal with the asymmetry in transactional data sets in an effective way. Basically, we analyze two classes of new signatures, called multiset and reconfigurable asymmetric signatures. The first class uses only one Bloom filter to track both read and write sets, while the second class uses Bloom filters of configurable size for reads and writes. The main focus of this paper is a thorough study of these alternative signature designs, including a statistical analysis of false positives and an experimental evaluation, providing performance results and hardware area, time and energy requirements. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improve Efficiency and Reliability in Single-Hop WSNs with Transmit-Only Nodes

    Page(s): 520 - 534
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (893 KB) |  | HTML iconHTML  

    Wireless Sensor Networks (WSNs) will play a significant role at the “edge” of the future “Internet of Things.” In particular, WSNs with transmit-only nodes are attracting more attention due to their advantages in supporting applications requiring dense and long-lasting deployment at a very low cost and energy consumption. However, the lack of receivers in transmit-only nodes renders most existing MAC protocols invalid. Based on our previous study on WSNs with pure transmit-only nodes, this work proposes a simple, yet cost effective and powerful single-hop hybrid WSN cluster architecture that contains not only transmit-only nodes but also standard nodes (with transceivers). Along with the hybrid architecture, this work also proposes a new MAC layer protocol framework called Robust Asynchronous Resource Estimation (RARE) that efficiently and reliably manages the densely deployed single-hop hybrid cluster in a self-organized fashion. Through analysis and extensive simulations, the proposed framework is shown to meet or exceed the needs of most applications in terms of the data delivery probability, QoS differentiation, system capacity, energy consumption, and reliability. To the best of our knowledge, this work is the first that brings reliable scheduling to WSNs containing both nonsynchronized transmit-only nodes and standard nodes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving the Reliability of MPI Libraries via Message Flow Checking

    Page(s): 535 - 549
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2198 KB) |  | HTML iconHTML  

    Despite the success of the Message Passing Interface (MPI), many MPI libraries have suffered from software bugs. These bugs severely impact the productivity of a large number of users, causing program failures or other errors. As a result, MPI application developers often have to spend days or weeks in vain debugging their own code. To address this daunting problem, this paper presents a new method called FlowChecker, which detects communication related bugs in MPI libraries. First, FlowChecker extracts program intentions of message passing (MP-intentions), which specify messages to be delivered from the sources to the destinations. Then FlowChecker tracks the message flows that actually occur in the underlying MPI libraries. Finally, FlowChecker checks whether the messages are correctly delivered from the sources to the destinations by comparing the message flows against the MP-intentions. If a mismatch is found, FlowChecker reports a bug and provides diagnostic information to help MPI library developers to understand and fix it. We have built a FlowChecker prototype on Linux and evaluated it with five real-world and two injected bug cases in three widely used MPI libraries, including Open MPI, MPICH2, and MVAPICH2. Our experimental results show that FlowChecker effectively detects all seven evaluated bug cases. Additionally, it provides useful diagnostic information for narrowing down or even pinpointing root causes of the bugs. Moreover, our experiments with High Performance Linpack and NAS Parallel Benchmarks show that FlowChecker induces low runtime overhead (0.9-5.6 percent on Open MPI, 0.9-8.1 percent on MPICH2, and 1.6-9.7 percent on MVAPICH2). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling and Optimization of the IEEE 802.15.4 Protocol for Reliable and Timely Communications

    Page(s): 550 - 564
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1859 KB) |  | HTML iconHTML  

    Distributed processing through ad hoc and sensor networks is having a major impact on scale and applications of computing. The creation of new cyber-physical services based on wireless sensor devices relies heavily on how well communication protocols can be adapted and optimized to meet quality constraints under limited energy resources. The IEEE 802.15.4 medium access control protocol for wireless sensor networks can support energy efficient, reliable, and timely packet transmission by a parallel and distributed tuning of the medium access control parameters. Such a tuning is difficult, because simple and accurate models of the influence of these parameters on the probability of successful packet transmission, packet delay, and energy consumption are not available. Moreover, it is not clear how to adapt the parameters to the changes of the network and traffic regimes by algorithms that can run on resource-constrained devices. In this paper, a Markov chain is proposed to model these relations by simple expressions without giving up the accuracy. In contrast to previous work, the presence of limited number of retransmissions, acknowledgments, unsaturated traffic, packet size, and packet copying delay due to hardware limitations is accounted for. The model is then used to derive a distributed adaptive algorithm for minimizing the power consumption while guaranteeing a given successful packet reception probability and delay constraints in the packet transmission. The algorithm does not require any modification of the IEEE 802.15.4 medium access control and can be easily implemented on network devices. The algorithm has been experimentally implemented and evaluated on a testbed with off-the-shelf wireless sensor devices. Experimental results show that the analysis is accurate, that the proposed algorithm satisfies reliability and delay constraints, and that the approach reduces the energy consumption of the network under both stationary and transient conditions. Specif- cally, even if the number of devices and traffic configuration change sharply, the proposed parallel and distributed algorithm allows the system to operate close to its optimal state by estimating the busy channel and channel access probabilities. Furthermore, results indicate that the protocol reacts promptly to errors in the estimation of the number of devices and in the traffic load that can appear due to device mobility. It is also shown that the effect of imperfect channel and carrier sensing on system performance heavily depends on the traffic load and limited range of the protocol parameters. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal Client-Server Assignment for Internet Distributed Systems

    Page(s): 565 - 575
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1667 KB) |  | HTML iconHTML  

    We investigate an underlying mathematical model and algorithms for optimizing the performance of a class of distributed systems over the Internet. Such a system consists of a large number of clients who communicate with each other indirectly via a number of intermediate servers. Optimizing the overall performance of such a system then can be formulated as a client-server assignment problem whose aim is to assign the clients to the servers in such a way to satisfy some prespecified requirements on the communication cost and load balancing. We show that 1) the total communication load and load balancing are two opposing metrics, and consequently, their tradeoff is inherent in this class of distributed systems; 2) in general, finding the optimal client-server assignment for some prespecified requirements on the total load and load balancing is NP-hard, and therefore; 3) we propose a heuristic via relaxed convex optimization for finding the approximate solution. Our simulation results indicate that the proposed algorithm produces superior performance than other heuristics, including the popular Normalized Cuts algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Resilient Self-Compressive Monitoring for Large-Scale Hosting Infrastructures

    Page(s): 576 - 586
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1351 KB) |  | HTML iconHTML  

    Large-scale hosting infrastructures have become the fundamental platforms for many real-world systems such as cloud computing infrastructures, enterprise data centers, and massive data processing systems. However, it is a challenging task to achieve both scalability and high precision while monitoring a large number of intranode and internode attributes (e.g., CPU usage, free memory, free disk, internode network delay). In this paper, we present the design and implementation of a Resilient self-Compressive Monitoring (RCM) system for large-scale hosting infrastructures. RCM achieves scalable distributed monitoring by performing online data compression to reduce remote data collection cost. RCM provides failure resilience to achieve robust monitoring for dynamic distributed systems where host and network failures are common. We have conducted extensive experiments using a set of real monitoring data from NCSU's virtual computing lab (VCL), PlanetLab, a Google cluster, and real Internet traffic matrices. The experimental results show that RCM can achieve up to 200 percent higher compression ratio and several orders of magnitude less overhead than the existing approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Service Provision Control in Federated Service Providing Systems

    Page(s): 587 - 600
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (696 KB) |  | HTML iconHTML  

    Different from traditional P2P systems, individuals nodes of a Federated Service Providing (FSP) system play a more active role by offering a variety of domain-specific services. The service provision control (SPC) problem is an important problem of the FSP system and will be tackled in this paper within a stochastic optimization framework through several steps. The first step focuses on using stochastic differential equations (SDEs) to model and analyze the dynamic evolution of the service demand. Driven by the SDE model, expected future performance of a FSP system is analytically evaluated in the second step. Step three utilizes the differential evolution (DE) algorithm to identify near-optimal service-providing policies for each node. The service subscription protocol is further proposed in step four to help every node adjust its local policy in accordance with the services provided by other nodes. The four steps together implement a complete solution of the SPC problem and will be called the SDE-based service-provision control (SSPC) mechanism in this paper. Experimental evaluation of the mechanism has been reported in the paper. The results show that our approach is effective in tackling the SPC problem and may be therefore suitable for many practical applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Social Similarity Favors Cooperation: The Distributed Content Replication Case

    Page(s): 601 - 613
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1479 KB) |  | HTML iconHTML  

    This paper explores how the degree of similarity within a social group can dictate the behavior of the individual nodes, so as to best tradeoff the individual with the social benefit. More specifically, we investigate the impact of social similarity on the effectiveness of content placement and dissemination. We consider three schemes that represent well the spectrum of behavior-shaped content storage strategies: the selfish, the self-aware cooperative, and the optimally altruistic ones. Our study shows that when the social group is tight (high degree of similarity), the optimally altruistic behavior yields the best performance for both the entire group (by definition) and the individual nodes (contrary to typical expectations). When the group is made up of members with almost no similarity, altruism or cooperation cannot bring much benefit to either the group or the individuals and thus, selfish behavior emerges as the preferable choice due to its simplicity. Notably, from a theoretical point of view, our “similarity favors cooperation” argument is inline with sociological interpretations of human altruistic behavior. On a more practical note, the self-aware cooperative behavior could be adopted as an easy to implement distributed alternative to the optimally altruistic one; it has close to the optimal performance for tight social groups and the additional advantage of not allowing mistreatment of any node, i.e., its induced content retrieval cost is always smaller than the cost of the selfish strategy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SPOC: A Secure and Privacy-Preserving Opportunistic Computing Framework for Mobile-Healthcare Emergency

    Page(s): 614 - 624
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1391 KB) |  | HTML iconHTML  

    With the pervasiveness of smart phones and the advance of wireless body sensor networks (BSNs), mobile Healthcare (m-Healthcare), which extends the operation of Healthcare provider into a pervasive environment for better health monitoring, has attracted considerable interest recently. However, the flourish of m-Healthcare still faces many challenges including information security and privacy preservation. In this paper, we propose a secure and privacy-preserving opportunistic computing framework, called SPOC, for m-Healthcare emergency. With SPOC, smart phone resources including computing power and energy can be opportunistically gathered to process the computing-intensive personal health information (PHI) during m-Healthcare emergency with minimal privacy disclosure. In specific, to leverage the PHI privacy disclosure and the high reliability of PHI process and transmission in m-Healthcare emergency, we introduce an efficient user-centric privacy access control in SPOC framework, which is based on an attribute-based access control and a new privacy-preserving scalar product computation (PPSPC) technique, and allows a medical user to decide who can participate in the opportunistic computing to assist in processing his overwhelming PHI data. Detailed security analysis shows that the proposed SPOC framework can efficiently achieve user-centric privacy access control in m-Healthcare emergency. In addition, performance evaluations via extensive simulations demonstrate the SPOC's effectiveness in term of providing high-reliable-PHI process and transmission while minimizing the privacy disclosure during m-Healthcare emergency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology