By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 11 • Date Nov. 2012

Filter Results

Displaying Results 1 - 20 of 20
  • [Front cover]

    Publication Year: 2012 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (97 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Publication Year: 2012 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (126 KB)  
    Freely Available from IEEE
  • A Distributed Constraint Satisfaction Problem Approach to Virtual Device Composition

    Publication Year: 2012 , Page(s): 1997 - 2009
    Cited by:  Papers (2)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1904 KB)  

    The dynamic composition of networked appliances, or virtual devices, enables users to generate complex, strong, and specific systems. Current MANET-based composition schemes use service discovery mechanisms that depend on periodic service advertising by controlled broadcast, resulting in the unnecessary depletion of node resources. The assumption that, once generated, a virtual device is to remain static is false; the device should gracefully degrade and upgrade along with the conditions in the user's environment, particularly the network's current performance requirements. Presently, schemes for infrastructure-less virtual device composition and management do not consider this adaptation. We present a distributed constraint satisfaction problem (distCSP) for virtual device composition in MANETs that addresses these issues together with simulations that show its effectiveness and efficiency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive-Tree Multicast: Efficient Multidestination Support for CMP Communication Substrate

    Publication Year: 2012 , Page(s): 2010 - 2023
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2048 KB) |  | HTML iconHTML  

    Multidestination communications are a highly necessary capability for many coherence protocols in order to minimize on-chip hit latency. Although CMPs share this necessity, up to now few suitable proposals have been developed. The combination of resource scarcity and the common idea that multicast support requires a substantial amount of extra resources is responsible for this situation. In this work, we propose a new approach suitable for on-chip networks capable of managing multidestination traffic via hardware in an efficient way with negligible complexity. We introduce a novel multicast routing mechanism, able to circumvent many of the limitations of conventional multicast schemes. Adaptive-tree multicasting is able to maintain correctness for multiflit multicast messages without routing restrictions, while also coupling correctness and performance in a natural way. Replication restrictions not only guarantee the presence of enough resources to avoid deadlock, but also dynamically adapt tree shape to network conditions, routing multicast messages through noncongested paths. The performance results, using a state-of-the-art full system simulation framework, show that it improves the average full system performance of a CMP by 20 percent and network ED2P by 15 percent, when compared to a state-of-the-art router with conventional multicast support and similar implementation cost. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Effective Execution Time Approximation Method for Parallel Computing

    Publication Year: 2012 , Page(s): 2024 - 2032
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (951 KB) |  | HTML iconHTML  

    In performance modeling of parallel synchronous iterative applications, the longest individual execution time among parallel processors determines the iteration time and often must be estimated for performance analysis. This involves the mean maximum calculation which has been a challenge in computer modeling for a long time. For large systems, numerical methods are not suitable because of heavy computation requirements and inaccuracy caused by rounding. On the other hand, previous approximation methods face challenges of accuracy and generality, especially for heterogeneous computing environments. This paper presents an interesting property of extreme values to enable Effective Mean Maximum Approximation (EMMA). Compared to previous mean maximum execution time approximation methods, this method is more accurate and general to different computational environments. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Optimized High-Throughput Strategy for Constructing Inverted Files

    Publication Year: 2012 , Page(s): 2033 - 2044
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1137 KB) |  | HTML iconHTML  

    Current high-throughput algorithms for constructing inverted files all follow the MapReduce framework, which presents a high-level programming model that hides the complexities of parallel programming. In this paper, we take an alternative approach and develop a novel strategy that exploits the current and emerging architectures of multicore processors. Our algorithm is based on a high-throughput pipelined strategy that produces parallel parsed streams, which are immediately consumed at the same rate by parallel indexers. We have performed extensive tests of our algorithm on a cluster of 32 nodes, and were able to achieve a throughput close to the peak throughput of the I/O system: a throughput of 280 MB/s on a single node and a throughput that ranges between 5.15 GB/s (1 Gb/s Ethernet interconnect) and 6.12 GB/s (10 Gb/s InfiniBand interconnect) on a cluster with 32 nodes for processing the ClueWeb09 data set. Such a performance represents a substantial gain over the best known MapReduce algorithms even when comparing the single node performance of our algorithm to MapReduce algorithms running on large clusters. Our results shed a light on the extent of the performance cost that may be incurred by using the simpler, higher level MapReduce programming model for large scale applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Autotuning GEMM Kernels for the Fermi GPU

    Publication Year: 2012 , Page(s): 2045 - 2057
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (736 KB) |  | HTML iconHTML  

    In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientific and engineering applications, even more so since the introduction of the Fermi architecture by NVIDIA, with features essential to numerical computing, such as fast double precision arithmetic and memory protected with error correction codes. Being the crucial component of numerical software packages, such as LAPACK and ScaLAPACK, the general dense matrix multiplication routine is one of the more important workloads to be implemented on these devices. This paper presents a methodology for producing matrix multiplication kernels tuned for a specific architecture, through a canonical process of heuristic autotuning, based on generation of multiple code variants and selecting the fastest ones through benchmarking. The key contribution of this work is in the method for generating the search space; specifically, pruning it to a manageable size. Performance numbers match or exceed other available implementations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compiler-Assisted Data Distribution and Network Configuration for Chip Multiprocessors

    Publication Year: 2012 , Page(s): 2058 - 2066
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1012 KB)  

    Data access latency, a limiting factor in the performance of chip multiprocessors, grows significantly with the number of cores in nonuniform cache architectures with distributed cache banks. To mitigate this effect, we use a compiler-based approach to leverage data access locality, choose an optimized data placement and efficiently configure the on-chip network. The proposed experimental compiler framework employs novel compilation techniques to discover and represent multithreaded memory access patterns (MMAPs). At runtime, symbolic MMAPs are resolved and used by a partitioning algorithm to choose a partition of allocated memory blocks among the forked threads in the analyzed application. This partition is used to enforce data ownership by associating the data with the core that executes the thread owning the data. Based on the partition, the communication pattern of the application can be extracted. We demonstrate how this information can be used in an experimental architecture to accelerate applications. In particular, our compiler assisted data partitioning approach shows a 20 percent speedup over shared caching and 5 percent speedup over the closest runtime approximation, first touch. By leveraging the communication pattern we can achieve a comparable performance to a system that uses a complex centralized network configuration system at runtime. Thus, our final system saves significant runtime complexity and achieves an 5.1 percent additional speedup through the addition of the reconfigurable network. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Datacenter at the Airport: Reasoning about Time-Dependent Parking Lot Occupancy

    Publication Year: 2012 , Page(s): 2067 - 2080
    Cited by:  Papers (8)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1233 KB)  

    Recently, Olariu et al. [3], [7], [18], [19], [20] proposed to refer to a dynamic group of vehicles whose excess computing, sensing, communication, and storage resources can be coordinated and dynamically allocated to authorized users, as a vehicular cloud. One of the characteristics that distinguishes vehicular clouds from conventional clouds is the dynamically changing amount of available resources that, in some cases, may fluctuate rather abruptly. In this work, we envision a vehicular cloud involving cars in the long-term parking lot of a typical international airport. The patrons of such a parking lot are typically on travel for several days, providing a pool of cars that can serve as the basis for a datacenter at the airport. We anticipate a park and plug scenario where the cars that participate in the vehicular cloud are plugged into a standard power outlet and are provided Ethernet connection to a central server at the airport. In order to be able to schedule resources and to assign computational tasks to the various cars in the vehicular cloud, a fundamental prerequisite is to have an accurate picture of the number of vehicles that are expected to be present in the parking lot as a function of time. What makes the problem difficult is the time-varying nature of the arrival and departure rates. In this work, we concern ourselves with predicting the parking occupancy given time-varying arrival and departure rates. Our main contribution is to provide closed forms for the probability distribution of the parking lot occupancy as a function of time, for the expected number of cars in the parking lot and its variance, and for the limiting behavior of these parameters as time increases. In addition to analytical results, we have obtained a series of empirical results that confirm the accuracy of our analytical predictions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Data Migration to Conserve Energy in Streaming Media Storage Systems

    Publication Year: 2012 , Page(s): 2081 - 2093
    Cited by:  Papers (1)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1188 KB)  

    Reducing energy consumption has been an important design issue for large-scale streaming media storage systems. Existing energy conservation techniques are inadequate to achieve high energy efficiency for streaming media computing environments due to high data migration overhead. To address this problem, we propose in this paper a new energy-efficient method called Explicit Energy Saving Disk Cooling or EESDC. EESDC significantly reduces data migration overhead because of two reasons. First, a set of disks referred to Explicit Energy Saving Disks (EESD) is explicitly fixed according to temporal system load. Second, all the migrated data in EESDC directly contribute on extending the idle time of EESD to conserve more energy efficiently. Therefore, the EESDC method is conducive to saving more energy by quickly achieving energy-efficient data layouts without unnecessary data migrations. We implement EESDC in a simulated disk system, which is validated against a prototype system powered by our EESDC. Our experimental results using both real-world traces and synthetic traces show that EESDC can save up to 28.13-29.33 percent energy consumption for typical streaming media traces. Energy efficiency of streaming media storage systems can be improved by 3.3-6.0 times when EESDC is coupled. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Misplaced-Tag Pinpointing in Large RFID Systems

    Publication Year: 2012 , Page(s): 2094 - 2106
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1253 KB) |  | HTML iconHTML  

    Radio-Frequency Identification (RFID) technology brings many innovative applications. Of great importance to RFID applications in production economics is misplaced-tag pinpointing (MTP), because misplacement errors fail optimal inventory placement and thus significantly decrease profit. The existing MTP solution [1], originally proposed from a data-processing perspective, collects and processes a large amount of data. It suffers from time inefficiency (and energy-inefficiency as well if active tags are in use). The problem of finding efficient solutions for the MTP problem from the communication protocol design perspective has never been investigated before. In this paper, we propose a series of protocols toward efficient MTP solutions in large RFID systems. The proposed protocols detect misplaced tags using reader positions instead of tag positions to guarantee the efficiency and scalability as system scale grows, because RFID readers are much fewer than tags. Considering applications that employ active tags, we further propose a solution requiring responses from only a subset of tags in favor of energy saving. We also design a distributed protocol that enables each reader to independently detect misplaced tags. We then investigate how to apply the proposed protocols in scenarios with tag mobility. To evaluate the proposed protocols, we analyze their optimal performances to demonstrate their efficiency potential and also conduct extensive simulation experiments. The results show that the proposed protocols can significantly increase the time efficiency and the energy efficiency by over 70 percent on average when compared with the best existing work. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-Efficient Tree-Based Multipath Power Control for Underwater Sensor Networks

    Publication Year: 2012 , Page(s): 2107 - 2116
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (932 KB) |  | HTML iconHTML  

    Due to the use of acoustic channels with limited available bandwidth, Underwater Sensor Networks (USNs) often suffer from significant performance restrictions such as low reliability, low energy-efficiency, and high end-to-end packet delay. The provisioning of reliable, energy-efficient, and low-delay communication in USNs has become a challenging research issue. In this paper, we take noise attenuation in deep water areas into account and propose a novel layered multipath power control (LMPC) scheme in order to reduce the energy consumption as well as enhance reliable and robust communication in USNs. To this end, we first formalize an optimization problem to manage transmission power and control data rate across the whole network. The objective is to minimize energy consumption and simultaneously guarantee the other performance metrics. After proving that this optimization problem is NP-complete, we solve the key problems of LMPC including establishment of the energy-efficient tree and management of energy distribution and further develop a heuristic algorithm to achieve the feasible solution of the optimization problem. Finally, the extensive simulation experiments are conducted to evaluate the network performance under different working conditions. The results reveal that the proposed LMPC scheme outperforms the existing mechanism significantly. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hamiltonian Embedding in Crossed Cubes with Failed Links

    Publication Year: 2012 , Page(s): 2117 - 2124
    Cited by:  Papers (3)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1340 KB)  

    The crossed cube is a prominent variant of the well known, highly regular-structured hypercube. In [24], it is shown that due to the loss of regularity in link topology, generating Hamiltonian cycles, even in a healthy crossed cube, is a more complicated procedure than in the hypercube, and fewer Hamiltonian cycles can be generated in the crossed cube. Because of the importance of fault-tolerance in interconnection networks, in this paper, we treat the problem of embedding Hamiltonian cycles into a crossed cube with failed links. We establish a relationship between the faulty link distribution and the crossed cube's tolerability. A succinct algorithm is proposed to find a Hamiltonian cycle in a CQn tolerating up to n-2 failed links. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Maintaining Data Consistency in Structured P2P Systems

    Publication Year: 2012 , Page(s): 2125 - 2137
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1913 KB) |  | HTML iconHTML  

    A fundamental challenge of supporting mutable data replication in a Peer-to-Peer (P2P) system is to efficiently maintain consistency. This paper presents a framework for Balanced Consistency Maintenance (BCoM) in structured P2P systems with heterogeneous node capabilities and various workload patterns. Replica nodes of each object are organized into a tree structure for disseminating updates, and a sliding window update protocol is developed for consistency maintenance. We present an analytical model to optimize the window size according to the dynamic network conditions, workload patterns and resource limits. In this way, BCoM balances the consistency strictness, object availability for updates, and update propagation performance for various application requirements. On top of the dissemination tree, two enhancements are proposed: (1) a fast recovery scheme to strengthen the robustness against node and link failures, and (2) a node migration policy to remove and prevent bottlenecks allowing more efficient update delivery. Simulations are conducted using P2PSim to evaluate BCoM in comparison to SCOPE [1]. The experimental results demonstrate that BCoM outperforms SCOPE with lower discard rates. BCoM achieves a discard rate as low as 5 percent in most cases while SCOPE has almost 100 percent discard rate. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining Frequent Trajectory Patterns for Activity Monitoring Using Radio Frequency Tag Arrays

    Publication Year: 2012 , Page(s): 2138 - 2149
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (910 KB) |  | HTML iconHTML  

    Activity monitoring, a crucial task in many applications, is often conducted expensively using video cameras. Effectively monitoring a large field by analyzing images from multiple cameras remains a challenging issue. Other approaches generally require the tracking objects to attach special devices, which are infeasible in many scenarios. To address the issue, we propose to use RF tag arrays for activity monitoring, where data mining techniques play a critical role. The RFID technology provides an economically attractive solution due to the low cost of RF tags and readers. Another novelty of this design is that the tracking objects do not need to be equipped with any RF transmitters or receivers. By developing a practical fault-tolerant method, we offset the noise of RF tag data and mine frequent trajectory patterns as models of regular activities. Our empirical study using real RFID systems and data sets verifies the feasibility and the effectiveness of this design. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Privacy-Preserving Decentralized Key-Policy Attribute-Based Encryption

    Publication Year: 2012 , Page(s): 2150 - 2162
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (759 KB) |  | HTML iconHTML  

    Decentralized attribute-based encryption (ABE) is a variant of a multiauthority ABE scheme where each authority can issue secret keys to the user independently without any cooperation and a central authority. This is in contrast to the previous constructions, where multiple authorities must be online and setup the system interactively, which is impractical. Hence, it is clear that a decentralized ABE scheme eliminates the heavy communication cost and the need for collaborative computation in the setup stage. Furthermore, every authority can join or leave the system freely without the necessity of reinitializing the system. In contemporary multiauthority ABE schemes, a user's secret keys from different authorities must be tied to his global identifier (GID) to resist the collusion attack. However, this will compromise the user's privacy. Multiple authorities can collaborate to trace the user by his GID, collect his attributes, then impersonate him. Therefore, constructing a decentralized ABE scheme with privacy-preserving remains a challenging research problem. In this paper, we propose a privacy-preserving decentralized key-policy ABE scheme where each authority can issue secret keys to a user independently without knowing anything about his GID. Therefore, even if multiple authorities are corrupted, they cannot collect the user's attributes by tracing his GID. Notably, our scheme only requires standard complexity assumptions (e.g., decisional bilinear Diffie-Hellman) and does not require any cooperation between the multiple authorities, in contrast to the previous comparable scheme that requires nonstandard complexity assumptions (e.g., q-decisional Diffie-Hellman inversion) and interactions among multiple authorities. To the best of our knowledge, it is the first decentralized ABE scheme with privacy-preserving based on standard complexity assumptions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Program Regularization in Memory Consistency Verification

    Publication Year: 2012 , Page(s): 2163 - 2174
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1058 KB) |  | HTML iconHTML  

    A widely adopted methodology for verifying the memory subsystem of a Chip Multiprocessor (CMP) is to verify executions of parallel test programs on the CMP against the given memory consistency model, which has been long known to be time consuming in both theory and practice. To accelerate memory consistency verification, previous approaches have to bear the cost of availability (e.g., relying on dedicated hardware supports that have not been offered by many commodity CMPs) or completeness (e.g., missing some bugs). In the meantime, the impact of parallel programs on memory consistency verification has more or less been overlooked. One piece of evidence is that few investigations have been dedicated to finding appropriate test programs enabling more efficient verification From a novel perspective of test program, we devise a practical technique called “program regularization,” which can effectively reduce the computation time of memory consistency verification. The key intuition behind program regularization is that any parallel program, if being reformed appropriately, can enable efficient memory consistency verification. More specifically, for an original program, program regularization introduces some auxiliary memory addresses, and periodically inserts load/store operations accessing these addresses to the original program. With the regularized program, memory consistency verification can be accomplished in linear time (with respect to the number of memory operations) when the number of processors is fixed. Experimental results show that program regularization can significantly accelerate memory consistency verification. Last but not least, our technique, which does not rely on concrete verification algorithm or dedicated hardware support, can be smoothly integrated into existing presilicon/postsilicon verification platforms of industrial CMPs to speed up memory consistency verification. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • X-BOT: A Protocol for Resilient Optimization of Unstructured Overlay Networks

    Publication Year: 2012 , Page(s): 2175 - 2188
    Cited by:  Papers (2)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1376 KB)  

    Gossip, or epidemic, protocols have emerged as a highly scalable and resilient approach to implement several application level services such as reliable multicast, data aggregation, publish-subscribe, among others. All these protocols organize nodes in an unstructured random overlay network. In many cases, it is interesting to bias the random overlay in order to optimize some efficiency criteria, for instance, to reduce the stretch of the overlay routing. In this paper, we propose X-BOT, a new protocol that allows to bias the topology of an unstructured gossip overlay network. X-BOT is completely decentralized and, unlike previous approaches, preserves several key properties of the original (nonbiased) overlay (most notably, the node degree and consequently, the overlay connectivity). Experimental results show that X-BOT can generate more efficient overlays than previous approaches independently of the underlying physical network topology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TPDS: information for authors [inside back cover]

    Publication Year: 2012 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (126 KB)  
    Freely Available from IEEE
  • Table of contents, continued [Back cover]

    Publication Year: 2012 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (97 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology