By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 7 • Date July 2011

Filter Results

Displaying Results 1 - 23 of 23
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (241 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (253 KB)  
    Freely Available from IEEE
  • Cooperative Channelization in Wireless Networks with Network Coding

    Page(s): 1073 - 1084
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1158 KB) |  | HTML iconHTML  

    In this paper, we address congestion of multicast traffic in multihop wireless networks through a combination of network coding and resource reservation. Network coding reduces the number of transmissions required in multicast flows, thus allowing a network to approach its multicast capacity. In addition, it efficiently repairs errors in multicast flows by combining packets lost at different destinations. However, under conditions of extremely high congestion the repair capability of network coding is seriously degraded. In this paper, we propose cooperative channelization, in which portions of the transmission media are allocated to links that are congested at the point where network coding cannot efficiently repair loss. A health metric is proposed to allow comparison of need for channelization of different multicast links. Cooperative channelization considers the impact of channelization on overall network performance before resource reservation is triggered. Our results show that cooperative channelization improves overall network performance while being well suited for wireless networks using network coding. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Video Streaming Distribution in VANETs

    Page(s): 1085 - 1091
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (491 KB)  

    Streaming applications will rapidly develop and contribute a significant amount of traffic in the near future. A problem, scarcely addressed so far, is how to distribute video streaming traffic from one source to all nodes in an urban vehicular network. This problem significantly differs from previous work on broadcast and multicast in ad hoc networks because of the highly dynamic topology of vehicular networks and the strict delay requirements of streaming applications. We present a solution for intervehicular communications, called Streaming Urban Video (SUV), that 1) is fully distributed and dynamically adapts to topology changes, and 2) leverages the characteristics of streaming applications to yield a highly efficient, cross-layer solution. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Consensus and Mutual Exclusion in a Multiple Access Channel

    Page(s): 1092 - 1104
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1383 KB) |  | HTML iconHTML  

    We consider deterministic feasibility and time complexity of two fundamental tasks in distributed computing: consensus and mutual exclusion. Processes have different labels and communicate through a multiple access channel. The adversary wakes up some processes in possibly different rounds. In any round, every awake process either listens or transmits. The message of a process i is heard by all other awake processes, if i is the only process to transmit in a given round. If more than one process transmits simultaneously, there is a collision and no message is heard. We consider three characteristics that may or may not exist in the channel: collision detection (listening processes can distinguish collision from silence), the availability of a global clock showing the round number, and the knowledge of the number n of all processes. If none of the above three characteristics is available in the channel, we prove that consensus and mutual exclusion are infeasible; if at least one of them is available, both tasks are feasible, and we study their time complexity. Collision detection is shown to cause an exponential gap in complexity: if it is available, both tasks can be performed in time logarithmic in n, which is optimal, and without collision detection both tasks require linear time. We then investigate both consensus and mutual exclusion in the absence of collision detection, but under alternative presence of the two other features. With global clock, we give an algorithm whose time complexity linearly depends on n and on the wake-up time, and an algorithm whose complexity does not depend on the wake-up time and differs from the linear lower bound only by a factor O(log2 n). If n is known, we also show an algorithm whose complexity differs from the linear lower bound only by a factor O(log2 n). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • HaRP: Rapid Packet Classification via Hashing Round-Down Prefixes

    Page(s): 1105 - 1119
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1887 KB) |  | HTML iconHTML  

    Packet classification is central to a wide array of Internet applications and services, with its approaches mostly involving either hardware support or optimization steps needed by software-oriented techniques (to add precomputed markers and insert rules in the search data structures). Unfortunately, an approach with hardware support is expensive and has limited scalability, whereas one with optimization fails to handle incremental rule updates effectively. This work deals with rapid packet classification, realized by hashing round-down prefixes (HaRP) in a way that the source and the destination IP prefixes specified in a rule are rounded down to “designated prefix lengths” (DPL) for indexing into hash sets. HaRP exhibits superb hash storage utilization, able to not only outperform those earlier software-oriented classification techniques but also well accommodate dynamic creation and deletion of rules. HaRP makes it possible to hold all its search data structures in the local cache of each core within a contemporary processor, dramatically elevating its classification performance. Empirical results measured on an AMD 4-way 2.8 GHz Opteron system (with 1 MB cache for each core) under six filter data sets (each with up to 30 K rules) obtained from a public source unveil that HaRP enjoys up to some 3.6× throughput level achievable by the best known decision tree-based counterpart, HyperCuts (HC). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Small World of File Sharing

    Page(s): 1120 - 1134
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3058 KB) |  | HTML iconHTML  

    Web caches, content distribution networks, peer-to-peer file-sharing networks, distributed file systems, and data grids all have in common that they involve a community of users who use shared data. In each case, overall system performance can be improved significantly by first identifying and then exploiting the structure of community's data access patterns. We propose a novel perspective for analyzing data access workloads that considers the implicit relationships that form among users based on the data they access. We propose a new structure-the interest-sharing graph-that captures common user interests in data and justify its utility with studies on four data-sharing systems: a high-energy physics collaboration, the Web, the Kazaa peer-to-peer network, and a BitTorrent file-sharing community. We find small-world patterns in the interest-sharing graphs of all four communities. We investigate analytically and experimentally some of the potential causes that lead to this pattern and conclude that user preferences play a major role. The significance of small-world patterns is twofold: it provides a rigorous support to intuition and it suggests the potential to exploit these naturally emerging patterns. As a proof of concept, we design and evaluate an information dissemination system that exploits the small-world interest-sharing graphs by building an interest-aware network overlay. We show that this approach leads to improved information dissemination performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Nonnegative Tensor Factorization Accelerated Using GPGPU

    Page(s): 1135 - 1141
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (774 KB) |  | HTML iconHTML  

    This article presents an optimized algorithm for Nonnegative Tensor Factorization (NTF), implemented in the CUDA (Compute Uniform Device Architecture) framework, that runs on contemporary graphics processors and exploits their massive parallelism. The NTF implementation is primarily targeted for analysis of high-dimensional spectral images, including dimensionality reduction, feature extraction, and other tasks related to spectral imaging; however, the algorithm and its implementation are not limited to spectral imaging. The speedups measured on real spectral images are around 60 - 100× compared to a traditional C implementation compiled with an optimizing compiler. Since common problems in the field of spectral imaging may take hours on a state-of-the-art CPU, the speedup achieved using a graphics card is attractive. The implementation is publicly available in the form of a dynamically linked library, including an interface to MATLAB, and thus may be of help to researchers and engineers using NTF on large problems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Processor Array Architectures for Scalable Radix 4 Montgomery Modular Multiplication Algorithm

    Page(s): 1142 - 1149
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (712 KB)  

    This paper presents a systematic methodology for exploring possible processor arrays of scalable radix 4 modular Montgomery multiplication algorithm. In this methodology, the algorithm is first expressed as a regular iterative expression, then the algorithm data dependence graph and a suitable affine scheduling function are obtained. Four possible processor arrays are obtained and analyzed in terms of speed, area, and power consumption. To reduce power consumption, we applied low power techniques for reducing the glitches and the Expected Switching Activity (ESA) of high fan-out signals in our processor array architectures. The resulting processor arrays are compared to other efficient ones in terms of area, speed, and power consumption. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Utilization-Based Resource Partitioning for Power-Performance Efficiency in SMT Processors

    Page(s): 1150 - 1163
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1823 KB) |  | HTML iconHTML  

    Simultaneous multithreading (SMT) increases processor throughput by allowing parallel execution of several threads. However, fully sharing processor resources may cause resource monopolization by a single thread or other misallocations, resulting in overall performance degradation. Static resource partitioning techniques have been suggested, but are not as effective as dynamic ones since program behavior does change over the course of its execution. In this paper, we propose an Adaptive Resource Partitioning Algorithm (ARPA) that dynamically assigns resources to threads according to changes in thread behavior. ARPA analyzes the resource usage efficiency of each thread in a given time period and assigns more resources to threads which can use them more efficiently. Its purpose is to improve the efficiency of resource utilization, thereby improving overall instruction throughput. Our simulation results on a set of 42 multiprogramming workloads show that ARPA outperforms the traditional fetch policy ICOUNT by 55.8 percent with regard to overall instruction throughput and achieves a 33.8 percent improvement over Static Partitioning. It also outperforms the current best dynamic resource allocation technique, Hill-climbing, by 5.7 percent. Considering fairness accorded to each thread, ARPA attains 43.6, 18.5, and 9.2 percent improvements over ICOUNT, Static Partitioning, and Hill-climbing, respectively, using a common fairness metric. We also explore the energy efficiency of dynamically controlling the number of powered-on reorder buffer entries for ARPA. Compared with ARPA, our energy-aware resource partitioning algorithm achieves 10.6 percent energy savings, while the performance loss is negligible. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unraveling the BitTorrent Ecosystem

    Page(s): 1164 - 1177
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4197 KB) |  | HTML iconHTML  

    BitTorrent is the most successful open Internet application for content distribution. Despite its importance, both in terms of its footprint in the Internet and the influence it has on emerging P2P applications, the BitTorrent Ecosystem is only partially understood. We seek to provide a nearly complete picture of the entire public BitTorrent Ecosystem. To this end, we crawl five of the most popular torrent-discovery sites over a ine-month period, identifying all of 4.6 million and 38,996 trackers that the sites reference. We also develop a high-performance tracker crawler, and over a narrow window of 12 hours, crawl essentially all of the public Ecosystem's trackers, obtaining peer lists for all referenced torrents. Complementing the torrent-discovery site and tracker crawling, we further crawl Azureus and Mainline DHTs for a random sample of torrents. Our resulting measurement data are more than an order of magnitude larger (in terms of number of torrents, trackers, or peers) than any earlier study. Using this extensive data set, we study in-depth the Ecosystem's torrent-discovery, tracker, peer, user behavior, and content landscapes. For peer statistics, the analysis is based on one typical snapshot obtained over 12 hours. We further analyze the fragility of the Ecosystem upon the removal of its most important tracker service. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Lightweight Chip Multi-Threading (LCMT): Maximizing Fine-Grained Parallelism On-Chip

    Page(s): 1178 - 1191
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2016 KB) |  | HTML iconHTML  

    Irregular and dynamic applications, such as graph problems and agent-based simulations, often require fine-grained parallelism to achieve good performance. However, current multicore processors only provide architectural support for coarse-grained parallelism, making it necessary to use software-based multithreading environments to effectively implement fine-grained parallelism. Although these software-based environments have demonstrated superior performance over heavyweight, OS-level threads, they are still limited by the significant overhead involved in thread management and synchronization. In order to address this, we propose a Lightweight Chip Multi-Threaded (LCMT) architecture that further exploits thread-level parallelism (TLP) by incorporating direct architectural support for an “unlimited” number of dynamically created lightweight threads with very low thread management and synchronization overhead. The LCMT architecture can be implemented atop a mainstream architecture with minimum extra hardware to leverage existing legacy software environments. We compare the LCMT architecture with a Niagara-like baseline architecture. Our results show up to 1.8X better scalability, 1.91X better performance, and more importantly, 1.74X better performance per watt, using the LCMT architecture for irregular and dynamic benchmarks, when compared to the baseline architecture. The LCMT architecture delivers similar performance to the baseline architecture for regular benchmarks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Complexity of Optimal Job Co-Scheduling on Chip Multiprocessors and Heuristics-Based Solutions

    Page(s): 1192 - 1205
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1276 KB) |  | HTML iconHTML  

    In Chip Multiprocessors (CMPs) architecture, it is common that multiple cores share some on-chip cache. The sharing may cause cache thrashing and contention among co-running jobs. Job co-scheduling is an approach to tackling the problem by assigning jobs to cores appropriately so that the contention and consequent performance degradations are minimized. Job co-scheduling includes two tasks: the estimation of co-run performance, and the determination of suitable co-schedules. Most existing studies in job co-scheduling have concentrated on the first task but relies on simple techniques (e.g., trying different schedules) for the second. This paper presents a systematic exploration to the second task. The paper uncovers the computational complexity of the determination of optimal job co-schedules, proving its NP-completeness. It introduces a set of algorithms, based on graph theory and Integer/Linear Programming, for computing optimal co-schedules or their lower bounds in scenarios with or without job migrations. For complex cases, it empirically demonstrates the feasibility for approximating the optimal effectively by proposing several heuristics-based algorithms. These discoveries may facilitate the assessment of job co-schedulers by providing necessary baselines, as well as shed insights to the development of co-scheduling algorithms in practical systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Throughput Optimization in Multihop Wireless Networks with Multipacket Reception and Directional Antennas

    Page(s): 1206 - 1213
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (823 KB)  

    Recent advances in the physical layer have enabled the simultaneous reception of multiple packets by a node in wireless networks. We address the throughput optimization problem in wireless networks that support multipacket reception (MPR) capability. The problem is modeled as a joint routing and scheduling problem, which is known to be NP-hard. The scheduling subproblem deals with finding the optimal schedulable sets, which are defined as subsets of links that can be scheduled or activated simultaneously. We demonstrate that any solution of the scheduling subproblem can be built with |E| + 1 or fewer schedulable sets, where |E| is the number of links of the network. This result is in contrast with previous works that stated that a solution of the scheduling subproblem is composed of an exponential number of schedulable sets. Due to the hardness of the problem, we propose a polynomial time scheme based on a combination of linear programming and approximation algorithm paradigms. We illustrate the use of the scheme to study the impact of design parameters on the performance of MPR-capable networks, including the number of transmit interfaces, the beamwidth, and the receiver range of the antennas. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Attribute-Based Access Control with Efficient Revocation in Data Outsourcing Systems

    Page(s): 1214 - 1221
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (456 KB)  

    Some of the most challenging issues in data outsourcing scenario are the enforcement of authorization policies and the support of policy updates. Ciphertext-policy attribute-based encryption is a promising cryptographic solution to these issues for enforcing access control policies defined by a data owner on outsourced data. However, the problem of applying the attribute-based encryption in an outsourced architecture introduces several challenges with regard to the attribute and user revocation. In this paper, we propose an access control mechanism using ciphertext-policy attribute-based encryption to enforce access control policies with efficient attribute and user revocation capability. The fine-grained access control can be achieved by dual encryption mechanism which takes advantage of the attribute-based encryption and selective group key distribution in each attribute group. We demonstrate how to apply the proposed mechanism to securely manage the outsourced data. The analysis results indicate that the proposed scheme is efficient and secure in the data outsourcing systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Network Immunization with Distributed Autonomy-Oriented Entities

    Page(s): 1222 - 1229
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (863 KB)  

    Many communication systems, e.g., internet, can be modeled as complex networks. For such networks, immunization strategies are necessary for preventing malicious attacks or viruses being percolated from a node to its neighboring nodes following their connectivities. In recent years, various immunization strategies have been proposed and demonstrated, most of which rest on the assumptions that the strategies can be executed in a centralized manner and/or that the complex network at hand is reasonably stable (its topology will not change overtime). In other words, it would be difficult to apply them in a decentralized network environment, as often found in the real world. In this paper, we propose a decentralized and scalable immunization strategy based on a self-organized computing approach called autonomy-oriented computing (AOC) [1], [2]. In this strategy, autonomous behavior-based entities are deployed in a decentralized network, and are capable of collectively finding those nodes with high degrees of conductivities (i.e., those that can readily spread viruses). Through experiments involving both synthetic and real-world networks, we demonstrate that this strategy can effectively and efficiently locate highly-connected nodes in decentralized complex network environments of various topologies, and it is also scalable in handling large-scale decentralized networks. We have compared our strategy with some of the well-known strategies, including acquaintance and covering strategies on both synthetic and real-world networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimization of Rate Allocation with Distortion Guarantee in Sensor Networks

    Page(s): 1230 - 1237
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1064 KB)  

    Lossy compression techniques are commonly used by long-term data-gathering applications that attempt to identify trends or other interesting patterns in an entire system since a data packet need not always be completely and immediately transmitted to the sink. In these applications, a nonterminal sensor node jointly encodes its own sensed data and the data received from its nearby nodes. The tendency for these nodes to have a high spatial correlation means that these data packets can be efficiently compressed together using a rate-distortion strategy. This paper addresses the optimal rate-distortion allocation problem, which determines an optimal bit rate of each sensor based on the target overall distortion to minimize the network transmission cost. We propose an analytically optimal rate-distortion allocation scheme, and we also extend it to a distributed version. Based on the presented allocation schemes, a greedy heuristic algorithm is proposed to build the most efficient data transmission structure to further reduce the transmission cost. The proposed methods were evaluated using simulations with real-world data sets. The simulation results indicate that the optimal allocation strategy can reduce the transmission cost to 6~15% of that for the uniform allocation scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance of Acyclic Stochastic Networks with Network Coding

    Page(s): 1238 - 1245
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (537 KB)  

    Network coding allows a network node to code the information flows before forwarding them. While it has been theoretically proved that network coding can achieve maximum network throughput, the theoretical results usually do not consider the burstiness of data traffic, delays, and the stochastic nature in information processing and transmission. There is currently no theory to systematically model and evaluate the performance of network coding, especially when node's capacity (i.e., coding and transmission) becomes stochastic. Without such a theory, the performance of network coding under various system settings is far from clear. To fill the vacancy, we develop an analytical approach by extending the stochastic network calculus theory to tackle the special difficulties in the evaluation of network coding. We prove the new properties of the stochastic network calculus and design an algorithm to obtain the performance bounds for acyclic stochastic networks with network coding. The tightness of theoretical bounds is validated with simulation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Correction to "Editor's Note"

    Page(s): 1246
    Save to Project icon | Request Permissions | PDF file iconPDF (40 KB)  
    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Call for Papers for a Special Issue of IEEE Transactions on Parallel and Distributed Systems (TPDS) on Cyber-Physical Systems (CPS)

    Page(s): 1247
    Save to Project icon | Request Permissions | PDF file iconPDF (51 KB)  
    Freely Available from IEEE
  • IEEE Computer Society OnlinePlus Coming Soon to TPDS

    Page(s): 1248
    Save to Project icon | Request Permissions | PDF file iconPDF (229 KB)  
    Freely Available from IEEE
  • TPDS Information for authors

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (253 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (241 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology