By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 3 • Date March 2012

Filter Results

Displaying Results 1 - 25 of 28
  • [Front cover]

    Publication Year: 2012 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (176 KB)  
    Freely Available from IEEE
  • [Cover 2]

    Publication Year: 2012 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (156 KB)  
    Freely Available from IEEE
  • (ε, δ)-Approximate Aggregation Algorithms in Dynamic Sensor Networks

    Publication Year: 2012 , Page(s): 385 - 396
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (960 KB) |  | HTML iconHTML  

    Aggregation operations are important in WSN applications. Since large numbers of applications only require approximate aggregation results rather than the exact ones, some approximate aggregation algorithms have been proposed to save energy. However, the error bounds of these algorithms are fixed and it is impossible to adjust the error bounds automatically, so they cannot meet the requirement of arbitrary precision required by various users. Thus, a uniform sampling-based algorithm was proposed by the authors of this paper to satisfy arbitrary precision requirement. Unfortunately, this uniform sampling-based algorithm is only suitable for static sensor networks. To overcome the shortcoming of the uniform sampling-based algorithm, this paper proposes four Bernoulli sampling-based and distributed approximate aggregation algorithms to process the snapshot and continuous aggregation queries in dynamic sensor networks. Theoretical analysis and experimental results show that the proposed algorithms have high performance in terms of accuracy and energy consumption. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Novel Parallel Scan for Multicore Processors and Its Application in Sparse Matrix-Vector Multiplication

    Publication Year: 2012 , Page(s): 397 - 404
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1209 KB) |  | HTML iconHTML  

    We present a novel parallel algorithm for computing the scan operations on x86 multicore processors. The existing best known parallel scan for the same platform requires the number of processors to be a power of two. But this constraint is removed from our proposed method. In the design of the algorithm architectural considerations for x86 multicore processors are given so that the rate of cache misses is reduced and the cost of thread synchronization and management is minimized. Results from tests made on a machine with dual-socket times quad-core Intel Xeon E5405 showed that the proposed solution outperformed the best known parallel reference. A novel approach to sparse matrix-vector multiplication (SpMV) based on the proposed scan is then explained. The approach, unlike the existing ones that make use of backward segmented operations, uses forward ones for more efficient caching. An implementation of the proposed SpMV was tested against the SpMV in Intel's Math Kernel Library (MKL) and merits were found in the proposed approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Survey and Evaluation of Topology-Agnostic Deterministic Routing Algorithms

    Publication Year: 2012 , Page(s): 405 - 425
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (21980 KB) |  | HTML iconHTML  

    Most standard cluster interconnect technologies are flexible with respect to network topology. This has spawned a substantial amount of research on topology-agnostic routing algorithms, which make no assumption about the network structure, thus providing the flexibility needed to route on irregular networks. Actually, such an irregularity should be often interpreted as minor modifications of some regular interconnection pattern, such as those induced by faults. In fact, topology-agnostic routing algorithms are also becoming increasingly useful for networks on chip (NoCs), where faults may make the preferred 2D mesh topology irregular. Existing topology-agnostic routing algorithms were developed for varying purposes, giving them different and not always comparable properties. Details are scattered among many papers, each with distinct conditions, making comparison difficult. This paper presents a comprehensive overview of the known topology-agnostic routing algorithms. We classify these algorithms by their most important properties, and evaluate them consistently. This provides significant insight into the algorithms and their appropriateness for different on- and off-chip environments. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Systematic Approach toward Automated Performance Analysis and Tuning

    Publication Year: 2012 , Page(s): 426 - 435
    Cited by:  Papers (1)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (881 KB) |  | HTML iconHTML  

    High productivity is critical in harnessing the power of high-performance computing systems to solve science and engineering problems. It is a challenge to bridge the gap between the hardware complexity and the software limitations. Despite significant progress in programming language, compiler, and performance tools, tuning an application remains largely a manual task, and is done mostly by experts. In this paper, we propose a systematic approach toward automated performance analysis and tuning that we expect to improve the productivity of performance debugging significantly. Our approach seeks to build a framework that facilitates the combination of expert knowledge, compiler techniques, and performance research for performance diagnosis and solution discovery. With our framework, once a diagnosis and tuning strategy has been developed, it can be stored in an open and extensible database and thus be reused in the future. We demonstrate the effectiveness of our approach through the automated performance analysis and tuning of two scientific applications. We show that the tuning process is highly automated, and the performance improvement is significant. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Aho-Corasick String Matching on Shared and Distributed-Memory Parallel Architectures

    Publication Year: 2012 , Page(s): 436 - 443
    Cited by:  Papers (2)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (966 KB) |  | HTML iconHTML  

    String matching requires a combination of (sometimes all) the following characteristics: high and/or predictable performance, support for large data sets and flexibility of integration and customization. This paper compares several software-based implementations of the Aho-Corasick algorithm for high-performance systems. We focus on the matching of unknown inputs streamed from a single source, typical of security applications and difficult to manage since the input cannot be preprocessed to obtain locality. We consider shared-memory architectures (Niagara 2, x86 multiprocessors, and Cray XMT) and distributed-memory architectures with homogeneous (InfiniBand cluster of x86 multicores) or heterogeneous processing elements (InfiniBand cluster of x86 multicores with NVIDIA Tesla C1060 GPUs). We describe how each solution achieves the objectives of supporting large dictionaries, sustaining high performance, and enabling customization and flexibility using various data sets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Intelligent Task Allocation Scheme for Multihop Wireless Networks

    Publication Year: 2012 , Page(s): 444 - 451
    Cited by:  Papers (4)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (960 KB) |  | HTML iconHTML  

    Emerging applications in Multihop Wireless Networks (MHWNs) require considerable processing power which often may be beyond the capability of individual nodes. Parallel processing provides a promising solution, which partitions a program into multiple small tasks and executes each task concurrently on independent nodes. However, multihop wireless communication is inevitable in such networks and it could have an adverse effect on distributed processing. In this paper, an adaptive intelligent task mapping together with a scheduling scheme based on a genetic algorithm is proposed to provide real-time guarantees. This solution enables efficient parallel processing in a way that only possible node collaborations with cost-effective communications are considered. Furthermore, in order to alleviate the power scarcity of MHWN, a hybrid fitness function is derived and embedded in the algorithm to extend the overall network lifetime via workload balancing among the collaborative nodes, while still ensuring the arbitrary application deadlines. Simulation results show significant performance improvement in various testing environments over existing mechanisms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Balancing Performance and Cost in CMP Interconnection Networks

    Publication Year: 2012 , Page(s): 452 - 459
    Cited by:  Papers (1)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (602 KB) |  | HTML iconHTML  

    This paper presents an innovative router design, called Rotary Router, which successfully addresses CMP cost/performance constraints. The router structure is based on two independent rings, which force packets to circulate either clockwise or counterclockwise, traveling through every port of the router. These two rings constitute a completely decentralized arbitration scheme that enables a simple, but efficient way to connect every input port to every output port. The proposed router is able to avoid network deadlock, livelock, and starvation without requiring data-path modifications. The organization of the router permits the inclusion of throughput enhancement techniques without significantly penalizing the implementation cost. In particular, the router performs adaptive routing, eliminates HOL blocking, and carries out implicit congestion control using simple arbitration and buffering strategies. Additionally, the proposal is capable of avoiding end-to-end deadlock at coherence protocol level with no physical or virtual resource replication, while guaranteeing in-order packet delivery. This facilitates router management and improves storage utilization. Using a comprehensive evaluation framework that includes full-system simulation and hardware description, the proposal is compared with two representative router counterparts. The results obtained demonstrate the Rotary Router's substantial performance and efficiency advantages. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bounding the Impact of Unbounded Attacks in Stabilization

    Publication Year: 2012 , Page(s): 460 - 466
    Cited by:  Papers (3)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (525 KB) |  | HTML iconHTML  

    Self-stabilization is a versatile approach to fault-tolerance since it permits a distributed system to recover from any transient fault that arbitrarily corrupts the contents of all memories in the system. Byzantine tolerance is an attractive feature of distributed systems that permit to cope with arbitrary malicious behaviors. Combining these two properties proved difficult: it is impossible to contain the spatial impact of Byzantine nodes in a self-stabilizing context for global tasks such as tree orientation and tree construction. We present and illustrate a new concept of Byzantine containment in stabilization. Our property, called Strong Stabilization enables to contain the impact of Byzantine nodes if they actually perform too many Byzantine actions. We derive impossibility results for strong stabilization and present strongly stabilizing protocols for tree orientation and tree construction that are optimal with respect to the number of Byzantine nodes that can be tolerated in a self-stabilizing context. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Consensus in Sparse, Mobile Ad Hoc Networks

    Publication Year: 2012 , Page(s): 467 - 474
    Cited by:  Papers (4)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (605 KB) |  | HTML iconHTML  

    Consensus is central to several applications including collaborative ones which a wireless ad hoc network can facilitate for mobile users in terrains with no infrastructure support for communication. We solve the consensus problem in a sparse network in which a node can at times have no other node in its wireless range and useful end-to-end connectivity between nodes can just be a temporary feature that emerges at arbitrary intervals of time for any given node pair. Efficient one-to-many dissemination, essential for consensus, now becomes a challenge; enough number of destinations cannot deliver a multicast unless nodes retain the multicast message for exercising opportunistic forwarding. Seeking to keep storage and bandwidth costs low, we propose two protocols. An eventually relinquishing (◇ RC) protocol that does not store messages for long is used for attempting at consensus, and an eventually quiescent (◇ QC) one that stops forwarding messages after a while is used for concluding consensus. Use of the ◇ RC protocol poses additional challenges for consensus, when the fraction, f/n, of nodes that can crash is 1/4 ≤ f/n <; 1/2. Consensus latency and packet overhead are measured through simulations and both decrease considerably even for a modest increase in network density. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Coverage and Connectivity in Duty-Cycled Wireless Sensor Networks for Event Monitoring

    Publication Year: 2012 , Page(s): 475 - 482
    Cited by:  Papers (5)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (619 KB) |  | HTML iconHTML  

    In duty-cycled wireless sensor networks (WSNs) for stochastic event monitoring, existing efforts are mainly concentrated on energy-efficient scheduling of sensor nodes to guarantee the coverage performance, ignoring another crucial issue of connectivity. The connectivity problem is extremely challenging in the duty-cycled WSNs due to the fact that the link connections between nodes are transient thus unstable. In this paper, we propose a new kind of network, partitioned synchronous network, to jointly address the coverage and connectivity problem. We analyze the coverage and connectivity performances of partitioned synchronous network and compare them with those of existing asynchronous network. We perform extensive simulations to demonstrate that the proposed partitioned synchronous network has a better connectivity performance than that of asynchronous network, while coverage performances of two types of networks are close. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cut Detection in Wireless Sensor Networks

    Publication Year: 2012 , Page(s): 483 - 490
    Cited by:  Papers (5)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1138 KB) |  | HTML iconHTML  

    A wireless sensor network can get separated into multiple connected components due to the failure of some of its nodes, which is called a “cut.” In this paper, we consider the problem of detecting cuts by the remaining nodes of a wireless sensor network. We propose an algorithm that allows 1) every node to detect when the connectivity to a specially designated node has been lost, and 2) one or more nodes (that are connected to the special node after the cut) to detect the occurrence of the cut. The algorithm is distributed and asynchronous: every node needs to communicate with only those nodes that are within its communication range. The algorithm is based on the iterative computation of a fictitious “electrical potential” of the nodes. The convergence rate of the underlying iterative scheme is independent of the size and structure of the network. We demonstrate the effectiveness of the proposed algorithm through simulations and a real hardware implementation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • DCS: Distributed Asynchronous Clock Synchronization in Delay Tolerant Networks

    Publication Year: 2012 , Page(s): 491 - 504
    Cited by:  Papers (14)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1883 KB) |  | HTML iconHTML  

    In this paper, we propose a distributed asynchronous clock synchronization (DCS) protocol for Delay Tolerant Networks (DTNs). Different from existing clock synchronization protocols, the proposed DCS protocol can achieve global clock synchronization among mobile nodes within the network over asynchronous and intermittent connections with long delays. Convergence of the clock values can be reached by compensating for clock errors using mutual relative clock information that is propagated in the network by contacted nodes. The level of clock accuracy is depreciated with respect to time in order to account for long delays between contact opportunities. Mathematical analysis and simulation results for various network scenarios are presented to demonstrate the convergence and performance of the DCS protocol. It is shown that the DCS protocol can achieve faster clock convergence speed and, as a result, reduces energy cost by half for neighbor discovery. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Determination of Wireless Networks Parameters through Parallel Hierarchical Support Vector Machines

    Publication Year: 2012 , Page(s): 505 - 512
    Cited by:  Papers (1)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (632 KB) |  | HTML iconHTML  

    We consider the problems of 1) estimating the physical locations of nodes in an indoor wireless network, and 2) estimating the channel noise in a MIMO wireless network, since knowing these parameters are important to many tasks of a wireless network such as network management, event detection, location-based service, and routing. A hierarchical support vector machines (H-SVM) scheme is proposed with the following advantages. First, H-SVM offers an efficient evaluation procedure in a distributed manner due to hierarchical structure. Second, H-SVM could determine these parameters based only on simpler network information, e.g., the hop counts, without requiring particular ranging hardware. Third, the exact mean and the variance of the estimation error introduced by H-SVM are derived which are seldom addressed in previous works. Furthermore, we present a parallel learning algorithm to reduce the computation time required for the proposed H-SVM. Thanks for the quicker matrix diagonization technique, our algorithm can reduce the traditional SVM learning complexity from O(n3) to O(n2) where n is the training sample size. Finally, the simulation results verify the validity and effectiveness for the proposed H-SVM with parallel learning algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed Throughput Optimization for ZigBee Cluster-Tree Networks

    Publication Year: 2012 , Page(s): 513 - 520
    Cited by:  Papers (5)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (638 KB) |  | HTML iconHTML  

    ZigBee, a unique communication standard designed for low-rate wireless personal area networks, has extremely low complexity, cost, and power consumption for wireless connectivity in inexpensive, portable, and mobile devices. Among the well-known ZigBee topologies, ZigBee cluster-tree is especially suitable for low-power and low-cost wireless sensor networks because it supports power saving operations and light-weight routing. In a constructed wireless sensor network, the information about some area of interest may require further investigation such that more traffic will be generated. However, the restricted routing of a ZigBee cluster-tree network may not be able to provide sufficient bandwidth for the increased traffic load, so the additional information may not be delivered successfully. In this paper, we present an adoptive-parent-based framework for a ZigBee cluster-tree network to increase bandwidth utilization without generating any extra message exchange. To optimize the throughput in the framework, we model the process as a vertex-constraint maximum flow problem, and develop a distributed algorithm that is fully compatible with the ZigBee standard. The optimality and convergence property of the algorithm are proved theoretically. Finally, the results of simulation experiments demonstrate the significant performance improvement achieved by the proposed framework and algorithm over existing approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic Fractional Resource Scheduling versus Batch Scheduling

    Publication Year: 2012 , Page(s): 521 - 529
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (648 KB) |  | HTML iconHTML  

    We propose a novel job scheduling approach for homogeneous cluster computing platforms. Its key feature is the use of virtual machine technology to share fractional node resources in a precise and controlled manner. Other VM-based scheduling approaches have focused primarily on technical issues or extensions to existing batch scheduling systems, while we take a more aggressive approach and seek to find heuristics that maximize an objective metric correlated with job performance. We derive absolute performance bounds and develop algorithms for the online nonclairvoyant version of our scheduling problem. We further evaluate these algorithms in simulation against both synthetic and real-world HPC workloads and compare our algorithms to standard batch scheduling approaches. We find that our approach improves over batch scheduling by orders of magnitude in terms of job stretch, while leading to comparable or better resource utilization. Our results demonstrate that virtualization technology coupled with lightweight online scheduling strategies can afford dramatic improvements in performance for executing HPC workloads. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-Efficient Scheduling of Periodic Real-Time Tasks on Lightly Loaded Multicore Processors

    Publication Year: 2012 , Page(s): 530 - 537
    Cited by:  Papers (3)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (775 KB) |  | HTML iconHTML  

    For lightly loaded multicore processors that contain more processing cores than running tasks and have dynamic voltage and frequency scaling capability, we address the energy-efficient scheduling of periodic real-time tasks. First, we introduce two energy-saving techniques for the lightly loaded multicore processors: exploiting overabundant cores for executing a task in parallel with a lower frequency and turning off power of rarely used cores. Next, we verify that if the two introduced techniques are supported, then the problem of minimizing energy consumption of real-time tasks while meeting their deadlines is NP-hard on a lightly loaded multicore processor. Finally, we propose a polynomial-time scheduling scheme that provides a near minimum-energy feasible schedule. The difference of energy consumption between the provided schedule and the minimum-energy schedule is limited. The scheme saves up to 64 percent of the processing core energy consumed by the previous scheme that executes each task on a separate core. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Equivalent Disk Allocations

    Publication Year: 2012 , Page(s): 538 - 546
    Cited by:  Papers (1)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (922 KB) |  | HTML iconHTML  

    Declustering techniques reduce query response times through parallel I/O by distributing data among multiple devices. Except for a few cases, it is not possible to find declustering schemes that are optimal for all spatial range queries. As a result of this, most of the research on declustering have focused on finding schemes with low worst case additive error. Number-theoretic declustering techniques provide low additive error and high threshold. In this paper, we investigate equivalent disk allocations and focus on number-theoretic declustering. Most of the number-theoretic disk allocations are equivalent and provide the same additive error and threshold. Investigation of equivalent allocations simplifies schemes to find allocations with desirable properties. By keeping one of the equivalent disk allocations, we can reduce the complexity of searching for good disk allocations under various criteria such as additive error and threshold. Using proposed scheme, we were able to collect the most extensive experimental results on additive error and threshold in 2, 3, and 4 dimensions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting Jamming-Caused Neighbor Changes for Jammer Localization

    Publication Year: 2012 , Page(s): 547 - 555
    Cited by:  Papers (5)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (827 KB) |  | HTML iconHTML  

    Jamming attacks are especially harmful when ensuring the dependability of wireless communication. Finding the position of a jammer will enable the network to actively exploit a wide range of defense strategies. In this paper, we focus on developing mechanisms to localize a jammer by exploiting neighbor changes. We first conduct jamming effect analysis to examine how the communication range alters with the jammer's location and transmission power using free-space model. Then, we show that a node's affected communication range can be estimated purely by examining its neighbor changes caused by jamming attacks and thus, we can perform the jammer location estimation by solving a least-squares (LSQ) problem that exploits the changes of communication range. Compared with our previous iterative-search-based virtual force algorithm, our LSQ-based algorithm exhibits lower computational cost (i.e., one step instead of iterative searches) and higher localization accuracy. Furthermore, we analyze the localization challenges in real systems by building the log-normal shadowing model empirically and devising an adaptive LSQ-based algorithm to address those challenges. The extensive evaluation shows that the adaptive LSQ-based algorithm can effectively estimate the location of the jammer even in a highly complex propagation environment. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving End-to-End Routing Performance of Greedy Forwarding in Sensor Networks

    Publication Year: 2012 , Page(s): 556 - 563
    Cited by:  Papers (4)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (490 KB) |  | HTML iconHTML  

    Greedy forwarding is a simple yet efficient technique employed by many routing protocols. It is ideal to realize point-to-point routing in wireless sensor networks because packets can be delivered by only maintaining a small set of neighbors' information regardless of network size. It has been successfully employed by geographic routing, which assumes that a packet can be moved closer to the destination in the network topology if it is forwarded geographically closer to the destination in the physical space. This assumption, however, may lead packets to the local minimum where no neighbors of the sender are closer to the destination or low-quality routes that comprise long distance hops of low packet reception ratio. To address the local minimum problem, we propose a topology aware routing (TAR) protocol that efficiently encodes a network topology into a low-dimensional virtual coordinate space where hop distances between pairwise nodes are preserved. Based on precise hop distance comparison, TAR can assist greedy forwarding to find the right neighbor that is one hop closer to the destination and achieve high success ratio of packet delivery without location information. Further, we improve the routing quality by embedding a network topology based on the metric of expected transmission count (ETX). ETX embedding accurately encodes both a network's topological structure and channel quality to nodes' small size virtual coordinates, which helps greedy forwarding to guide a packet along the optimal path that has the fewest number of transmissions. We evaluate our approaches through both simulations and experiments, showing that routing performance are improved in terms of routing success ratio and routing cost. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimally Maximizing Iteration-Level Loop Parallelism

    Publication Year: 2012 , Page(s): 564 - 572
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1214 KB)  

    Loops are the main source of parallelism in many applications. This paper solves the open problem of extracting the maximal number of iterations from a loop to run parallel on chip multiprocessors. Our algorithm solves it optimally by migrating the weights of parallelism-inhibiting dependences on dependence cycles in two phases. First, we model dependence migration with retiming and formulate this classic loop parallelization into a graph optimization problem, i.e., one of finding retiming values for its nodes so that the minimum nonzero edge weight in the graph is maximized. We present our algorithm in three stages with each being built incrementally on the preceding one. Second, the optimal code for a loop is generated from the retimed graph of the loop found in the first phase. We demonstrate the effectiveness of our optimal algorithm by comparing with a number of representative nonoptimal algorithms using a set of benchmarks frequently used in prior work and a set of graphs generated by TGFF. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Advertisement - Computer Magazine Now Available in Digital Format

    Publication Year: 2012 , Page(s): 573
    Save to Project icon | Request Permissions | PDF file iconPDF (820 KB)  
    Freely Available from IEEE
  • Take the CS Library wherever you go! [advertisement]

    Publication Year: 2012 , Page(s): 574
    Save to Project icon | Request Permissions | PDF file iconPDF (357 KB)  
    Freely Available from IEEE
  • IEEE Computer Society OnlinePlus Video Tutorial

    Publication Year: 2012 , Page(s): 575
    Save to Project icon | Request Permissions | PDF file iconPDF (344 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology