By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 12 • Date Dec 1995

Filter Results

Displaying Results 1 - 9 of 9
  • Processor mapping techniques toward efficient data redistribution

    Page(s): 1234 - 1247
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1520 KB)  

    Run-time data redistribution can enhance algorithm performance in distributed-memory machines. Explicit redistribution of data can be performed between algorithm phases when a different data decomposition is expected to deliver increased performance for a subsequent phase of computation. Redistribution, however, represents increased program overhead as algorithm computation is discontinued while data are exchanged among processor memories. In this paper, we present a technique that minimizes the amount of data exchange for BLOCK to CYCLIC(c) (or vice-versa) redistributions of arbitrary number of dimensions. Preserving the semantics of the target (destination) distribution pattern, the technique manipulates the data to logical processor mapping of the target pattern. When implemented on an IBM SP, the mapping technique demonstrates redistribution performance improvements of approximately 40% over traditional data to processor mapping. Relative to the traditional mapping technique, the proposed method affords greater flexibility in specifying precisely which data elements are redistributed and which elements remain on-processor View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sorting n2 numbers on n×n meshes

    Page(s): 1221 - 1225
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (508 KB)  

    We show that by folding data from an n×n mesh onto an n×(n/k) submesh, sorting on the submesh, and finally unfolding back onto the entire n×n mesh it is possible to sort on bidirectional and strict unidirectional meshes using a number of routing steps that is very close to the distance lower bound for these architectures View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Generalized algorithms for systematic synthesis of Branch-and-Combine clock networks for meshes, tori, and hypercubes

    Page(s): 1283 - 1300
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1460 KB)  

    Branch-and-Combine (BaC) clock distribution has recently been introduced. The most interesting aspect of the new scheme is its ability to bound skew by a constant irrespective of network size. In this paper, we introduce algorithms for systematic synthesis of BaC networks for clocking meshes, tori, and hypercubes of different dimensionalities. For meshes our approach relies on filing techniques. We start with the identification of basic proper tiles satisfying certain criteria. We define a set of valid transformations on tiles. By appropriately applying a sequence of transformations on a basic proper tile, one could synthesize a valid BaC network. We formally introduce methods and procedures for applying the above steps to systematically construct different valid BaC network designs for 2D and 3D meshes. To construct BaC networks for clocking hypercubes of any dimensionality we describe a formal methodology. In this case, we utilize an approach called replication which is based on constructing larger hypercube clocking networks from smaller ones. We combine the techniques for 2D, 3D meshes with replication techniques to formulate a methodology applicable to meshes and tori of dimensionality greater than three. We provide proofs of correctness for the algorithms we introduce. Besides, we formally define an optimality criterion based on link costs which is utilized to check the optimality of the synthesized network designs. In the case of meshes, we show that the majority of synthesized networks are optimal with respect to our defined criterion. For those suboptimal networks, we describe a procedure for identifying and removing unnecessary (redundant) links. The procedure is guaranteed to optimize the network without changing its behavioral parameters View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic task allocation models for large distributed computing systems

    Page(s): 1301 - 1315
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1900 KB)  

    Dynamic task allocation for distributed computing systems (DCS) is an important goal to be achieved for engineering applications. The purpose of dynamic task allocation is to increase the system throughput in a dynamic environment, which can be done by balancing the utilization of computing resources and minimizing communication between processors during run time. In this paper, we propose two dynamic task allocation models which are: 1) the clustering simulated annealing model (CSAM); and 2) the mean field annealing model (MFAM). Both of these models combine characteristics of statistical and deterministic approaches. These models provide the rapid convergence characteristic of the deterministic approaches while preserving the solution quality afforded by simulated annealing. Simulation results of the CSAM and MFAM provide a stable and balanced system with 50% and 10% of the convergence time needed by simulated annealing, respectively. The results of this research are important in that it presents the feasibility of applying statistically based task allocation models on large DCSs in a dynamic environment. Solutions of these models depend on the annealing process instead of the structures of the input data, providing the possibility of obtaining better solutions by using more efficient computing hardware View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fault-tolerant algorithm for replicated data management

    Page(s): 1271 - 1282
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1392 KB)  

    We examine the tradeoff between message overhead and data availability that arises in the design of fault-tolerant algorithms for replicated data management in distributed systems. We propose a property called asymptotically high resiliency which is useful for evaluating the fault-tolerance of replica control algorithms and distributed mutual exclusion algorithms. We present a new algorithm for replica control that can be tailored (through a design parameter) to achieve the desired balance between low message overhead and high data availability. Further, we show that for a message overhead of O(√(Nlog N)), our algorithm can achieve asymptotically high resiliency View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A parallel simulated annealing algorithm with low communication overhead

    Page(s): 1226 - 1233
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (796 KB)  

    In this paper, we propose a parallel simulated annealing algorithm based on the technique presented by Witte et al. (1991) but with low communication overhead. The performance of our proposed algorithm is significantly better than the method presented by Witte et al., particularly for optimization problems where the time required to communicate the solution is comparable to the evaluation time. The efficiency of the technique is demonstrated using two case studies with good results View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Resource-constrained software pipelining

    Page(s): 1248 - 1270
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2480 KB)  

    This paper presents a software pipelining algorithm for the automatic extraction of fine-grain parallelism in general loops. The algorithm accounts for machine resource constraints in a way that smoothly integrates the management of resource constraints with software pipelining. Furthermore, generality in the software pipelining algorithm is not sacrificed to handle resource constraints, and scheduling choices are made with truly global information. Proofs of correctness and the results of experiments with an implementation are also presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Synchronous bandwidth allocation in FDDI networks

    Page(s): 1332 - 1338
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (812 KB)  

    It is well known that an FDDI token ring network provides a guaranteed throughput for synchronous messages and a bounded medium access delay for each node/station. However, this fact alone cannot effectively support many real-time applications that require the timely delivery of each critical message. The reason for this is that the FDDI guarantees a medium access delay bound to nodes, but not to messages themselves. The message-delivery delays may exceed the medium-access delay bound even if a node transmits synchronous messages at a rate not greater than the guaranteed throughput. We solve this problem by developing a synchronous bandwidth allocation (SEA) scheme which calculates the synchronous bandwidth necessary for each application to satisfy its message-delivery delay requirement. The result obtained in this paper is essential for effective use of the FDDI token ring networks in supporting such real-time communication as digital video/audio transmissions, and distributed control/monitoring View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comparative modeling and evaluation of CC-NUMA and COMA on hierarchical ring architectures

    Page(s): 1316 - 1331
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1448 KB)  

    Parallel computing performance on scalable shared-memory architectures is affected by the structure of the interconnection networks linking processors to memory modules and on the efficiency of the memory/cache management systems. Cache Coherence Nonuniform Memory Access (CC-NUMA) and Cache Only Memory Access (COMA) are two effective memory systems, and the hierarchical ring structure is an efficient interconnection network in hardware. This paper focuses on comparative performance modeling and evaluation of CC-NUMA and COMA on a hierarchical ring shared-memory architecture. Analytical models for the two memory systems for comparative evaluation are presented. Intensive performance measurements on data migrations have been conducted on the KSR-1, a COMA hierarchical ring shared-memory machine. Experimental results support the analytical models, and we present practical observations and comparisons of the two cache coherence memory systems. Our analytical and experimental results show that a COMA system balances the work load well. However the overhead of frequent data movement may match the gains obtained from improving load balance. We believe our performance results could be further generalized to the two memory systems on a hierarchical network architecture. Although a CC-NUMA system may not automatically balance the load at the system level, it provides an option for a user to explicitly handle data locality for a possible performance improvement View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology