By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 4 • Date Apr 1994

Filter Results

Displaying Results 1 - 9 of 9
  • Analysis of processor allocation in multiprogrammed, distributed-memory parallel processing systems

    Page(s): 401 - 420
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1712 KB)  

    A main objective of scheduling independent jobs composed of multiple sequential tasks in shared-memory and distributed-memory multiprocessor computer systems is the assignment of these tasks to processors in a manner that ensures efficient operation of the system. Achieving this objective requires the analysis of a fundamental tradeoff between maximizing parallel execution, suggesting that the tasks of a job be spread across all system processors, and minimizing synchronization and communication overheads, suggesting that the job's tasks be executed on a single processor. The authors consider a class of scheduling policies that represent the essential aspects of this processor allocation tradeoff, and model the system as a distributed fork-join queueing system. They derive an approximation for the expected job response time, which includes the important effects of various parallel processing overheads (such as task synchronization and communication) induced by the processor allocation policy View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal processor assignment for a class of pipelined computations

    Page(s): 439 - 445
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (684 KB)  

    The availability of large-scale multitasked parallel architectures introduces the following processor assignment problem. We are given a long sequence of data sets, each of which is to undergo processing by a collection of tasks whose intertask data dependencies form a series-parallel partial order. Each individual task is potentially parallelizable, with a known experimentally determined execution signature. Recognizing that data sets can be pipelined through the task structure, the problem is to find a “good” assignment of processors to tasks. Two objectives interest us: minimal response time per data set, given a throughput requirement, and maximal throughput, given a response time requirement. Our approach is to decompose a series-parallel task system into its essential “serial” and “parallel” components; our problem admits the independent solution and recomposition of each such component. We provide algorithms for the series analysis, and use an algorithm due to Krishnamurti and Ma for the parallel analysis. For a p processor system and a series-parallel precedence graph with n constituent tasks, we give a O(np2) algorithm that finds the optimal assignment (over a broad class of assignments) for the response time optimization problem; we find the assignment optimizing the constrained throughput in O(np2 log p) time. These techniques are applied to a task system in computer vision View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A pairwise substitutional fault tolerance technique for the cube-connected cycles architecture

    Page(s): 433 - 438
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (628 KB)  

    With all of the salient features of hypercubes, the cube-connected cycles (CCC) structure is an attractive parallel computation network suited for very large scale integration (VLSI) implementation because of its layout regularity. Unfortunately, the classical CCC structure tends to suffer from considerable performance degradation in the presence of faults. The authors deal with a fault-tolerant CCC structure obtained by incorporating a spare PE in each cycle and by adding extra links among PE's to realize dimensional substitutes for failed PE's in the immediate lower dimension. A unique feature of this design lies in that a faulty PE and its laterally connected PE are always replaced at the same time by their immediate vertical successor pair, achieving pairwise substitution to elegantly maintain the rigid full CCC structure after faulty PE's arise. The proposed structure improves reliability substantially without incurring large overhead in layout area. This design is compared with earlier fault-tolerant CCC designs in terms of normalized reliability, which takes area overhead into account. An extension to this fault-tolerant structure is also discussed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Allocating tree structured programs in a distributed system with uniform communication costs

    Page(s): 445 - 448
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (360 KB)  

    Studies the complexity of the problem of allocating m modules to n processors in a distributed system to minimize total communication and execution costs. When the communication graph is a tree, Bokhari has shown that the optimum allocation can be determined in O(mn2) time. Recently, this result has been generalized by Fernandez-Baca, who has proposed an allocation algorithm in O(mnk+1) when the communication graph is a partial k-tree. The author shows that in the case where communication costs are uniform, the module allocation problem can be solved in O(mn) time if the communication graph is a tree. This algorithm is asymptotically optimum View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved lower bounds on the reliability of hypercube architectures

    Page(s): 364 - 378
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1276 KB)  

    The hypercube topology, also known as the Boolean n-cube, has recently been used for multiprocessing systems. The paper considers two structural-reliability models, namely, terminal reliability (TR) and network reliability (NR), for the hypercube. Terminal (network) reliability is defined as the probability that there exists a working path connecting two (all) nodes. There are no known polynomial time algorithms for exact computation of TR or NR for the hypercube. Thus, lower-bound computation is a better alternative, because it is more efficient computationally, and the system will be at least as reliable as the bound. The paper presents algorithms to compute lower bounds on TR and NR for the hypercube considering node and/or link failures. These algorithms provide tighter bounds for both TR and NR than known results and run in time polynomial in the cube dimension n, specifically, within time O(n2) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using processor affinity in loop scheduling on shared-memory multiprocessors

    Page(s): 379 - 400
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1760 KB)  

    Loops are the single largest source of parallelism in many applications. One way to exploit this parallelism is to execute loop iterations in parallel on different processors. Previous approaches to loop scheduling attempted to achieve the minimum completion time by distributing the workload as evenly as possible while minimizing the number of synchronization operations required. The authors consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to nonlocal data. They show that traditional algorithms for loop scheduling, which ignore the location of data when assigning iterations to processors, incur a significant performance penalty on modern shared-memory multiprocessors. They propose a new loop scheduling algorithm that attempts to simultaneously balance the workload, minimize synchronization, and co-locate loop iterations with the necessary data. They compare the performance of this new algorithm to other known algorithms by using five representative kernel programs on a Silicon Graphics multiprocessor workstation, a BBN Butterfly, a Sequent Symmetry, and a KSR-1, and show that the new algorithm offers substantial performance improvements, up to a factor of 4 in some cases. The authors conclude that loop scheduling algorithms for shared-memory multiprocessors cannot afford to ignore the location of data, particularly in light of the increasing disparity between processor and memory speeds View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structuring fault-tolerant object systems for modularity in a distributed environment

    Page(s): 421 - 432
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1312 KB)  

    The object-oriented approach to system structuring has found widespread acceptance among designers and developers of robust computing systems. The authors propose a system structure for distributed programming systems that support persistent objects and describe how properties such as persistence and recoverability can be implemented. The proposed structure is modular, permitting easy exploitation of any distributed computing facilities provided by the underlying system. An existing system constructed according to the principles espoused here is examined to illustrate the practical utility of the proposed approach to system structuring View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Partitioning message patterns for bundled omega networks

    Page(s): 353 - 363
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1072 KB)  

    Considers a strategy for dealing with communication conflicts in omega networks. Specifically, the authors consider the problem of partitioning a set of conflicting messages into a minimum number of subsets, called rounds, each free of communication conflicts. In addition to standard omega networks, they consider this problem for a more general class of networks called bundled omega networks, where interconnection links in the network are replaced by bundles of wires. Although the partitioning problem has previously been considered in the literature, its computational complexity has remained open. The authors show that for a number of cases, the problem is NP-complete, but for certain special cases, it is solvable in polynomial time. In addition, they present a class of distributed, on-line heuristics for the problem. Finally, they give a lower bound of Ω(log N) on the performance ratio for one of these heuristics View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconfiguration with time division multiplexed MIN's for multiprocessor communications

    Page(s): 337 - 352
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1412 KB)  

    Time division multiplexed multistage interconnection networks (TDM-MIN's) are proposed for multiprocessor communications. Connections required by an application are partitioned into a number of subsets, called mappings, such that connections in each mapping can be established in an MIN without conflict. Switch settings for establishing connections in each mapping are determined and stored in shift registers. By repeatedly changing switch settings, connections in each mapping are established for a time slot in a round-robin fashion. Thus, all connections required by an application may be established in an MIN in a time division multiplexed way. TDM-MIN's can emulate a completely connected network using N time slots. It can also emulate regular networks such as rings, meshes, cube-connected-cycles (CCC), binary trees, and n-dimensional hypercubes using 2, 4, 3, 4, and n time slots, respectively. The problem of partitioning an arbitrary set of requests into a minimal number of mappings is NP-hard. Simple heuristic algorithms are presented and their performances are shown to be close to optimal. The flexibility of TDM-MIN's allows for the support of run-time requests through dynamic reconfigurations. The techniques are especially suitable for hybrid electro-optical systems with optical interconnects View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology