Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 7 • Date July 1994

Filter Results

Displaying Results 1 - 11 of 11
  • Fault-tolerant algorithms for fair interprocess synchronization

    Publication Year: 1994 , Page(s): 737 - 748
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1407 KB)  

    The implementation of nondeterministic pairwise synchronous communication among a set of asynchronous processes is modeled as the binary interaction problem. The paper describes an algorithm for this problem that satisfies a strong fairness property that guarantees freedom from process starvation. This is the first algorithm for binary interactions with strong fairness whose message cost and response time are independent of the total number of processes in the system. The paper also describes how the fair algorithm may be extended to tolerate detectable fail-stop failures. Finally, we show how any solution to the dining philosophers problem can be embedded to design a fair algorithm for binary interactions. In particular, this embedding is used to derive a fair algorithm that can cope with undetectable fail-stop failures.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A framework for mapping periodic real-time applications on multicomputers

    Publication Year: 1994 , Page(s): 778 - 784
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (644 KB)  

    This short paper presents a framework for periodic execution of task-flow graphs that enables schedulability analysis of the communication requirements. The analysis performs the steps of segmenting messages, assigning the segments to specific links and time intervals, and ordering them within the intervals to generate node switching schedules that provide contention-free message routing at run-time. The analysis is also used to integrate task allocation with message routing using a contention-based objective function. Usefulness of the proposed scheme in ensuring guaranteed communication performance is demonstrated by an appropriate example View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spoken language recognition on a DSP array processor

    Publication Year: 1994 , Page(s): 697 - 703
    Cited by:  Papers (1)  |  Patents (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (576 KB)  

    A new architecture is presented to support the general class of real-time large-vocabulary speaker-independent continuous speech recognizers incorporating language models. Many such recognizers require multiple high-performance central processing units (CPU's) as well as high interprocessor communication bandwidth. This array processor provides a peak CPU performance of 2.56 giga-floating point operations per second (GFLOPS) as well as a high-speed communication network. In order to efficiently utilize these resources, algorithms were devised for partitioning speech models for mapping into the array processor. Also, a novel scheme is presented for a functional partitioning of the speech recognizer computations. The recognizer is functionally partitioned into six stages, namely, the linear predictive coding (LPC) based feature extractor, mixture probability computer, (phone) state probability computer, word probability computer, phrase probability computer, and traceback computer. Each of these stages is further subdivided as many times as necessary to fit the individual processing elements (PE's). The functional stages are pipelined and synchronized with the frame rate of the incoming speech signal. This partitioning also allows a multistage stack decoder to be implemented for reduction of computation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and evaluation of effective load sharing in distributed real-time systems

    Publication Year: 1994 , Page(s): 704 - 719
    Cited by:  Papers (20)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1472 KB)  

    In a distributed real-time system, uneven task arrivals temporarily overload some nodes and leave others idle or underloaded. Consequently, some tasks may miss their deadlines even if the overall system has the capacity to meet the deadlines of all tasks. An effective load-sharing (LS) scheme is proposed as a solution to this problem. Upon arrival of a task at a node, the node determines whether the node can complete the task in time under the minimum-laxity first-served policy. If the task cannot be guaranteed, or if guarantees of some other tasks are to be violated as a result of the addition of this task to the existing schedule, the node looks up the list of loss-minimizing decisions and determines the best node among a set of nodes in its physical proximity, called its buddy set, to which the task(s) may be transferred. This list of decisions is periodically updated using Bayesian decision analysis and prior/posterior state distributions. These probability distributions are derived from the information collected via time-stamped state-region change broadcasts within each buddy set. By characterizing the inconsistency between a node's “observed” state and the corresponding true state with prior and posterior distributions, the node can first estimate the states of other nodes, and then use them to reduce the probability of transferring a task to an “incapable”) node. Moreover, the use of prior and posterior distributions and Bayesian analysis has made the proposed scheme robust to the variation of design parameters that usually require fine-tuning for adaptive LS. The performance of the proposed scheme is evaluated via simulation, along with five other schemes: no LS, LS with state probing, LS with random selection, LS with focused addressing, and perfect LS View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reading many variables in one atomic operation: solutions with linear or sublinear complexity

    Publication Year: 1994 , Page(s): 688 - 696
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (960 KB)  

    We address the problem of reading several variables (components) X 1,...,Xc, all in one atomic operation, by only one process, called the reader, while each of these variables are being written by a set of writers. All operations (i.e., both reads and writes) are assumed to be totally asynchronous and wait-free. For this problem, only algorithms that require at best quadratic time and space complexity can be derived from the existing literature. (The time complexity of a construction is the number of suboperations of a high-level operation and its space complexity is the number of atomic shared variables it needs) In this paper, we provide a deterministic protocol that has linear (in the number of processes) space complexity, linear time complexity for a read operation, and constant time complexity for a write. Our solution does not make use of time-stamps. Rather, it is the memory location where a write writes that differentiates it from the other writes. Also, introducing randomness in the location where the reader gets the value that it returns, we get a conceptually very simple probabilistic algorithm. This algorithm has an overwhelmingly small, controllable probability of error. Its space complexity, and also the time complexity of a read operation, are sublinear. The time complexity of a write is constant. On the other hand, under the Archimedean time assumption, we get a protocol whose time and space complexity do not depend on the number of writers, but are linear in the number of components only. (The time complexity of a write operation is still constant.) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A systolic-based parallel bin packing algorithm

    Publication Year: 1994 , Page(s): 769 - 772
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (412 KB)  

    A systolic based parallel approximation algorithm that obtains solutions to the I-D bin packing problem is presented. The algorithm has an asymptotic error bound of 1.5 and time complexity O(n). An experimental study demonstrates that the heuristic offers improved packing and execution performance over parallelizations of two well-known serial algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fast selection algorithm for meshes with multiple broadcasting

    Publication Year: 1994 , Page(s): 772 - 778
    Cited by:  Papers (16)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (660 KB)  

    One of the fundamental algorithmic problems in computer science involves selecting the kth smallest element in a collection A of n elements. We propose an algorithm design methodology to solve the selection problem on meshes with multiple broadcasting. Our methodology leads to a selection algorithm that runs in O(n1/8(log n)3/4)) time on a mesh with multiple broadcasting of size n 3/8(log n)1/4×n5/8/(log n)1/4. This result is optimal over a large class of selection algorithms. Our result shows that just as for semigroup computations, selection can be done faster on suitably chosen rectangular meshes than on square meshes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hierarchical compilation of macro dataflow graphs for multiprocessors with local memory

    Publication Year: 1994 , Page(s): 720 - 736
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1612 KB)  

    This paper presents a hierarchical approach for compiling macro dataflow graphs for multiprocessors with local memory. Macro dataflow graphs comprise several nodes (or macro operations) that must be executed subject to prespecified precedence constraints. Programs consisting of multiple nested loops, where the precedence constraints between the loops are known, can be viewed as macro dataflow graphs. The hierarchical compilation approach comprises a processor allocation phase followed by a partitioning phase. In the processor allocation phase, using estimated speedup functions for the macro nodes, computationally efficient techniques establish the sequencing and parallelism of macro operations for close-to-optimal run-times. The second phase partitions the computations in each macro node to maximize communication locality for the level of parallelism determined by the processor allocation phase. The same approach can also be used for programs consisting of multiple loop nests, when each of the nested loops can be characterized by a speedup function. These ideas have been implemented in a prototype structure-driven compiler, SDC, for expressions of matrix operations. The paper presents the performance of the compiler for several matrix expressions on a simulator of the Alewife multiprocessor View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Communication aspects of the star graph interconnection network

    Publication Year: 1994 , Page(s): 678 - 687
    Cited by:  Papers (13)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (888 KB)  

    Basic communication algorithms for star graph interconnection networks are developed by using the hierarchical properties of the star graph, with the assumption that one input channel can drive only one output communication channel at a time. With this constraint, communication algorithms for each node can be expressed only as sequences of generators corresponding to the communication channels. Sequences that are identical exploit the symmetry and hierarchical properties of the star graph and can be easily integrated in communication hardware. Their time complexities are evaluated and compared with the corresponding results for the hypercube View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal VLSI networks for multidimensional transforms

    Publication Year: 1994 , Page(s): 763 - 769
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (704 KB)  

    This paper presents a new class of AT2-optimal networks for computing the multidimensional discrete Fourier transform. Although optimal networks have been proposed previously, the networks proposed in this paper are based on a new methodology for mapping large K-shuffle networks, K⩾2, onto smaller area networks that maintain the optimality of the DFT network. Such networks are used to perform the index-rotation operations needed by the multidimensional computation. The resulting networks have simple regular layouts, and can be easily partitioned among several chips in order to reduce the number of input-output pins per chip View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Shared memory multimicroprocessor operating system with an extended Petri net model

    Publication Year: 1994 , Page(s): 749 - 762
    Cited by:  Papers (1)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1200 KB)  

    We propose a methodology for programming multiprocessor event-driven systems. This methodology is based on two programming levels: the task level, which involves programming the basic actions that may be executed in the system as units with a single control thread; and the job level, on which parallel programs to be executed by the complete multiprocessor system are developed. We also present the structure and implementation of an operating system designed as the programming support for software development under the proposed methodology. The model that has been chosen for the representation of the system software is based on an extended Petri net, which provides a well-established conceptual model for the development of the tasks, thus allowing a totally independent and generic development. This model also facilitates job-level programming, since the Petri net is a very powerful description tool for the parallel program View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology