By Topic

Parallel and Distributed Processing, 1991. Proceedings of the Third IEEE Symposium on

Date 2-5 Dec. 1991

Filter Results

Displaying Results 1 - 25 of 119
  • Data communication and computational geometry on the star and pancake interconnection networks

    Page(s): 415 - 422
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (698 KB)  

    The star and pancake networks were recently proposed as attractive alternatives to the hypercube topology for interconnecting processors in a parallel computer. However, little has been done to design parallel algorithms on these networks. The paper presents several data communication algorithms that are fundamental to designing algorithms on these two networks. These algorithms are then used to develop parallel solutions to various computational geometric problems on both networks. Computational geometry is just one area where the data communication algorithms proposed can be applied. It is believed that these algorithms are interesting and important in their own right, and are basic to the design of solutions on the star and pancake networks to a host of other problems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing (Cat. No.91TH0396-2)

    Save to Project icon | Request Permissions | PDF file iconPDF (35 KB)  
    Freely Available from IEEE
  • Parallel algorithms for information dissemination by packets

    Page(s): 274 - 281
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (520 KB)  

    Each vertex of an undirected graph possesses a piece of information which must be sent to every other vertex. The method of communication is to send bounded size packets of messages from one vertex to another. We describe parallel algorithms to accomplish the desired tasks for five prominent architectures. The algorithms are optimal, or nearly so, in every case View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Object oriented Fortran for development of portable parallel programs

    Page(s): 608 - 615
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (572 KB)  

    Parallel programming has to date remained inaccessible to the average scientific programmer. Parallel programming languages are generally foreign to most scientific applications programmers who only speak Fortran. Automatic parallelization techniques have so far proved unsuccessful in extracting large amounts of parallelism from sequential codes and do not encourage development of new, inherently parallel algorithms. In addition, there is a lack of consistency of programmer interface across architectures which requires programmers to invest a lot of effort in porting code from one parallel machine to another. This paper discusses the object oriented Fortran language and support routines developed at Mississippi State in support of parallelizing complex field simulations. This interface is based on Fortran to ease its acceptance by scientific programmers and is implemented on top of the Unix operating system for portability View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Synthesis of parallel Ada code from a knowledge base of rules

    Page(s): 600 - 607
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (688 KB)  

    A synergistic approach utilizing compilation, compaction, and parallelization is described to achieve real-time computing throughput from rule-based expert systems. The methodology involves synthesizing a set of concurrently executable Ada tasks from a knowledge base of rules. Compaction of code size is accomplished by eliminating the overhead associated with inference engine control constructs not utilized by a particular knowledge base. Heuristics are used to customize the generated Ada code for optimum performance gains given the characteristics of the source knowledge base and the target processor. The effectiveness of this approach depends on both the characteristics of the knowledge base and the efficiency of the Ada compiler's task invocation mechanism. A prototype compilation system based on this multifaceted approach has demonstrated speedups in excess of 100× for certain knowledge bases, as well as additional benefits in terms of increased embeddability and maintainability of the knowledge base View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance evaluation of a local area network for real-time applications

    Page(s): 504 - 512
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (816 KB)  

    This paper presents the performance (upper bound for the transmission waiting time, capacity, expected message delay and loss) of a controlled access protocol for real time LANs. The protocol guarantees mutual exclusion for the access to the transmission medium, upper bound for the transmission waiting time and priority management. The expected message delay, obtained by simulations, is compared to the delays of some of the best or most widely used protocols: token ring, token bus, MiniSlotted Alternating Priorities, and to M/D/1 which represents an upper bound on performance. End to end response time has also been evaluated for a case study to point out the influence of scheduling, communication architecture and application software, and to obtain data for the design of real-time systems based on the examined LAN. For each LAN node, hardware architectures based on one or two processors have been considered and the main differences with reference to their performance have been pointed out View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stabilizing Petri nets

    Page(s): 352 - 356
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (392 KB)  

    A fundamental criterion of a robust distributed system is its ability to recover from perturbations that can possibly corrupt the state of the system. In a Petri net model, system perturbations may affect the marking of the net in an unpredictable manner. The paper shows that for certain classes of nets, it is possible to devise a self-stabilizing extension, so that regardless of the initial marking, the system automatically restores its liveness and safety properties View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance enhancement of multistage interconnection network with nonuniform traffic

    Page(s): 796 - 804
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (520 KB)  

    Nonuniform traffic in multistage interconnection network (MIN) degrades the performance of MIN. It is more significant when a tree saturation occurs due to hot spot contention. The authors reveal that the number of processors generating hot requests is more influential to the network performance than hot request rate. To effectively reflect this characteristic in the MIN design, they propose to detect early and block the potential hot spot contention in each switch node, while allowing uniform traffic to proceed to the succeeding stages. Thus their design can be efficient for both uniform and nonuniform traffic. The processor-memory bandwidth of their scheme was found to be consistently better than earlier designs for any traffic condition, without a significant implementation overhead View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An analytical model for predicting performance in a heterogeneous system

    Page(s): 334 - 341
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (624 KB)  

    Presents an analytical model for predicting the response time of tasks in a heterogeneous system with load sharing. The model is general and can be used to evaluate the performance of any algorithm and any system configuration as long as task transfer decisions are based on the load levels of the sending and receiving processors. The methodology is applied to three load sharing algorithms in a system comprised of processors with different speeds View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SIMD-emulations of hypercubes and related networks

    Page(s): 656 - 659
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (300 KB)  

    This paper addresses a problem in emulation theory. The author shows how processor-array networks with simple topologies can efficiently emulate the computations of complex topologies. This is possible by trading off parallelism for time. Such emulations are advantageous since processor-array networks of simple topologies are cost-effective to build on a large-scale. The challenge is to perform these emulations optimally, without the loss of too much parallelism. The author presents emulations of generic computations programmed in a SIMD fashion which are all optimal (up to constant factors). Specifically, they present emulations of the order-n cube-connected-cycles network and the order-n shuffle-exchange network by an n-node ring-connected processor array. They also present an emulation of an order-n hypercube network by an n/log n-node linear processor array View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Broadcast ring sandwich networks

    Page(s): 788 - 795
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (512 KB)  

    The authors describe the analysis and constructive design of a new class of rearrangeable broadcast networks called ring sandwich networks. They present analytical results which permit the rearrangeability of ring sandwich networks to be evaluated on the basis of fundamental parameters associated with the ring sandwich structure so that the trade-off between the network rearrangeability and the network cost can be determined. These results permit ring sandwich broadcast networks to be designed for which the average number of rearrangements (i.e., disturbed connections) in making a connection can be reduced to a small constant. Moreover, this is accomplished with less overall circuitry that must actually perform the broadcast function than other comparable designs. Ring sandwich networks are highly attractive for providing broadcast capability in parallel and distributed computing environments wherein a limited degree of rearrangeability can be tolerated View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Duplicating keys to streamline sorting on a mesh-connected computer

    Page(s): 296 - 300
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (448 KB)  

    Introduces a model of a mesh-connected computer in which multiple-key packets can be exchanged between processors in single routing steps. The author develops a sorting algorithm for such an enhanced model, which has time performance better than optimal algorithms in the traditional model. The technical contributions of this paper are as follows. The first is a new 5n-o(n ) lower bound for sorting in a row-major ordering. This improves the best previously known bound of 4n-4. The second contribution is a new sorting algorithm utilizing multiple-packet routing capabilities. The time complexity of this algorithm, measured in the number of concurrent routing steps, can be arbitrarily close to the absolute distance lower bound of 2n-2, provided the number of keys that fit in one routing packet is sufficiently large View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Circuit partitioning using parallel mean field annealing algorithms

    Page(s): 534 - 541
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (584 KB)  

    Mean field annealing (MFA) algorithm, recently proposed for solving combinatorial optimization problems, combines the characteristics of neural networks and simulated annealing. Previous works on MFA resulted with successful mapping of the algorithm to some classic optimization problems such as travelling salesman problem and graph partitioning problem. In this paper, MFA is formulated for circuit partitioning problem (CPP) by using both graph and network models. Initial results of the implementations show that MFA can be used as an efficient alternative heuristic for CPP. MFA algorithms proposed for solving CPP are parallelized and implemented on an iPSC/2 hypercube multicomputer. Experimental results show that the proposed heuristics can be efficiently parallelized on hypercube multicomputers, which is crucial for algorithms solving such computationally hard problems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiprogramming on multiprocessors

    Page(s): 590 - 597
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (620 KB)  

    Many solutions have been proposed to the problem of multiprogramming a multiprocessor. However, each has limited applicability or fails to address an important source of overhead. In addition, there has been little experimental comparison of the various solutions in the presence of applications with varying degrees of parallelism and synchronization. The authors explore the tradeoffs between three different approaches to multiprogramming a multiprocessor: time-slicing, coscheduling, and dynamic hardware partitions. They implemented applications that vary in the degree of parallelism, and the frequency and type of synchronization. They show that in most cases coscheduling is preferable to time-slicing. They also show that although there are cases where coscheduling is beneficial, dynamic hardware partitions do no worse, and will often do better. They conclude that under most circumstances, hardware partitioning is the best strategy for multiprogramming a multiprocessor, no matter how much parallelism applications employ or how frequently synchronization occurs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Prototyping parallel and distributed programs in Proteus

    Page(s): 26 - 34
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (768 KB)  

    This paper presents Proteus, an architecture-independent language suitable for prototyping parallel and distributed programs. Proteus is a high-level imperative notation based on sets and sequences with a single construct for the parallel composition of processes. Although a shared-memory model is the basis for communication between processes, this memory can be partitioned into shared and private variables. Parallel processes operate on individual copies of private variables, which are independently updated and may be merged into the shared state at specifiable barrier synchronization points. Several examples are given to illustrate how the various parallel programming models, such as synchronous data-parallelism and asynchronous control-parallelism, can be expressed in terms of this foundation. This common foundation allows prototypes to be tested, evolved and finally implemented through refinement techniques targeting specific architectures View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On a new approach for enhancing the fault coverage of conformance testing of protocols

    Page(s): 428 - 435
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (536 KB)  

    Proposes a new approach for enhancing the fault coverage of protocol testing using unique input/output (UIO) sequences. UIO sequences can be efficiently employed in checking the conformance specifications of protocols by transition testing and an optimization process based on the rural Chinese postman tour algorithm. The proposed approach is based on a new set of conditions for UIO sequence generation, namely singularity, i.e. uniqueness between UIO sequences in the traversal of the FSM and their ability to avoid masking and undetection of faults View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Off-line routing with small queues on a mesh-connected processor array

    Page(s): 301 - 304
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (300 KB)  

    Presents efficient and simple algorithm for offline routing that uses very small queues. Our main result is that there exists an off-line algorithm for permutation routing on the n×n mesh-connected processor array, that takes 2.2 n+5 steps, and uses queues of size not more than 14. The algorithm uses novel and interesting techniques, and the bound on the queue size is smaller than those of known algorithms with the same time complexity View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reliability oriented allocation of files on distributed systems

    Page(s): 886 - 893
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (560 KB)  

    One of the important features of distributed computing systems (DCSs) is the potential of high reliability. When the hardware configuration of a DCS is fixed, the system reliability mainly depends on the allocation of various resources. One of the important resources used in a DCS are various files. The authors have developed a reliability oriented file allocation scheme for distributed systems. In this scheme various files are allocated to different nodes of a DCS such that the reliability of executing a program which requires files from remote nodes(s) is maximized. Several variations of this problem are solved to illustrate the genetic algorithm based solution approach. The paper also provides the relation between degree of redundancy of files and the maximum achievable reliability of executing a program View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Caching and writeback policies in parallel file systems

    Page(s): 60 - 67
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (676 KB)  

    Improvements in the processing speed of multiprocessors are outpacing improvements in the speed of disk hardware. Parallel disk I/O subsystems have been proposed as one way to close the gap between processor and disk speeds. Such parallel disk systems require parallel file system software to avoid performance-limiting bottlenecks. The authors discuss cache management techniques that can be used in a parallel file system implementation. The authors examine several writeback policies, and give results of experiments that test their performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Explicit construction for reliable reconfigurable array architectures

    Page(s): 640 - 647
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (604 KB)  

    This paper describes some explicit constructions for reconfigurable array architectures. Given a working architecture (application graph), the authors add redundant hardware to increase reliability. The degree of reconfigurability, DR, of a redundant graph is a measure of the cost of reconfiguration after failures. When DR is independent of the size of the application graph, the authors say the graph is finitely reconfigurable, FR. They present a class of simple layered graphs with a logarithmic number of redundant edges, which can maintain both finite reconfigurability and a fixed level of reliability for a wide class of application graphs. By sacrificing finite reconfigurability, they show that by using expanders they can construct highly reliable structures with the asymptotically optimal number of edges for one-dimensional and tree-like array architectures View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new algorithm-based fault tolerance technique for computing matrix operations

    Page(s): 452 - 455
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (224 KB)  

    The paper proposes a new algorithm-based fault tolerance (ABFT) technique for computing matrix operations. The scheme provides fault tolerant capability to linear or rectangular processor arrays so that all single PE faults can be tolerated. It also shows that the effect of implementation problems (overflow, round-off errors, and hardware overhead which includes encoding/decoding logic) on the proposed scheme is significantly less than that on other existing schemes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Three disjoint path paradigms in star networks

    Page(s): 400 - 406
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (612 KB)  

    Star networks have been recently proposed as an attractive choice for interconnection networks. They have sublogarithmic node degree and diameter and, like hypercubes, have a highly recursive structure. Several researchers have endeavored to prove that star networks are as versatile as hypercubes. The paper is an effort in the same direction. It presents optimal algorithms for computing disjoint paths in star graphs for two well known paradigms. It also studies the problem of disjoint connecting paths and presents an efficient algorithm for finding a limited number of such paths in the star network View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Super critical tree numbering and optimal tree ranking are in NC

    Page(s): 767 - 773
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (472 KB)  

    This paper places the optimal tree ranking problem in NC. A ranking is a labeling of the nodes with natural numbers such that if nodes u and v have the same label then there exists another node with a greater label on the path between them. An optimal ranking is a ranking in which the largest label assigned to any node is as small as possible among all rankings. An O(n) sequential algorithm is known. Researchers have speculated that the problem is P-complete. The authors show that for an n node tree, one can compute an optimal ranking in O(log2 n) time using n2/log n EREW PRAM processors. In fact, their ranking is super critical in that the label assigned to each node is absolutely as small as possible. They achieve their results by introducing and showing that a more general problem, which they call the super critical numbering problem, is in NC. No NC algorithm for the super critical tree ranking problem, approximate or otherwise, was previously known; the only known NC algorithm for optimal tree ranking was an approximate one View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An NC algorithm for the general planar monotone circuit value problem

    Page(s): 196 - 203
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (628 KB)  

    A planar monotone circuit (PMC) is a Boolean circuit that can be embedded in a plane and that has only AND and OR gates. Although a special case of the planar monotone circuit value problem (PMCVP) has been shown to be in NC2, it was not known whether the general PMCVP is in NC. In the paper, the author first gives an NC2 algorithm to evaluate a layered one-input-face PMC using straight-line code parallel evaluation technique. He then applies the algorithm to a less restrictive one-input-face PMC. Finally, he gives an NC3 algorithm for the general PMCVP View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An evaluation of concurrent priority queue algorithms

    Page(s): 518 - 525
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (624 KB)  

    This paper describes the design and experimental evaluation of two novel concurrent priority queue algorithms, a parallel Fibonacci heap and a concurrent priority pool, and compares them with the concurrent binary heap. The parallel Fibonacci heap is based on the sequential Fibonacci heap, which is theoretically the most efficient data structure for sequential priority queues. The concurrent priority pool is based on the concurrent B-tree and the concurrent pool. Both new algorithms scale better and have large throughput than the concurrent binary heap, as shown by the performance results. These performance advantages are achieved by relaxing the semantics of the extract-min operation, allowing it to return items that are close to the minimum but not necessarily minimal. Two applications of concurrent priority queues, the vertex cover problem and the single source shortest path problem, are tested View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.