By Topic

Proceedings of 9th International Parallel Processing Symposium

25-28 April 1995

Filter Results

Displaying Results 1 - 25 of 122
  • Performance measurements of a concurrent production system architecture without global synchronization

    Publication Year: 1995, Page(s):790 - 797
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (740 KB)

    The use of the serializability criterion of correctness allows the elimination of global synchronization in production system machines. We present an extensive performance evaluation of a concurrent production system architecture that is based on serializability and takes advantage of modern associative memory devices to allow parallel production firing, concurrent matching, and overlap among matc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Divide-and-conquer programming on MIMD computers

    Publication Year: 1995, Page(s):734 - 741
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (588 KB)

    We have developed a programming template to implement divide and conquer algorithms on MIMD computers. The template is based on the parallel divide and conquer function of Z.G. Mou and P. Hudak (1988). We explore the programmability and performance of this approach by solving some well known numerical problems on a shared memory multiprocessor and a multicomputer. A byproduct of this work is a new... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • &ACE: a high-performance parallel Prolog system

    Publication Year: 1995, Page(s):564 - 571
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (833 KB)

    In recent years a lot of research has been invested in parallel processing of numerical applications. However, parallel processing of Symbolic and AI applications has received less attention. This paper presents a system for parallel symbolic computing, named ACE, based on the logic programming paradigm. ACE is a computational model for the full Prolog language, capable of exploiting Or-parallelis... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Proceedings of 9th International Parallel Processing Symposium

    Publication Year: 1995
    Request permission for commercial reuse | PDF file iconPDF (40 KB)
    Freely Available from IEEE
  • Characterizing parallel file-access patterns on a large-scale multiprocessor

    Publication Year: 1995, Page(s):165 - 172
    Cited by:  Papers (17)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (759 KB)

    High-performance parallel file systems are needed to satisfy tremendous I/O requirements of parallel scientific applications. The design of such high-performance parallel file systems depends on a comprehensive understanding of the expected workload, but so far there have been very few usage studies of multiprocessor file systems. This paper is part of the CHARISMA project, which intends to fill t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • VIP-FS: a VIrtual, Parallel File System for high performance parallel and distributed computing

    Publication Year: 1995, Page(s):159 - 164
    Cited by:  Papers (5)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (512 KB)

    In the past couple of years, significant progress has been made in the development of message-passing libraries for parallel and distributed computing, and in the area of high-speed networking. These advances in computing technology have also led to a tremendous increase in the amount of data being manipulated and produced by scientific and commercial application programs. Despite their popularity... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Index translation schemes for adaptive computations on distributed memory multicomputers

    Publication Year: 1995, Page(s):812 - 819
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (744 KB)

    Current research in parallel programming is focused on closing the gap between globally indexed algorithms and the separate address spaces of processors on distributed memory multicomputers. A set of index translation schemes have been implemented as a part of CHAOS runtime support library, so that the library functions can be used for implementing a global index space across a collection of separ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance of the Vesta parallel file system

    Publication Year: 1995, Page(s):150 - 158
    Cited by:  Papers (3)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (896 KB)

    Vesta is an experimental parallel file system implemented on the IBM SPI. Its main features are support for parallel access from multiple application processes to file, and the ability to partition and re-partition the file data among these processes. This paper reports on a set of experiments designed to evaluate Vesta's performance. This includes basic single-node performance, and performance us... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Capturing and automating performance diagnosis: the Poirot approach

    Publication Year: 1995, Page(s):606 - 613
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (808 KB)

    Performance diagnosis, the process of finding and explaining performance problems, is an important part of parallel programming. Effective performance diagnosis requires that the programmer plan an appropriate method, and manage the experiments required by that method. This paper presents Poirot, an architecture to support performance diagnosis. It explains how the architecture helps automatically... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Broadcasting on the star and pancake interconnection networks

    Publication Year: 1995, Page(s):660 - 665
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (416 KB)

    Broadcasting is an important data communication operation in a parallel computer. In this paper, we first give a short survey on various broadcasting schemes on the star and pancake interconnection networks. We then present a broadcasting algorithm on the star and pancake networks, which can broadcast m messages of fixed length on an n-star or n-pancake in time O(n log n+m), improving the previous... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting spatial regularity in irregular iterative applications

    Publication Year: 1995, Page(s):820 - 826
    Cited by:  Papers (10)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (580 KB)

    The increasing gap between the speed of microprocessors and memory subsystems makes it imperative to exploit locality of reference in sequential irregular applications. The parallelization of such applications requires special considerations. Current RTS (Run-Time Support) for irregular computations fails to exploit the fine grain regularity present in these applications, producing unnecessary tim... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Experience with active messages on the Meiko CS-2

    Publication Year: 1995, Page(s):140 - 149
    Cited by:  Papers (20)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (940 KB)

    Active messages provide a low latency communication architecture which on modern parallel machines achieves more than an order of magnitude performance improvement over more traditional communication libraries. This paper discusses the experience we gained while implementing active messages on the Meiko CS-2, and discusses implementations for similar architectures. During our work we have identifi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-phase array redistribution: modeling and evaluation

    Publication Year: 1995, Page(s):441 - 445
    Cited by:  Papers (18)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (492 KB)

    Array redistribution is used in languages such as High Performance Fortran to allow programmers to dynamically change the distribution of arrays across processors. Distributed-memory implementations of several scientific applications require array redistribution. In this paper, efficient methods for performing array redistribution are presented. Precise closed forms for determining the processors ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Monitoring and controlling remote parallel computations using Schooner

    Publication Year: 1995, Page(s):614 - 620
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (664 KB)

    Scientific visualization systems such as AVS have the potential to help users of parallel systems monitor and control their computations. Unfortunately, the machines most suitable for visualization systems are not the parallel systems on which the computation executes, often leading to the use of two distinct machines and the viewing of results only after the computation has completed. Here, an ap... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Time synchronization on SP1 and SP2 parallel systems

    Publication Year: 1995, Page(s):666 - 672
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (552 KB)

    We describe an experimental time utility for synchronizing the operating system clocks on the SP1 and SP2 parallel system nodes. It synchronizes the node clocks typically, within 5 microseconds of each other utilizing the synchronous feature of the SP1 and SP2 interconnection network. This is 2 to 3 orders of magnitude better than what can be achieved by previous methods. Synchronized clocks are u... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bicriterion scheduling of identical processing time jobs by uniform processors

    Publication Year: 1995, Page(s):276 - 279
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (252 KB)

    The problem of bicriterion scheduling of jobs with identical processing times by uniform processors is considered. The first criterion is the minimization of either total or maximum costs, the second one is the minimization of maximum cost with different cost functions. Polynomial time algorithms are presented to determine all efficient solutions and the optimal solution for a given global criteri... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data parallel programming in an adaptive environment

    Publication Year: 1995, Page(s):827 - 832
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (540 KB)

    For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at runtime. In this paper, we discuss runtime support for data parallel programming in such an adaptive environment. Executing data parallel-programs in an adaptive environment requires redistributing data when the number of processors chan... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A workload characterization for coarse-grain multiprocessors

    Publication Year: 1995, Page(s):393 - 397
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (444 KB)

    Scalable shared memory multiprocessors commonly employ replication and the associated coherency maintenance of memory blocks, but differ in the granularity from fine-grain (cache-coherent multiprocessors) to coarse-grain (page-based distributed shared memory systems). Regardless of the size of coherency blocks, attaining good performance may depend on the number of copies staying small. Previous w... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PCODE: an efficient and reliable collective communication protocol for unreliable broadcast domain

    Publication Year: 1995, Page(s):130 - 139
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (932 KB)

    Existing programming environments for clusters are typically built on top of a point-to-point communication layer (send and receive) over local area networks (LANs) and, as a result, suffer from poor performance in the collective communication part. For example, a broadcast that is implemented using a TCP/IP protocol (which is a point-to-point protocol) over a LAN is obviously an efficient as it i... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A synthesis method of LSGP partitioning for given-shape regular arrays

    Publication Year: 1995, Page(s):234 - 238
    Cited by:  Papers (1)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (348 KB)

    This paper presents a method to partition and map a computational polytope onto processor arrays. Based on the theoretical framework of an existing LSGP method, a systematic design procedure is proposed which constructs an activity matrix, proposed by Darte, according to the shapes of the computational polytope and the processor array and derives a valid timing vector. By this method the given-sha... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Toward data distribution independent parallel matrix multiplication

    Publication Year: 1995, Page(s):436 - 440
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (420 KB)

    To eliminate or reduce initial data redistribution overheads for distributed memory parallel computers, this paper considers the problem of writing data distribution independent (DDI) programs whose functionality and execution time are independent of initial data distributions. Relations between time-space mappings and input data distributions are established. These relations are the basis of a sy... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Mcube: a symmetrical cube based network with twisted links

    Publication Year: 1995, Page(s):11 - 16
    Cited by:  Papers (17)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (648 KB)

    The Mcube network proposed in this paper is a highly recursive and symmetrical interconnection network based on twisted links (Abraham and Padmanabhan, 1989). However, unlike other twist-based networks which are asymmetrical, the Mcube has a uniform distance distribution. In addition the Mcube is immune to the adverse effects of skewed traffic patterns that occur in asymmetrical structures. Mcubes... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel implementation of ray-tracing algorithm on the Intel Delta parallel computer

    Publication Year: 1995, Page(s):688 - 692
    Cited by:  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (456 KB)

    Ray tracing is one of the computer graphics techniques used to render high quality images. Ray tracing complex scenes can require large amounts of CPU time and memory storage. We present a parallel implementation of the ray tracing algorithm on the Intel Delta parallel computer. Two key issues of efficient implementation are load balancing and database distribution. In our database distribution, o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A performance comparison of fast distributed mutual exclusion algorithms

    Publication Year: 1995, Page(s):258 - 264
    Cited by:  Papers (9)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (544 KB)

    Several fast and low-overhead distributed mutual exclusion algorithms have been proposed. Each of these algorithms required O(log n) messages per critical section entry and O(log n) bits of storage per processor. In this paper, we make a comparative performance study of four distributed mutual exclusion algorithms. Since the algorithms we study are the basis for distributed synchronization, distri... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Symbolic range propagation

    Publication Year: 1995, Page(s):357 - 363
    Cited by:  Papers (10)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (584 KB)

    Many analyses and transformations in a parallelizing compiler can benefit from the ability to compare arbitrary symbolic expressions. In this paper, we describe how one can compare expressions by using symbolic ranges of variables. A range is a lower and upper bound on a variable. We describe how these ranges can be efficiently computed from the program test. Symbolic range propagation has been im... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.