By Topic

Parallel Processing Symposium, 1996., Proceedings of IPPS '96, The 10th International

Date 15-19 April 1996

Filter Results

Displaying Results 1 - 25 of 135
  • Proceedings of International Conference on Parallel Processing

    Publication Year: 1996
    Request permission for commercial reuse | PDF file iconPDF (565 KB)
    Freely Available from IEEE
  • Ocean circulation on the Intel Paragon: modeling and implementation

    Publication Year: 1996
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (884 KB)

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Nested parallel call optimization

    Publication Year: 1996, Page(s):225 - 229
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (577 KB)

    We present a novel optimization called Last Parallel Call Optimization (LPCO) for parallel systems. The last parallel call optimization can be regarded as a parallel extension of last call optimization found in sequential systems. While the LPCO is fairly general, we use and-parallel logic programming systems to illustrate it and to report its performance on multiprocessor systems. The last parall... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Panel on "For a Massive Number of Massively Parallel Machines: What are the Target Applications, Who

    Publication Year: 1996
    Request permission for commercial reuse | PDF file iconPDF (358 KB)
    Freely Available from IEEE
  • Author index

    Publication Year: 1996
    Request permission for commercial reuse | PDF file iconPDF (218 KB)
    Freely Available from IEEE
  • Resource placement in torus-based networks

    Publication Year: 1996, Page(s):327 - 331
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (388 KB)

    This paper investigates methods to locate system resources, such as expensive hardware or software modules, to provide the most effective cost/performance tradeoffs in a torus parallel machine. This paper contains some solutions to perfect distance-t and perfect/quasi-perfect j-adjacency placement in a κ-ary n-cube and a torus using Lee (1958) distance error-correcting codes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Broadcasting multiple messages in the multiport model

    Publication Year: 1996, Page(s):781 - 788
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (708 KB)

    Considers the problem of broadcasting multiple messages from one processor to many processors in the k-port model for message-passing systems. In such systems, processors communicate in rounds, where in every round, each processor can send k messages to k processors and can receive k messages from k processors. In this paper, we first present a simple and practical algorithm based on variations of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The evolution of a massively parallel vision system for real-time automotive image processing

    Publication Year: 1996, Page(s):724 - 728
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (632 KB)

    This paper presents the first prototype of the PAPRICA massively parallel system which is integrated in the MOB-LAB experimental land vehicle for real-time vision-based road marking detection. Its main bottlenecks are highlighted and its evolution toward a linear array is discussed. This system has been enhanced with a simple but powerful interprocessor communication network for the exchange of in... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Determining asynchronous acyclic pipeline execution times

    Publication Year: 1996, Page(s):568 - 572
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (412 KB)

    Pipeline execution is a form of parallelism in which sub-computations of a repeated computation, such as statements in the body of a loop, are executed in parallel. A measure of the execution time of a pipeline is needed to determine if pipelining is an effective form of parallelism for a loop, and to evaluate alternative scheduling choices. We derive a formula for precisely determining the asynch... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SWEB: towards a scalable World Wide Web server on multicomputers

    Publication Year: 1996, Page(s):850 - 856
    Cited by:  Papers (46)  |  Patents (31)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (716 KB)

    We investigate the issues involved in developing a scalable World Wide Web (WWW) server on a cluster of workstations and parallel machines. The objective is to strengthen the processing capabilities of such a server by utilizing the power of multicomputers to match huge demands in simultaneous access requests from the Internet. We have implemented a system called SWEB on a distributed memory machi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Maximizing speedup through self-tuning of processor allocation

    Publication Year: 1996, Page(s):463 - 468
    Cited by:  Papers (7)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (592 KB)

    Addresses the problem of maximizing application speedup through run-time self-selection of an appropriate number of processors on which to run. Automatic run-time selection of processor allocations is important because many parallel applications exhibit peak speedups at allocations that are data- or time-dependent. We propose the use of a run-time system that: (a) dynamically measures job efficien... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • NAS experiences of porting CM Fortran codes to HPF on IBM SP2 and SGI Power Challenge

    Publication Year: 1996, Page(s):873 - 880
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (624 KB)

    Current Connection Machine (CM) Fortran codes developed for the CM-2 and the CM-5 represent an important class of parallel applications. Several users have employed CM Fortran codes in the production mode on the CM-2 and the CM-5 for the last five to six years, constituting a heavy investment in terms of cost and time. With Thinking Machines Corporation's decision to withdraw from the hardware bus... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PACK/UNPACK on coarse-grained distributed memory parallel machines

    Publication Year: 1996, Page(s):320 - 324
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (488 KB)

    PACK/UNPACK are Fortran 9O/HPF array construction functions which derive new arrays from existing arrays. We present algorithms for performing these operations on coarse-grained parallel machines. Our algorithms are relatively architecture independent and can be applied to arrays of arbitrary dimensions with arbitrary distribution along every dimension. Experimental results are presented on the CM... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Converse: an interoperable framework for parallel programming

    Publication Year: 1996, Page(s):212 - 217
    Cited by:  Papers (18)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (640 KB)

    Many different parallel languages and paradigms have been developed, each with its own advantages. To benefit from all of them, it should be possible to link together modules written in different parallel languages in a single application. Since the paradigms sometimes differ in fundamental ways, this is difficult to accomplish. This paper describes a framework, Converse, that supports such multi-... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A direct block-five-diagonal system solver for the VLSI parallel model

    Publication Year: 1996, Page(s):886 - 890
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (420 KB)

    A VLSI algorithm for solving a special block-five-diagonal system of linear algebraic equations is presented. The algorithm is considered for a VLSI parallel computational model where both the time of the algorithm and the area of its design are components of the complexity estimations. The linear system arises from the finite-difference approximation of the first biharmonic boundary value problem... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effects of multithreading on data and workload distribution for distributed-memory multiprocessors

    Publication Year: 1996, Page(s):116 - 122
    Cited by:  Papers (5)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (712 KB)

    While data and workload distribution can be tailored to fit a particular problem to a particular distributed-memory architecture, it is often difficult to do so for various practical issues. This paper presents our study on multithreading for distributed-memory multiprocessors. Specifically, we investigate the effects of multithreading on data distribution and workload distribution with variable t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel implementations of irregular problems using high-level actor language

    Publication Year: 1996, Page(s):857 - 862
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (488 KB)

    We present our experience in implementing several irregular problems using a high-level actor language. The problems studied require dynamic computation of object placement and may result in load imbalance as the computation proceeds, thereby requiring dynamic load balancing. The algorithms are expressed as fine-grained computations providing maximal flexibility in adapting the computation load to... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Profiling optimized code: a profiling system for an HPF compiler

    Publication Year: 1996, Page(s):469 - 473
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (452 KB)

    High Performance Fortran (HPF), a portable data-parallel language, is based on a high-level model which abstracts programming details away from the user. To achieve high performance, the HPF compiler must optimize the code, which may result in a significant change to the original code structure. Because the performances of the optimized and non-optimized codes differ, profiling HPF programs with c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel synthetic aperture radar processing on workstation networks

    Publication Year: 1996, Page(s):716 - 723
    Cited by:  Papers (10)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (628 KB)

    Synthetic aperture radar (SAR) signal processing poses a significant challenge due to its very large computation and data storage requirements. This paper presents the computational requirements of a typical high resolution satellite SAR data processing scenario. A classification of approaches to partitioning the SAR problem for parallel processing is given. The suitability of networks of workstat... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The effects of network contention on processor allocation strategies

    Publication Year: 1996, Page(s):268 - 273
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (588 KB)

    Various processor allocation strategies have been proposed for scalable parallel computers (SPCs). These strategies try to maximize the overall system utilization and, in the mean time, try to avoid network contention among different processor partitions. This paper provides an intensive simulation study investigating whether contention-free processor allocation strategies are indeed important. Ou... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Eliminating stale data references through array data-flow analysis

    Publication Year: 1996, Page(s):4 - 13
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (928 KB)

    We develop a compiler algorithm for detecting references to stale data in shared-memory multiprocessors. The algorithm consists of two key analysis techniques, stale reference detection and locality preserving analysis. While the stale reference detection finds the memory reference patterns that may violate cache coherence, the locality preserving analysis minimizes the number of such stale refere... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On embedding various networks into the hypercube using matrix transformations

    Publication Year: 1996, Page(s):650 - 654
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (316 KB)

    Various researchers have shown that the binary n-cube (or hypercube) can embed any r-ary m-cubes, having the same number of nodes, with dilation 1. Their construction method is primarily based on the reflected Gray code. We present a different embedding method based on matrix transformation schemes that achieves the same results. In addition, this method has a nice property that makes it suitable ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A method for register allocation to loops in multiple register file architectures

    Publication Year: 1996, Page(s):28 - 33
    Cited by:  Papers (1)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (504 KB)

    Multiple instruction issue processors place high demands on register file bandwidth. One solution to reduce this bottleneck is the use of multiple register files. Register allocation for these architectures then becomes exceedingly important as spill code increases memory bandwidth demands and decreases performance, especially within loops. Previously, we have addressed the issue of finding an opt... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Categorizing network traffic in update-based protocols on scalable multiprocessors

    Publication Year: 1996, Page(s):142 - 151
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (864 KB)

    Categorizes the coherence traffic in update-based protocols and shows that, for most applications, more than 90% of all updates generated by such a protocol are unnecessary. We identify application characteristics that generate useless update traffic, and compare the isolated and combined effects of several software and hardware techniques for eliminating useless updates. These techniques include ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An optimal algorithm for the angle-restricted all nearest neighbor problem on the reconfigurable mesh

    Publication Year: 1996, Page(s):687 - 691
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (388 KB)

    Given a set S of n points in the plane and two directions r1 and r2, the angle-restricted all nearest neighbor problem (ARANN) asks to compute for every point p in S the nearest point in S lying in the planar region bounded by two rays in the directions r 1 and r2 emanating from p. The ARANN problem generalizes the well-known ANN problem and finds appli... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.