Scheduled System Maintenance
On Tuesday, May 22, IEEE Xplore will undergo scheduled maintenance. Single article sales and account management will be unavailable
from 6:00am–5:00pm ET. There may be intermittent impact on performance from noon–6:00pm ET.
We apologize for the inconvenience.

Proceedings of International Conference on Parallel Processing

15-19 April 1996

Filter Results

Displaying Results 1 - 25 of 135
  • Proceedings of International Conference on Parallel Processing

    Publication Year: 1996
    Request permission for commercial reuse | PDF file iconPDF (565 KB)
    Freely Available from IEEE
  • Ocean circulation on the Intel Paragon: modeling and implementation

    Publication Year: 1996
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (884 KB)

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Nested parallel call optimization

    Publication Year: 1996, Page(s):225 - 229
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (577 KB)

    We present a novel optimization called Last Parallel Call Optimization (LPCO) for parallel systems. The last parallel call optimization can be regarded as a parallel extension of last call optimization found in sequential systems. While the LPCO is fairly general, we use and-parallel logic programming systems to illustrate it and to report its performance on multiprocessor systems. The last parall... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Panel on "For a Massive Number of Massively Parallel Machines: What are the Target Applications, Who

    Publication Year: 1996
    Request permission for commercial reuse | PDF file iconPDF (358 KB)
    Freely Available from IEEE
  • Author index

    Publication Year: 1996
    Request permission for commercial reuse | PDF file iconPDF (218 KB)
    Freely Available from IEEE
  • Mapping linear recurrences onto systolic arrays

    Publication Year: 1996, Page(s):891 - 897
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (572 KB)

    Presents an automatic method for mapping a system of linear recurrence equations onto systolic architectures. First, we show that systolic architectures can be derived from linear recurrence equations using the notion of directed recurrence equations. Next, we provide a procedure called `cubization' to achieve better performance while mapping such equations. The cubization procedure is completely ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A direct block-five-diagonal system solver for the VLSI parallel model

    Publication Year: 1996, Page(s):886 - 890
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (420 KB)

    A VLSI algorithm for solving a special block-five-diagonal system of linear algebraic equations is presented. The algorithm is considered for a VLSI parallel computational model where both the time of the algorithm and the area of its design are components of the complexity estimations. The linear system arises from the finite-difference approximation of the first biharmonic boundary value problem... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Application load imbalance on parallel processors

    Publication Year: 1996, Page(s):836 - 842
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (620 KB)

    This paper addresses the issue of dynamic load imbalance in a class of synchronous iterative applications, and develops a model to represent their workload dynamics. Such models of application load dynamics help in more accurate performance prediction and in the design of efficient load balancing algorithms. Our model captures the workload dynamics across iterations, and predicts the workload dist... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Commutativity analysis: a technique for automatically parallelizing pointer-based computations

    Publication Year: 1996, Page(s):14 - 22
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (872 KB)

    This paper introduces an analysis technique, commutativity analysis, for automatically parallelizing computations that manipulate dynamic, pointer-based data structures. Commutativity analysis views computations as composed of operations on objects. It then analyzes the program to discover when operations commute, i.e. leave the objects in the same state regardless of the order in which they execu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributing tokens on a hypercube without error accumulation

    Publication Year: 1996, Page(s):573 - 578
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (532 KB)

    The problem of load balancing on the hypercube is considered. A number of tokens are placed at the nodes and the goal is to redistribute them evenly throughout the network. Initially, each of the p nodes stores up to m tokens. A simple algorithm is presented, operating in 𝒪(log p+m·log log p) time on average View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Random seeking: a general, efficient, and informed randomized scheme for dynamic load balancing

    Publication Year: 1996, Page(s):881 - 885
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (608 KB)

    Proposes a completely general, informed, randomized, dynamic load-balancing method called random seeking (RS), which is suitable for parallel algorithms with characteristics found in many of the search algorithms used in artificial intelligence and operations research and in many divide-and-conquer algorithms. In this method, source processors randomly seek out sink processors for load balancing b... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new technique for 3-D domain decomposition on multicomputers which reduces message-passing

    Publication Year: 1996, Page(s):831 - 835
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (448 KB)

    Algorithms for many geometric and physical problems rely on a decomposition of 3D space. The cubical decomposition that is typically used can lead to costly communication overheads when implemented on multicomputers since each cubical cell is adjacent to, and may interact with, as many as 26 neighbouring cells. We explore an alternate decomposition based on truncated octahedra that, along with oth... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Eliminating stale data references through array data-flow analysis

    Publication Year: 1996, Page(s):4 - 13
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (928 KB)

    We develop a compiler algorithm for detecting references to stale data in shared-memory multiprocessors. The algorithm consists of two key analysis techniques, stale reference detection and locality preserving analysis. While the stale reference detection finds the memory reference patterns that may violate cache coherence, the locality preserving analysis minimizes the number of such stale refere... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Determining asynchronous acyclic pipeline execution times

    Publication Year: 1996, Page(s):568 - 572
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (412 KB)

    Pipeline execution is a form of parallelism in which sub-computations of a repeated computation, such as statements in the body of a loop, are executed in parallel. A measure of the execution time of a pipeline is needed to determine if pipelining is an effective form of parallelism for a loop, and to evaluate alternative scheduling choices. We derive a formula for precisely determining the asynch... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance of asynchronous linear iterations with random delays

    Publication Year: 1996, Page(s):625 - 629
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (408 KB)

    In this paper we investigate the speedup potential of asynchronous iterative algorithms over their synchronous counterparts for the special case of linear iterations. The space of linear iterations of size two is explored by simulation and analytical methods. We find cases and conditions for high asynchronous speedups. However, averaging asynchronous speedups over the whole set of iteration matric... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simultaneous compression of makespan and number of processors using CRP

    Publication Year: 1996, Page(s):332 - 338
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (716 KB)

    This paper presents a new 2D compression (2DC) method for solving the multiprocessor scheduling (MS) problems to simultaneously achieve both objectives of minimizing the makespan and the number of processors used. Most existing approaches tend to focus on a very specific range of the MS problems, while risking the loss of the solution quality elsewhere. 2DC synthesizes two main classes of compress... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient run-time support for irregular task computations with mixed granularities

    Publication Year: 1996, Page(s):823 - 830
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (772 KB)

    Many irregular scientific computing problems can be modeled by directed acyclic task graphs (DAGs). We present an efficient run-time system for executing general asynchronous DAG schedules on distributed memory machines. Our solution tightly integrates the run-time scheme with a fast communication mechanism to eliminate unnecessary overhead in message buffering and copying, and takes advantage of ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A randomized algorithm for Voronoi diagram of line segments on coarse grained multiprocessors

    Publication Year: 1996, Page(s):192 - 198
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (616 KB)

    We present a randomized parallel algorithm for the Voronoi diagram of line segments on coarse grained parallel machines, which, for any input of n line segments on P processors (n=Ω(P3+ε), for any ε>0) requires (with high probability) O([n log n/P]) local operations, O(n/P) global-operations (or messages) per processor, O(1) levels of global data dependency (or O(1) c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizing COOP languages: study of a protein dynamics program

    Publication Year: 1996, Page(s):235 - 240
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (564 KB)

    Fine-grained concurrent object-oriented programming (COOP) models can simplify the programming of irregular parallel applications but are often perceived as inefficient. In this paper, we study implementation techniques to obtain efficient parallel execution of fine-grained COOP languages using a medium-sized protein dynamics program. We found that even with high data locality and good sequential ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementing parallel processing in a rugged embeddable environment

    Publication Year: 1996, Page(s):496 - 501
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (608 KB)

    Litton Guidance and Control Systems, together with MasPar Computer Corporation and support from the Advanced Research Projects Agency (ARPA), Information Technology Office (ITO), is addressing the problem of our military not having a fieldable, high performance, parallel processor. We are packaging MasPar's commercially successful, massively parallel processing system to minimize its size and maxi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • NAS experiences of porting CM Fortran codes to HPF on IBM SP2 and SGI Power Challenge

    Publication Year: 1996, Page(s):873 - 880
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (624 KB)

    Current Connection Machine (CM) Fortran codes developed for the CM-2 and the CM-5 represent an important class of parallel applications. Several users have employed CM Fortran codes in the production mode on the CM-2 and the CM-5 for the last five to six years, constituting a heavy investment in terms of cost and time. With Thinking Machines Corporation's decision to withdraw from the hardware bus... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Broadcasting multiple messages in the multiport model

    Publication Year: 1996, Page(s):781 - 788
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (708 KB)

    Considers the problem of broadcasting multiple messages from one processor to many processors in the k-port model for message-passing systems. In such systems, processors communicate in rounds, where in every round, each processor can send k messages to k processors and can receive k messages from k processors. In this paper, we first present a simple and practical algorithm based on variations of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A partitioning programming environment for a novel parallel architecture

    Publication Year: 1996, Page(s):544 - 548
    Cited by:  Papers (6)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (444 KB)

    The paper presents a partitioning and parallelizing programming environment for a novel parallel architecture. This universal embedded accelerator is based on a reconfigurable datapath hardware. The partitioning and parallelizing programming environment accepts C programs and carries out both a profiling-driven host/accelerator partitioning for performance optimization in a first step, and in a se... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Routing a permutation in the hypercube by two sets of edge-disjoint paths

    Publication Year: 1996, Page(s):561 - 567
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (536 KB)

    Consider a hypercube regarded as a directed graph, with one edge in each direction between each pair of adjacent nodes. We show that any permutation on the hypercube can be partitioned into two partial permutations of the same size, so that each of them can be routed by edge-disjoint directed paths. This result implies that the hypercube can be mode rearrangeable by virtually duplicating each edge... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mapping techniques for parallel evaluation of chains of recurrences

    Publication Year: 1996, Page(s):620 - 624
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (488 KB)

    This paper examines the parallelization of a technique for speeding up the evaluation of potentially-complex real-valued functions at a large number of points. The technique being parallelized generates a chain of recurrences (CR) which is then used to compute the function incrementally (i.e., by using the results of one iteration in calculating the value of the function in the next iteration). Th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.