By Topic

Frontiers of Massively Parallel Computation, 1992., Fourth Symposium on the

Date 19-21 Oct. 1992

Filter Results

Displaying Results 1 - 25 of 96
  • Fourth Symposium on the Frontiers of Massively Parallel Computation (Cat. No.92CH3185-6)

    Publication Year: 1992
    Request permission for commercial reuse | PDF file iconPDF (9 KB)
    Freely Available from IEEE
  • Combining switches for the NYU Ultracomputer

    Publication Year: 1992, Page(s):521 - 523
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (204 KB)

    A pairwise combining switch has been implemented for use in the 16×16 processor/memory interconnection network of the NYU Ultracomputer prototype. The switch design may be extended for use in very large systems by providing greater combining capability. Methods for doing so are discussed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Routing algorithms on a mesh-connected computer

    Publication Year: 1992, Page(s):524 - 527
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (312 KB)

    The authors present two algorithms for the 1-1 routing problems on a mesh-connected computer. The first algorithm, with a queue size of 28, solves the 1-1 routing problem on an n×n mesh-connected computer in 2n+O(1) steps. This improves the previous result of queue size 75. The second algorithm solves the problem in 2n-2 steps with queue size 12... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic data distribution for nearest neighbor networks

    Publication Year: 1992, Page(s):178 - 185
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (592 KB)

    An algorithm for mapping an arbitrary, multidimensional array onto an arbitrarily shaped multidimensional nearest-neighbor network of a distributed memory machine is presented. The individual dimensions of the array are labeled with high-level usage descriptors that either can be provided by the programmer or can be derived by sophisticated static compiler analysis. The presented algorithm achieve... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Network design and performance for a massively parallel SIMD system

    Publication Year: 1992, Page(s):186 - 193
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (484 KB)

    It is shown that a nearest neighbor communication network can be complimented with a log-diameter multistage network to handle different communications patterns. This is especially useful when the pattern of data movement is not uniform. The designed network is evaluated for two cases: a dense case with many processing elements communicating and a sparse case. For 32-b data, the algorithm for comp... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards efficient parallelizations of a computer algebra algorithm

    Publication Year: 1992, Page(s):67 - 74
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (660 KB)

    The authors summarize the results of a preliminary study that examines the feasibility of implementing computer algebra systems on massively parallel single-instruction multiple-data architectures. On serial computers, these systems rely on B.Buchberger's (1970, 1985) algorithm for computing Grobner bases. A parallelization of this algorithm that addresses the potential growth in the number of pol... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Input/output for fine grain multiprocessor systems

    Publication Year: 1992, Page(s):545 - 547
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (216 KB)

    While extensive investigations on how multiple processing elements (PEs) in a parallel system can be utilized efficiently have been carried out, the I/O (input/output) into and from the system has been ignored in most cases. However, the time for downloading input data or uploading results would not be negligible, especially when a large number of PEs such as those in a massively parallel system a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The LINPACK benchmark on the Fujitsu FAP 1000

    Publication Year: 1992, Page(s):128 - 135
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (620 KB)

    The author describes an implementation of the LINPACK benchmark on the Fujitsu AP 1000. Design considerations include communication primitives, data distribution, use of blocking to reduce memory references, and effective use of the cache. The LINPACK benchmark results show that the AP 1000 is a good machine for numerical linear algebra, and that one can consistently achieve close to 80 percent of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Off-line permutation scheduling on circuit-switched fixed routing networks

    Publication Year: 1992, Page(s):389 - 396
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (516 KB)

    The problem of offline permutation scheduling on linear arrays, rings, hypercubes, and two-dimensional arrays, assuming the CSFR (circuit-switched fixed routing) model, is examined. Optimal permutation scheduling involves finding a minimum number of subsets of nonconflicting source-destination paths. Every subset of paths can be established to run in one pass. Optimal permutation scheduling on lin... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Representations of Borel Cayley graphs

    Publication Year: 1992, Page(s):194 - 201
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (580 KB)

    It is shown that all degree-4 Borel Cayley graphs can also be represented by more restrictive chordal rings (CRs) through a constructive proof. All bidirectional, degree-4 Borel Cayley graphs have the more restrictive CR representations, and hence Hamiltonian cycles always exist for these graphs. A step-by-step algorithm to transform any degree-4 Borel Cayley graph into CR graphs is provided. Exam... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A CPU utilization limit for massively parallel MIMD computers

    Publication Year: 1992, Page(s):83 - 92
    Cited by:  Papers (7)  |  Patents (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (936 KB)

    Massively parallel computer systems based on off-the-shelf CPU chip-sets have become commercially available. The authors demonstrate a theoretical limit on the silicon (or other circuitry media) utilization of such architectures as the number of processors is scaled up. In addition, case studies of the Thinking Machines Corporation CM-5 and of the Intel Touchstone are presented in order to quantif... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the physical design of butterfly networks for PRAMs

    Publication Year: 1992, Page(s):202 - 209
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (472 KB)

    The design of networks for massively parallel computers is strongly influenced by available technology. The network latency, critical for many applications, is significantly increased by packaging constraints, i.e. many connections between switches involving pad drivers or even line drivers. The authors concentrate on reducing those influences for a butterfly network related to Ranade's routing al... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Throughput analysis of pipelined multiprocessor modules

    Publication Year: 1992, Page(s):548 - 550
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (228 KB)

    A feasible form of parallel architecture would be one which consists of several pipeline stages, each of which is a multiprocessor module of a large number of processing elements (PEs). In many applications, such as real-time image processing and dynamic control, the optimized computing structure would be in this form. In the present study, the performance of a parallel processing model of such an... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An algorithm for a class of direct and inverse scattering problems

    Publication Year: 1992, Page(s):237 - 243
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (360 KB)

    A novel, highly parallel algorithm for a class of direct and inverse scattering problems is proposed. It is shown that this algorithm reduces the noise propagation exhibited by the existing algorithms, and produces error terms that are proportional to the square of the discrete step size. Unlike the conventional algorithms, this new formulation decouples the reflection kernel in a given layer. Due... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Massively parallel sparse LU factorization

    Publication Year: 1992, Page(s):136 - 140
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (352 KB)

    The multifrontal algorithm for sparse LU factorization has been expressed as a data parallel program that is suitable for massively parallel computers. A new way of mapping data and computations to processors is used, and good processor utilization is obtained even for unstructured sparse matrices. The sparse problem is decomposed into many smaller, dense subproblems, with low overhead for communi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The MetaMP approach to parallel programming

    Publication Year: 1992, Page(s):562 - 565
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (332 KB)

    The authors are researching techniques for the programming of large-scale parallel machines for scientific computation. They use an intermediate-level language, MetaMP, that sits between High Performance Fortran (HPF) and low-level message passing. They are developing an efficient set of primitives in the intermediate language and are investigating compilation methods that can semi-automatically r... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hardware support for the Seamless programming model

    Publication Year: 1992, Page(s):353 - 360
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (860 KB)

    The communication latency problem is presented with special emphasis on RISC (reduced instruction set computer) based multiprocessors. An interprocessor communication model for parallel programs based on locality is presented. This model enables the programmer to manipulate locality at the language level and to take advantage of currently available system hardware to reduce latency. A hardware nod... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scientific visualization theatre

    Publication Year: 1992
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (36 KB)

    Summary form only given. Discusses the latest in massively parallel processing (MPP) applications' results through high-resolution graphics and animation. Three themes are represented, demonstrating the relationship between massively parallel computing and scientific visualization. Results of applications computed on MPPs and visualized on graphics workstations are shown for many of the cases. Exa... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A parallel software package for solving linear systems

    Publication Year: 1992, Page(s):397 - 401
    Cited by:  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (340 KB)

    A problem arising in scientific computation is the solution of Ax=b, where A is a large, sparse matrix. One of the most robust algorithms for solving the above equation is the conjugate gradient method, especially when combined with a preconditioner. The authors discuss a new software package, MP-PCGPAK2, that implements a parallel version of the conjugate gradient metho... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A rank-two divide and conquer method for the symmetric tridiagonal eigenproblem

    Publication Year: 1992, Page(s):402 - 410
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (528 KB)

    A rank-two divide and conquer algorithm is developed for calculating the eigensystem of a symmetric tridiagonal matrix. This algorithm is compared to the LAPACK recommended path for this problem and the rank-one divide and conquer algorithm. The timing results on a Sequent Symmetry S81b show that this algorithm has potential as a parallel alternative to the QR algorithm View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Massively parallel solution of quantum transport problems

    Publication Year: 1992, Page(s):506 - 507
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (156 KB)

    A numerically intensive program for the simulation of quantum transport in small structures has been implemented on a MasPar MP-1. The high degree of parallelism inherent in numerically intensive sections of the problem has been exploited, and devices with realistic dimensions and operating conditions have been investigated View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel pulse correlation and geolocation

    Publication Year: 1992, Page(s):541 - 542
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (160 KB)

    The identification and location of ground-based radars via orbiting receivers require the correlation of pulses, the determination of time differences of arrival, and geolocation. Data rates in emitter-rich environments would swamp single-CPU processors performing this operation. The authors present an innovative parallel algorithm developed specifically for this application on massively parallel ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Communication overhead on the CM5: an experimental performance evaluation

    Publication Year: 1992, Page(s):108 - 115
    Cited by:  Papers (10)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (524 KB)

    The authors present experimental results for communication overhead on the scalable parallel machine CM-5. It is observed that the communication latency of the data network is 88 μs. It was also observed that the communication cost for messages that are a multiple of 16 bytes is much smaller than for messages that are not, and therefore, for better performance, a user should pad messages to mak... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A hyper-pyramid network topology for image processing

    Publication Year: 1992, Page(s):224 - 229
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (360 KB)

    The authors describe a novel network topology for image processing, called the hyper-pyramid network topology. This structure is hierarchical and implements local, inside-region communications at each level, and upward/downward communications in the whole structure. Intraregion communications are shown by an image processing algorithm study. The authors display the implementation of a component la... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel holographic image calculation and compression

    Publication Year: 1992, Page(s):557 - 559
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (308 KB)

    The authors describe the parallel implementation of an algorithm suitable for hologram creation on a 16384 processor SIMD (single-instruction multiple-data) MasPar machine. When computing an image of typical complexity, the parallel implementation sacrifices up to 11% efficiency in data compression to gain a performance up to 250 times greater than that achieved on a uniprocessor workstation. The ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.