By Topic

Programming Models for Massively Parallel Computers, 1993. Proceedings

Date 20-20 Sept. 1993

Filter Results

Displaying Results 1 - 24 of 24
  • An evaluation of coarse grain dataflow code generation strategies

    Page(s): 63 - 71
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (689 KB)  

    Presents top-down and bottom-up methods for generating coarse grain dataflow or multithreaded code, and evaluates their effectiveness. The top-down technique generates clusters directly from the intermediate data dependence graph used for compiler optimizations. Bottom-up techniques coalesce fine-grain dataflow code into clusters. We measure the resulting number of clusters executed, cluster size, and number of inputs per cluster, for Livermore and Purdue benchmarks. The top-down method executes less clusters and instructions, but incurs a higher number of matches per cluster, which exemplifies the need for efficient matching of more than two inputs per cluster. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Proceedings of Workshop on Programming Models for Massively Parallel Computers

    Save to Project icon | Request Permissions | PDF file iconPDF (348 KB)  
    Freely Available from IEEE
  • Parallel programming models and their interdependence with parallel architectures

    Page(s): 2 - 11
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (908 KB)  

    Because of its superior performance and cost-effectiveness, parallel computing will become the future standard, provided we have the appropriate programming models, tools and compilers needed to make parallel computers widely usable. The dominating programming style is procedural, given in the form of either the memory sharing or the message-passing paradigm. The advantages and disadvantages of these models and their supporting architectures are discussed, as well as the tools by which parallel programming is made machine-independent. Further improvements can be expected from very high level coordination languages. A general breakthrough of parallel computing, however, will only come with the parallelizing compiler that enable the user to program applications in the conventional sequential style. The state-of-the-art of parallelizing compilers is outlined, and it is shown how they will be supported by higher-level programming models and multi-threaded architectures View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling parallel computers as memory hierarchies

    Page(s): 116 - 123
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (548 KB)  

    A parameterized generic model that captures the features of diverse computer architectures would facilitate the development of portable programs. Specific models appropriate to particular computers are obtained by specifying parameters of the generic model. A generic model should be simple, and for each machine that it is intended to represent, it should have a reasonably accurate specific model. The Parallel Memory Hierarchy (PMH) model of computation uses a single mechanism to model the costs of both interprocessor communication and memory hierarchy traffic. A computer is modeled as a tree of memory modules with processors at the leaves. All data movement takes the form of block transfers between children and their parents. The paper assesses the strengths and weaknesses of the PMH model as a generic model View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Overall design of Pandore II: an environment for high performance C programming on DMPCs

    Page(s): 28 - 34
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (444 KB)  

    Pandore II is an environment designed for parallel execution of imperative sequential programs on distributed memory parallel computers (DMPCs). It comprises a compiler, libraries for different target distributed computers and execution analysis tools. No specific knowledge of the target machine is required of the user: only the specification of data decomposition is left to his duty. The purpose of the paper is to present the overall design of the Pandore II environment. The high performance C input language is described and the main principles of the compilation and optimization techniques are presented. An example is used along the paper to illustrate the development process from a sequential C program with the Pandore II environment View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A test bed for experimenting with visualization of parallel programs

    Page(s): 53 - 62
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (668 KB)  

    Because of the lack of the software tools to assist with concurrent programming, programming for parallel computers has been a significant technical problem for a diverse range of users. We are concentrating on techniques that allow computing and non-computing experts to define what they need and then automatically generate the specified visual language. Consequently, our visual language research aims at the development of a test bed for conducting experiments in language design and speeding up the implementation process for tools to assist in parallel computing View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structured parallel programming

    Page(s): 160 - 169
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (680 KB)  

    Parallel programming is a difficult task involving many complex issues such as resource allocation, and process coordination. We propose a solution to this problem based on the use of a repertoire of parallel algorithmic forms, known as skeletons. The use of skeletons enables the meaning of a parallel program to be separated from its behaviour. Central to this methodology is the use of transformations and performance models. Transformations provide portability and implementation choices, whilst performance models guide the choices by providing predictions of execution time. We describe the methodology and investigate the use and construction of performance models by studying an example View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compiling data parallel programs to message passing programs for massively parallel MIMD systems

    Page(s): 100 - 107
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (516 KB)  

    The currently dominant message-passing programming paradigm for MIMD systems is difficult to use and error prone. One approach that avoids explicit communication is the data-parallel programming model. This model stands for a single thread of control, global name space, and loosely synchronous parallel computation. It is easy to use and data-parallel programs usually scale very well. Based on the experiences of an existing compilation system for data-parallel Fortran programs it is shown how to design such a compilation system and which optimization techniques are required to make data-parallel programs competitive with their handwritten counterparts using message-passing View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PROMOTER: an application-oriented programming model for massive parallelism

    Page(s): 198 - 205
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (560 KB)  

    The article deals with rationale and concepts of a programming model for massive parallelism. We mention the basic properties of massively parallel applications and develop a programming model for data parallelism on distributed-memory computers. Its key features are a suitable combination of homogeneity and heterogeneity aspects, a unified representation of data point configuration and interconnection schemes by explicit virtual data topologies, and various synchronization schemes and nondeterminisms. The outline of the linguistic representation and the abstract executional model are given View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An experimental parallelizing systolic compiler for regular programs

    Page(s): 92 - 99
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (564 KB)  

    Systolic transformation techniques are used for parallelization of regular loop programs. After a short introduction to systolic transformation, an experimental compiler system is presented that generates parallel C code by applying different transformation methods. This system is designed as a basis for development towards a systolic compiler generating efficient fine-grained parallel code for regular programs or program parts View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structuring data parallelism using categorical data types

    Page(s): 110 - 115
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (384 KB)  

    Data parallelism is a powerful approach to parallel computation, particularly when it is used with complex data types. Categorical data types are extensions of abstract data types that structure computations in a way that is useful for parallel implementation. In particular, they decompose the search for good algorithms on a data type into subproblems, all homomorphisms can be implemented by a single recursive, and often parallel, schema, and they are equipped with an equational system that can be used for software development by transformation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Massively parallel programming using object parallelism

    Page(s): 144 - 150
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (508 KB)  

    We introduce the concept of object parallelism. Object parallelism offers a unified model in comparison with traditional parallelisation techniques such as data parallelism and algorithmic parallelism. In addition, two fundamental advantages of the object-oriented approach are exploited. First, the abstraction level of object parallelism is application-oriented, ie., it hides the details of the underlying parallel architecture. Thus, the portability of parallel applications is inherent and program development can occur on monoprocessor systems. Secondly, the concept of specialisation (through inheritance) enables the integration of the given application code with advanced run time support for load balancing and fault tolerance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Modula-2* environment for parallel programming

    Page(s): 43 - 52
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (736 KB)  

    Presents a portable parallel programming environment for Modula-2*, an explicitly parallel machine-independent extension of Modula-2. Modula-2* offers synchronous and asynchronous parallelism, a global single address space, and automatic data and process distribution. The Modula-2* system consists of a compiler, a debugger, a cross-architecture make, graphical X Windows control panel, run-time systems for different machines, and sets of scalable parallel libraries. The existing implementation targets the MasPar MP series of massively parallel processors (SIMD), the KSR-1 parallel computer (MIMD), heterogeneous LANs of workstations (MIMD), and single workstations (SISD). We describe the important components of the Modula-2* environment, and discuss selected implementation issues. We focus on how we achieve a high degree of portability for our system, while at the same time ensuring efficiency View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Formal methods for concurrent systems design: a survey

    Page(s): 12 - 21
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (688 KB)  

    Concurrency is frequently employed as a means to increase the performance of computing systems: a conventional sequential program is designed, to be parallelised later on. This contribution is intended to show that concurrent systems can also differ essentially from conventional, sequential systems, with respect to the kind of problems to be solved, and even to the principal limits of capability and performance. This paper surveys particular concepts and properties of concurrent systems, followed by a choice of models that more or less reflect those properties. Finally, the author discusses a typical example of an algorithm for concurrent systems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MANIFOLD: a programming model for massive parallelism

    Page(s): 151 - 159
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (728 KB)  

    MANIFOLD is a coordination language for orchestration of the communications among independent, cooperating processes in a massively parallel or distributed application. The fundamental principle underlying MANIFOLD is the complete separation of computation from communication. This means that in MANIFOLD: computation processes know nothing about their own communication with other processes; and coordinator processes manage the communications among a set of processes, but know nothing about the computation they carry out. This principle leads to more flexible software made out of more re-usable components, and supports open systems. MANIFOLD is a new programming language based on a number of novel concepts. MANIFOLD is about concurrency of cooperation as opposed to the concern of the classical work on concurrency, that deals with concurrency of competition. In order to better understand the fundamentals of this language and its underlying model, we focus on the kernel of a simple sub-language of MANIFOLD, called MINIFOLD View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the implementation of virtual shared memory

    Page(s): 172 - 178
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (352 KB)  

    The field of parallel algorithms demonstrated that a machine model with virtual shared memory is easy to program. Most efforts in this field have been achieved on the PRAM-model. Theoretical results show that a PRAM can be simulated optimally on an interconnection network. We discuss implementations of some of these PRAM simulations and discuss their performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel symbolic processing-can it be done?

    Page(s): 24 - 25
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (168 KB)  

    My principle answer is: yes, but it depends. Parallelization of symbolic applications is possible, but only for certain classes of applications. Distributed memory may prevent parallelization in some cases where the relation of computation and communication overhead becomes too high, but also may be an advantage when applications require much garbage collection, which can then be done in a distributed way. There are also some applications which have a higher degree of parallelism than can be supported by shared memory, and so are candidates for profiting by massively parallel architectures View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A programming model for reconfigurable mesh based parallel computers

    Page(s): 124 - 133
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (796 KB)  

    The paper describes a high level programming model for reconfigurable mesh architectures. We analyze the engineering and technological issues of the implementation of reconfigurable mesh architectures and define an abstract architecture, called polymorphic processor array. We define both a computation model and a programming model for polymorphic processor arrays and design a parallel programming language called Polymorphic Parallel C based on this programming model, for which we have implemented a compiler and a simulator. We have used such tools to validate a number of PPA algorithms and to estimate the performance of the corresponding programs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interprocedural heap analysis for parallelizing imperative programs

    Page(s): 74 - 82
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (524 KB)  

    The parallelization of imperative programs working on pointer data structures is possible by using extensive heap analysis. Therefore, we consider a new interprocedural version of the heap analysis algorithm with summary nodes from Chase, Wegman and Zadeck (1990). Our analysis handles arbitrary call graph inclusive recursion, works on a realistic low-level intermediate language, and uses a modified propagation method to correct an inaccuracy of the original algorithm. Furthermore, we discuss how loops and recursions over heap data structures can be parallelized based on the analysis information View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Beyond the data parallel paradigm: issues and options

    Page(s): 179 - 190
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1116 KB)  

    Currently, the predominant approach in compiling a program for parallel execution on a distributed memory multiprocessor is driven by the data parallel paradigm, in which user-specified data mappings are used to derive computation mappings via ad hoc rules such as owner-computes. We explore a more general approach which is driven by the selection of computation mappings from the program dependence constraints, and by the selection of dynamic data mappings from the localization constraints in different computation phases of the program. We state the optimization problems addressed by this approach and outline the solution methods that can be used. We believe that this approach provides promising solutions beyond what can be achieved by the data parallel paradigm. The paper outlines the general program model assumed for this work, states the optimization problems addressed by the approach and presents solutions to these problems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reduced interprocessor-communication architecture for supporting programming models

    Page(s): 134 - 143
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (628 KB)  

    The paper presents an execution model and a processor architecture for general purpose massively parallel computers. To construct an efficient massively parallel computer: the execution model should be natural enough to map an actual problem structure into a processor architecture; each processor should have efficient and simple communication structure; and computation and communication should be tightly coupled and their operation should be highly overlapped. To meet these, we obtain a simplified architecture with a Continuation Driven Execution Model. We call this architecture RICA. RICA consists of a simplified message handling pipeline, a continuation-driven thread invocation mechanism, a RISC core for instruction execution, a message generation pipeline which can send messages asynchronously with other operations, and a thread switching mechanism with little overhead, all of which are fused in a simple architecture. Next, we state how RICA realizes parallel primitives of programming models and how efficiently it does. The primitives examined are-shared memory primitives, message passing primitives and barriers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The DSPL programming environment

    Page(s): 35 - 42
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (520 KB)  

    Gives an overview on the principle concepts employed in the DSPL (Data Stream Processing Language) programming environment, an integrated approach to automate system design and implementation of parallel applications. The programming environment consists of a programming language and the following set of integrated tools: (1) The modeling tool automatically derives a software model from the given application program. (2) The model based optimization tool uses the software model to compute such design decisions as network topology, task granularity, task assignment and task execution order. (3) Finally, the compiler/optimizer transforms the application program into executable code for the chosen processor network, reflecting the design decisions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Virtual shared memory-based support for novel (parallel) programming paradigms

    Page(s): 83 - 90
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (632 KB)  

    Discusses the implementation of novel programming paradigms on virtual shared memory (VSM) parallel architectures. A wide spectrum of paradigms (data-parallel, functional and logic languages) have been investigated in order to achieve, within the context of VSM parallel architectures, a better understanding of the underlying support mechanisms for the paradigms and to identify commonality amongst the different mechanisms. An overview of VSM is given in the context of a commercially available VSM machine: a KSR-1. The correspondence between the features of the high level languages and the VSM features which assist efficient implementation are presented. Case studies are discussed as concrete examples of the issues involved View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance analysis of distributed applications by suitability functions

    Page(s): 191 - 197
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (356 KB)  

    A simple programming model of distributed-memory message-passing computer systems is first applied to describe the couple architecture/application by two sets of parameters. The node timing formula is then derived on the basis of scalar, vector and communication components. A set of suitability functions, extracted from the performance formulae, are defined. These functions are applied as an example to the performance analysis of the 1-dimensional FFT benchmark from the GENESIS benchmark suite. The suitability functions could also be useful for comparative performance analysis of both existing distributed-memory systems and new architectures under development View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.