By Topic

Parallel and Distributed Systems, 1994. International Conference on

Date 19-21 Dec. 1994

Filter Results

Displaying Results 1 - 25 of 121
  • Proceedings of 1994 International Conference on Parallel and Distributed Systems

    Save to Project icon | Request Permissions | PDF file iconPDF (569 KB)  
    Freely Available from IEEE
  • Panel 1: Are We Providing The Right Education For Computer Science/engineering Students?

    Page(s): 14
    Save to Project icon | Request Permissions | PDF file iconPDF (70 KB)  
    Freely Available from IEEE
  • Panel 2: Is It Possible to Fairly Compare Interconnection Networks?

    Page(s): 16 - 18
    Save to Project icon | Request Permissions | PDF file iconPDF (284 KB)  
    Freely Available from IEEE
  • Panel 3: Taiwan's Information Superhighway: Technical Issues And Social Impacts

    Page(s): 20
    Save to Project icon | Request Permissions | PDF file iconPDF (47 KB)  
    Freely Available from IEEE
  • Panel 4: What Types Of Research Papers Should We Be Writing?

    Page(s): 22
    Save to Project icon | Request Permissions | PDF file iconPDF (78 KB)  
    Freely Available from IEEE
  • Panel 5: Parallel Processing: What Have We Done Wrong?

    Page(s): 24
    Save to Project icon | Request Permissions | PDF file iconPDF (43 KB)  
    Freely Available from IEEE
  • Panel 6: Parallel And Distributed Processing Research In Some Asian Countries

    Page(s): 26
    Save to Project icon | Request Permissions | PDF file iconPDF (50 KB)  
    Freely Available from IEEE
  • Author index

    Page(s): 769 - 771
    Save to Project icon | Request Permissions | PDF file iconPDF (178 KB)  
    Freely Available from IEEE
  • Extracting the parallelism in program with unstructured control statements

    Page(s): 264 - 270
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (460 KB)  

    Program parallelization is inhibited by unstructured control statements such as GOTOs, causing interacting and overlapping execution trajectories. In this contribution, a program restructuring method is proposed to convert unstructured control statements into block if statements and while loops. Furthermore, an algorithm is presented to transform a common type of while loops into do loops. The technique works for while loops of which the control variables satisfy a linear recurrence relation. As a result, the loop carried dependencies generated by the control variables are removed. If there are no other loop carried dependencies, the do loop may then be converted into a doall loop. The algorithm has been used to test and convert a significant number of while loops into doall loops for a suite of well-known numerical benchmarks View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stochastic modeling of scaled parallel programs

    Page(s): 272 - 279
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (884 KB)  

    Testing the performance scalability of parallel programs can be a time consuming task, involving many performance runs for different computer configurations, processor numbers, and problem sizes. Ideally, scalability issues would be addressed during parallel program design, but tools are not presently available that allow program developers to study the impact of algorithmic choices under different problem and system scenarios. Hence, scalability analysis is often reserved to existing (and available) parallel machines as well as implemented algorithms. In this paper we propose techniques for analyzing scaled parallel programs using stochastic modeling approaches. Although allowing more generality and flexibility in analysis, stochastic modeling of large parallel programs is difficult due to solution tractability problems. We observe, however that the complexity of parallel program models depends significantly on the type of parallel computation, and we present several computation classes where tractable, approximate graph models can be generated View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semigroup computation and its applications on mesh-connected computers with hyperbus broadcasting

    Page(s): 34 - 39
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (440 KB)  

    Let ⊕ be an associative operation on a domain D. The semigroup problem is to compute a0⊕a1⊕...aN-1, where ai ∈D, for 0⩽i<N. The algorithm described here runs on SIMD mesh-connected computers with hyperbus broadcasting using p processors in time O(N/p+logp), where p⩽N. It as shown optimal when p=N and optimal speedup when p log p=N. Based on the proposed semigroup algorithm, other applications such as matrix multiplication, all-pair shortest path, shortest path spanning tree, topological sorting and connected component problems can be also solved in the order of logarithmic time using N3 processors View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A basis approach to loop parallelization and synchronization

    Page(s): 326 - 332
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (568 KB)  

    Loop transformation is a crucial step in parallelizing compilers. We introduce the concept of positive coordinate basis for deriving loop transformations. The basis serves to find proper loop transformations to change the dependence vectors into the desired forms. We demonstrate how this approach can, systematically extract maximal outer loop parallelism. Based on the concept, we can also construct a minimal set of synchronization vectors, which are deadlock free, to transform the inner serial loops into doacross loops View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Load balancing in pipelined processing of multi-join queries

    Page(s): 670 - 675
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (564 KB)  

    Looks at how to effectively exploit pipelining for multi-join queries in shared-nothing systems. A multi-join query can be processed using an iterative approach. In each iteration, several relations are selected and are joined in a pipelined fashion. However, algorithms that are based on this approach have traditionally assumed that the relations are uniformly distributed or only slightly skewed. When this assumption is relaxed, i.e. when the data is skewed, some nodes may be assigned a larger amount of data than can fit into their memories. As such, pipelining cannot be effectively exploited, and performance may degenerate drastically. We propose four skew handling techniques to deal with data skew for multi-join queries. The results of a performance study show that a hybrid technique is superior in most cases View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multicast communication in 2-D mesh networks

    Page(s): 63 - 68
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (608 KB)  

    Multicast refers to the message delivery from a source node to an arbitrary number of destination nodes in a communication network. The 2D mesh topology has become increasingly popular as interconnection network for multicomputers and distributed systems. Two multicast algorithms are proposed for 2D mesh networks. The computational complexity of the algorithms is analyzed. The performance of the proposed algorithms is evaluated by intensive simulations. A comparison between the proposed algorithms and two existing algorithms is given View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simulation and performance evaluation of a modularly configurable attached processor

    Page(s): 88 - 94
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (604 KB)  

    A new architecture for high-performance parallel attached processors is studied in this paper. The unique features are that the attached processor can be configured to match a set of algorithms and its memory controllers can be programmed to fit the access patterns required by the algorithms. As a result, high utilization of the processing logic for given sets of algorithms can be obtained. A simulator with interactive graphic interface is designed to study the performance of the proposed architecture. An example based on matrix multiplication is used for illustration. The simulation results show that a sustained execution rate as high as 95% of the peak speed for matrices with a size of 128×128 can be achieved in the proposed attached processor architecture. If CMOS technology is chosen to implement the MCAP architecture, a sustained speed of 190 MFLOPS can be obtained for matrix multiplication with four multipliers and four adders View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizing entity join queries by extended semijoins in a wide area multidatabase environment

    Page(s): 676 - 681
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (480 KB)  

    We consider processing entity join queries in a wide area multidatabase environment where the query processing cost is dominated by the cost of data transmission. An entity join operation ”integrates” tuples representing the same entities from different relations in which inconsistent data may exist. The semijoin technique has been successfully used in a distributed database system to reduce the cost of data transmission. However, it cannot be directly applied to process the entity join query. An extension of the traditional semijoin, named extended semijoin is proposed to reduce the cost of data transmission for entity join query processing in a wide area multidatabase environment View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Full text access may be available. Click article title to sign in or learn about subscription options.
  • Experiments on high-priority cold requests in the presence of tree saturation

    Page(s): 70 - 75
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (516 KB)  

    In large-scale shared memory multiprocessors, when a multistage interconnection network (MIN) is used for communication between processors and memory modules, hot spot and tree saturation severely delay memory requests and degrade memory bandwidth. We propose the Cold-First scheme, which is based on priority control and virtual channel flow control concepts, to reduce the delay of cold requests in the presence of hot spots. By simulations and results, we show that Cold-First scheme reduces the delay of memory requests, especially the delay of cold requests, and improves the memory bandwidth. In addition, we study the effect caused by the long delay of hot requests on lock and unlock mechanisms generally used for synchronization View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Full text access may be available. Click article title to sign in or learn about subscription options.
  • A parallel programming environment based on message passing

    Page(s): 724 - 729
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (464 KB)  

    With the development of parallel processing technology, more and more high-performance parallel computer systems have been developed. The convenient and flexible parallel programming environment plays an important role in the spread of parallel computing. How to write efficient parallel codes and how to convert the existing sequential applications into parallel codes have become a very important issue in parallel processing. We introduce a parallel programming environment based on message passing, which is simple to develop parallel applications and has high performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation of a portable parallelizing compiler with loop partition

    Page(s): 333 - 338
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (584 KB)  

    We have implemented a portable FORTRAN parallelizing compiler with loop partition on our experimental target system, Acer Altos 10000, running OSF/1 operating system. We have defined a minimal set of thread-related functions and data types, called B Threads, that is required to support the execution of this parallelizing compiler. Our compiler is highly modularized so that the porting to other platforms will be very easy, and it can partition parallel loops into multithreaded codes based on several loop partition algorithms. We have also proposed a general model of parallel compilers, which is an extension from previous model and is useful in constructing a parallelizing compiler for a particular language. The experimental results show that the best speedups are 3.75, 3.46, and 3.81 for matrix multiplication, adjoint convolution, and increasing workload sample, respectively, when the number of processors is four. It has been shown that this approach works and the experimental results are satisfied View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation of fast Hartley transform on multiple bus cache coherent multiprocessors

    Page(s): 96 - 101
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (436 KB)  

    The use of multiple bus as interconnection network for multiprocessors has shown attractive features as compared to the existing ones. The addition of cache memory makes the architecture still a high performance one. In this paper we consider the implementation of Hou's FHT on multiple bus cache coherent multiprocessors. The analytical formulas are developed and performances are analysed in terms of speedup using these formulas. We also study the limitations of the inter processor communication overhead and propose a modification to the signal flow graph in order to minimise the multiprocessor execution time and hence to improve the speedup performance of the system View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A graph model for investigating memory consistency

    Page(s): 516 - 523
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (692 KB)  

    The complexity of a multiprocessor memory system grows with the endeavors people make to improve the performance. The pseudo and real execution graphs introduced here can formally describe the complex event ordering behavior of the multiprocessor memory system and to verify the correctness of a parallel program under a consistency model. A pseudo execution graph represents the programmer's abstraction of an execution in which memory accesses are simple, atomic operations. A loop in the pseudo execution graph indicates an incorrect execution. A real execution graph represents the hardware designer's abstraction of an execution in which each memory access is a causal sequence of events. A loop in the real execution graph indicates that this execution is impossible to occur. A program is correct if all loops in the pseudo execution graphs cause loops in the corresponding real execution graphs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Obtaining nondominated k-coteries for fault-tolerant distributed k-mutual exclusion

    Page(s): 582 - 587
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (532 KB)  

    A k-coterie is a family of sets (called quorums) in which any (k+1) quorums contain at least a pair of quorums intersecting each other. K-coteries can be used to develop distributed k-mutual exclusion algorithms that are resilient to node and/or communication link failures. A k-coterie is said to dominate another k-coterie if and only if every quorum in the latter is a super set of some quorum in the former. Obviously the dominating one has more chance than the dominated one for a quorum to be formed successfully in an error-prone environment. Thus, we should always concentrate on nondominated k-coteries that no k-coterie can dominate. We introduce a theorem for checking the nondomination of k-coteries, define a class of special nondominated k-coteries-strongly nondominated (SND) k-coteries, and propose two operations to generate new SND k-coteries from known SND k-coteries View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.