By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 3 • Date Mar 1993

Filter Results

Displaying Results 1 - 10 of 10
  • Performance evaluation of dynamic sharing of processors in two-stage parallel processing systems

    Page(s): 306 - 317
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (876 KB)  

    The performance of job scheduling is studied in a large parallel processing system where a job is modeled as a concatenation of two stages which must be processed in sequence. Pi is the number of processors required by stage P as the total number of processors in the system. A large parallel computing system is considered where Max(P1, P2)⩾P≫1 and Max(P1 , P2)≫Min(P1, P2). For such systems, exact expressions for the mean system delay are obtained for various job models and disciplines. The results show that the priority should be given to jobs working on the stage which requires fewer processors. The large parallel system (i.e. P≫1) condition is then relaxed to obtain the mean system time for two job models when the priority is given to the second stage. Moreover, a scale-up rule is introduced to obtain the approximated delay performance when the system provides more processors than the maximum number of processors required by both stages (i.e. P>Max(P1, P2)). An approximation model is given for jobs with more than two stages View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A parallel algorithm for random walk construction with application to the Monte Carlo solution of partial differential equations

    Page(s): 355 - 360
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (508 KB)  

    Random walks are widely applicable in statistical and scientific computations. In particular, they are used in the Monte Carlo method to solve elliptic and parabolic partial differential equations (PDEs). This method holds several advantages over other methods for PDEs as it solves problems with irregular boundaries and/or discontinuities, gives solutions at individual points, and exhibits great parallelism. However, the generation of each random walk in the Monte Carlo method has been done sequentially because each point in the walk is derived from the preceding point by moving one grid step along a randomly selected direction. A parallel algorithm for random walk generation in regular as well as irregular regions is presented. The algorithm is based on parallel prefix computations. The communication structure of the algorithm is shown to ideally fit on a hypercube of n nodes, where n is the number of processors View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A generalized scheme for mapping parallel algorithms

    Page(s): 328 - 346
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1504 KB)  

    A generalized mapping strategy that uses a combination of graph theory, mathematical programming, and heuristics is proposed. The authors use the knowledge from the given algorithm and the architecture to guide the mapping. The approach begins with a graphical representation of the parallel algorithm (problem graph) and the parallel computer (host graph). Using these representations, the authors generate a new graphical representation (extended host graph) on which the problem graph is mapped. An accurate characterization of the communication overhead is used in the objective functions to evaluate the optimality of the mapping. An efficient mapping scheme is developed which uses two levels of optimization procedures. The objective functions include minimizing the communication overhead and minimizing the total execution time which includes both computation and communication times. The mapping scheme is tested by simulation and further confirmed by mapping a real world application onto actual distributed environments View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance analysis of buffer coherency policies in a multisystem data sharing environment

    Page(s): 289 - 305
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1480 KB)  

    Six buffer coherency policies for a multisystem transaction processing environment are compared. These policies differ in their basic approaches on how and when the invalidated pages are identified or if the updated pages are propagated to the buffers of the remote nodes. They can be classified as detection, notification (of invalid pages), and (update) propagation oriented approaches. The policies trade off CPU overhead of coherency messages with buffer hit probability in different ways, resulting in a tradeoff of response time and maximum throughput. The main contribution is to develop analytical models to predict buffer hit probabilities under various buffer coherency policies assuming the LRU replacement policy and the independent reference model (IRM). The buffer models are validated using simulation models and show excellent agreement. Integrated analytic models capturing buffer hit probability and CPU overhead are developed to predict the overall response times under these coherency policies. The difference in buffer hit probabilities amongst various policies are found to be very sensitive to the skewness of the data access View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient heuristic for permutation packet routing on meshes with low buffer requirements

    Page(s): 270 - 276
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (596 KB)  

    Even though exact algorithms exist for permutation routine of n2 messages on a n×n mesh of processors which require constant size queues, the constants are very large and the algorithms very complicated to implement. A novel, simple heuristic for the above problem is presented. It uses constant and very small size queues (size=2). For all the simulations run on randomly generated data, the number of routing steps that is required by the algorithm is almost equal to the maximum distance a packet has to travel. A pathological case is demonstrated where the routing takes more than the optimal, and it is proved that the upper bound on the number of required steps is O(n2). Furthermore, it is shown that the heuristic routes in optimal time inversion, transposition, and rotations, three special routing problems that appear very often in the design of parallel algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An optimal implementation of broadcasting with selective reduction

    Page(s): 256 - 269
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1188 KB)  

    A model of parallel computation called broadcasting with selective reduction (BSR) can be viewed as a concurrent-read concurrent-write (CRCW) parallel random access machine (PRAM) with one extension. An additional type of concurrent memory access is permitted in BSR, namely the BROADCAST instruction by means of which all N processors may gain access to all M memory locations simultaneously for the purpose of writing. At each memory location, a subset of the incoming broadcast data is selected and reduced to one value finally stored in that location. For several problems, BSR algorithms are known which require fewer steps than the corresponding best-known PRAM algorithms, using the same number of processors. A circuit is introduced to implement the BSR model, and it is shown that, in size and depth, the circuit presented is of the same order as an optimal circuit implementing the PRAM. Thus, if it is reasonable to assume that CRCW PRAM instructions execute in constant time, the assumption of a constant time BROADCAST instruction is no less reasonable View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault-tolerant embedding of complete binary trees in hypercubes

    Page(s): 277 - 288
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (884 KB)  

    The focus is on the following graph-theoretic question associated with the simulation of complete binary trees by faulty hypercubes: if a certain number of nodes or links are removed from an n-cube, will an (n-1)-tree still exists as a subgraph? While the general problem of determining whether a k-tree, k< n, still exists when an arbitrary number of nodes/links are removed from the n-cube is found to be NP-complete, an upper bound is found on how many nodes/links can be removed and an (n-1)-tree still be guaranteed to exist. In fact, as a corollary of this, it is found that if no more than n-3 nodes/links are removed from an (n-1)-subcube of the n-cube, an (n-1)-tree is also guaranteed to exist View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Clocking arbitrarily large computing structures under constant skew bound

    Page(s): 241 - 255
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1164 KB)  

    A scheme for global synchronization of arbitrarily large computing structures such that clock skew between any two communicating cells is bounded above by a constant is described. The scheme utilizes clock nodes that perform simple processing on clock signals to maintain a constant skew bound irrespective of the size of the computing structure. Among the salient features of the scheme is the interdependence between network topology, skew upper bound, and maximum clocking rate achievable. A 2-D mesh framework is used to present the concepts, introduce three network designs, and to prove some basic results. For each network the (constant) upper bound on clock skew between any two communicating processors, is established, and its independence of network size is shown. Simulations were carried out to verify correctness and to check the workability of the scheme. A 4×4 network was built and successfully tested for stability. Such issues as node design, clocking of nonplanar structures such as hypercubes, and the concept of fuse programmed clock networks are addressed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A study of achievable speedup in distributed simulation via NULL messages

    Page(s): 347 - 354
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (764 KB)  

    The results of an experimental study on distributed simulation of three open queuing networks are reported. The distributed simulation scheme considered is a simple variation of the scheme given by K.M. Chandy and J. Misra (1979) using NULL messages. A new approach is used to study the relationship between the overhead and performance of a distributed simulator, and the approach is illustrated by studying these three example networks. Two measures of ideal speedup of distributed simulation over sequential simulation are defined and measured. These values of ideal speedup are much less than simply the number of processors, and hence provide a more realistic value for the ideal speedup View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On process migration and load balancing in Time Warp

    Page(s): 318 - 327
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (852 KB)  

    A load balancing algorithm for a discrete event simulation executed under Time Warp is presented. The algorithm rests upon recent developments in active process migration, which permit the use of dynamic strategies. Dynamic load balancing allows for readjustments when resource requirements vary during simulation. It is also useful when initial resource predictions are unknown or incorrect. A simulated multiprocessor environment (PARALLEX) was developed in order to evaluate the algorithm. The results indicate that substantial performance gains may be realized with the algorithm View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology