By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 3 • Date March 2001

Filter Results

Displaying Results 1 - 7 of 7
  • Exploiting wavefront parallelism on large-scale shared-memory multiprocessors

    Page(s): 259 - 271
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1634 KB) |  | HTML iconHTML  

    Wavefront parallelism, in which parallelism is limited to hyperplanes in an iteration space, can arise when compilers apply tiling to loop nests to enhance locality. Previous approaches for scheduling wavefront parallelism focused on maximizing parallelism; balancing workloads, and reducing synchronization. In this paper, we show that on large-scale shared-memory multiprocessors, locality is a crucial factor. We make the distinction between intratile and intertile locality and show that as the number of processors grows, intertile locality becomes more important. We consider and experimentally evaluate existing strategies for scheduling wavefront parallelism. We show that dynamic self-scheduling can be efficiently used on a small number of processors, but performs poorly at large scale because it does not enhance intertile locality. By contrast, static scheduling strategies enhance intertile locality for small tiles, maintaining parallelism and resulting in better performance at large scale. Results from a Convex SPP1000 multiprocessor demonstrate the importance of taking intertile locality into account. Static scheduling outperforms dynamic self-scheduling by a factor of up to 2.3 on 30 processors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An improved generalization of mesh-connected computers with multiple buses

    Page(s): 293 - 305
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3100 KB) |  | HTML iconHTML  

    Mesh-connected computers (MCCs) are a class of important parallel architectures due to their simple and regular interconnections. However, their performances are restricted by their large diameters. Various augmenting mechanisms have been proposed to enhance the communication efficiency of MCCs. One major approach is to add nonconfigurable buses for improved broadcasting. A typical example is the mesh-connected computer with multiple buses (MMB). We propose a new class of generalized MMBs, the improved generalized MMBs (IMMBs). We compare IMMBs with MMBs and a class of previously proposed generalized MMBs (GMMBs). We show the power of IMMBs by considering semigroup and prefix computations. Specifically, as our main result we show that for any constant 0 View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient buffer memory system for subarray access

    Page(s): 316 - 335
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1100 KB) |  | HTML iconHTML  

    Many current graphical display systems utilize a buffer memory system to contain a two-dimensional image array to be modified and displayed. In order to speed up the update of the buffer memory system, it is required that the buffer memory system accesses many image points within an image subarray in parallel. This paper proposes an efficient buffer memory system for a fast and high-resolution graphical display system. The memory system provides parallel accesses to pq image points within a block(p×q), a horizontal (1×pq), a vertical (pq×1), a forward-diagonal, or a backward-diagonal subarray in a two-dimensional image array, M×N, where the design parameters p and q are all powers of two. In the address calculation and routing circuit of the proposed buffer memory system, the address differences of the five subarrays are prearranged according to the index numbers of memory modules and stored in two static random access memories (SRAMs), so that the address differences are simply added to the base address to obtain the addresses according to the index numbers of memory modules. In addition, for the fast address calculation, one single multiplication operation in the base address calculation is replaced by a SRAM access, so that the multiplication operation can be performed during the SRAM access for the address differences for the case when N is not a power of two. The address calculation and routing circuit proposed in this paper is improved in the hardware cost, the complexity of control, and the speed over the previous circuits View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An analytical model of adaptive wormhole routing in hypercubes in the presence of hot spot traffic

    Page(s): 283 - 292
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1688 KB) |  | HTML iconHTML  

    Analytical models of fully adaptive routing for common wormhole-routed networks (e.g., hypercubes) under the uniform traffic pattern have recently been reported in the literature. However, many studies have revealed that the performance advantages of adaptive routing over deterministic routing is more noticeable when the traffic is nonuniform due to, for example, the existence of hot spots in the network. This paper proposes a new queueing model of fully adaptive routing in the hypercube in the presence of hot spot traffic. The analysis focuses on Duato's algorithm, but can easily be applied to other fully adaptive routing algorithms. Results from simulation experiments are presented to validate the model View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive parallel rendering on multiprocessors and workstation clusters

    Page(s): 241 - 258
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5700 KB) |  | HTML iconHTML  

    This paper presents the design and performance of a new parallel graphics renderer for 3D images. This renderer is based on an adaptive supersampling approach that works for time/space-efficient execution on two classes of parallel computers. Our rendering scheme takes subpixel supersamples only along polygon edges. This leads to a significant reduction in rendering time and in buffer memory requirements. Furthermore, we offer a balanced rasterization of all transformed polygons. Experimental results prove these advantages on both a shared-memory SGI multiprocessor server and a Unix cluster of Sun workstations. We reveal performance effects of the new rendering scheme on subpixel resolution, polygon number, scene complexity, and memory requirements. The balanced parallel renderer demonstrates scalable performance with respect to increase in graphic complexity and in machine size. Our parallel renderer outperforms Crow's scheme in benchmark experiments performed. The improvements are made in three fronts: (1) reduction in rendering time, (2) higher efficiency with balanced workload,: and (3) adaptive to available buffer memory size. The balanced renderer can be more cost-effectively embedded within many 3D graphics algorithms, such as those for edge smoothing and 3D visualization. Our parallel renderer is MPI-coded, offering high portability and cross-platform performance. These advantages can greatly improve the QoS in 3D imaging and in real-time interactive graphics View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An optimal index reshuffle algorithm for multidimensional arrays and its applications for parallel architectures

    Page(s): 306 - 315
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1024 KB) |  | HTML iconHTML  

    Reshuffling elements of a multidimensional array according to an index operation traditionally requires an auxiliary buffer of the same size as the original array. We describe a new in-place algorithm using vacancy tracking cycles with minimum memory access which eliminates the buffer array and the related copy-back, speeding up the reshuffle significantly for large arrays. The algorithm can be parallelized using a multithread approach on shared-memory multiprocessor computers. On distributed-memory multiprocessor computers, the index reshuffle of distributed multidimensional arrays amounts to a remapping of processor domains and is carried out using the in-place local algorithm combined with a global exchange algorithm. Implementation and test results on CRAY T3E and IBM SP indicate the effectiveness of the algorithm View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Submesh determination in faulty tori and meshes

    Page(s): 272 - 282
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1104 KB) |  | HTML iconHTML  

    Torus/mesh-based machines have received increasing attention. It is natural to identify the maximum healthy submeshes in a faulty torus/mesh so as to lower potential performance degradation, because the time for executing a parallel algorithm tends to depend on the size of the assigned submesh. This paper proposes an efficient approach for identifying all the maximum healthy submeshes present in a faulty torus/mesh. The proposed approach is based on manipulating set expressions, with the search space reduced considerably by taking advantage of the interesting properties of a faulty torus/mesh. This procedure is a distributed one, because every healthy node performs the same procedure independently and concurrently. We show that the proposed scheme may outperform previous methods View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology