By Topic

Parallel and Distributed Systems, IEEE Transactions on

Issue 10 • Date Oct 1993

Filter Results

Displaying Results 1 - 10 of 10
  • A new graph approach to minimizing processor fragmentation in hypercube multiprocessors

    Publication Year: 1993 , Page(s): 1165 - 1171
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (672 KB)  

    The authors propose a new approach for subcube and noncubic processor allocations for hypercube multiprocessors. The main idea is to represent available processors in the system by means of a prime cube graph (PC-graph). The PC-graph maintains the inter-relationships between free subcubes and hence reduces both internal and external processor fragmentations. Their simulation results show that the PC-graph approach outperforms the existing allocation strategies by 25% to 50% under certain load conditions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A family of parallel prefix algorithms embedded in networks

    Publication Year: 1993 , Page(s): 1179 - 1184
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (580 KB)  

    This paper presents a family of algorithms for producing, from (υ0, υ1, ..., υn-1), all initial prefixes xi0θυ1 θ···θυi (i=0, 1, ..., n-1) in parallel in interconnection networks such as the omega network and the hypercube, where θ is an associative binary operator. Each algorithm can be embedded in the switches and interconnections of the network, and can be executed in O((log2 r+1) logr n) time steps provided that the network connecting n processors is constructed by using an r×r switch, and that parallelism within as well as among individual switches is exploited. The objective of these algorithms is to attain a communication pattern that fits the topology of the network. One type of network can be made equivalent to, or can be embedded in, another type of network, so a family of algorithms can be derived from one basic algorithm. In the basic algorithm, every processor pi upward multicasts υi to processors pk (k=i+1, i+2, ..., n - 1). En route to pi, υj (j=0, 1, ..., i - 1) are combined in the switches to produce the (i - 1)th initial prefix xi-1 that is received by pi, which can then compute the ith initial prefix xi=xi-1θυi View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving memory utilization in cache coherence directories

    Publication Year: 1993 , Page(s): 1130 - 1146
    Cited by:  Papers (3)  |  Patents (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1732 KB)  

    Efficiently maintaining cache coherence is a major problem in large-scale shared memory multiprocessors. Hardware directory coherence schemes have very high memory requirements, while software-directed schemes must rely on imprecise compile-time memory disambiguation. Recently proposed dynamically tagged directory schemes allocate pointers to blocks only as they are referenced, which significantly reduces their memory requirements, but they still allocate pointers to blocks that do not need them. The authors present two compiler optimizations that exploit the high-level sharing information available to the compiler to further reduce the size of a tagged directory by allocating pointers only when necessary. Trace-driven simulations are used to show that the performance of this combined hardware-software approach is comparable to other coherence schemes, but with significantly lower memory requirements. In addition, these simulations suggest that this approach is less sensitive to the quality of the memory disambiguation and interprocedural analysis performed by the compiler than software-only coherence schemes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An availability model for MIN-based multiprocessors

    Publication Year: 1993 , Page(s): 1118 - 1129
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1080 KB)  

    System decomposition is a novel technique for modeling the dependability of complex systems without constructing a single-level Markov Chain (MC). This is demonstrated in this paper for the availability computation of a class of multiprocessors that uses 4×4 switching elements for the multistage interconnection network (MIN). The availability model is known as task-based availability, where a system is considered operational as long as the task requirements are satisfied. The authors develop two simple MC's for the processors and memories and solve them using a software package, called HARP. The probabilities of i processing elements (PE's) and j memory modules (MM's) working at any time t, denoted as Pi(t) and Pj(t), are obtained from their corresponding MC's. The effect of the MIN is captured in the model by finding the number of switches required for the connection of i PE's and j MM's. A third MC is then developed for the switches to find the probability that the MIN provides the required (i×j) connection. Multiplying this term with Pi(t) and Pj(t), the probability of an (i×j) working group is obtained. The methodology is generalized to model arbitrary as well as larger size systems. Transient and steady state availabilities are computed for a variety of MIN configurations and the results are validated through simulation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • HARP: an open architecture for parallel matrix and signal processing

    Publication Year: 1993 , Page(s): 1081 - 1091
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1004 KB)  

    Describes and analyzes the Hybrid Array Ring Processor (HARP) architecture. The HARP is an application specific architecture built around a host processor, shared memory, and a set of memory mapped processing cells that are connected both into an open backplane and a bidirectional systolic ring. The architecture is analyzed through detailed simulation of a system implementation based on the Texas Instruments TMS34082 floating point RISC. A bus controller is designed that provides a tightly coupled DMA function that accelerates systolic communication and supports new interleaved transparent communications and reduced overhead message passing. The architecture is benchmarked with the matrix multiplication, FFT, QRD, and SVD algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal routing algorithm and the diameter of the cube-connected cycles

    Publication Year: 1993 , Page(s): 1172 - 1178
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (636 KB)  

    Communication between processors is one of the most important issues in parallel and distributed systems. The authors study the communication aspects of a well known multiprocessor structure, the cube-connected cycles (CCC). Only nonoptimal routing algorithms and bounds on the diameter of restricted subclasses of the CCC have been presented in earlier work. The authors present an optimal routing algorithm for the general CCC, with a formal proof of its optimality. Based on this routing algorithm, they derive the exact network diameter for the general CCC View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance analysis and scheduling of stochastic fork-join jobs in a multicomputer system

    Publication Year: 1993 , Page(s): 1147 - 1164
    Cited by:  Papers (12)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1324 KB)  

    The authors model a parallel processing system comprising several homogeneous computers interconnected by a communication network. Jobs arriving to this system have a linear fork-join structure. Each fork of the job gives rise to a random number of tasks that can be processed independently on any of the computers. Since exact analysis of fork-join models is known to be intractable, the authors resort to obtaining analytical bounds to the mean job response time of the fork-join job. For jobs with a single fork-join and, probabilistic allocation of tasks of the job to the N processors, they obtain upper and lower bounds to the mean job response time. Upper bounds are obtained using the concept of associated random variables and are found to be a good approximation to the mean job response time. A simple lower bound is obtained by neglecting queueing delays. They also find two lower bounds that include queueing delays. For multiple fork-join jobs, they study an approximation based on associated random variables. Finally, two versions of the join-the-shortest-queue (JSQ) allocation policy (i.e., JSQ by batch and JSQ by task) are studied and compared, via simulations and diffusion limits View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multicast communication in multicomputer networks

    Publication Year: 1993 , Page(s): 1105 - 1117
    Cited by:  Papers (68)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1124 KB)  

    Efficient routing of messages is a key to the performance of multicomputers. Multicast communication refers to the delivery of the same message from a source node to an arbitrary number of destination nodes. While multicast communication is highly demanded in many applications, most of the existing multicomputers do not directly support this service; rather it is indirectly supported by multiple one-to-one or broadcast communications, which result in more network traffic and a waste of system resources. The authors study routing evaluation criteria for multicast communication under different switching technologies. Multicast communication in multicomputers is formulated as a graph theoretical problem. Depending on the evaluation criteria and switching technologies, they study three optimal multicast communication problems, which are equivalent to the finding of the following three subgraphs: optimal multicast path, optimal multicast cycle, and minimal Steiner tree, where the interconnection of a multicomputer defines a host graph. They show that all these optimization problems are NP-complete for the popular 2D-mesh and hypercube host graphs. Heuristic multicast algorithms for these routing problems are proposed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • NETRA: a hierarchical and partitionable architecture for computer vision systems

    Publication Year: 1993 , Page(s): 1092 - 1104
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1196 KB)  

    Computer vision is regarded as one of the most complex and computationally intensive problems. In general, a Computer Vision System (CVS) attempts to relate scene(s) in terms of model(s). A typical CVS employs algorithms from a very broad spectrum such as numerical, image processing, graph algorithms, symbolic processing, and artificial intelligence. The authors present a multiprocessor architecture, called “NETRA,” for computer vision systems. NETRA is a highly flexible architecture. The topology of NETRA is recursively defined, and hence, is easily scalable from small to large systems. It is a hierarchical architecture with a tree-type control hierarchy. Its leaf nodes consists of a cluster of processors connected with a programmable crossbar with selective broadcast capability to provide the desired flexibility. The processors in clusters can operate in SIMD-, MIMD- or Systolic-like modes. Other features of the architecture include integration of limited data-driven computation within a primarily control flow mechanism, block-level control and data flow, decentralization of memory management functions, and hierarchical load balancing and scheduling capabilities. The paper also presents a qualitative evaluation and preliminary performance results of a cluster of NETRA View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal architectures and algorithms for mesh-connected parallel computers with separable row/column buses

    Publication Year: 1993 , Page(s): 1073 - 1080
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (716 KB)  

    A two-dimensional mesh of processing elements (PE's) with separable row and column buses (i.e., broadcast mechanisms for rows and columns that can be logically divided into a number of local buses through the use of PE-controlled switches) has been shown to be quite effective for semigroup computation, prefix computation, and a wide class of other computations that do not require excessive communication or data routing. For meshes with separable row/column buses, the authors show how semigroup and prefix computations can be performed with the same asymptotic time complexity without the provision of buses for every row and every column and discuss the VLSI implications of this new architecture View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David Bader
College of Computing
Georgia Institute of Technology