By Topic

Computers, IEEE Transactions on

Issue 11 • Date Nov 1988

Filter Results

Displaying Results 1 - 25 of 26
  • The distribution of waiting times in clocked multistage interconnection networks

    Page(s): 1337 - 1352
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (940 KB)  

    Analyzes the random delay experienced by a message traversing a buffered, multistage packet-switching banyan network. The authors find the generating function for the distribution of waiting time at the first stage of the network for a very general class of traffic, assuming messages have discrete sizes. For example, traffic can be uniform or nonuniform, messages can have different sizes, and messages can arrive in batches. For light-to-moderate loads, the authors conjecture that delays experienced at the various stages of the network are nearly the same and are nearly independent. This allows us to approximate the total delay distribution. Better approximations for the distribution of waiting times at later stages of the network are attained by assuming that in the limit a sort of spatial steady state is achieved. Extensive simulations confirm the formulas and conjectures View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Carry-free addition of recoded binary signed-digit numbers

    Page(s): 1470 - 1476
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (644 KB)  

    Signed-digital number representation systems have been defined for any radix r⩾3 with digit values ranging over the set {-α,···,-1,0,1,···, α}, where α is an arbitrary integer in the range r/2<α<r. Such number representation systems possess sufficient redundancy to allow for the annihilation of carry or borrow chains and hence result in fast, propagation-free addition and subtraction. The original definition of signed-digit arithmetic precludes the case of r=2 for which α cannot be selected in the proper range. Binary signed-digit numbers are known to allow limited-carry propagation with a somewhat more complex addition process. The author shows that a special `recorded' representation of binary signed-digit numbers not only allows for carry-free addition and borrow-free subtraction but also offers other important advantages for the practical implementation of arithmetic functions. The recoding itself is totally parallel and can be performed in constant time, independent of operand lengths. It is also shown that binary signed-digit numbers compare favorably to other redundant schemes such as stored-carry and higher radix signed-digit representations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The design of totally self-checking TMR fault-tolerant systems

    Page(s): 1450 - 1454
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (444 KB)  

    A totally self-checking triple modular redundancy (TSC-TMR) system consists of a conventional TMR system monitored by a TSC circuit with two outputs indicating information errors and internal faults. The internal fault indication is independent of the output information errors and indicates masked errors of modular units or faults in the monitoring circuit itself. The information error indication depends mainly on the output information errors and it can be used as a stop signal preventing the propagation of erroneous outputs. This scheme of fault-tolerant systems is very simple and reliable compared to the known TMR systems with self-checking capabilities and achieves a high degree of availability and maintainability. The TSC error checking circuit has been designed by applying a new algebraic technique with two basic operator blocks performing the AND and X¯O¯R¯ operations on error indication variables View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Broadcast normalization in systolic design

    Page(s): 1428 - 1434
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (496 KB)  

    When a sequential algorithm is directly mapped into an array of processing elements, quite likely data broadcasts are required and their source places vary during the computation. The authors introduce a normalization method to fix the positions of the broadcast sources so that the derived design can be further transformed by retimings into a systolic array. The method is fully illustrated in designing systolic arrays for enumeration sort, solving simultaneous linear equations, and transitive closure View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Connectivity of Imase and Itoh digraphs

    Page(s): 1459 - 1461
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (280 KB)  

    An important problem in the design of efficient interconnection networks consists of finding digraphs with a minimal diameter for a given number of nodes n and a given degree d. The best family known at present, denoted by G(n,d), has been proposed by Imase and Itoh, ibid., vol.C-32, p.782-4 (1983). Its vertex set is the set of integers modulo n and its arc set A is defined as A={(x,y)/y≡-dx-a , 1⩽ad}. The authors determine the connectivity of these digraphs, which proves that they are highly reliable. More precisely, we show that provided that the diameter is greater than 4, the connectivity of G(n,d) is d if n=k(d+1) and gcd(n,d)>1, and d-1 otherwise View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal chaining in expression trees

    Page(s): 1366 - 1374
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (804 KB)  

    Chaining is the ability to pipeline two or more vector instructions on Cray-1 like machines. The authors show how to optimally use this feature to compute (vector) expression trees in the context of automatic code generation. They present a linear time scheduling algorithm for finding an optimal order of evaluation for a machine with a bounded number of registers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance analysis of multistage interconnection networks with hierarchical requesting model

    Page(s): 1438 - 1442
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (472 KB)  

    Analyzes the performance of the multistage interconnection networks (MINs) for interconnecting N processors or N processors to N commonly shared memory modules in a multiprocessor system. A general model, called hierarchical requesting model, has been proposed. The performance of the MINs with respect to their memory bandwidth is analyzed and is compared to that of a crossbar under the proposed model. Based on the analytical results, the authors present a task allocation strategy to increase the memory bandwidth of the MINs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Transmission delays in hardware clock synchronization

    Page(s): 1465 - 1467
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (288 KB)  

    Various methods, both with software and hardware, have been proposed to synchronize a set of physical clocks in the system. Software methods are very flexible and economical but suffer an excessive time overhead, whereas hardware methods require no time overhead but are unable to handle transmission delays in clock signals. The effects of nonzero transmission delays in synchronization have been studied extensively in the communication area in the absence of malicious or Byzantine faults. The authors show that it is easy to incorporate the ideas from the communication area into the existing hardware clock synchronization algorithms in order to take into account the presence of both malicious faults and nonzero transmission delays View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A systolic array for the assignment problem

    Page(s): 1422 - 1425
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (284 KB)  

    A pure systolic realization of an algorithm for solving the n ×n assignment problem is presented. This systolic algorithm can be implemented on an homogeneous hexagonal processor array and requires O(n2) area complexity and O(n2) time complexity View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A distributed algorithm for fault diagnosis in systems with soft failures

    Page(s): 1476 - 1480
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (560 KB)  

    The problem of diagnosis of soft failures at the system level in large and fully distributed networks of processors (or units) is considered. A system model in which each of the network's units is assumed to possess the ability to test (or evaluate) certain other units for the presence of failures is employed. Using this model and assuming that the total number of faulty units does not exceed a given bound, a distributed algorithm is presented which allows all the fault-free units to independently converge to correct and consistent diagnoses of the system status. This algorithm is also shown to be applicable to bounded fault situations where both units and communication links can be faulty View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Two-phase deadlock detection algorithm

    Page(s): 1454 - 1458
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (448 KB)  

    A deadlock detection algorithm utilizing a transaction-wait-for (TWF) graph is presented. It is a fully distributed algorithm which allows multiple outstanding requests. The proposed algorithm can achieve improved overall performance, using multiple disjoint controllers coupled with the two-phase property, while maintaining the simplicity of centralized schemes. The detection step is divided into two phases. Phase 1 analyzes the conditions of the system of interacting transactions, involving phase 2 only if conditions are possible for deadlocks to occur. Phase 2 performs the actual cycle detection. The proposed algorithm can be used in transaction-based distributed processing systems. Some results on the complexity of the algorithm are given View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interstitial redundancy: an area efficient fault tolerance scheme for large area VLSI processor arrays

    Page(s): 1398 - 1410
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1032 KB)  

    In the proposed scheme, spare PEs are located at interstitial sites within the array. Each spare can functionally replace any one of the neighboring primary PEs that are connected to it. Because spares are physically close to the PE that they replace, restructured interconnections are short, minimizing performance degradation. This structure can incorporate different levels of redundancy depending on how many of the interstitial sites are used to locate spares, and also how many spares are placed at each site. The author gives a polynomial time algorithm for assigning operational spares to failed primary PEs. He also gives area efficient layouts for such structures, and designs for implementing the switching network needed for reconfiguration. A procedure for deciding the optimum level of redundancy so as to maximize chip area utilization is also shown. The main attractive features of interstitial redundancy are short (fixed length) PE interconnections and high utilization of failure-free PEs. The analysis shows that for a wide range of array sizes and PE survival probabilities, 45-55 percent utilization of failure-free PEs on the chip can be achieved View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accurate low-cost methods for performance evaluation of cache memory systems

    Page(s): 1325 - 1336
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (944 KB)  

    Trace-driven simulation is a simple way of evaluating cache memory systems with varying hardware parameters. But to evaluate realistic workloads, simulating even a few million addresses is not adequate and such large scale simulation is impractical from the consideration of space and time requirements. New methods of simulation based on statistical techniques are proposed for decreasing the need for large trace measurements and for predicting true program behavior. In the method, sampling techniques are applied while collecting the address trace from a workload. This drastically reduces the space and time needed to collect the trace. New simulation techniques are developed to use the sample data not only to predict the mean miss rate of the cache, but also to provide an empirical estimate of its actual distribution. Finally, a new concept of primed cache is introduced to simulate large caches by the sampling-based method View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-speed CAM-based architecture for a Prolog machine (ASCA)

    Page(s): 1375 - 1383
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (768 KB)  

    A content addressable memory (CAM)-based machine architecture is proposed for a high-speed Prolog machine. This Prolog machine attempts to speed up the total Prolog execution performance by using a hierarchical pipelined scheme and a CAM-based backtracking scheme. The hierarchical pipelined scheme reduces the total number of Prolog execution steps to half of that using the conventional method. The CAM-based backtracking is efficiently and quickly achieved by using CAM's sophisticated garbage collection function, which eliminates the need for stacks and additional operation cycles. In this machine, all Prolog execution can be simply controlled by a semantic information `inference depth' without any address handling by storing all working information, binding and control information, in CAMs. This machine attains a performance of 100 KLIPS (kilo logical inference per second) on the deterministic append program in the interpretive mode, and also attains high performance in the nondeterministic program View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the diagnosability of multicomputer systems with homogeneous and incomplete tests

    Page(s): 1419 - 1421
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (428 KB)  

    A generalized PMC model, where only a subset of possible unit faults (the same for each unit) can be detected by unit tests, is considered. A category of visible faults is introduced and the necessary and sufficient condition for the new class of tv-diagnosable systems is determined. Moreover, an O(|T|) diagnosis algorithm is presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Relationship between P-valued majority functions and P -valued threshold functions

    Page(s): 1442 - 1445
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (316 KB)  

    In a previous paper (Trans. IECE Japan, vol.J63-D, p.493-500, 1980, vol.J64-D, p.172-3, 1981 and vol.E-67, p.47-8, 1984), the authors defined a new class of multiple-valued logic functions, called multiple-valued majority functions. The authors clarifies the distinction of multiple-valued majority functions from multiple-valued threshold functions through the difference between a number function and an inner product of an input vector and a weight vector View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A multiple fault-tolerant processor network architecture for pipeline computing

    Page(s): 1414 - 1418
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (440 KB)  

    Certain fault-tolerant multiprocessor networks that can emulate linear array interconnections are considered. The system is fault tolerant of (m-1) node and link failures. One of the particularly attractive features of this network is that it allows for a linear array structure starting with any node even in spite of (m -2) faults. The configuration algorithm is fully distributed, and is performed on the basis of test results obtained from nonfaulty processors only. A simple fault identification procedure is developed using the above routing algorithm View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The extra stage Gamma network

    Page(s): 1445 - 1450
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (424 KB)  

    The augmented data manipulator (ADM), inverse augmented data manipulator (IADM), and the gamma network are based on the plus-minus-2 i connection patterns. In such a network, there exists multiple paths to connect a source S to a destination D except when S=D. The number of paths for ( S,D) is a function of the tag value (D-S ) modulo N, and the size of the network N. It is shown that by adding an extra stage to the PM2I interconnection network, multiple paths are provided for all the tag values including 0. The extra stage can be any stage out of n=log2N stages of the original network. The analyses on the distribution of the number of paths for various tag values are performed for n possible choices of the extra stage. It is shown that the extra stage of 0, +1, -1 connection patterns gives the most uniform distribution, and also results in a 1-fault tolerant interconnection network View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Test pattern generation for API faults in RAM

    Page(s): 1426 - 1428
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (268 KB)  

    The algorithm for detecting pattern-sensitive faults in memories, as presented by K.K. Salnja, K. Kinoshita (ibid., vol.34, no.3, p.284-7, 1985), is simplified. In addition, a new algorithm is presented which has a near optimal WRITE sequence View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance prediction and calibration for a class of multiprocessors

    Page(s): 1353 - 1365
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1020 KB)  

    A model for predicting multiprocessor performance on iterative algorithms is developed. Each iteration consists of some amount of access to global data and some amount of local processing. The iterations may be synchronous or asynchronous, and the processors may or may not incur waiting time, depending on the relationship between the access time and processing time. The effect on performance of the speed of the processor, memory, and the interconnection network is studied. The model also illustrates the significant impact on performance of decomposing an algorithm into parallel processes. The model's predictions are calibrated with experimental measurements View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault-tolerant matrix triangularizations on systolic arrays

    Page(s): 1434 - 1438
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (380 KB)  

    Examines the checksum methods of Abraham et al. for LU decomposition on multiprocessor arrays. Their methods are efficient for detecting a transient error, but expensive for correcting it due to the need for a computation rollback. The authors show how to avoid the rollback by using matrix updating techniques, and they introduce new checksum methods for Gaussian elimination with pairwise pivoting and for QR decomposition on systolic arrays View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Binary decision tree test functions

    Page(s): 1461 - 1465
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (432 KB)  

    A class of multivariable logic functions which are suitable for use as binary decision tree node test functions is considered. These functions can be regarded as a natural extension of the polarity test normally used. The properties of these functions are discussed. A method for obtaining a reduced but not necessarily optimal tree is presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On functional testing of array processors

    Page(s): 1480 - 1484
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (416 KB)  

    This correspondence presents a new testing method for single instruction multiple data (SIMD) VLSI arrays. A new fault model is presented. Faults are defined at the functional level. A systematic test generation procedure is derived. Testing is performed by sequences of instructions. Two criteria are used. The first criterion establishes the external observability and controllability of the instructions. The second criterion uses instruction cardinality as a metric of instruction complexity. An example of the application of the proposed technique to an existing parallel scheme is described View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new class of fault-tolerant static interconnection networks

    Page(s): 1468 - 1470
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (300 KB)  

    The authors present a new class of interconnection networks using combinatorial block designs. These networks are highly structured and have strong fault-tolerant properties. They also have a free parameter that allows tradeoffs to be made between performance and cost in a fairly continuous way View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Heuristic algorithms for task assignment in distributed systems

    Page(s): 1384 - 1397
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1192 KB)  

    Investigate the problem of static task assignment in distributed computing systems, i.e. given a set of k communicating tasks to be executed on a distributed system of n processors, to which processor should each task be assigned? The author proposes a family of heuristic algorithms for Stone's classic model of communicating tasks whose goal is the minimization of the total execution and communication costs incurred by an assignment. In addition, she augments this model to include interference costs which reflect the degree of incompatibility between two tasks. Whereas high communication costs serve as a force of attraction between tasks, causing them to be assigned to the same processor, interference costs serve as a force of repulsion between tasks, causing them to be distributed over many processors. The inclusion of interference costs in the model yields assignments with greater concurrency, thus overcoming the tendency of Stone's model to assign all tasks to one or a few processors. Simulation results show that the algorithms perform well and in particular, that the highly efficient Simple Greedy Algorithm performs almost as well as more complex heuristic algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Albert Y. Zomaya
School of Information Technologies
Building J12
The University of Sydney
Sydney, NSW 2006, Australia
http://www.cs.usyd.edu.au/~zomaya
albert.zomaya@sydney.edu.au