System Maintenance:
There may be intermittent impact on performance while updates are in progress. We apologize for the inconvenience.
By Topic

Computers, IEEE Transactions on

Issue 5 • Date May 1995

Filter Results

Displaying Results 1 - 17 of 17
  • Efficient stack simulation for set-associative virtual address caches with real tags

    Publication Year: 1995 , Page(s): 719 - 723
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (496 KB)  

    Stack simulation is a powerful cache analysis approach to generate the number of misses and write backs for various cache configurations in a single run. Unfortunately, none of the previous work on stack simulation has efficient stack algorithm for virtual address caches with real tags (VIR-type caches). In this paper, we devise an efficient stack simulation algorithm for analyzing VIR-type caches. Using markers with a valid range for synonym lines, our algorithm is able to keep track of stack distances for different cache configurations. In addition to cache miss ratios and write back ratios, our approach generates pseudonym frequency for all cache configurations under investigation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Equivalence proofs of some yield modeling methods for defect-tolerant integrated circuits

    Publication Year: 1995 , Page(s): 724 - 728
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (352 KB)  

    In this paper, two equivalence proofs of yield modeling methods for defect-tolerant integrated circuits (ICs) are presented. These proofs are generalizations of those found in Koren and Stapper (1989); one of the proofs presented in this paper is valid for any defect-tolerant IC, while the other one is valid for defect-tolerant ICs with two levels of hierarchy View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved digital signature algorithm

    Publication Year: 1995 , Page(s): 729 - 730
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (172 KB)  

    A digital signature algorithm is developed which is an improved version of the digital signature algorithm (DSA) proposed by the NIST (1991). The security of the improved version is the same as the original one while it benefits the signature signer and performs more efficiently View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Delay-insensitive pipelined communication on parallel buses

    Publication Year: 1995 , Page(s): 660 - 668
    Cited by:  Papers (12)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (696 KB)  

    Consider a communication channel that consists of several subchannels transmitting simultaneously and asynchronously. As an example of this scheme, we can consider a board with several chips. The subchannels represent wires connecting between the chips where differences in the lengths of the wires might result in asynchronous reception. In current technology, the receiver acknowledges reception of the message before the transmitter sends the following message. Namely, pipelined utilization of the channel is not possible. Our main contribution is a scheme that enables transmission without an acknowledgment of the message, therefore enabling pipelined communication and providing a higher bandwidth. However, our scheme allows for a certain number of transitions from a second message to arrive before reception of the current message has been completed, a condition that we call skew. We have derived necessary and sufficient conditions for codes that can tolerate a certain amount of skew among adjacent messages (therefore, allowing for continuous operation) and detect a larger amount of skew when the original skew is exceeded. These results generalize previously known results. We have constructed codes that satisfy the necessary and sufficient conditions, studied their optimality, and devised efficient decoding algorithms. To the best of our knowledge, this is the first known scheme that permits efficient asynchronous communications without acknowledgment. Potential applications are in on-chip, on-board, and board to board communications, enabling much higher communication bandwidth View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Conflict-free access for streams in multimodule memories

    Publication Year: 1995 , Page(s): 634 - 646
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1036 KB)  

    Address transformation schemes, such as skewing and linear transformations, have been proposed to achieve conflict-free access for streams with constant stride. However, this is achieved only for some strides. In this paper, we extend these schemes to achieve this conflict-free access for a larger number of strides. The basic idea is to perform an out-of-order access to a stream of fixed length. This stream is then stored in a local memory and used in subsequent instructions. This mode of operation is suitable for vector processors and for processors with decoupled access. The scheme and mode of operation proposed produce the largest possible number of conflict-free strides. Memory systems with any ratio between the number of memory modules and memory latency are considered. The hardware for address calculations and access control is described and shown to be of similar complexity as that required for access in order View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive unanimous voting (UV) scheme for distributed self-diagnosis

    Publication Year: 1995 , Page(s): 730 - 735
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (412 KB)  

    Distributed self-diagnosis approach proposed for multiprocessor systems is also effective for integrated circuit wafers containing a number of identical circuits. Here the testing of each node is based on the majority voting on the test results from itself and neighboring nodes. In this paper, we identify that the unanimous voting (UV) approach always outperforms the individual voting (IV) approach, irrespective of the number of voting cells and fault rate. Based on the UV approach, the optimal number of tests is obtained. We also introduce an adaptive voting scheme by which the test overhead of the traditional voting schemes can be significantly reduced View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unidirectional bit/byte error control

    Publication Year: 1995 , Page(s): 710 - 714
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (448 KB)  

    This paper defines a new class of unidirectional errors, named t/1-unidirectional errors, which affect at most t bits confined to at most t bytes of the code word. Codes that are capable of detecting, locating and correcting t/1-unidirectional errors are presented. Lower bounds on the number of checkbits required for t/1-unidirectional error detection and location are also presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 3D skewing and de-skewing scheme for conflict-free access to rays in volume rendering

    Publication Year: 1995 , Page(s): 707 - 710
    Cited by:  Papers (3)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (376 KB)  

    We extend a 2D linear skewed memory organization to 3D and introduce the associated de-skewing scheme designed to provide conflict-free access to projection rays of voxels for use in a volume rendering architecture. This is an application of a 3D linear skewing scheme which supports real-time axonometric projection from 26 primary orientations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal realization of any BPC permutation on K-extra-stage Omega networks

    Publication Year: 1995 , Page(s): 714 - 719
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (516 KB)  

    An N×N k-Omega network is obtained by adding k more stages in front of an Omega network. An N-permutation defines a bijection between the set of N sources and the set of N destinations. Such a permutation is said to be admissible to a k-Omega if N conflict-free paths, one for each source-destination pair defined by the permutation, can be established simultaneously. When an N-permutation is not admissible, it is desirable to divide the N pairs into a minimum number of groups (passes) such that the conflict-free paths can be established for the pairs id each group. Raghavendra and Varma solved this problem for BPC (Bit Permutation Complement) permutations on an Omega without extra stage. This paper generalizes their result to a k-Omega where k can be any integer between 0 and n-1. An O(NlgN) algorithm is given which realizes any BPC permutation in a minimum number of passes on a k-Omega (0⩽k⩽n-1) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of space-optimal regular arrays for algorithms with linear schedules

    Publication Year: 1995 , Page(s): 683 - 694
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (896 KB)  

    The problem of designing space-optimal 2D regular arrays for N×N×N cubical mesh algorithms with linear schedule ai+bj+ck, 1⩽a⩽b⩽c, and N=nc, is studied. Three novel nonlinear processor allocation methods, each of which works by combining a partitioning technique (gcd-partition) with different nonlinear processor allocation procedures (traces), are proposed to handle different cases. In cases where a+b⩽c, which are dealt with by the first processor allocation method, space-optimal designs can always he obtained in which the number of processing elements is equal to N2 /c. For other cases where a+b>c and either a=b and b=c, two other optimal processor allocation methods are proposed. Besides, the closed form expressions for the optimal number of processing elements are derived for these cases View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Safety levels-an efficient mechanism for achieving reliable broadcasting in hypercubes

    Publication Year: 1995 , Page(s): 702 - 706
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (460 KB)  

    We consider a distributed broadcasting algorithm for injured hypercubes using incomplete spanning binomial trees. An injured hypercube is a connected hypercube with faulty nodes. The incomplete spanning binomial tree proposed in this paper is a useful structure for implementing broadcasting in injured hypercubes. It is defined as a sub-tree of a regular spanning binomial tree that connects all the nonfaulty nodes. We show that in an injured n-dimensional hypercube with m faulty nodes, there are at least 2n-2m source nodes (called l-nodes), each of which can generate an incomplete spanning binomial tree. A method is proposed to locate a large subset of the l-node set using the concept of safety level. The safety level of each node in an n-dimensional hypercube can be easily calculated through n-1 rounds of information exchange among neighboring nodes. An optimal broadcast initiated from a safe node is proposed. When a nonfaulty source node is unsafe and there are at most n-1 faulty nodes in an injured n-dimensional hypercube, the proposed broadcasting scheme requires at most n+1 steps View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Frames: a simple characterization of permutations realized by frequently used networks

    Publication Year: 1995 , Page(s): 695 - 697
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (296 KB)  

    Rearrangeable multistage networks such as the Benes network realize any permutation, yet their routing algorithms are not cost-effective. On the other hand, there exist inexpensive routing algorithms for nonrearrangeable networks, but no simple technique exists to characterize all the permutations realized on these networks. This paper introduces the concept of frame and shows how it can be used to characterize all the permutations realized on various multistage interconnection networks. They include subnetworks of baseline, Benes, and cascaded baseline and shuffle-exchange networks View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A coordinated location policy for load sharing in hypercube-connected multicomputers

    Publication Year: 1995 , Page(s): 669 - 682
    Cited by:  Papers (7)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1204 KB)  

    Uneven task arrivals in a hypercube-connected multicomputer may temporarily overload some nodes while leaving others underloaded. This problem can be solved or alleviated by load sharing (LS); that is, some of the tasks arriving at overloaded nodes, called overflow tasks, are transferred to underloaded nodes. One important issue in LS is to locate underloaded nodes to which the overflow tasks can be transferred. This is termed the location policy. Any efficient location policy should distribute the overflow tasks to the entire system instead of `dumping' them on a few underloaded nodes. To reduce the overhead for collecting state information and transferring tasks, each node is required to maintain the state information of only those nodes in its proximity, called a buddy set. Several location policies-random probing, random selection, preferred lists, and bidding algorithm-are analyzed and compared for hypercube-connected multicomputer systems. Under the random-selection and preferred-list policies, an overloaded node can select, without probing other nodes, an underloaded node within its buddy set, while under the random probing policy and the bidding algorithm the overloaded node needs to probe other nodes before transferring the overflow task. Task collision(s) is said to occur if two or more overflow tasks are transferred (almost) simultaneously to the same underloaded node. The performances of these location policies are analyzed and compared in terms of the average number of task collisions. Our analysis shows that use of preferred lists allows the overflow tasks to be shared more evenly throughout the entire hypercube than the other two location policies View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal 2-bit branch predictors

    Publication Year: 1995 , Page(s): 698 - 702
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (448 KB)  

    This paper presents an efficient technique to analyze finite-state machines to determine an optimal one for branch prediction. It also presents results from using this technique to determine optimal 4-state branch predictors for applications in the SPECS9 benchmark suite running on the IBM RS/6000. The paper concludes that the simple 2-bit counter is the only machine that performs consistently well and close to the optimal over all applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Mobius cubes

    Publication Year: 1995 , Page(s): 647 - 659
    Cited by:  Papers (69)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1032 KB)  

    The Mobius cubes are hypercube variants that give better performance with the same number of links and processors. We show that the diameter of the Mobius cubes is about one half the diameter of the equivalent hypercube, and that the average number of steps between processors for a Mobius cube is about two-thirds of the average for a hypercube. We give an efficient routing algorithm for the Mobius cubes. This routing algorithm finds a shortest path and operates in time proportional to the dimension of the cube. We also give efficient broadcast algorithms for the Mobius cubes. We show that the Mobius cubes contain ring networks and other networks. We report results of simulation studies on the dynamic message-passing performance of the hypercube, the Twisted Cube of P.A.J. Hilbers et al. (1987), and the Mobius cubes. Our results are in agreement with S. Abraham (1990), showing that the Twisted Cube has worse dynamic performance than the hypercube, but our results show that the 1-Mobius cube has dynamic performance superior to that of the hypercube. This contradicts current literature, which implies that twisted cube variants will have worse dynamic performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast combinatorial RNS processors for DSP applications

    Publication Year: 1995 , Page(s): 624 - 633
    Cited by:  Papers (36)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (848 KB)  

    It is known that RNS VLSI processors can parallelize fixed-point addition and multiplication operations by the use of the Chinese remainder theorem (CRT). The required modular operations, however, must use specialized hardware whose design and implementation can create several problems. In this paper a modified residue arithmetic, called pseudo-RNS is introduced in order to alleviate some of the RNS problems when digital signal processing (DSP) structures are implemented. Pseudo-RNS requires only the use of modified binary processors and exhibits a speed performance comparable with other RNS traditional approaches. Some applications of the pseudo-RNS to common DSP architectures, such as multipliers and filters, are also presented in this paper. They are compared in terms of the area-time square product versus other RNS and weighted binary structures. It is proven that existing combinatorial or look-up table approaches for RNS are tailored to small designs or special applications, while the pseudo-RNS approach remains competitive also for complex systems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effective hardware-based data prefetching for high-performance processors

    Publication Year: 1995 , Page(s): 609 - 623
    Cited by:  Papers (120)  |  Patents (19)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1408 KB)  

    Memory latency and bandwidth are progressing at a much slower pace than processor performance. In this paper, we describe and evaluate the performance of three variations of a hardware function unit whose goal is to assist a data cache in prefetching data accesses so that memory latency is hidden as often as possible. The basic idea of the prefetching scheme is to keep track of data access patterns in a reference prediction table (RPT) organized as an instruction cache. The three designs differ mostly on the timing of the prefetching. In the simplest scheme (basic), prefetches can be generated one iteration ahead of actual use. The lookahead variation takes advantage of a lookahead program counter that ideally stays one memory latency time ahead of the real program counter and that is used as the control mechanism to generate the prefetches. Finally the correlated scheme uses a more sophisticated design to detect patterns across loop levels. These designs are evaluated by simulating the ten SPEC benchmarks on a cycle-by-cycle basis. The results show that 1) the three hardware prefetching schemes all yield significant reductions in the data access penalty when compared with regular caches, 2) the benefits are greater when the hardware assist augments small on-chip caches, and 3) the lookahead scheme is the preferred one cost-performance wise View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Paolo Montuschi
Politecnico di Torino
Dipartimento di Automatica e Informatica
Corso Duca degli Abruzzi 24 
10129 Torino - Italy
e-mail: pmo@computer.org