System Maintenance:
There may be intermittent impact on performance while updates are in progress. We apologize for the inconvenience.
By Topic

Computers, IEEE Transactions on

Issue 11 • Date Nov 1997

Filter Results

Displaying Results 1 - 10 of 10
  • Minimizing area cost of on-chip cache memories by caching address tags

    Publication Year: 1997 , Page(s): 1187 - 1201
    Cited by:  Papers (4)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (396 KB)  

    This paper presents a technique for minimizing chip-area cost of implementing an on-chip cache memory of microprocessors. The main idea of the technique is Caching Address Tags, or CAT cache, for short. The CAT cache exploits locality property that exists among addresses of memory references. By keeping only a limited number of distinct tags of cached data, rather than having as many tags as cache lines, the CAT cache can reduce the cost of implementing tag memory by an order of magnitude without noticeable performance difference from ordinary caches. Therefore, CAT represents another level of caching for cache memories. Simulation experiments are carried out to evaluate performance of CAT cache as compared to existing caches. Performance results of SPEC92 programs show that the CAT cache, with only a few tag entries, performs as well as ordinary caches, while chip-area saving is significant. Such area saving will increase as the address space of a processor increases. By allocating the saved chip-area for larger cache capacity, or more powerful functional units, CAT is expected to have a great impact on overall system performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hybrid CORDIC algorithms

    Publication Year: 1997 , Page(s): 1202 - 1207
    Cited by:  Papers (41)  |  Patents (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (276 KB)  

    Each coordinate rotation digital computer iteration selects the rotation direction by analyzing the results of the previous iteration. In this paper, we introduce two arctangent radices and show that about 2/3 of the rotation directions can be derived in parallel without any error. Some architectures exploiting these strategies are proposed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Concurrent error detection in nonlinear digital circuits using time-freeze linearization

    Publication Year: 1997 , Page(s): 1208 - 1218
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (380 KB)  

    Concurrent error detection in digital circuits is very important in applications where error in processed data can have catastrophic effects. Typically, error detection is performed by a small amount of additional hardware called the checking circuit. In the past, researchers have developed techniques for concurrent error detection in linear digital state variable circuits. In this paper, we investigate concurrent error detection techniques for nonlinear digital circuits that compute polynomial functions of multiple variables. Such circuits have widespread use in the design of various classes of nonlinear digital filters. The proposed error detection schemes are possible due to the use of a new linearization method called time-freeze linearization. In this method, a nonlinear circuit is modeled as a linear circuit for each individual time frame corresponding to the time taken to process a given set of input data. The defining parameters of this linear model change from one time frame to another but are regarded as fixed or frozen in any given time frame. This allows the use of real number checksum codes for fault detection. As opposed to duplicating the entire nonlinear part of the circuit, our approach allows us to use the nonlinear functions to drive the check circuitry, while achieving full fault coverage at low hardware cost View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Indexed BDDs: algorithmic advances in techniques to represent and verify Boolean functions

    Publication Year: 1997 , Page(s): 1230 - 1245
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (464 KB)  

    A new Boolean function representation scheme, the Indexed Binary Decision Diagram (IBDD), is proposed to provide a compact representation for functions whose Ordered Binary Decision Diagram (OBDD) representation is intractably large. We explain properties of IBDDs and present algorithms for constructing IBDDs from a given circuit. Practical and effective algorithms for satisfiability testing and equivalence checking of IBDDs, as well as their implementation results, are also presented. The results show that many functions, such as multipliers and the hidden-weighted-bit function, whose analysis is intractable using OBDDs, can be efficiently accomplished using IBDDs. We report efficient verification of Booth multipliers, as well as a practical strategy for polynomial time verification of some classes of unsigned array multipliers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analytical prediction of performance for cache coherence protocols

    Publication Year: 1997 , Page(s): 1155 - 1173
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (560 KB)  

    In this paper, we introduce new analytical models for predicting the performance of parallel applications under various cache coherence protocol assumptions. The purpose of these models is to determine which protocols are to be used for which data blocks, and, in the case of dynamic protocols, also to determine when to change protocols. Although we focus on tightly-coupled multiprocessor systems, similar models can be derived for loosely-coupled distributed systems, such as networks of workstations. Our models are unique in that they lie between a large body of theoretical models that assume independence and a uniform distribution of memory accesses across processors, and a large body of address-trace oriented models that assume the availability of a precise characterization of interleaving behavior of memory accesses. The former are not very realistic, and the latter are not suitable for compile-time and run-time usage. In contrast, our models enable us to choose different input parameters depending on how the models will be used and depending on the needed accuracy in performance prediction. We present the models and show how the required parameters can be obtained. We assess the accuracy of our models on 15 parallel applications. For these applications, our most complete model predicts performance within a 10 percent margin when compared to a simulation of a sequentially consistent multiprocessor system. As part of this study, we also show the potential advantage of using dynamic hybrid protocols View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Synthesis of hazard-free asynchronous circuits based on characteristic graph

    Publication Year: 1997 , Page(s): 1246 - 1263
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (380 KB)  

    To synthesize hazard-free asynchronous circuits from Signal Transition Graphs (STGs), we present a new Characteristic Graph (CG) to encapsulate all feasible solutions of the original STG in reduced size, which compares favorably with the state graph approach. Based on CG, we are able to explore the design space, as well as develop a necessary and sufficient condition for hazard-free realization on a predefined general circuit model, which has not yet been reported. The exact optimization for synthesis is shown to be NP hard. A heuristic method is thus proposed which results in efficient solutions while requiring very little CPU time View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Forward and inverse transformations between Haar spectra and ordered binary decision diagrams of Boolean functions

    Publication Year: 1997 , Page(s): 1272 - 1279
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (108 KB)  

    Unnormalized Haar spectra and Ordered Binary Decision Diagrams (OBDDs) are two standard representations of Boolean functions used in logic design. In this article, mutual relationships between those two representations have been derived. The method of calculating the Haar spectrum from OBDD has been presented. The decomposition of the Haar spectrum, in terms of the cofactors of Boolean functions, has been introduced. Based on the above decomposition, another method to synthesize OBDD directly from the Haar spectrum has been presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cellular automata for weighted random pattern generation

    Publication Year: 1997 , Page(s): 1219 - 1229
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (360 KB)  

    Fault testing random-pattern-resistant circuits requires that BIST (built-in self-test) techniques generate large numbers of pseudorandom patterns. To shorten these long test lengths, this study describes a cellular automata-based method that efficiently generates weighted pseudorandom BIST patterns. This structure, called a weighted cellular automaton (WCA), uses no external weighting logic. The design algorithm MWCARGO combines generation of the necessary weight sets and design of the WCA. In this study, WCA pattern generators designed by MWCARGO achieved 100 percent coverage of testable stuck-at faults for benchmark circuits with random-pattern-resistant faults. The WCA applies complete tests much faster than existing test-per-scan techniques. At the same time, the hardware overhead of WCA proves to be competitive with that of current test-per-clock schemes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compression-based program characterization for improving cache memory performance

    Publication Year: 1997 , Page(s): 1174 - 1186
    Cited by:  Papers (3)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (408 KB)  

    It is well known that compression and prediction are interrelated in that high compression implies good predictability, and vice versa. We use this correlation to find predictable properties of program behavior and apply them to appropriate cache management tasks. In particular, we look at two properties of program references: (1) Inter Reference Gaps: defined as the time interval between successive references to the same address by the processor, and (2) Cache Misses: references which access the next level of the memory hierarchy. Using compression, we show that these two properties are highly predictable and exploit them to improve Cache Replacement and Cache Prefetching, respectively. Using trace-driven simulations on SPEC and Dinero benchmarks, we demonstrate the performance of our predictive schemes, and compare them with other methods for doing the same. We show that, using our predictive replacement scheme, miss ratio in cache memories can be improved up to 43 percent over the well-known Least Recently Used (LRU) algorithm, which covers the gap between the LRU and the off-line optimal (MIN) miss ratios, by more than 84 percent. For cache prefetching, we show that our scheme eliminates up to 62 percent of the total misses in D-caches. An equivalent sequential prefetch scheme only removes up to 42 percent of the misses. For I-caches, our scheme performs almost the same as the sequential scheme and removes up to 78 percent of the misses View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Error analysis and reduction for angle calculation using the CORDIC algorithm

    Publication Year: 1997 , Page(s): 1264 - 1271
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (152 KB)  

    In this paper, we consider the errors appearing in angle computations with the CORDIC algorithm (circular and hyperbolic coordinate systems) using fixed-point arithmetic. We include errors arising not only from the finite number of iterations and the finite width of the data path, but also from the finite number of bits of the input. We show that this last contribution is significant when both operands are small and that the error is acceptable only if an input normalization stage is included, making unsatisfactory other previous proposals to reduce the error. We propose a method based on the prescaling of the input operands and a modified CORDIC recurrence and show that it is a suitable alternative to the input normalization with a smaller hardware cost. This solution can also be used in pipelined architectures with redundant carry-save arithmetic View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Paolo Montuschi
Politecnico di Torino
Dipartimento di Automatica e Informatica
Corso Duca degli Abruzzi 24 
10129 Torino - Italy
e-mail: pmo@computer.org