By Topic

Computers, IEEE Transactions on

Issue 1 • Date Jan. 2008

Filter Results

Displaying Results 1 - 16 of 16
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (121 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (76 KB)  
    Freely Available from IEEE
  • State of the Journal

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | PDF file iconPDF (802 KB)  
    Freely Available from IEEE
  • On-Demand Solution to Minimize I-Cache Leakage Energy with Maintaining Performance

    Page(s): 7 - 24
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9619 KB) |  | HTML iconHTML  

    This paper describes a new on-demand wake-up prediction policy for reducing leakage power. The key insight is that branch prediction can be used to selectively wake up only the needed cache line. This achieves better leakage savings than the best prior policies while avoiding the performance overheads of those policies, without needing an extra prediction structure. The proposed policy reduces leakage energy by 92.7 percent with only 0.08 percent performance overhead on average. The branch-prediction-based approach requires an extra pipeline stage for wake up, which adds to the branch misprediction penalty. Fortunately, this cost is mitigated because the extra wake-up stage is overlapped with misprediction recovery. This paper assumes the superdrowsy leakage control technique using reduced supply voltage because it is well suited to the instruction cache's criticality. However, the proposed policy can be also applied to other leakage-saving circuit techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • RACE: A Robust Adaptive Caching Strategy for Buffer Cache

    Page(s): 25 - 40
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6621 KB) |  | HTML iconHTML  

    Although many block replacement algorithms for buffer caches have been proposed to address the well-known drawbacks of the LRU algorithm, they are not robust and cannot maintain a consistent performance improvement over all workloads. This paper proposes a novel and simple replacement scheme, called the Robust Adaptive buffer Cache management schemE (RACE), which differentiates the locality of I/O streams by actively detecting access patterns that are inherently exhibited in two correlated spaces, that is, the discrete block space of program contexts from which I/O requests are issued and the continuous block space within files to which I/O requests are addressed. This scheme combines the global I/O regularities of an application and the local I/O regularities of individual files that are accessed in that application to accurately estimate the locality strength, which is crucial in deciding which blocks are to be replaced upon a cache miss. Through comprehensive simulations on 10 real-application traces, RACE is shown to have higher hit ratios than LRU and all other state-of-the-art cache management schemes studied in this paper. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Memory Data Flow Modeling in Statistical Simulation for the Efficient Exploration of Microprocessor Design Spaces

    Page(s): 41 - 54
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (238 KB) |  | HTML iconHTML  

    Microprocessor design is both complex and time consuming: exploring a huge design space for identifying the optimal design under a number of constraints is infeasible using detailed architectural simulation of entire benchmark executions. Statistical simulation is a recently introduced approach for efficiently culling the microprocessor design space. The basic idea of statistical simulation is to collect a number of important program characteristics and to generate a synthetic trace from it. Simulating this synthetic trace is extremely fast as it contains only a million instructions. This paper improves the statistical simulation methodology by proposing accurate memory data flow models. We propose 1) cache miss correlation, or measuring cache statistics conditionally dependent on the global cache hit/miss history, for modeling cache miss patterns and memory-level parallelism, 2) cache line reuse distributions for modeling accesses to outstanding cache lines, and 3) through-memory read-after-write dependency distributions for modeling load forwarding and bypassing. Our experiments using the SPEC CPU2000 benchmarks show substantial improvements compared to current state-of-the-art statistical simulation methods. For example, for our baseline configuration, we reduce the average instructions per cycle (IPC) prediction error from 10.9 to 2.1 percent; the maximum error observed equals 5.8 percent. In addition, we show that performance trends are predicted very accurately, making statistical simulation enhanced with accurate data flow models a useful tool for efficient and accurate microprocessor design space explorations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Self-Adaptive Configuration of Visualization Pipeline Over Wide-Area Networks

    Page(s): 55 - 68
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (366 KB) |  | HTML iconHTML  

    Next-generation scientific applications require the capability to visualize large archival data sets or on-going computer simulations of physical and other phenomena over wide-area network connections. To minimize the latency in interactive visualizations across wide-area networks, we propose an approach that adaptively decomposes and maps the visualization pipeline onto a set of strategically selected network nodes. This scheme is realized by grouping the modules that implement visualization and networking subtasks and mapping them onto computing nodes with possibly disparate computing capabilities and network connections. Using estimates for communication and processing times of subtasks, we present a polynomial-time algorithm to compute a decomposition and mapping to achieve minimum end-to-end delay of the visualization pipeline. We present experimental results using geographically distributed deployments to demonstrate the effectiveness of this method in visualizing data sets from three application domains. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies

    Page(s): 69 - 81
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4025 KB) |  | HTML iconHTML  

    Performance and power act as opposing constraints for the optimal pipeline depth of a processor. Although increasing the pipeline depth may enable performance improvement, the higher clock speed associated with a deeper pipeline also increases the power dissipation. Previous papers have shown that the optimal pipeline depth for superscalars considering both power and performance is 18 to 20 fan-out-of-four (FO4) inverter delays. As simultaneous multithreading (SMT) becomes increasingly important for modern high-end processors, there is a need to quantify the optimal power-performance pipeline depth for SMT. Although previous work has shown that SMT retains the performance-optimal pipeline depth in near-future technologies, this result does not take power into account. The intricate interplay between the relative impacts of changing pipeline depth on power and performance makes it difficult to predict the scaling trends for optimal SMT pipeline depths considering both power and performance. Using simulations, we quantify the optimal SMT pipeline depths based on the well-known power-performance metric PD3. Our analysis is novel and provides the following key results about the scaling trends for SMT pipelines considering both power and performance: 1) SMT has a deeper PD3-optimal pipeline as compared to superscalar. 2) The PD3-optimal SMT pipeline depth increases with an increase in the number of programs. 3) The PD3-optimal SMT pipeline becomes shallower with technology for a given number of programs. Based on these results, we provide the following insights into SMT designs for future technologies: 1) To retain the PD3-optimal pipeline depth across technology generations while being energy-efficient, the number of programs running on an SMT must increase. 2) To maintain a constant power dissipation across technology generations, SMT pipelines must become shallower. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Predicting and Exploiting Transient Values for Reducing Register File Pressure and Energy Consumption

    Page(s): 82 - 95
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (795 KB) |  | HTML iconHTML  

    High-performance microprocessors use large, heavily ported physical register files (RFs) to increase the instruction throughput. The high complexity and power dissipation of such RFs mainly stem from the need to maintain each and every result for a large number of cycles after the result generation. We observed that a significant fraction (about 45 percent) of the result values are never read from the register file and are not required to reconstruct the precise state following branch mispredictions. In this paper, we propose Speculative Avoidance of Register allocations to Transient values (SPARTAN) - a set of microarchitectural extensions that predicts such transient values and, in many cases, completely avoids physical register allocations to them. We show that the transient values can be predicted as such with more than 97 percent accuracy, on average, across simulated SPEC 2000 benchmarks. We evaluate the performance of SPARTAN on a variety of configurations and show that significant improvements in performance and energy efficiency can be realized. Furthermore, we directly compare SPARTAN against a number of previously proposed schemes for register optimizations and show that our technique significantly outperforms all those schemes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Minimum Deadline Calculation for Periodic Real-Time Tasks in Dynamic Priority Systems

    Page(s): 96 - 109
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (684 KB) |  | HTML iconHTML  

    Real-time systems are often designed using a set of periodic tasks. Task periods are usually set by the system requirements, but deadlines and computation times can be modified in order to improve system performance. Sensitivity analysis in real-time systems has focused on changes in task computation times using fixed priority analysis. Only a few studies deal with the modification of deadlines in dynamic-priority scheduling. The aim of this work is to provide a sensitivity analysis for task deadlines in the context of dynamic-priority, preemptive, uniprocessor scheduling. In this paper, we present a deadline minimization method that computes the shortest deadline of a periodic task. As undertaken in other studies concerning computation times, we also define and calculate the critical scaling factor for task deadlines. Our proposal is evaluated and compared with other works. The deadline minimization proposed strongly reduces jitter and the response time of control tasks, which can lead to a significant improvement in system performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Prefix Updates for IP Router Using Lexicographic Ordering and Updatable Address Set

    Page(s): 110 - 125
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6134 KB) |  | HTML iconHTML  

    Dynamic IP router table schemes, which have recently been proposed in the literature, perform an IP lookup or an online prefix update in O(log2|T|) memory accesses (MAs). In terms of lookup time, they are still slower than the full expansion/compression (FEC) scheme (compressed next-hop array/code word array (CNHA/CWA)), which requires exactly (at most) three MAs, irrespective of the number of prefixes |T| in a routing table T. The prefix updates in both FEC and CNHA/CWA have a drawback: Inefficient offline structure reconstruction is arguably the only viable solution. This paper solves the problem. We propose the use of lexicographic ordered prefixes to reduce the offline construction time of both schemes. Simulations on several real routing databases, run on the same platform, show that our approach constructs FEC (CNHA/CWA) tables in 2.68 to 7.54 (4.57 to 6) times faster than that from previous techniques. We also propose an online update scheme that, using an updatable address set and selectively decompressing the FEC and CNHA/CWA structures, modifies only the next hops of the addresses in the set. Recompressing the updated structures, the resulting forwarding tables are identical to those obtained by structure reconstructions, but are obtained at much lower computational cost. Our simulations show that the improved FEC and CNHA/CWA outperform the most recent O(log2|T|) schemes in terms of lookup time, update time, and memory requirement. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sequential Circuit Design for Embedded Cryptographic Applications Resilient to Adversarial Faults

    Page(s): 126 - 138
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1593 KB) |  | HTML iconHTML  

    In the relatively young field of fault-tolerant cryptography, the main research effort has focused exclusively on the protection of the data path of cryptographic circuits. To date, however, we have not found any work that aims at protecting the control logic of these circuits against fault attacks, which thus remains the proverbial Achilles' heel. Motivated by a hypothetical yet realistic fault analysis attack that, in principle, could be mounted against any modular exponentiation engine, even one with appropriate data path protection, we set out to close this remaining gap. In this paper, we present guidelines for the design of multifault-resilient sequential control logic based on standard Error-Detecting Codes (EDCs) with large minimum distance. We introduce a metric that measures the effectiveness of the error detection technique in terms of the effort the attacker has to make in relation to the area overhead spent in implementing the EDC. Our comparison shows that the proposed EDC-based technique provides superior performance when compared against regular N-modular redundancy techniques. Furthermore, our technique scales well and does not affect the critical path delay. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reviewers List

    Page(s): 139 - 144
    Save to Project icon | Request Permissions | PDF file iconPDF (32 KB)  
    Freely Available from IEEE
  • Annual index

    Page(s): not in print
    Save to Project icon | Request Permissions | PDF file iconPDF (462 KB)  
    Freely Available from IEEE
  • TC Information for authors

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (76 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (121 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Albert Y. Zomaya
School of Information Technologies
Building J12
The University of Sydney
Sydney, NSW 2006, Australia
http://www.cs.usyd.edu.au/~zomaya
albert.zomaya@sydney.edu.au