By Topic

Computers, IEEE Transactions on

Issue 6 • Date June 2006

Filter Results

Displaying Results 1 - 15 of 15
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (137 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (91 KB)  
    Freely Available from IEEE
  • Program counter-based prediction techniques for dynamic power management

    Page(s): 641 - 658
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3643 KB) |  | HTML iconHTML  

    Reducing energy consumption has become one of the major challenges in designing future computing systems. This paper proposes a novel idea of using program counters to predict I/O activities in the operating system. It presents a complete design of program-counter access predictor (PCAP) that dynamically learns the access patterns of applications and predicts when an I/O device can be shut down to save energy. PCAP uses path-based correlation to observe a particular sequence of program counters leading to each idle period and predicts future occurrences of that idle period. PCAP differs from previously proposed shutdown predictors in its ability to: 1) correlate I/O operations to particular behavior of the applications and users, 2) carry prediction information across multiple executions of the applications, and 3) attain higher energy savings while incurring lower mispredictions. We perform an extensive evaluation study of PCAP using a detailed trace-driven simulation and an actual Linux implementation. Our results show that PCAP achieves lower average mispredictions and higher energy savings than the simple timeout scheme and the state-of-the-art learning tree scheme View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A hardware Gaussian noise generator using the Box-Muller method and its error analysis

    Page(s): 659 - 671
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2384 KB) |  | HTML iconHTML  

    We present a hardware Gaussian noise generator based on the Box-Muller method that provides highly accurate noise samples. The noise generator can be used as a key component in a hardware-based simulation system, such as for exploring channel code behavior at very low bit error rates, as low as 10-12 to 10-13. The main novelties of this work are accurate analytical error analysis and bit-width optimization for the elementary functions involved in the Box-Muller method. Two 16-bit noise samples are generated every clock cycle and, due to the accurate error analysis, every sample is analytically guaranteed to be accurate to one unit in the last place. An implementation on a Xilinx Virtex-4 XC4VLX100-12 FPGA occupies 1,452 slices, three block RAMs, and 12 DSP slices, and is capable of generating 750 million samples per second at a clock speed of 375 MHz. The performance can be improved by exploiting concurrent execution: 37 parallel instances of the noise generator at 95 MHz on a Xilinx Virtex-II Pro XC2VP100-7 FPGA generate seven billion samples per second and can run over 200 times faster than the output produced by software running on an Intel Pentium-4 3 GHz PC. The noise generator is currently being used at the Jet Propulsion Laboratory, NASA to evaluate the performance of low-density parity-check codes for deep-space communications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing rename logic complexity for high-speed and low-power front-end architectures

    Page(s): 672 - 685
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5943 KB) |  | HTML iconHTML  

    In modern day high-performance processors, the complexity of the register rename logic grows along with the pipeline width and leads to larger renaming time delay and higher power consumption. Renaming logic in the front-end of the processor is one of the largest contributors of peak temperatures on the chip and, so, demands attention to reduce the power consumption. Further, with the advent of clustered microarchitectures, the rename map table at the front-end is shared by the clusters and, hence, its critical path delay should not become a bottleneck in determining the processor clock cycle time. Analysis of characteristics of Spec2000 integer benchmark programs reveals that, when the programs are processed in a 4-wide processor, none or only one two-source instruction (an instruction with two source registers) is renamed in a cycle for 94 percent of the total execution time. Similarly, in an 8-wide processor, none or only one two-source instruction is renamed in a cycle for 92 percent of the total execution time. Thus, the analysis observes that the rename map table port bandwidth is highly underutilized for a significant portion of time. Based on the analysis, in this paper, we propose a novel technique to significantly reduce the number of ports in the rename map table. The novelty of the technique is that it is easy to implement and succeeds in reducing the access time, power, and area of the rename logic, without any additional power, area, and delay overheads in any other logic on the chip. The proposed technique performs the register renaming of instructions in the order of their fetch, with no significant impact on the processor's performance. With this technique in an 8-wide processor, as compared to a conventional rename map table in an integer pipeline with 16 ports to look up source operands, a rename map table with nine ports results in a reduction in access time, power, and area by 14 percent, 42 percent, and 49 percent, respectively, with only 4.7 - - percent loss in instructions committed per cycle (IPC). The implementation of the technique in a 4-wide processor results in a reduction in access time, power, and area by 7 percent, 38 percent, and 59 percent, respectively, with an IPC loss of only 4.4 percent View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient dynamic algorithm for maintaining all-pairs shortest paths in stochastic networks

    Page(s): 686 - 702
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3840 KB) |  | HTML iconHTML  

    This paper presents a new solution to the dynamic all-pairs shortest path routing problem, using a linear reinforcement learning scheme. The particular instance of the problem that we have investigated concerns finding the all-pairs shortest paths in a stochastic graph, where there are continuous probabilistically-based updates in edge-weights. We present the details of the algorithm with an illustrative example. The algorithm can be used to find the all-pairs shortest paths for the "statistical" average graph, and the solution converges irrespective of whether there are new changes in edge-weights or not. On the other hand, the existing algorithms will fail to exhibit such a behavior and would recalculate the affected shortest paths after each edge-weight update. There are two important contributions of the proposed algorithm. The first contribution is that not all the edges in a stochastic graph are probed and, even if they are, they are not all probed equally often. Indeed, the algorithm attempts to almost always probe only those edges that will be included in the final list involving all pairs of nodes in the graph, while probing the other edges minimally. This increases the performance of the proposed algorithm. The second contribution is designing a data-structure, the elements of which represent the probability that a particular edge in the graph lies in the shortest path between a pair of nodes in the graph. All the algorithms were tested in environments where edge-weights change stochastically and where the graph topologies undergo multiple simultaneous edge-weight updates. Its superiority in terms of the average number of processed nodes, scanned edges, and the time per update operation, when compared with the existing algorithms, was experimentally established View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Risk-resilient heuristics and genetic algorithms for security-assured grid job scheduling

    Page(s): 703 - 719
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4623 KB) |  | HTML iconHTML  

    In scheduling a large number of user jobs for parallel execution on an open-resource grid system, the jobs are subject to system failures or delays caused by infected hardware, software vulnerability, and distrusted security policy. This paper models the risk and insecure conditions in grid job scheduling. Three risk-resilient strategies, preemptive, replication, and delay-tolerant, are developed to provide security assurance. We propose six risk-resilient scheduling algorithms to assure secure grid job execution under different risky conditions. We report the simulated grid performances of these new grid job scheduling algorithms under the NAS and PSA workloads. The relative performance is measured by the total job makespan, grid resource utilization, job failure rate, slowdown ratio, replication overhead, etc. In addition to extending from known scheduling heuristics, we developed a new space-time genetic algorithm (STGA) based on faster searching and protected chromosome formation. Our simulation results suggest that, in a wide-area grid environment, it is more resilient for the global job scheduler to tolerate some job delays instead of resorting to preemption or replication or taking a risk on unreliable resources allocated. We find that delay-tolerant min-min and STGA job scheduling have 13-23 percent higher performance than using risky or preemptive or replicated algorithms. The resource overheads for replicated job scheduling are kept at a low 15 percent. The delayed job execution is optimized with a delay factor, which is 20 percent of the total makespan. A Kiviat graph is proposed for demonstrating the quality of grid computing services. These risk-resilient job scheduling schemes can upgrade grid performance significantly at only a moderate increase in extra resources or scheduling delays in a risky grid computing environment View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simple error detection methods for hardware implementation of Advanced Encryption Standard

    Page(s): 720 - 731
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2161 KB) |  | HTML iconHTML  

    In order to prevent the Advanced Encryption Standard (AES) from suffering from differential fault attacks, the technique of error detection can be adopted to detect the errors during encryption or decryption and then to provide the information for taking further action, such as interrupting the AES process or redoing the process. Because errors occur within a function, it is not easy to predict the output. Therefore, general error control codes are not suited for AES operations. In this work, several error-detection schemes have been proposed. These schemes are based on the (n+1, n) cyclic redundancy check (CRC) over GF(28), where nisin{4,8,16}. Because of the good algebraic properties of AES, specifically the MixColumns operation, these error detection schemes are suitable for AES and efficient for the hardware implementation; they may be designed using round-level, operation-level, or algorithm-level detection. The proposed schemes have high fault coverage. In addition, the schemes proposed are scalable and symmetrical. The scalability makes these schemes suitable for an AES circuit implemented in 8-bit, 32-bit, or 128-bit architecture. Symmetry also benefits the implementation of the proposed schemes to achieve that the encryption process and the decryption process can share the same error detection hardware. These schemes are also suitable for encryption-only or decryption-only cases. Error detection for the key schedule in AES is also proposed and is based on the derived results in the data procedure of AES View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new reliability-oriented place and route algorithm for SRAM-based FPGAs

    Page(s): 732 - 744
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1875 KB) |  | HTML iconHTML  

    The very high integration levels reached by VLSI technologies for SRAM-based field programmable gate arrays (FPGAs) lead to high occurrence-rate of transient faults induced by single event upsets (SEUs) in FPGAs' configuration memory. Since the configuration memory defines which circuit an SRAM-based FPGA implements, any modification induced by SEUs may dramatically change the implemented circuit. When such devices are used in safety-critical applications, fault-tolerant techniques are needed to mitigate the effects of SEUs in FPGAs' configuration memory. In this paper, we analyze the effects induced by the SEUs in the configuration memory of SRAM-based FPGAs. The reported analysis outlines that SEUs in the FPGA's configuration memory are particularly critical since they are able to escape well-known fault masking techniques such as triple modular redundancy (TMR). We then present a reliability-oriented place and route algorithm that, coupled with TMR, is able to effectively mitigate the effects of the considered faults. The effectiveness of the new reliability-oriented place and route algorithm is demonstrated by extensive fault injection experiments showing that the capability of tolerating SEU effects in the FPGA's configuration memory increases up to 85 times with respect to a standard TMR design technique View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On optimization of e-textile systems using redundancy and energy-aware routing

    Page(s): 745 - 756
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2115 KB) |  | HTML iconHTML  

    Recent advances in the electronic device manufacturing technology have opened many research opportunities in pervasive computing. Among the emerging design platforms, "electronic textiles" (or e-textiles) make possible a wide variety of novel applications, ranging from consumer electronics to aerospace devices. Due to the harsh environment of e-textile components and battery size limitations, low-power and redundancy techniques are critical for obtaining successful e-textile applications. In this paper, we consider a platform which consists of dedicated components for e-textiles, including computational modules, dedicated transmission lines, and thin-film batteries on fiber substrates. As a theoretical contribution, we address the issue of the energy-aware routing for e-textile platforms and propose an efficient algorithm to solve it. Furthermore, we derive an analytical upper bound for determining the maximum number of achievable jobs over all possible e-textile routing frameworks. From a practical standpoint, for the Advanced Encryption Standard (AES) cipher, the routing technique we propose achieves close to or more than 75 percent of this theoretical upper bound. Moreover, compared to the non-energy-aware counterpart, the new routing technique increases the number of encryption jobs by one order of magnitude View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Toward a theory for scheduling dags in Internet-based computing

    Page(s): 757 - 768
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1524 KB) |  | HTML iconHTML  

    Conceptual and algorithmic tools are developed as a foundation for a theory of scheduling complex computation-dags for Internet-based computing. The goal of the schedules produced is to render tasks eligible for allocation to remote clients (hence, for execution) at the maximum possible rate. This allows one to utilize remote clients well, as well as to lessen the likelihood of the "gridlock" that ensues when a computation stalls for lack of eligible tasks. Earlier work has introduced a formalism for studying this optimization problem and has identified optimal schedules for several significant families of structurally uniform dags. The current paper extends this work via a methodology for devising optimal schedules for a much broader class of complex dags, which are obtained via composition from a prespecified collection of simple building-block dags. The paper provides a suite of algorithms that decompose a given dag Gscr to expose its building blocks and an execution-priority relation xutri on building blocks. When the building blocks are appropriately interrelated under xutri the algorithms specify an optimal schedule for Gscr View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Measuring benchmark similarity using inherent program characteristics

    Page(s): 769 - 782
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4075 KB) |  | HTML iconHTML  

    This paper proposes a methodology for measuring the similarity between programs based on their inherent microarchitecture-independent characteristics, and demonstrates two applications for it: 1) finding a representative subset of programs from benchmark suites and 2) studying the evolution of four generations of SPEC CPU benchmark suites. Using the proposed methodology, we find a representative subset of programs from three popular benchmark suites - SPEC CPU2000, MediaBench, and MiBench. We show that this subset of representative programs can be effectively used to estimate the average benchmark suite IPC, L1 data cache miss-rates, and speedup on 11 machines with different ISAs and microarchitectures - this enables one to save simulation time with little loss in accuracy. From our study of the similarity between the four generations of SPEC CPU benchmark suites, we find that, other than a dramatic increase in the dynamic instruction count and increasingly poor temporal data locality, the inherent program characteristics have more or less remained unchanged View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comment on "Computing the shortest network under a fixed topology

    Page(s): 783 - 784
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (230 KB)  

    A linear programming formulation was given for the problem of computing a shortest network under a fixed topology (under the lambda-metric). We point out a nontrivial error in this paper and give a correct and simpler linear programming formulation. We also show that the result can be generalized to any distance function given by a Minkowski unit circle that is a centrally symmetric polygon View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TC Information for authors

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (91 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (137 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Albert Y. Zomaya
School of Information Technologies
Building J12
The University of Sydney
Sydney, NSW 2006, Australia
http://www.cs.usyd.edu.au/~zomaya
albert.zomaya@sydney.edu.au