By Topic

Computers, IEEE Transactions on

Issue 1 • Date Jan. 2000

Filter Results

Displaying Results 1 - 9 of 9
  • 1999 reviewers list

    Page(s): 95 - 96
    Save to Project icon | Request Permissions | PDF file iconPDF (11 KB)  
    Freely Available from IEEE
  • Probabilistic loop scheduling for applications with uncertain execution time

    Page(s): 65 - 80
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (592 KB)  

    One of the difficulties in high-level synthesis and compiler optimization is obtaining a good schedule without knowing the exact computation time of the tasks involved. The uncertain computation times of these tasks normally occur when conditional instructions are employed and/or inputs of the tasks influence the computation time. The relationship between these tasks can be represented as a data-flow graph where each node models the task associated with a probabilistic computation time. A set of edges represents the dependencies between tasks. In this research, we study scheduling and optimization algorithms taking into account the probabilistic execution times. Two novel algorithms, called probabilistic retiming and probabilistic rotation scheduling, are developed for solving the underlying nonresource and resource constrained scheduling problems, respectively. Experimental results show that probabilistic retiming consistently produces a graph with a smaller longest path computation time for a given confidence level, as compared with the traditional retiming algorithm that assumes a fixed worst-case and average-case computation times. Furthermore, when considering the resource constraints and probabilistic environments, probabilistic rotation scheduling gives a schedule whose length is guaranteed to satisfy a given probability requirement. This schedule is better than schedules produced by other algorithms that consider worst-case and average-case scenarios View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On finding a minimal functional description of a finite-state machine for test generation for adjacent machines

    Page(s): 88 - 94
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (264 KB)  

    In some applications, it is desirable to find for a circuit a minimal partial description that allows a certain task to be carried out. A partial circuit description allows the task to be carried out more efficiently since fewer decision points exist based on a partial description compared to the full circuit description. We consider this problem with respect to finite state machines and the following tasks. Starting from a functional description of a finite state machine M in the form of a state table ST, we select a minimal subset of state-transitions STpart⊂ST such that every output sequence that can be produced using state-transitions out of ST can also be produced using state-transitions out of STpart. We also formulate a similar problem related to the propagation of fault effects from the inputs to the outputs of M and describe a procedure for solving this problem. Applications of these tasks include test generation for circuits described as interconnections of finite-state machines. Experimental results presented show that STpart contains a small fraction of the state-transitions of ST View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An approach for detecting multiple faulty FPGA logic blocks

    Page(s): 48 - 54
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (176 KB)  

    An approach is proposed to test FPGA logic blocks, including part of the configuration memories used to control them. The proposed AND tree and OR tree-based testing structure is simple and the conditions for constant testability can easily be satisfied. Test generation for only a single logic block is sufficient. We do not assume any particular fault model. Any number of faulty blocks in the chip can be detected. Members of the Xilinx XC3000, XC4000, and XC5200 families were studied. The proposed AND/OR approach was found to reduce the number of FPGA reprogrammings needed for testing by up to a factor of seven versus direct methods of multiple faulty block detection View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Buffer assignment algorithms on data driven ASICs

    Page(s): 16 - 32
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (524 KB)  

    Data driven architectures have significant potential in the design of high performance ASICs. By exploiting the inherent parallelism in the application, these architectures can maximize pipelining. The key consideration involved with the design of a data driven ASIC is ensuring that throughput is maximized while a relatively low area is maintained. Optimal throughput can be realized by ensuring that all operands arrive simultaneously at their corresponding operator node. If this condition is achieved, the underlying data flow graph is said to be balanced. If the initial data flow graph is unbalanced, buffers must be inserted to prevent the clogging of the pipeline along the shorter paths. A novel algorithm for the assignment of buffers in a data flow graph is proposed. The method can also be applied to achieve wave-pipelining in digital systems under certain restrictions. The algorithm uses a new application of the retiming technique; the number of buffers here is shown to be equal to the minimum number of buffers achieved by integer programming techniques. We also discuss an extension of this algorithm which can further reduce the number of buffers by altering the DFG without affecting functionality or performance. The time complexities of the proposed algorithms are O(V×E) and O(V2×logV), respectively, a considerable improvement over the existing strategies. Also proposed is a novel buffer distribution algorithm that exploits a unique feature of data driven operation. This procedure maximizes throughput by inserting substantially fewer buffers than other techniques. Experimental results show that the proposed algorithms outperform the existing methods View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Filtering memory references to increase energy efficiency

    Page(s): 1 - 15
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (10436 KB)  

    Most modern microprocessors employ one or two levels of on-chip caches in order to improve performance. Caches typically are implemented with static RAM cells and often occupy a large portion of the chip area. Not surprisingly, these caches can consume a significant amount of power. In many applications, such as portable devices, energy efficiency is more important than performance. We propose sacrificing some performance in exchange for energy efficiency by filtering cache references through an unusually small first level cache. We refer to this structure as the filter cache. A second level cache, similar in size and structure to a conventional first level cache, is positioned behind the filter cache and serves to mitigate the performance loss. Extensive experiments indicate that a small filter cache still can achieve a high hit rate and good performance. This approach allows the second level cache to be in a low power mode most of the time, thus resulting in power savings. The filter cache is particularly attractive in low power applications, such as the embedded processors used for communication and multimedia applications. For example, experimental results across a wide range of embedded applications show that a direct mapped 255-byte filter cache achieves a 58 percent power reduction while reducing performance by 21 percent. This trade-off results in a 51 percent reduction in the energy-delay product when compared to a conventional design View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An IEEE compliant floating-point adder that conforms with the pipeline packet-forwarding paradigm

    Page(s): 33 - 47
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (320 KB)  

    This paper presents a floating-point addition algorithm and adder pipeline design employing a packet forwarding pipeline paradigm. The packet forwarding format and the proposed algorithms constitute a new paradigm for handling data hazards in deeply pipelined floating-point pipelines. The addition and rounding algorithms employ a four stage execution phase pipeline with each stage suitable for implementation in a short clock period, assuming about 15 logic levels per cycle. The first two cycles are related to addition proper and are the focus of this paper. The last two cycles perform the rounding and have been covered in a paper by D.W. Matula and A.M. Nielsen (1997). The addition algorithm accepts one operand in a standard binary floating-point formal at the start of cycle one. The second operand is represented in the packet forwarding floating-point format: namely, it is divided into four parts: the sign bit, the exponent string, the principal part of the significant, and the carry-round packet. The first three parts of the second operand are input at the start of cycle one and the carry-round packet is input at the start of cycle two. The result is output in two formats that both represent the rounded result as required by the IEEE 754 standard. The result is output in the packet forwarding floating-point format at the end of cycles two and three to allow forwarding with an effective latency of two cycles. The result is also format at the end of cycle four for retirement to a register. The packet forwarding result is thus available with an effective two cycle latency for forwarding to the start of the adder pipeline or to a cooperating multiplier pipeline accepting a packet forwarding operand. The effective latency of the proposed design is two cycles for successive dependent operations while perceiving IEEE 754 binary floating-point compatibility View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluation of fault tolerance latency from real-time application's perspectives

    Page(s): 55 - 64
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (268 KB)  

    Information on Fault Tolerance Latency (FTL), which is defined as the total time required by all sequential steps taken to recover from an error, is important to the design and evaluation of fault-tolerant computers used in safety-critical real-time control systems with deadline information. In this paper, we evaluate FTL in terms of several random and deterministic variables accounting for fault behaviors and/or the capability and performance of error-handling mechanisms, while considering various fault tolerance mechanisms based on the trade-off between temporal and spatial redundancy, and use the evaluated FTL to check if an error-handling policy can meet the Control System Deadline (CSD) for a given real-time application View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Period-based load partitioning and assignment for large real-time applications

    Page(s): 81 - 87
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (700 KB)  

    We propose a new approach to the problem of workload partitioning and assignment for very large distributed real-time systems, in which software components are typically organized hierarchically, and hardware components potentially span several shared and/or dedicated links. Existing approaches for load partitioning and assignment are based on either schedulability or communication. The first category attempts to construct a feasible schedule for various assignments and chooses the one that minimizes task lateness (or other similar criteria), while the second category partitions the workload heuristically in accordance with the amount of intertask communication. We propose, and argue for, a (new) third category based on task periods, which, among others, combines the ability of handling heterogeneity with excellent scalability. Our algorithm is a recursive invocation of two stages: clustering and assignment. The clustering stage partitions tasks and processors into clusters. The assignment stage maps task clusters to processor clusters. A later scheduling stage will compute a feasible schedule, if any, when the size of processor clusters reduces to one at the bottom of the recursion tree. We introduce a new clustering heuristic and evaluate elements of the period-based approach using simulations to verify its suitability for large real-time applications. Also presented is an example application drawn from the field of command and control that has the potential to benefit significantly from the proposed approach View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Paolo Montuschi
Politecnico di Torino
Dipartimento di Automatica e Informatica
Corso Duca degli Abruzzi 24 
10129 Torino - Italy
e-mail: pmo@computer.org