By Topic

Computers and Digital Techniques, IEE Proceedings -

Issue 6 • Date Nov 2000

Filter Results

Displaying Results 1 - 15 of 15
  • Bubble-sort approach to channel routing

    Page(s): 415 - 422
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (668 KB)  

    An efficient bubble-sort technique for solving the two-layer non-Manhattan channel-routing problem is presented. The time and space complexities of our algorithm are O(kn) and O(n), respectively, where k is the number of sorting passes required and n is the total number of two-terminal nets in a routing channel. The algorithm is easily extended to handle the cases with multiterminal nets distributed in a channel. Various tests verify the efficiency of the bubble-sort based router. Experimental results indicate that the router is time-efficient for routing. A three-layer algorithm having O(kn) time based on an identical problem formulation is proposed for solving the non-Manhattan channel routing View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Two systolic architectures for multiplication in GF(2m)

    Page(s): 375 - 382
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (540 KB)  

    Two new systolic architectures are presented for multiplications in the finite field GF(2m). These two architectures are based on the standard basis representation. In Architecture-I, the authors attempt to speed up the operation by using a new partitioning scheme for the basic cell in a straightforward systolic architecture to shorten the clock cycle period. In Architecture-II, they eliminate the one clock cycle gap between iterations by pairing off the cells of Architecture-I. They compare their architectures with previously proposed systolic architectures and a semisystolic architecture, and show that their Architecture-I offers the highest speed and Architecture-II the lowest hardware complexity View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Check bit prediction scheme using Dong's code for concurrent error detection in VLSI processors

    Page(s): 467 - 471
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (384 KB)  

    The authors describe the application of Dong's Code to the implementation of a checkbit prediction scheme for concurrent error detection (CED) in VLSI processors. Checkbit prediction is the only method which will permit the detection of both data transfer and data processing errors. Dong's Code has the advantage that its error detection capability is a function of the number of checkbits used, independent of the number of databits being processed; that is the error detection capability of code can be made to be application specific. The applicability of the scheme for implementing a `CED' test strategy in VLSI circuits is demonstrated by integrating this test method into a 32 bit RISC processor. The impact of the test scheme on the design is subsequently analysed in terms of area overheads and effect on performance. A comparison is made with two self-testing ALUs, one using Berger Code and the other Bose-Lin Code; Dong's Code shows a reduction in the gate count required for checkbit prediction hardware for the ALU of 27 and 11%, respectively. When Dong's Code was used for CED in the 32 bit RISC Processor, the area overhead incurred amounted to 55.5%, which is much less than duplication View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simultaneous optimisation of dynamic power, area and delay in behavioural synthesis

    Page(s): 383 - 390
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (768 KB)  

    Concern over power dissipation, coupled with the continuing rise in system size and complexity, means that there is a growing need for high-level design tools capable of doubt automatically optimising systems to take into account power dissipation, in addition to the more conventional metrics of area, delay and testability. Current methods for reducing power consumption tend to be ad-hoc: for example, slowing down, or turning off idle parts of the system, or a controlled reduction in power supply. The behavioural synthesis system described here features an integrated incremental power estimation capability, which makes use of activity profiles, generated automatically through simulation of a design on any standard VHDL simulator; accurate circuit-level cell models (generated, again automatically, via SPICE simulation); and a comprehensive system power model. This data, along with similar estimators for area and delay, guides the optimisation of a design towards independent user-specified objectives for final area, delay, clock speed, and energy consumption. In addition, a range of power reducing features are included, encompassing: supply voltage scaling, clock gating, input latching, input gating, low-power cells, and pipelined and multicycle units. These are automatically exploited during optimisation as part of the area/delay/power dissipation trade-off process. The resulting system is capable of reducing the estimated energy consumption of several benchmark designs by factors of between 3.5 and 7.0 times. Furthermore, the design exploration capability enables a range of alternative structural implementations to be generated from a single behavioural description, with differing area/delay/power trade-offs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author Index

    Page(s): 473
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (36 KB)  

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Heuristic AND-OR-EXOR three-level minimisation algorithm for multiple-output incompletely-specified Boolean functions

    Page(s): 451 - 461
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (832 KB)  

    Given a logic function f, an AND-OR-EXOR representation of f comprises a pair of sum-of-products expressions connected by a single two-input EXOR operator such that the resulting expression realises f: The AND-OR-EXOR form of a logic function has been observed to have a more compact representation for functions like arithmetic, control, ALU circuits, etc. Compared to sum-of-products as well as EXOR-sum-of-products based representations. The problem of AND-OR-EXOR minimisation of logic functions is to find a suitable pair of sum-of-products expressions which will reduce the size of the representation and hence the resulting hardware. A new heuristic AND-OR-EXOR minimisation algorithm for multiple-output incompletely-specified logic functions has been developed. The algorithm is divided into two parts. The first part constitutes a new heuristic algorithm for decomposition of multiple-output incompletely specified logic functions for AND-OR-EXOR minimisation. The second part constitutes ANDOR-EXOR optimisation algorithms for multiple-output incompletely-specified logic functions. Using benchmark PLAs the authors show that their new AND-OR-EXOR minimisation algorithm is able to find better solutions than previous techniques View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Partitioning methodology for dynamically reconfigurable embedded systems

    Page(s): 391 - 396
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (524 KB)  

    Hardware-software partitioning is a crucial step in the design of embedded reconfigurable systems. Typically the objective is to optimise the speed performance of the application while minimising cost. Recently, FPGAs are being employed in embedded systems to increase the computational power of the system by customising the reconfigurable platform to the hardware partition's requirements. A new method for hardware-software partitioning is presented, which considers partitioning applications within resource-limited embedded systems that utilise the runtime reconfiguration capabilities of FPGAs. The partitioning approach is demonstrated using example speech and image processing applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Limited maximum fault-multiplicity diagnosis procedure for scan designs

    Page(s): 423 - 433
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (804 KB)  

    A new effect-cause approach to multiple fault diagnosis in digital circuits is presented. The previous elimination algorithm is extended to sequential circuits. A new procedure capable of diagnosing any `stuck-at' type fault with limited maximum multiplicity in full scan or partial scan designs is developed. The limit is established for each logic partition of the circuit. With this new algorithm, any trial that would lead to faults of multiplicity greater than the established limit is determined a priori and is rejected. This greatly reduces the diagnostic computing time, without affecting its utility, and without probing any internal line. It can also obtain masked faults with or without a multiplicity limit. Experimental results for some full scan circuits from the ISCAS'89 benchmarks are included View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multinode broadcasting in a wormhole-routed 2-D torus using an aggregation-then-distribution strategy

    Page(s): 403 - 413
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (832 KB)  

    An efficient multinode broadcasting algorithm in a wormhole-routed 2-D torus is presented where there are an unknown number of s source nodes located on unknown positions each intending to broadcast a message of size m bytes to the rest of network. The torus is assumed to use the all-port model and the popular dimension-ordered routing. Most existing results are derived based on finding multiple edge-disjoint spanning trees in the network. The main technique used is an aggregation-then-distribution strategy. First, the broadcast messages are aggregated into some positions of the torus. Then, a number of independent subnetworks are constructed from the torus. These subnetworks, which are responsible for distributing the messages, can well exploit the communication parallelism and the characteristic of wormhole routing. It is shown that such an approach is more appropriate than those using edge-disjoint trees for fixed-connection network such as tori. This is justified by performance analysis View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of time-stamped proxy signatures with traceable receivers

    Page(s): 462 - 466
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (500 KB)  

    A proxy signature scheme is a method which allows an original signer to delegate his signing power to a proxy signer. Most proxy signature schemes use a warrant appearing in the signature verification equation to declare the valid delegation period. However, the declaration in the warrant is useless because no-one can know the exact time when the proxy signer signed a message. To avoid the proxy signer abusing the signing capability, the original signer may hope to know the identity of who received the proxy signature from the proxy signer. Recently Sun and Chen proposed the concept of time-stamped proxy signatures with traceable receivers to solve these two problems. A time-stamped proxy signature scheme with traceable receivers is a proxy signature scheme which can ascertain whether a proxy signature is created during the delegation period, and can trace who actually received the proxy signatures from the proxy signer. The author shows that Sun and Chen's scheme suffers from weaknesses and consequently proposes a new time-stamped proxy signature scheme which doesn't suffer from the same weaknesses View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Switching activity estimation under real-gate delay using timed Boolean functions

    Page(s): 444 - 450
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (512 KB)  

    A probabilistic method to estimate the switching activity of a combinational circuit under a real-gate delay model considering temporal, structural and input pattern dependencies is introduced. It is proved that the switching activity evaluation problem is reduced to the zero-delay problem at specific time instances. A mathematical model based on Markov stochastic processes, which describes the temporal and spatial correlation in terms of the associated zero-delay parameters, is presented. To handle the influence of time on glitch generation, the theory of the timed Boolean function (TBF) is adopted. Additionally, an algorithm to evaluate the switching activity at specific time instances using TBF-ordered binary decision diagrams (TBF-OBDDs) is given. Comparative study of benchmark circuits demonstrates the accuracy and efficiency of the proposed method View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Subject Index

    Page(s): 474 - 475
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (216 KB)  

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault-tolerant wormhole routing using a variation of the distributed recovery block approach

    Page(s): 397 - 402
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (496 KB)  

    A fault-tolerant wormhole routing technique that incorporates a variation of the distributed recovery block (DRB) approach is described. The section of a parallel system that spans between the source and destination nodes is dynamically partitioned into overlapping DRB groups. A DRB group consists of a current node, a primary and an alternate successor node. The message packets travel towards the destination from one DRB group to the next group. A prototype of the routing system is implemented for mesh and hypercube topologies; however, the method can be used for topologies with a minimum node connectivity of three. The simulation results indicate that the DRB approach based wormhole routing tolerates both node and link failures View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Supporting object accesses in a Java processor

    Page(s): 435 - 443
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (848 KB)  

    Due to Java's object-based nature and support for garbage collection, efficient object manipulation and relocation are critical to the execution speed of Java code. Java Virtual Machine implementations that utilise handle representation such as Sun's Java Development Kit 1.1 enable efficient object relocation at the cost of an additional indirection for each object access. The direct address object representations such as those used in CACAO and NET compiler eliminate the indirection overhead, but update during object relocation is complex. A virtual address object cache that reduces the indirection overhead while maintaining the efficiency of object relocation is proposed. The objects in the virtual address cache are addressed directly using the object reference and field offset pair. This eliminates the indirection overhead and off-set addition overhead associated with the handle representation model. A hardware object table that maintains the handles is used to obtain the actual object location on a virtual address cache miss. The performance of the virtual address cache is analysed using various Java programs, and is found to reduce 1.5 cycles per object access on an average as compared to the handle representation model for the various benchmarks studied View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluating trace cache on moderate-scale processors

    Page(s): 369 - 374
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (532 KB)  

    Trace cache is a new instruction supply mechanism for wide-issue superscalar processors. It caches dynamic instruction traces, each of which consists of multiple basic blocks. As a result, a number of basic blocks are combined into a single large unit, and hence the effective fetch bandwidth is enlarged. The trace cache has been evaluated only on much aggressive superscalar processors. However, an efficient instruction supply mechanism is required also for moderate-scale processors, as media-specific applications increase their importance. The paper evaluates the trace cache fetch mechanism on moderate-scale superscalar processors. A simpler trace cache is proposed, named nonconsecutive basic block buffer (NCB), for the processors. From experimental evaluation, using the NCB improves the instruction supply efficiency by approximately 10% and hence processor performance is also improved by approximately 10% for integer programs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.