By Topic

Computers & Digital Techniques, IET

Issue 2 • Date March 2007

Filter Results

Displaying Results 1 - 10 of 10
  • Comparative analysis of GALS clocking schemes

    Page(s): 59 - 69
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (631 KB)  

    Because of the increase in complexity of distributing a global clock over a single die, globally asynchronous and locally synchronous systems are becoming an efficient alternative technique to design distributed system-on-chip (SOC). A number of independently clocked synchronous domains can be integrated by clock pausing, clock stretching or data-driven clocking techniques. Such techniques are applied on point-to-point inter-domain communication schemes. Presented here is a comparison of these schemes and how they can be applied to an existing partitioned synchronous architecture to obtain a reliable, low-latency and efficient clock control architectures. The comparison highlights the advantages and disadvantages of one scheme over the other in terms of logical correctness, circuit implementation, performance and relative power consumption. Also presented are circuit solutions for stretchable and data-driven clocking schemes. These circuit solutions can be easily plugged into existing partitioned synchronous islands. To enable early evaluation of functional correctness, also proposed is the use of Petri net modelling techniques to model the asynchronous control blocks that constitute the interface between the synchronous islands View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FPGA-based fault emulation of synchronous sequential circuits

    Page(s): 70 - 76
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (287 KB)  

    A feasibility study of accelerating fault simulation by emulation on field programmable gate arrays (FPGAs) is described. Fault simulation is an important subtask in test pattern generation and it is frequently used throughout the test generation process. The problems associated with fault simulation of sequential circuits are explained. Alternatives that can be considered as trade-offs in terms of the required FPGA resources and accuracy of test quality assessment are discussed. In addition, an extension to the existing environment for re-configurable hardware emulation of fault simulation is presented. It incorporates hardware support for fault dropping. The proposed approach allows simulation speed-up of 40-500 times as compared to the state-of-the-art in software-based fault simulation. On the basis of the experiments, it can be concluded that it is beneficial to use emulation for circuits/methods that require large numbers of test vectors while using simple but flexible algorithmic test vector generating circuits, for example built-in self-test View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Embedded high-resolution delay measurement system using time amplification

    Page(s): 77 - 86
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (793 KB)  

    The rapid pace of change in IC technology, specifically in the speed of operation, demands sophisticated design solutions for IC testing methodologies. Moreover, the current technology of System-on-Chip makes great demands on the accurate testing of internal timing parameters as access to internal nodes through input/output pins becomes more difficult. This work presents two high-resolution time measurement schemes for digital Built-in Self-Test (BIST) applications, namely: Two-Delay Interpolation Method and the Time Amplifier. The two schemes are subsequently combined to produce a novel design for BIST time measurement which offers two main advantages: a small time interval measurement capability which advances the state of the art and a small footprint, occupying 0.2 mm2 or equivalent to 3020 transistors, compared with a recent design which has the equivalent of 4800 transistors View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Built-in time measurement circuits -- a comparative design study

    Page(s): 87 - 97
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1002 KB)  

    An increasingly important issue in the implementation of high-performance circuits using either System-on-Chip or System-in-Package technology is ensuring the correct timing performance at the input/output interfaces of cores or chips. These interfaces are not accessible to conventional Automatic Test Equipment (ATE). However, had these nodes been accessible the limitations of the ATE to make accurate measurements would necessitate the use of tight guard bands adversely impacting upon yield. To address this issue of internal time parameter measurement, the circuitry normally resident in the ATE to perform the measurements is incorporated into the design itself. This paper is a case study of three time measurement techniques potentially suitable for circuit integration, namely, Time Difference Measurement (TDM), Successive Approximation Time Measurement (SATM) and Time Delay Interpolation Measurement (TDIM) methods. The techniques are analysed and compared for a number of design parameters such as area overhead, ease of calibration, timing resolution, robustness to processing, temperature and supply voltage variations. The results of the analysis indicate that TDIM is the most efficient of the three circuits analysed; this method has been incorporated in a high-resolution time measurement system in the sub-picosecond range and has subsequently been fabricated by Sun Microsystems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comparison of the cost metrics through investigation of the relation between optimal NCV and optimal NCT three-qubit reversible circuits

    Page(s): 98 - 104
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (245 KB)  

    A breadth-first search method for determining optimal three-qubit circuits composed of quantum NOT, CNOT, controlled-V and controlled-V + (NCV) gates is introduced. Results are presented for simple gate count and for technology-motivated cost metrics. The optimal NCV circuits are also compared with NCV circuits derived from optimal NOT, CNOT and Toffoli (NCT) gate circuits. This work provides basic results and motivation for continued study of the direct synthesis of NCV circuits, and establishes relations between function realizations in different circuit cost metrics View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hybrid cache architecture for high-speed packet processing

    Page(s): 105 - 112
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (591 KB)  

    The exposed memory hierarchies employed in many network processors (NPs) are expensive in terms of meeting the worst-case processing requirement. Moreover, it is difficult to effectively utilise them because of the explicit data movement between different memory levels. Also, the effectiveness of traditional cache in NPs needs to be improved. A memory hierarchy component, called split control cache, is presented that employs two independent low-latency memory stores to temporarily hold the flow-based and application-relevant information, exploiting the different locality behaviours exhibited by these two types of data. Just like conventional cache, data movement is manipulated by specially designed hardware so as to relieve the programmers from the details of memory management. Software simulation shows that compared with conventional cache, a performance improvement of up to 90% can be achieved by this scheme for OC-3c and OC-12c links View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing power of functional units in high-performance processors by checking instruction codes and resizing adders

    Page(s): 113 - 119
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (187 KB)  

    A hardware technique to reduce static and dynamic power consumption in functional units of 64-bit high-performance processors is presented here. The instructions that require an adder have been studied it can be concluded and that, there is a large percentage of instruction where one of the two source operands is always narrow and does not require a 64-bit adder. Furthermore, by analysing the executed applications, it is feasible to classify their internal operations according to their bit-width requirements and select the appropriate adder type that each instruction requires. This approach is based on substituting some of the 64-bit power-hungry adders with 32-bit ones, which consume much lower power, and modifying the protocol to issue as much instructions as possible to these low power consumption units, while incurring in negligible performance penalties. Five different configurations were tested for the execution units. Results indicate that this technique can save between up to 50% of the power consumed by the adders and up to 21% of the overall power consumption in the execution unit of high-performance architectures. Moreover, the simulations show good results in terms of power efficiency (IPC/W) and it can be affirmed that it could prevent the creation of hot spots in the functional units View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design-time application mapping and platform exploration for MP-SoC customised run-time management

    Page(s): 120 - 128
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (488 KB)  

    In an Multi-Processor system-on-Chip (MP-SoC) environment, a customized run-time management layer should be incorporated on top of the basic Operating System services to alleviate the run-time decision-making and to globally optimise costs (e.g. energy consumption) across all active applications, according to application constraints (e.g. performance, user requirements) and available platform resources. To that end, to avoid conservative worst-case assumptions, while also eliminating large run-time overheads on the state-of-the-art RTOS kernels, a Pareto-based approach is proposed combining a design-time application and platform exploration with a low-complexity run-time manager. The design-time exploration phase of this approach is the main contribution of this work. It is also substantiated with two real-life applications (image processing and video codec multimedia). These are simulated on MP-SoC platform simulator and used to illustrate the optimal trade-offs offered by the design-time exploration to the run-time manager View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of power-efficient pipelined truncated multipliers with various output precision

    Page(s): 129 - 136
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (327 KB)  

    An energy-efficient multiplier is very desirable for multimedia and digital signal processing systems. In many of these systems, the effective dynamic range of input operands for multipliers is generally limited to a small range and the case with maximum range seldom occurs. In addition, the output products of multipliers are usually rounded or truncated to avoid growth in word size. Based on these features, a low-power signed pipelined truncated multiplier is proposed that can dynamically detect multiple combinations of input ranges and deactivate a large amount of the unnecessary transitions in non-effective ranges to reduce the power consumption. Moreover, the proposed multiplier can trade output precision against power consumption so as to further reduce power consumption. Experimental results show that the proposed multiplier consumes up to 90% less power than the conventional standard multiplier while still maintaining an acceptable output precision and quality View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • HW-SW optimisation of JPEG2000 wavelet transform for dedicated multimedia processor architectures

    Page(s): 137 - 143
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (226 KB)  

    The discrete wavelet transform is essential for several image compression applications such as Jpeg2000 encoding. In this area, time-effective execution represents a hot-spot for software realisations targeting microprocessor-based System-on-Chip platforms. The discrete wavelet transform software was redesigned to enhance its performance over a parallel single-instruction-multiple-data (SIMD) architecture built onto very-long-instructions-word processing elements. All major aspects affecting the processing speed were focused on: the penalties deriving from data-cache stalls have been massively eased whereas optimal exploitation of the platform vector processing capabilities has been pursued by instruction-level optimisations. Overall, up to 72% savings was attained in the runtime with respect to a conventional implementation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IET Computers & Digital Techniques publishes technical papers describing recent research and development work in all aspects of digital system-on-chip design and test of electronic and embedded systems.

Full Aims & Scope

Meet Our Editors

IET Research Journals
iet_cdt@theiet.org