By Topic

System-on-Chip, 2003. Proceedings. International Symposium on

Date 19-21 Nov. 2003

Filter Results

Displaying Results 1 - 25 of 44
  • A code compression scheme for improving SoC performance

    Page(s): 35 - 40
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (524 KB) |  | HTML iconHTML  

    Code compression is an effective technique for reducing the instruction memory requirement in an embedded system. This paper presents a code compression approach in which the boundary between compressed and uncompressed space lays between the instruction cache (ICache) and the microprocessor core. The approach achieves better compression ratios (around 0.57) than other reported implementations, and, as the ICache holds compressed instructions, its effective size is increased and the hit ratio is improved. The implementation of branch prediction as part of the decompression hardware further improves the system's performance. The work has required the resolutions of issues that arise from both memory and ICache data misalignment and form the compressed to uncompressed address mapping. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • AMBA based multiprocessor system

    Page(s): 41 - 42
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (243 KB) |  | HTML iconHTML  

    In this paper, AMBA based multiprocessor system using dual ARM processor cores is proposed. In general, AMBA did not intend to be used in multiprocessor system. To implement multiprocessor system using AMBA AHB bus, a revised AHB bus arbiter, special AHB slave hardware, and registers which store bus master number and enable the right of bus usage are designed. The proposed multiprocessor system is implemented and tested using dual ARM922T cores with 0.18 μm standard cell process. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Twisted differential on-chip interconnect architecture for inductive/capacitive crosstalk noise cancellation

    Page(s): 93 - 96
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (334 KB) |  | HTML iconHTML  

    A simple generic interconnect architecture is presented to allow effective cancellation of inductive and capacitive noise in high-speed on-chip interconnect lines. The approach is based on the principle of constructing periodically twisted differential line pairs for parallel interconnect segments in order to eliminate the mutual coupling influences. Detailed 3-D simulations show that a crosstalk noise reduction of up to 60 dB is achievable with this approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Complexity analysis of spatially scalable MPEG-4 encoder

    Page(s): 57 - 60
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (392 KB) |  | HTML iconHTML  

    Computational complexity of MPEG-4 encoder supporting spatial scalability is presented. The encoder is partitioned into atomic functions whose complexities are estimated in terms of millions of RISC like operations per second (MOPS). A detailed listing of arithmetic, logic and control flow operations is given. The complexity estimates are used to identify computationally the most demanding encoding tasks. The results indicate that approximately 6600 RISC MOPS is required to encode CIF/4CIF-sized video layers while using frame rate of 30 frames/s and a low-complexity motion estimation algorithm. Furthermore, encoding both spatial layers with the new H.264/AVC causes computational load of 27000 RISC MOPS. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis and design of level-converting flip-flops for dual-Vdd/Vth integrated circuits

    Page(s): 151 - 154
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (359 KB) |  | HTML iconHTML  

    This paper investigates the performance and energy consumption of six fully static CMOS edge-triggered level-converting flip-flops (LCFFs). These flip-flops provide the necessary voltage level conversion from a lower to a higher supply without incurring leakage currents in dual-Vdd systems while maintaining good speed. In particular, we propose two novel designs and extend two previous non-level-converting flip-flops to intrinsically perform level conversion. In addition, the robustness of the newly proposed LCFFs is investigated based on worst-case process corners as well as power supply noise. Results show delay improvement of up to 50% and energy-delay product reductions of 15-50% compared to a conventional level-converting master-slave flip-flop. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementing user and application specific algorithms within IP-methodology: a coarse-grain-approach

    Page(s): 61 - 64
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (341 KB) |  | HTML iconHTML  

    The method for designing application specific algorithms on IP-methodology is presented. Two well known VLSI design methods, IP-based design methodology and coarse grain computing, are combined making it possible to design a complete chip (i.e. system-on-a-chip) from IP blocks and to change the functionality of the chip after manufacturing even dynamically. In addition the actual implementation of the presented method is shown and a modern computation intensive algorithm is mapped on it. In the results it is shown that modern silicon technologies already make it realistic to use the presented programmable and configurable IP block on SoC. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Highly scalable network on chip for reconfigurable systems

    Page(s): 79 - 82
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (378 KB) |  | HTML iconHTML  

    An efficient methodology for building the billion-transistors systems on chip of tomorrow is a necessity. Networks on chip promise to be the solution for the numerous technological, economical and productivity problems. We believe that different types of networks are required for each application domains. Our approach therefore is to have a very flexible network design, highly scalable, that allows to easily accommodate the various needs. This paper presents the design of our network on chip, which is part of the platform we are developing for reconfigurable systems. The present design allows us to instantiate arbitrary network topologies, has a low latency and high throughput. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A system level IP integration methodology for fast SOC design

    Page(s): 127 - 130
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (371 KB) |  | HTML iconHTML  

    In the system-on-chip (SOC) era, the growing number of functionalities included on a single chip requires the development of new design methodologies to keep the design complexity under control. Intellectual property reuse has been commonly employed as a technique to address this problem, but a new system-level approach is needed to integrated IP-reuse methodology in the design flow, in order to speed up the designer's productivity. In this paper, a SOC design platform is proposed as a solution to this problem, providing a library of IP reusable blocks and a high level tool for SOC design development. An IP library based on AMBA bus architecture was built, featuring a collection of devices with homogeneous interfaces described with VHDL language constructs that enable hardware configurability. A system-level assembler (SLA) was then developed to provide a hardware configuration tool and a suite of utilities to support the designer work. Once defined the system structure, the SLA allows automatic generation of the environments used for software development, simulation, synthesis and verification tasks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Immediate optimization for compressed transport triggered architecture instructions

    Page(s): 65 - 68
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (363 KB) |  | HTML iconHTML  

    Program code size has become a critical design constraint of embedded systems. Code compression is one of the approaches to reduce the program code size; it results in smaller memories and reduced cost of the chip. In this paper, long immediate encoding of compressed transport triggered architecture instructions is optimized to further improve the code density. Six applications taken from different application domains are used for benchmarking. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New adaptive routing algorithm for extended generalized fat trees on-chip

    Page(s): 113 - 118
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (494 KB) |  | HTML iconHTML  

    Network topology and routing algorithm are the issues that have the most significant effect on the performance of packet-switched networks. The throughput of fat trees can be considerably improved by connecting the topmost switches together and by replacing the most usually used minimal (shortest-path) routing algorithms with a new turn back when possible (TBWP) algorithm. This paper shows how the TBWP can also be used in extended generalized fat trees (XGFT) after the modification. The XGFTs are one version of fat trees with improved scalability. Simulation results are also presented so as to show the improved throughput of the modified XGFTs when the TBWP is used in them. This paper concerns also issues related to scalability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic clamping: on-chip dynamic shielding and termination for high-speed RLC buses

    Page(s): 97 - 100
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (360 KB) |  | HTML iconHTML  

    This paper presents a novel approach called dynamic clamping for minimizing crosstalk noise and inductive effects in global buses. A simple circuit is shown that can be used to dynamically shield and terminate high-speed RLC buses. Unlike traditional passive shielding and parallel termination, dynamic clamping has no area overhead and no static power dissipation. Dynamic clamping enables significant reductions in noise (∼35%) and inductive overshoot (∼90%) with a small delay penalty (∼10%). We also propose using bus-invert coding with our approach as dynamic clamping is seen to give excellent results for low to moderate bus activity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CTL based DFT solution to accelerate design to test development for system on chip devices

    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (172 KB)  

    In this presentation, we will explain how CTL has been designed as a very rich and powerful language that can be leveraged in a DFT solution. While CTL has been designed with the primary focus of communicating information between the core providers and the SoC test integrators, it is showing very useful as a communication mechanism form the EDA world to the ATE. CTL does not only support cores tested with the standard scan approach; IP blocks with embedded logic BIST and memories can also be described in CTL. An EDA DFT solution based on CTL enables full automation of scan and BIST at the core level and test integration at the SoC level. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mappability estimate: a measure of the goodness of a processor-algorithm pair

    Page(s): 119 - 122
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (404 KB) |  | HTML iconHTML  

    A quick way of measuring the goodness of a processor-algorithm pair is presented. The main emphasis in this paper is in the reasoning of the mappability factors of a processor and an algorithm. Typical algorithm properties and how they affect the usability of the corresponding architecture characteristics are considered. The mappability estimation approach is demonstrated using MiBench benchmark algorithms and the Simplescalar processor simulator with ARM instruction set. The estimation results are consistent with the simulations and the estimates correctly predicted the most suitable architectures for three of the four algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance of dynamically scheduling VLIW instructions

    Page(s): 7 - 10
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (380 KB) |  | HTML iconHTML  

    This paper evaluates performance of the dynamically instruction scheduled VLIW (DISVLIW) processor architecture. The DISVLIW processor architecture is designed for dynamically scheduling VLIW instructions using dependency information. Features such as explicit parallelism, balanced scheduling effort, and dynamic scheduling of VLIW instructions can be used to provide a sound structure for supercomputing. We simulate the DISVLIW processor architecture and show that the DISVLIW processor performs significantly better than the VLIW processor across various numerical benchmark applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • xICU - in interrupt control unit for a configurable DSP core

    Page(s): 75 - 78
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (354 KB) |  | HTML iconHTML  

    Increasing complexity of SoC applications leads to a strong demand on powerful software programmable embedded cores. Low-cost applications do not allow adding more than one core to the application. Depending on the application a DSP or a microcontroller is used. Therefore DSP cores have to handle interrupts typically served by microcontroller sub-systems also with low latency and small overhead concerning cycle count and code density. This paper describes the architecture of an ICU (interrupt control unit) for a configurable DSP core. The main architectural features of the ICU can be configured to reduce the consumed silicon area to an application specific optimum. Priority morphing is introduced to enable the control of the execution order of pending interrupt sources during run-time and to prevent the loss of interrupt information. A smooth integration into the program sequencer allows short interrupt latency and low overhead for serving ISRs (interrupt service routines). xICU is a part of a configurable DSP core. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A C-based algorithm development flow for a reconfigurable processor architecture

    Page(s): 69 - 73
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (440 KB) |  | HTML iconHTML  

    Reconfigurable processors are an appealing option to achieve high performance and low energy consumption in digital signal processing, but their utilization often involves hardware issues not usual for algorithm developers proficient in high level languages. This paper presents a C-based algorithm development flow for XiRisc, a reconfigurable processor architecture targeted at embedded systems, that couples a VLIW risc core with a custom designed programmable hardware unit optimized for being programmed starting from data flow graph (DFG) descriptions. Starting from C-language, the flow produces both executable codes for the processor core and configuration bits for the embedded programmable unit. The proposed flow was utilized for implementing a set of DSP algorithms on a prototypal 0.18 μm XiRisc test-chip obtaining performance speed-ups up to 10x and energy consumption reduction up to 75%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A programmable platform for software-defined radio

    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (186 KB) |  | HTML iconHTML  

    Integrated circuits for communication systems are usually designed with the primary goal of minimizing silicon area and power consumption. Usually, this yields a solution where each functional block in the algorithm is mapped directly into a dedicated hardware block. The hardware blocks provide limited flexibility by setting of parameters but only as far as it has been specified during the design phase. According to the abovementioned reasons we believe that more flexibility is necessary that requires fully programmable solutions based on digital signal processors (DSP). For the development of an SDR platform architecture our primary design goal is to find the most flexible and easy-to-program solution within a specified power budget for the baseband processing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • VLIW operation refinement for reducing energy consumption

    Page(s): 131 - 134
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (371 KB) |  | HTML iconHTML  

    The demand for mobile computer power has exploded in the recent years. Variable length VLIW processors offer the necessary performance at low power. Software optimizations are necessary to further decrease the energy consumption. In this article we present a compiler optimization which reduces the dynamic power dissipation resulting from the switching activities during instruction fetch. Energy consumption can be reduced by minimizing the Hamming distance between successively fetched instruction words. Using a dynamic programming approach we first compute a set of optimal instruction arrangements of the execution bundles in a basic block. These sets are used in an enumerative optimal algorithm and a genetic evolution, in order to minimize an objective function for the Hamming distance. We evaluated our algorithms on different variable length VLIW architectures with 3 to 6 parallel functional units. On a large set of DSP benchmark programs the Hamming distance can be reduced by about 10% on average. Maximum reductions range up to 30%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of a parametrizable low cost Ethernet MAC core for SoC solutions

    Page(s): 139 - 142
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (335 KB) |  | HTML iconHTML  

    This paper describes an efficient Ethernet medium access control (MAC) design, according to IEEE 802.3 standards for fast Ethernet (100 Mbps) a 10Base-T (10 Mbps) implementations in full and half duplex modes, suitable for re-use as intellectual property (IP) core in system-on-chip (SoC) designs. The description and contributed results exhibit that the main advantages of this design reside in a really low logic cost keeping a great tolerance for external clock frequencies and domains, and a high configurability and flexibility for master processors interface. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A driver load model for capacitive coupled on-chip interconnect buses

    Page(s): 101 - 104
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (332 KB) |  | HTML iconHTML  

    An analytical model is derived for capacitive coupled on-chip interconnect buses. If an estimation of the wire length is available, the driver output can be estimated using the load model and take coupling effects on the interconnect tree into account. The load is modeled as π-network which depends on the first three moments of the exact admittance function of the bus system. Simulations with specterS have shown, that the model represents a good approximation for the driver load of a distributed interconnect bus tree. It is proven mathematically that the model remains passive. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Moustique: smaller than an ASIC and fully programmable

    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (203 KB)  

    Low-cost and fast time-to-market are the most pressing requirements for embedded systems solutions. Silicon Hive's "Moustique" product family of ultra-low-cost and highly programmable cores addresses both needs. An MPEG4 decoder plus a complex de-blocking algorithm can be mapped onto a 0.7 mm2 (0.12μm process) accelerator subsystem, preserving full programmability, and operating with CIF resolution at 30 frames per second. The accelerators are supported by the Hivecc ANSI-C compiler suite, which automatically extracts the intrinsic parallelism. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Updating matrix inverse in fixed-point representation: direct versus iterative methods

    Page(s): 45 - 48
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (362 KB) |  | HTML iconHTML  

    The VLSI implementations of digital signal processing algorithms gain huge performance improvements if fixed-point arithmetic is being used. Inspired by the fact, fixed-point algorithms for both direct and iterative methods to update the inverse of a matrix were implemented and compared. Also, an algorithm to approximate an overdetermined system for an efficient and fast implementation of the Sherman-Morrison formula is proposed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A guaranteed-throughput switch for network-on-chip

    Page(s): 31 - 34
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (351 KB) |  | HTML iconHTML  

    Today's systems on a chip (SoC) contain numerous complex functional blocks integrated by an elaborate network of interconnects and buses. As systems grow in complexity, the on-chip interconnect network is expected to become critical for overall system-level metrics, such as performance, power consumption, reliability etc. However, present day's dedicated channels and shared buses do not scale and therefore do not meet these requirements. The emerging network-on-chip approach, based on on-chip communication network, might solve the problems by A. Jantsch and H. Tenhunen (2003). In this paper, a guaranteed-throughput switch designed for NoC is described. This switch provides in-order delivery and supports multicast operation. It is implemented with random access memory at the input and output. The input and output are then connected by a fully connected interconnect network. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Abstract RTOS modeling for multiprocessor system-on-chip

    Page(s): 147 - 150
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (377 KB) |  | HTML iconHTML  

    In this paper, we present a SystemC-based framework to study the effects of running multi-threaded application software on a multiprocessor platform under the control of one or more abstract real-time operating systems (RTOS's). We propose a modeling framework consisting of basic RTOS service models; scheduling, synchronization, and resource allocation, and a generic task model that is able to model periodic and aperiodic tasks as well as task properties such as varying execution times, offsets, deadlines, and data dependencies. A given multiprocessor system is formed by the composition of RTOS service models and the allocation of tasks (the application software) onto RTOS's. We demonstrate the potential of our approach by simulating and analyzing a small multiprocessor system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mixed static/dynamic profiling for dictionary based code compression

    Page(s): 159 - 163
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (429 KB) |  | HTML iconHTML  

    Many compression techniques have been proposed to accommodate ever increasing software pieces into restricted memory area in embedded systems. Recently, these techniques have been shown to improve other design constraints like energy and performance. This paper proposes a blended dictionary model based on static/dynamic profiling that lead to best trade-offs on compression, performance and energy savings. We also propose a new dictionary based code compression algorithm, independent of the cache organization and processor, to support our experiments. A mix of benchmarks and MiBench suites reveals that compression ratios of 75% can be obtained while decreasing bus accesses to the cache by 31% for the Leon processor. These results approach simultaneously the best solutions of when using pure static or pure dynamic information based dictionaries techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.