Notification:
We are currently experiencing intermittent issues impacting performance. We apologize for the inconvenience.
By Topic

System on Chip (SoC), 2011 International Symposium on

Date Oct. 31 2011-Nov. 2 2011

Filter Results

Displaying Results 1 - 25 of 34
  • [Front cover]

    Publication Year: 2011 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (426 KB)  
    Freely Available from IEEE
  • Foreword

    Publication Year: 2011 , Page(s): 1 - 2
    Save to Project icon | Request Permissions | PDF file iconPDF (317 KB)  
    Freely Available from IEEE
  • Committee

    Publication Year: 2011 , Page(s): 1 - 2
    Save to Project icon | Request Permissions | PDF file iconPDF (317 KB)  
    Freely Available from IEEE
  • Sponsors

    Publication Year: 2011 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (436 KB)  
    Freely Available from IEEE
  • Statistics

    Publication Year: 2011 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (307 KB)  
    Freely Available from IEEE
  • Invited talk abstracts and biographies

    Publication Year: 2011 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (518 KB)  

    Presents abstracts of invited presentations from the conference proceedings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SOC 2011: Advanced program

    Publication Year: 2011 , Page(s): 1 - 4
    Save to Project icon | Request Permissions | PDF file iconPDF (1089 KB)  
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2011 , Page(s): 1 - 2
    Save to Project icon | Request Permissions | PDF file iconPDF (1068 KB)  
    Freely Available from IEEE
  • Automatic calibration of streaming applications for software mapping exploration

    Publication Year: 2011 , Page(s): 136 - 142
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (599 KB) |  | HTML iconHTML  

    Streaming models have lately gained a lot of interest in embedded software design as they closely resemble computation of signal processing applications typically found in wireless and multimedia domains. To map streaming applications ontoMPSoCs (Multi-Processor System-on-Chips) efficiently, programmers need not only to validate software but also to estimate the performance of their software accurately. Therefore, fast MPSoC virtual platforms which support fully functional execution of software with good timing accuracy are required. In this paper, we propose a tool-flow to construct such MPSoC virtual platforms. The key idea is to annotate timing of sequential execution of streaming applications automatically by calibration in a configurable abstract MPSoC virtual platform. A case study of applying the tool-flow to a real-life heterogeneous MPSoC, TI's OMAP, has been conducted to prove the tool-flow's feasibility and show good accuracy of the calibrated virtual platform for software mapping exploration. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Building a RTOS for MPSoC dataflow programming

    Publication Year: 2011 , Page(s): 143 - 146
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1049 KB) |  | HTML iconHTML  

    Multiprocessor Systems-on-Chip (MPSoC) are becoming the standard high performance Digital Signal Processing (DSP) systems. Hardware complexity abstraction is needed to enable efficient MPSoC programming. A major challenge of MPSoC programming is efficiently handling the combination of new features necessary in a MPSoC operating system: load balancing and efficient use of the parallel resources, with the more traditional features of Real-Time Operating Systems (RTOS): resource sharing between applications, task priorities and reactivity to events. This paper presents a method to combine dataflow methods and RTOS features. The resulting system prototypes an RTOS for symmetric multiprocessing MPSoCs whose inputs are dataflow graphs of applications. The prototype is built on the μC/OS-II RTOS. Experimental results are given on a 3GPP Long Term Evolution algorithm executed on a 4-core MPSoC. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Impact of proactive temperature management on performance of Networks-on-Chip

    Publication Year: 2011 , Page(s): 116 - 121
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (505 KB) |  | HTML iconHTML  

    With the progress of deep submicron technology power consumption and temperature related issues have become dominant factors for chip design. Therefore, very large-scale integrated systems like Systems-on-Chip (SoCs) are exposed to an increasing thermal stress. On the one hand, this necessitates effective mechanisms for thermal management. On the other hand, appliance of thermal management is accompanied by disturbance of system integrity and degradation of system performance. In this paper we propose to precompute and proactively manage on-chip temperature of systems based on Networks-on-Chip (NoCs). Thereby, traditional reactive approaches, utilizing the NoC infrastructure to perform thermal management, can be replaced. This results not only in shorter response times for appliance of management measures and therefore in a reduction of temperature and thermal imbalances, but also in less impairment of system integrity and performance. Simulations show that proactive management achieves improvements of nearly 150% regarding reduction of average temperature inside a 3×3 NoC compared to identical reactive approaches, while mitigating additional delay for packet transmission by more than 50%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Co-designs of parallel Rijndael

    Publication Year: 2011 , Page(s): 72 - 77
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (186 KB) |  | HTML iconHTML  

    State-of-the-art Field Programmable Gate Arrays (FPGAs) have inspired the innovation of hardware/software co-design methodologies that provide a high-level of abstraction in the design process. In this paper, we explore the effectiveness of a formal methodology in the co-design of parallel versions of the Rijndael cryptographic algorithm. The investigated methodology employs the functional paradigm for specifications, derived concurrency, and hardware mapping. Several implementations are developed with different performance characteristics. The refined designs are tested under RC-1000 reconfigurable computer with its two million gates FPGA. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A set of traffic models for Network-on-Chip benchmarking

    Publication Year: 2011 , Page(s): 78 - 81
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (504 KB) |  | HTML iconHTML  

    This paper presents a set of 9 application traffic models for benchmarking Networks-on-Chip designs. Common benchmarks allow fair comparison, reproduction of research results, and accelerate NoC development. The set is based on real applications found in literature and executable on freely available benchmarking tool called Transaction Generator (TG). It was found that traffic target distribution is far from uniform and bandwidth requirements vary very much between tasks and between applications. TG and the model source codes are freely available. Models are stored in XML format and are based on task graphs with on average 15 processing tasks. The average communication load is about 2 GByte/s. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analyzing synchronous dataflow scenarios for dynamic software-defined radio applications

    Publication Year: 2011 , Page(s): 14 - 21
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (272 KB) |  | HTML iconHTML  

    Contemporary embedded systems for wireless communications support various radios. A software-defined radio (SDR) is a radio implemented as concurrent software processes that typically run on a multiprocessor system-on-chip (MPSoC). SDRs are real-time streaming applications with throughput requirements. One efficient approach for timing analysis of concurrent real-time applications is the dataflow model of computation (MoC). Nonetheless, the dataflow modeling of SDRs is challenging due to their dynamically changing data processing workload. A dataflow MoC that is not expressive enough to capture this dynamism gives pessimistic throughput results. On the other hand, if it is too expressive and detailed, it may not be analyzable at all. In this paper, we address the challenge of dataflow modeling of SDRs such that their timing behavior can be accurately analyzed to guarantee real-time requirements without unnecessarily over-allocating MPSoC resources. The basis of our modeling approach is splitting the dynamic data processing behavior of a SDR into a group of static modes of operation. Each static mode of operation is then modeled by a Synchronous Dataflow (SDF), which we refer to as scenario. This paper has two main contributions: 1) a scenario-based dataflow model of Long Term Evolution (LTE), which is the latest standard in cellular communication, and 2) investigation of existing throughput analysis techniques of SDF scenarios for our LTE model. Our results show that scenario-based worst-case throughput computation is 2 to 3.4 times more accurate than a state-of-the-art SDF analysis technique. Our investigation also shows that existing timing analysis techniques of SDF scenarios have very low run-time that scales very well with increase in graph size. This makes SDF scenarios suitable in practice for modeling and analyzing SDRs as well as similar dynamic applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analyzing transport and MAC layer in system-level performance simulation

    Publication Year: 2011 , Page(s): 1 - 8
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1639 KB) |  | HTML iconHTML  

    The modern mobile embedded devices support complex distributed applications via heterogeneous multi-core platforms. For the successful deployment of these applications, the scalability and performance analysis must be performed at all the layers of OSI model. This helps to identify the potential bottlenecks at different layers to perform the necessary optimizations. To achieve this goal, a framework is needed which accurately models the functionalities at different layers. The technical contributions described in this article include the extensions of ABstract inStruction wOrkLoad & execUtion plaTform based performance simulation (ABSOLUT) for the performance and scalability analysis of Transport and Medium Access Control (MAC) layers in the system level performance simulation. The article elaborates the design accuracy of the modeled components and their application in the context of M3 (multi-device, multi-vendor, multi-domain), which is a tri-layered conceptual interoperability architecture for embedded devices. These extensions pave the way towards the full coverage of the OSI model in the system-level performance simulation of distributed embedded systems. The network simulators for example ns-2, OMNeT++ and OPNET though provide detailed models of transport and MAC protocols but do not provide any framework such that these models can be used by the application workload models to mimic the real world use-cases. Also these models do not model the execution workload of these protocols on a particular execution platform and hence cannot be used in the architectural exploration of distributed embedded systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effects of loop unrolling and use of instruction buffer on processor energy consumption

    Publication Year: 2011 , Page(s): 82 - 85
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (127 KB) |  | HTML iconHTML  

    In the area of Embedded Systems, instruction memories are one of the critical components consuming significant amounts of energy. Existence of a relation between size of the compiled program, and consequently required size of the instruction memory, and the compiler optimization flags is well-known. In particular, loop transformations such as loop unrolling, while having potential to increase performance dramatically, often cause unreasonable growth in the size of the required instruction memory, causing loss of benefit of lower cycle count from overall system energy point of view. One method how to decrease energy consumption of the memories is use of instruction buffers. Often executed loops are stored in the buffer and executed from there, while main memory is not read. In this paper, we show how the compiler flag, controlling loop unrolling, influences the structure of the loops in the program. While unrolling improves performance, unrolled loops can disappear from the program completely, or grow to unreasonable size where use of instruction buffer brings no benefits from the energy point of view. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A hybrid model of speculative execution and scout threading for auto-memoization processor

    Publication Year: 2011 , Page(s): 22 - 28
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (271 KB) |  | HTML iconHTML  

    We have proposed an auto-memoization processor based on computation reuse, and merged it with speculative multi-threading based on value prediction into a parallel speculative execution. In the parallel speculative execution model, speculative cores do not work when the target instruction region is not suitable for computation reuse. This paper proposes a new parallel speculative execution model where the idle speculative cores execute scout threads for reducing cache miss penalties. The scout thread is based on value prediction, and can handle an instruction region which accesses the addresses with several strides. It also can reduce execution cycles by raising computation reuse ratio. The result of the experiment with SPEC CPU95 FP suite benchmarks shows that the new hybrid model of parallel speculative execution and scout threading improves the maximum speedup from 40.6% to 41.3%, and the average speedup from 15.0% to 19.1%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Applying IP-XACT in product data management

    Publication Year: 2011 , Page(s): 86 - 91
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (732 KB) |  | HTML iconHTML  

    Key challenge for industrial embedded system companies is product data management (PMD) of rapidly changing requirements, new platforms and own legacy intellectual property. This can be alleviated with IP component reuse methods, platform based design, and Model Driven Development (MDD) methodologies. We propose an open source product integration environment suitable for small and mid-size enterprises (SME) utilizing FPGAs. We extend the use of IP-XACT standard from HW integration at IP-level to board and chip level designs, describe SW as IP-XACT metadata objects, and add reusability status to objects for PDM. We do not modify or extend IP-XACT metadata format to maintain compatibility with standard compliant tools. Instead, we propose naming conventions and usage profiles. An exemplar FPGA-product shows feasibility in practice in our Kactus2 tool. The benefits are covering complete products including SW objects and HW/SW mappings while strictly keeping the standard XML format. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Static analysis method for deadlock detection in SystemC designs

    Publication Year: 2011 , Page(s): 42 - 47
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1103 KB) |  | HTML iconHTML  

    One of the goals of SystemC is high level system design verification at the early stage. Currently, simulation is widely used for this purpose. As the level of design parallelism grows, efficiency of simulation-based verification methods decreases. Thus different formal verification methods for SystemC are actively researched. In this paper we present an approach to deadlock detection in SystemC designs based on static code analysis. Our approach to static analysis considers SystemC scheduler semantics. The developed approach has been implemented in Deadlock Analyzer tool. We demonstrate efficiency of our tool by applying it to dining philosophers, crossroads, producer-consumer cases and to a real-life model of video accelerator. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Increasing energy efficiency of automotive E/E-architectures with Intelligent Communication Controllers for FlexRay

    Publication Year: 2011 , Page(s): 92 - 95
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (550 KB) |  | HTML iconHTML  

    When a modern vehicle is in use, its interconnected Electronic Control Units (ECUs) are effectively always on. By temporarily deactivating those ECUs, whose functions are not needed, we can contribute to a vehicle's energy efficiency and lower its fuel consumption as well as its emissions. FlexRay is an automotive bus system typically used to interconnect high-performance ECUs. FlexRay, however, does not support the deactivation of single nodes. In this paper, we introduce the concept of Intelligent Communication Controllers, which allow us to perform a demand dependent deactivation of FlexRay ECUs. We also outline an FPGA based prototypical implementation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An automatic experimental set-up for robustness analysis of designs implemented on SRAM FPGAS

    Publication Year: 2011 , Page(s): 96 - 101
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (232 KB) |  | HTML iconHTML  

    This paper introduces an experimental test-flow for evaluating the susceptibility of SRAM based FPGA designs to SEU (Single Event Upsets). Using this method it is possible to cover both SEUs and MBU (Multiple Bit Upsets) in the configuration memory of Xilinx FPGAs for applications based on tiny soft microprocessors. The introduced test-flow imposes a minimal effort to the system developer and achieves a good estimation on the percentage of critical bits in the configuration memory of a design. This flow is executed for a design using multiple tiny soft microprocessors and the reliability values extracted by the test-flow are compared to non-experimental estimation techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Customizable Datapath Integrated Lock Unit

    Publication Year: 2011 , Page(s): 29 - 33
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (701 KB) |  | HTML iconHTML  

    Multicore Application-Specific Instruction-Set Processors (MCASIP) offer an interesting alternative for implementing parallel applications in MPSoCs. Flexible MCASIP architecture templates allow matching the instruction and task level parallelism provided by the processor to the requirements of the application at hand. The processing throughput provided by shared memory (SM) multicores is commonly limited by the SM bandwidth. Synchronizing the execution of multiple threads using lock variables residing in the SM further adds to the bottleneck. In this paper we present a technique to reduce the SM contention in the case of MCASIPs where application-specific hardware customization can be used. The proposed solution is to use customized Datapath Integrated Lock Units (DILU) that enable the implementation of light weight synchronization primitives which minimize SM traffic. The paper presents an experiment with a 48-core MCASIP which shows that the SM impact of the proposed fast barrier based on DILU in comparison to a basic SM polling one is up to 64% smaller. The size of the DILU hardware is negligible. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bringing Network-on-Chip links to 45nm

    Publication Year: 2011 , Page(s): 122 - 127
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (540 KB) |  | HTML iconHTML  

    The literature lacks of a comprehensive overview of achievable NoC link performance when key parameters are swept in the link microarchitecture and in the NoC floorplan. This paper bridges this basic gap while at the same time capturing how link performance is affected by the migration from a 65nm to a 45nm technology node. Finally, it identifies the requirements on EDA tools to keep up with the technology scaling. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SAMOSA: Scratchpad aware mapping of streaming applications

    Publication Year: 2011 , Page(s): 48 - 55
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (315 KB) |  | HTML iconHTML  

    Scratchpad memories have now emerged as an alternative to caches for energy constrained embedded systems. However, effectively mapping data on them while considering energy/timing trade-offs remains a challenge. We present SAMOSA as a technique for mapping streaming applications to scratchpad based MPSoCs. The contribution of this approach is a representation and transformation of the mapping problems - buffer dimensioning and allocation - to a constraint-based optimization problem. SAMOSA was used to explore energy-execution time trade-offs for mapping the H.264 decoder to a scratchpad-based MPSoC. Results show that scratchpad awareness has significant impacts on the energy-execution time trade-offs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A coarse-grained reconfigurable protocol processor

    Publication Year: 2011 , Page(s): 102 - 107
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (189 KB) |  | HTML iconHTML  

    Trade-off between flexibility and performance became an important factor for characterizing modern protocol processing architectures. While some solutions tend to be more flexible and less computational efficient like GPPs, other solutions like custom ASIC devices provide high computational efficiency while loosing the ability to cope with the diversity of current and evolving protocols. We propose a reconfigurable protocol processor that is flexible and highly adaptable to the needs of the required protocol with the ability to operate individually or as a multi-core integrating processors. We show how a common protocol processing task that consumes one third of RISC CPU time can be performed on our processor at high speed and low energy cost. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.