By Topic

Digital System Design: Architectures, Methods and Tools, 2006. DSD 2006. 9th EUROMICRO Conference on

Date Aug. 30 2006-Sept. 1 2006

Filter Results

Displaying Results 1 - 25 of 101
  • 9th EUROMICRO Conference on Digital System Design - Cover

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (35 KB)  
    Freely Available from IEEE
  • 9th EUROMICRO Conference on Digital System Design - Title

    Page(s): i - iii
    Save to Project icon | Request Permissions | PDF file iconPDF (115 KB)  
    Freely Available from IEEE
  • 9th EUROMICRO Conference on Digital System Design - Copyright

    Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (96 KB)  
    Freely Available from IEEE
  • 9th EUROMICRO Conference on Digital System Design - TOC

    Page(s): v - xi
    Save to Project icon | Request Permissions | PDF file iconPDF (121 KB)  
    Freely Available from IEEE
  • Message fromthe Program Chair

    Page(s): xii
    Save to Project icon | Request Permissions | PDF file iconPDF (55 KB)  
    Freely Available from IEEE
  • Conference Committees

    Page(s): xiii - xiv
    Save to Project icon | Request Permissions | PDF file iconPDF (64 KB)  
    Freely Available from IEEE
  • The Challenges for High Performance Embedded Systems

    Page(s): 3 - 7
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (111 KB) |  | HTML iconHTML  

    Consumer electronics devices traditionally rely on non-programmable circuits for their "streaming" part. Recent demands on flexibility moved the balance towards the use of programmable components. However, there is a major gap between the current programmable processors and the actual requirements of applications. To bridge this gap, it is necessary to use parallel architectures consisting of multiple, programmable compute blocks, specifically designed for efficient processing of data streams. Programming those architectures poses major challenges and requires appropriate tools. Managing the ever-increasing complexity of those embedded systems is certainly one of the most important challenges beside power consumption. Complexity will make systems unreliable and unpredictable. The new technology nodes (65nm, and below) will also bring their own additional challenges: the global interconnect delay that does not scale, the predominant leakage current and the increasing variability of components View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Digital RF

    Page(s): 8
    Save to Project icon | Request Permissions | PDF file iconPDF (61 KB)  
    Freely Available from IEEE
  • Deep Sub-100 nm Design Challenges

    Page(s): 9 - 16
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (497 KB) |  | HTML iconHTML  

    This paper will describe the problems in the design and development of deep sub-100 nm system LSI's and/or SoC's from different aspects. One of the most difficult problems is the large power consumption, in both active and stand-by modes. Another problem is how to improve the efficiency in the development of large scale chips and related softwares. Lithography, that has been getting harder and harder, is also an issue. It directly impacts the chip fabrication yield. Several approaches to counteract these problems mentioned above will be discussed; various low power technologies from device, circuit to architecture view points, high-level language based design flow and platform based IP reuse, and DFM (design for manufacturing) related technologies View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New Directions in Mobile Device Architectures

    Page(s): 17 - 26
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1348 KB) |  | HTML iconHTML  

    Mobile industry will face several disruptive changes in coming years. Emerging digital convergence will bring new exiting multifunctional products for consumers but it also put new requirements for the product development. Shift from vertical to horizontal mode will bring deep impact to the R&D of the whole mobile industry. Emerging dominant platforms and architectures will challenge the whole community to maintain the spirit of open innovation instead of ending up to dominance of one or two key players. In this presentation we are considering the challenges and requirements for the future architectures, and open up our thinking what are the right architectural directions to go View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robustness in SOC Design

    Page(s): 27 - 36
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (231 KB) |  | HTML iconHTML  

    Embedded systems, ubiquitous computing and networked architectures are getting more and more important within our society. System parts are often completely implemented as integrated circuits (SoC = system on chip). Consequently, their complexity and heterogeneity have grown dramatically in the recent past. Moreover, embedded systems are used in environments where parameters are subject to continuous changes. Hence, they have to respond to environmental requirements and changes of their own system parameters in a robust manner. To gain this robustness and to cope with the design methodology, formal measures and metrics are of great importance. Such measures need to be combined with the still increasing requirement for computing performance. The implementation of robust features requires adaptivity by reconfiguration and parallelism. We will call the corresponding systems adaptive computing systems (ACS). The ACS class offers the opportunity to adapt the whole architecture or parts of the architecture to the changing needs of applications or changing environments. The paper addresses some of these aspects and presents some ideas for modelling and designing adaptive computing systems (ACS). Especially measures, metrics and taxonomies for reliability, adaptivity and robustness are analysed and discussed. Robust behaviour of electronic systems will contribute to significantly higher trust of the society in modern technology. Therefore it is of very high economical relevance for industry and commerce View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards Performance-Oriented Pattern-Based Refinement of Synchronous Models onto NoC Communication

    Page(s): 37 - 44
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (198 KB) |  | HTML iconHTML  

    We present a performance-oriented refinement approach that refines a perfectly synchronous communication model onto network-on-chip (NoC) communication. We first identify four basic forms of NoC process interaction patterns at the process level, namely, producer-consumer, peers, client-server, and multicast. We propose a three-step top-down refinement method: channel refinement, protocol refinement and channel mapping. For the producer-consumer pattern, we describe it in detail. In channel refinement, we deal with interfacing multiple clock domains and use a stochastic process to model channel delay and jitter. In protocol refinement, we show how to refine communication towards application requirements such as reliability and throughput. In channel mapping, we discuss channel convergence and channel merge arising from channel overlapping. All the refinements have been conducted and validated as an integral design phase towards implementation in ForSyDe, a formal system-level design methodology based on a synchronous model of computation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Resource-Efficient Routing and Scheduling of Time-Constrained Network-on-Chip Communication

    Page(s): 45 - 52
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (325 KB) |  | HTML iconHTML  

    Network-on-chip-based multiprocessor systems-on-chip are considered as future embedded systems platforms. One of the steps in mapping an application onto such a parallel platform involves scheduling the communication on the network-on-chip. This paper presents different scheduling strategies that minimize resource usage by exploiting all scheduling freedom offered by networks-on-chip. Our experiments show that resource-utilization is improved when compared to existing techniques View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Cache Coherency and Memory Consistency Issues in NoC Based Shared Memory Multiprocessor SoC Architectures

    Page(s): 53 - 60
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (230 KB) |  | HTML iconHTML  

    The concept of network on chip (NoC) is a recent breakthrough in the system on chip (SoC) design area. A lot of work has been done to define efficient NoC architectures and implementations. In this paper, our goal is twofold. Firstly, we want to outline that the use of a NoC based shared-memory multiprocessor SoC challenges the application integrator because of the underlying assumptions of software, namely cache coherency and memory consistency. These problems are well known in general purpose shared memory multiprocessors. However, when designing a SoC, we benefit on the one hand from the knowledge of the applications, the much simpler usage of virtual memory, lower interconnect latencies and very high bandwidth at lost cost, but on the other hand we suffer from more tight design constraints (yield, power, predictable performances, ...). Secondly, we define simple and yet attractive solutions - in term of design time and hardware cost - to both problems in the context of application specific multiprocessor SoCs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Partition Based Dynamic 2D HW Multitasking Management

    Page(s): 61 - 70
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (201 KB) |  | HTML iconHTML  

    The design of computing systems is facing an interesting challenge with the opportunity to include runtime reconfigurable (RTR) devices in them. Operating systems (OS) need to be extended with functionalities that allow to efficiently manage such devices. We present a simple and fast algorithm for the management of FPGA area in a general-purpose computing system with hardware multitasking. It divides the device area into four partitions with different sizes. Each partition has an associated queue where the hardware manager places each arriving task depending on its size, shape and deadline requirements. Rectangular tasks may be rotated when necessary, and partition merging done if needed for tasks not fitting any partition. The queue selection criterium and the size of the partitions may be changed during run-time in order to adapt algorithm behaviour to different circumstances. This is a constant complexity algorithm and we will show experimental results that prove it may compete in performance with other algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Global Analysis of Resource Arbitration for MPSoC

    Page(s): 71 - 78
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (446 KB) |  | HTML iconHTML  

    Modern day applications require use of multi-processor systems for reasons of scalability and power efficiency. As more and more applications are integrated on a single device, mapping and analyzing them on a multi-processor system becomes a multi-dimensional problem. Each possible set of applications that can be active simultaneously leads to a different use-case (also referred to as scenario) that the system has to be verified and tested for. Analyzing the feasibility and resource utilization of all possible use-cases is very demanding and often infeasible. In this paper, we highlight the issue of composability, i.e. being able to analyze applications in isolation while still reason about their overall behavior. We observe that arbitration plays an important role in this analysis. We compare two simple, yet commonly used arbitration mechanisms, and highlight the properties that are important for such analysis. We conclude that none of this arbitration mechanism is ideal for such an analysis and propose some variations to make them more suited for the analysis View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-Efficient Cache Coherence for Embedded Multi-Processor Systems through Application-Driven Snoop Filtering

    Page(s): 79 - 82
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (146 KB) |  | HTML iconHTML  

    Maintaining local caches coherent in bus-based multiprocessor systems results in significantly elevated power consumption, as the bus snooping protocols require local cache lookups for each memory reference placed on the common bus. Such a conservative approach is warranted in general-purpose systems, where no prior knowledge regarding the communication structure between threads or processes is available. In such a general-purpose context the assumption is that each memory request is potentially a reference to a shared memory region, which may result in cache inconsistency, if no correcting activities are undertaken. The approach we propose exploits the fact that in embedded systems, important knowledge is available to the system designers regarding communication activities between tasks allocated to the different processor nodes. We demonstrate how the snoop-related cache probing activity can be drastically reduced by identifying in a deterministic way all the shared memory regions and the communication patterns between the processor nodes. Cache snoop activity is enabled only for the fraction of the bus transactions, which refer to locations belonging to known shared memory regions for each processor node; for the remaining larger part of memory references known to be of no relation to the given processor node, snoop probings in the local cache are completely disabled, thus saving a large amount of power. The experiments which we have performed on a number of important applications demonstrate the effectiveness of the proposed approach View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comparison of GALS and Synchronous Architectures with MPEG-4 Video Encoder on Multiprocessor System-on-Chip FPGA

    Page(s): 83 - 88
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (110 KB) |  | HTML iconHTML  

    In large system-on-chip (SoC) architectures, balancing the clock network is increasingly difficult. Globally asynchronous locally synchronous (GALS) removes the need for global clock net, and also provides efficient means for managing the complexity and re-use in large architectures. However, quantitative comparisons of GALS against similar synchronous structures are rare for full SoC architectures. In this paper, we compare our SoC GALS architectures to a synchronous architecture with a fully functional MPEG-4 video encoder on FPGA. The results show that the area and performance overhead of GALS is only 1%. That is negligible compared to the benefits of the GALS architecture such as multiple clock frequencies for intellectual property (IP) blocks and dynamic frequency/voltage scaling, clock tree removal, and re-usability. Our architecture does not require modifications to the IP blocks already used with synchronous architectures, providing an ideal solution for rapid switch to GALS architecture View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-Bank Main Memory Architecture with Dynamic Voltage Frequency Scaling for System Energy Optimization

    Page(s): 89 - 96
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (207 KB) |  | HTML iconHTML  

    Several techniques were developed to reduce processor consumption which was the predominant source of dissipation. However with the technology evolution and the development of new applications that make heavy use of large memory data size, the energy savings obtained by these techniques become limited. In this article we showed that dynamic voltage frequency scaling technique (DVFS) increases the main memory consumption. A multi-banked memory architecture, having the capability of setting banks in low power modes when they are not accessed, is adopted to reduce the memory consumption. An approach of tasks allocation and banks configuration reducing the memory energy is developed at system level for multi-task and real-time systems. Experimental results show that, when we combined DVFS technique with an efficient multi-bank architecture and tasks to banks allocation, a system energy saving that reaches 35% is obtained View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Monitoring-Aware Network-on-Chip Design Flow

    Page(s): 97 - 106
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (185 KB) |  | HTML iconHTML  

    Networks-on-chip (NoC) are a scalable interconnect solution for systems on chip and are rapidly becoming reality. Monitoring is a key enabler for debugging or performance analysis and quality-of-service techniques. The NoC design problem and the NoC monitoring problem cannot be treated in isolation. We propose a monitoring-aware NoC design flow able to take into account the monitoring requirements in general. We illustrate our flow with a debug driven monitoring case study of transaction monitoring. By treating the NoC design and monitoring problems in synergy, the area cost of monitoring can be limited to 3-20% in general View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Run-Time Re-configurable Parametric Architecture for Local Neighborhood Image Processing

    Page(s): 107 - 115
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (756 KB) |  | HTML iconHTML  

    We propose a run-time re-configurable parametric architecture (fabric) for local neighborhood image processing. The proposed architecture is composed of polymorphous cells where each cell accesses neighborhood data from a local cell memory, and executes a neighborhood function sequentially. The architecture is flexible since different neighborhood functions can be implemented by rewriting a cell's software micro-code. High throughput is achieved because many cells execute concurrently. We show that for a satellite image feature extraction application, our architecture, implemented on Stratix II and Virtex 2 field programmable gate arrays, achieves similar performance, hardware resource utilization, and throughput as a fully pipelined systolic array architecture, yet offers imp roved flexibility to the developer. We compare and contrast these two architectures for their usability to the image processing community View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Hardware IP-Core for Information Retrieval

    Page(s): 115 - 122
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (279 KB) |  | HTML iconHTML  

    With the ever increasing amounts of information stored on the Web or archived within computing systems, high performance data processing architectures are required to process this data in real time. The aim of the work presented in this paper is the development of a hardware text mining IP-Core for use in FPGA based systems. In this paper we describe the development of our text processing hardware pipeline, with the addition of a complex word stemming and loadable stop list stages. The performance of this system is then compared to our initial prototype and an equivalent software implementation using the Lucene software library View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Thermal-Aware Scheduling: A Solution for Future Chip Multiprocessors Thermal Problems

    Page(s): 123 - 126
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (137 KB) |  | HTML iconHTML  

    The increased complexity and operating frequency in current microprocessors is resulting in a decrease in the performance improvements. In order to keep up with the expected performance gains, major manufacturers have started to offer chip-multiprocessor architectures. Nevertheless, the integration of several cores on the same chip leads to increased heat dissipation and consequently additional costs, decrease of the reliability, and performance loss, among others. In this paper we propose thermal-aware scheduling (TAS) a technique that aims to minimize all these problems. When assigning processes to cores, TAS takes their temperature into account avoiding thermal violation events. As a side effect, the performance is improved. Simulation results show that for a 25-core CMP, a simple TAS heuristic reduces the performance loss that is introduced by excessive temperature, from 52% to 18%. At the same time, TAS decreases the chip's temperature by 2.6degC View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluating Dataflow and Pipelined Vector Processing Architectures for FPGA Co-processors

    Page(s): 127 - 130
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (117 KB) |  | HTML iconHTML  

    This paper describes the development of FPGA based co-processor architecture for accelerating vector comparisons e.g. Euclidean distance. In this paper we compare traditional pipelined and data/low implementations, in terms of processing speed and area requirements. Processing performance is compared against a software implementation to evaluate possible speedup View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Solving the Fundamental Problem of Digital Design - A Systematic Review of Design Methods

    Page(s): 131 - 138
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (200 KB) |  | HTML iconHTML  

    During the last decade various asynchronous circuit structures and design methods have been proposed that seem to be quite different. In essence, however, all these methods contribute to solving the same fundamental design problem in one way or another. In this paper we use a simple communication model to figure out what this fundamental design problem actually is and to highlight its roots. We show how each of the related sub-problems can be conceptually solved in the time domain and in the information domain. Having this model in mind we finally develop a common framework to classify the most popular asynchronous design methods and figure out which sub-problem they actually solve and in which respect they differ from each other View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.