Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

Hardware/Software Codesign and System Synthesis, 2005. CODES+ISSS '05. Third IEEE/ACM/IFIP International Conference on

Date Sept. 2005

Filter Results

Displaying Results 1 - 25 of 61
  • Hardware and software architectures for the CELL processor

    Publication Year: 2005 , Page(s): 1
    Cited by:  Papers (2)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (108 KB) |  | HTML iconHTML  

    The Cell processor is a first instance of a new family of processors intended for the broadband era. The processors will find early use in game systems (PlayStation3TM), a variety of other consumer electronics applications, a wide variety of embedded applications, and various forms of computational accelerators. Cell is a non-homogeneous multi-core processor, with one POWER processor core (two threads) dedicated to the operating system and other control functions, and eight synergistic processors optimized for compute-intensive applications.Cell addresses two of the main limiters to microprocessor performance: increased memory latency, and performance limitations induced by system power limits. Memory latency is addressed by introducing another software-managed level of private "local" memory, in between the private registers and shared system memory. Data is transferred between this local memory and shared memory with asynchronous cache coherent DMA commands, and synergistic processor load and store commands access the local store only. This organization of memory makes it possible for the Cell processor to have over 100 memory transactions in flight at the same time, more than enough to cover memory latency. Power limitations are addressed by two main mechanisms; a non-homogeneous multi-core organization, and an ultra high-frequency design that allows the chip to be operated at 3.2GHz at low voltage.The Cell processor supports many of today's programming models by introducing the concept of heterogeneous tasks or threads. Both Power processor and SPE based threads can be managed by the operating system and effectively utilized by applications starting with the relatively straightforward function offload model to the more complex single source heterogeneous parallel programming model. Cell achieves between one and two orders of magnitude of performance advantage over conventional single-core processors on compute-intensive (32-bit) applications, by permitting prog- rammers and compilers explicit control over instruction scheduling, data movement and the use of a large register file. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance and power analysis of computer systems

    Publication Year: 2005 , Page(s): 2
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (99 KB) |  | HTML iconHTML  

    This tutorial will present an overview of techniques for architectural-level performance and power analysis of computer systems. It starts with a discussion of metrics for both performance and power, followed by an overview of some widely used benchmarks including SPEC, Mediabench, and MiBench. It then illustrates the use of these benchmarks with some published performance results. After this initial overview, the tutorial will focus on a discussion of architectural simulators to measure performance and power.Architectural simulators model systems on a (clock) cycle-by-cycle basis. Their operation will be illustrated with two popular examples: SimpleScalar and M5. Besides performance analysis, these simulators can be extended to include power estimation. Full simulations of complete applications can be extremely time consuming. The tutorial will explain how sampling techniques can be used to reduce simulation time. Finally, it will conclude with a discussion on the accuracy that can be expected from architectural simulators. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The challenges of embedded system design

    Publication Year: 2005 , Page(s): 3
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (111 KB)  

    As complexity grows the importance of system level exploration, design, verification and debug becomes inescapable. However while there has been great progress in process technology, tools and methodologies resulting in ever more sophisticated platforms some of the critical design parameters have been moving in the wrong direction. Power is becoming a critical challenge that will require new solutions for both design and manufacturing if the roller coaster ride is to continue. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Future processors: flexible and modular

    Publication Year: 2005 , Page(s): 4 - 6
    Cited by:  Papers (6)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (47 KB) |  | HTML iconHTML  

    The ability to continue increasing processor frequency and single thread performance is being severely limited by exponential increases in leakage and active power. To continue to improve system performance, future designs will rely on increasing numbers of smaller, more power efficient cores and special purpose accelerators integrated on a chip. In this paper, we describe how these trends are leading to more modular, SoC-like designs for future processor chips, which can still achieve very high throughput performance while using simplified components and a cost efficient design methodology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Future wireless convergence platforms

    Publication Year: 2005 , Page(s): 7 - 12
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (968 KB) |  | HTML iconHTML  

    As wireless platforms converge to multimedia systems, architectures must converge to support voice, data, and video applications. From a processor architecture perspective, support for signal processing (both audio and video), control code, and Java execution will be required in a convergent device. Traditionally, wireless communications systems have been implemented in hardware. Convergent devices must be able to roam seamlessly across multiple communications systems. To avoid excessive hardware costs, a Software Defined Radio (SDR) approach offers a programmable and dynamically reconfigurable method of reusing hardware to implement physical layer processing. In this paper, we discuss trends in wireless platforms which are inherently convergence platforms. We also present the Sandbridge state-of-the-art example platform that supports both communications and multimedia applications processing. The architecture efficiently executes Java, Digital Signal Processing (DSP), and control code. Architectural features that reduce power dissipation and enable real-time processing are described. All of the communications and multimedia processing is executed completely in software without specialized hardware support. The processor is programmed in C with supercomputer-class compiler support for automatic vectorization, multithreading, and DSP semantic analysis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A core flight software system

    Publication Year: 2005 , Page(s): 13 - 14
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (400 KB) |  | HTML iconHTML  

    No two flight missions are alike, hence, development and on-orbit software costs are high. Software portability and adaptability across hardware platforms and operating systems has been minimal at best. Standard interfaces across applications and/or common applications are almost non-existent. To reduce flight software costs, these issues must be addressed. This presentation describes how the Flight Software Branch at Goddard Space Flight Center has architected a solution to these problems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Conflict analysis in multiprocess synthesis for optimized system integration

    Publication Year: 2005 , Page(s): 15 - 20
    Cited by:  Patents (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (291 KB) |  | HTML iconHTML  

    This paper presents a novel approach for multiprocess synthesis supporting well-tailored module integration at system level. The goal is to extend the local scope of existing architectural synthesis approaches in order to apply global optimization techniques across process bounds for shared system resources (e.g. memories, busses, global ALUs) during scheduling and binding. This allows an area efficient implementation of un-timed or cycle-fixed multiprocess specifications at RT or algorithmic level of abstraction. Furthermore, this approach supports environment-oriented synthesis for optimized module integration by scheduling accesses to global resources with respect to the access schedules of other modules communicating to the same global resources. As a result, dynamic access conflicts can be avoided by construction, and hence, there is no need for dynamic arbitration of bus and memory accesses with potentially unpredictable timing behavior. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A cycle-accurate compilation algorithm for custom pipelined datapaths

    Publication Year: 2005 , Page(s): 21 - 26
    Cited by:  Papers (11)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (287 KB) |  | HTML iconHTML  

    Traditional high level synthesis (HLS) techniques generate a datapath and controller for a given behavioral description. The growing wiring cost and delay of today technologies require aggressive optimizations, such as interconnect pipelining, that cannot be done after generating the datapath and without invalidating the schedule. On the other hand, the increasing manufacturing complexities demand approaches that favor design for manufacturability (DFM).To address these problems we propose an approach in which the datapath of the architecture is fully allocated before scheduling and binding. We compile a C program directly to the datapath and generate the controller. We can support the entire ANSI C syntax because the datapath can be as complex as the datapath of a processor. Since there is no instruction abstraction in this architecture we call it No-Instruction-Set-Computer (NISC). As the first step towards realization of a NISC-based design flow, we present an algorithm that maps an application on a given datapath by performing scheduling and binding simultaneously. With this algorithm, we achieved up to 70% speedup on a NISC with a datapath similar to that of MIPS, compared to a MIPS gcc compiler. It also efficiently handles different datapath features such as pipelining, forwarding and multi-cycle units. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Highly flexible multi-mode system synthesis

    Publication Year: 2005 , Page(s): 27 - 32
    Cited by:  Papers (2)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (199 KB) |  | HTML iconHTML  

    Multi-mode systems have emerged as an area- and power-efficient approach to implementing multiple time-wise mutually exclusive algorithms and applications in a single hardware space. These systems have limited flexibility and temporal separation between modes is achieved by changing only the dataflow between components. This paper presents a synthesis methodology for integrating flexible components and controllers into primarily fixed logic multi-mode systems thereby increasing their overall flexibility and efficiency. The components are built using a technique called small-scale reconfigurability that provides the necessary flexibility without the penalties associated with general-purpose reconfigurable logic. The reconfiguration latency is small enabling both inter-mode and intra-mode reconfiguration of components. Datapath and controller area and power consumption are reduced beyond what is provided in current multi-mode systems using this methodology, without sacrificing performance. The results show an average 7% reduction in datapath component area, 26% reduction in register area, 36% reduction in interconnect MUX cost, and a 68% reduction in the number of controller signals for a set of benchmark 32-bit signal processing applications. There is also an average 38% increase in component utilization. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-efficient address translation for virtual memory support in low-power and real-time embedded processors

    Publication Year: 2005 , Page(s): 33 - 38
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (147 KB) |  | HTML iconHTML  

    In this paper we present an application-driven address translation scheme for low-power and real-time embedded processors with virtual memory support. The power inefficiency and nondeterministic execution times of address-translation mechanisms have been major barriers in adopting and utilizing the benefits of virtual memory in embedded processors with low-power and real-time constraints. To address this problem, we propose a novel, Customizable Translation Table (CTT) organization, where application knowledge regarding the virtual memory footprint is used in order to eliminate conflicts in the hardware translation buffer and, thus, achieve tag-free address translation lookups. The set of virtual pages is partitioned into groups, such that for each group only a few of the least significant bits are used as an index to obtain the physical page number. We outline an efficient compile-time algorithm for identifying these groups and allocate their translation entries optimally into the CTT. The proposed methodology relies on the combined efforts of compiler, operating system, and hardware architecture to achieve a significant power reduction. The experiments that we have performed on a set of embedded applications show power reductions in the range of 55% to 80% compared to a general- purpose Translation Lookaside Buffer (TLB). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automated data cache placement for embedded VLIW ASIPs

    Publication Year: 2005 , Page(s): 39 - 44
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (193 KB) |  | HTML iconHTML  

    Memory bandwidth issues present a formidable bottleneck to accelerating embedded applications, particularly data bandwidth for multiple-issue VLIW processors. Providing an efficient ASIP data cache solution requires that the cache design be tailored to the target application. Multiple caches or caches with multiple ports allow simultaneous parallel access to data, alleviating the bandwidth problem if data is placed effectively. We present a solution that greatly simplifies the creation of targeted caches and automates the process of explicitly allocating individual memory access to caches and banks. The effectiveness of our solution is demonstrated with experimental results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient direct mapped instruction cache for application-specific embedded systems

    Publication Year: 2005 , Page(s): 45 - 50
    Cited by:  Papers (3)  |  Patents (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (256 KB) |  | HTML iconHTML  

    Caches may consume half of a microprocessor's total power and cache misses incur accessing off-chip memory, which is both time consuming and energy costly. Therefore, minimizing cache power consumption and reducing cache misses are important to reduce total energy consumption of embedded systems. Direct mapped caches consume much less power than that of same sized set associative caches but with a poor hit rate on average. Through experiments, we observe that memory space of direct mapped instruction caches is not used efficiently in most embedded applications. We design an efficient cache - a configurable instruction cache that can be tuned to utilize the cache sets efficiently for a particular application such that cache memory is exploited more efficiently by index remapping. Experiments on 11 benchmarks drawn from Mediabench show that the efficient cache achieves almost the same miss rate as a conventional two-way set associative cache on average and with total memory-access energy savings of 30% compared with a conventional two-way set associative cache. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Shift buffering technique for automatic code synthesis from synchronous dataflow graphs

    Publication Year: 2005 , Page(s): 51 - 56
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (113 KB) |  | HTML iconHTML  

    This paper proposes a new efficient buffer management technique called shift buffering for automatic code synthesis from synchronous dataflow graphs (SDF). Two previous buffer management methods, linear buffering and modulo (or circular) buffering, assume that samples are queued in the arc buffers in the arrival order and are accessed by moving the buffer indices. But both methods have significant overhead for general multi-rate systems: the linear buffering method requires large size buffers and the modulo buffering method needs run-time overhead of buffer index computation. The proposed shift buffering method shifts samples rather than moving buffer indices. We develop optimal shift buffering algorithms to minimize the number of shifted samples. Our experimental results show that the proposed algorithm saves up to 90% of performance overhead while requiring the same amount of buffer memory as modulo buffering. Considering the sample copy overhead, shift buffering is applicable when memory size is more crucial than performance overhead, and the shifting overhead is less than the modulo addressing overhead. Another advantage of the shift buffering technique is that it supports the library code written with the linear buffering assumption, which is practically more important. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation of dynamic streaming Applications on heterogeneous multi-Processor architectures

    Publication Year: 2005 , Page(s): 57 - 62
    Cited by:  Papers (2)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (73 KB) |  | HTML iconHTML  

    System design based on static task graphs does not match well with modern consumer electronic devices with dynamic stream processing applications.We propose the TTL API for task graph reconfiguration services,which can be used to describe the dynamic behaviour of applications.We demonstrate the efficient implementation of the TTL API on a heterogeneous multi-processor architecture.It is possible to design dynamic streaming applications with reusable reconfiguration-aware tasks and we argue that the TTL API serves as a good starting point for standardization. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using minimal minterms to represent programmability

    Publication Year: 2005 , Page(s): 63 - 68
    Cited by:  Papers (2)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (521 KB) |  | HTML iconHTML  

    We address the problem of formally representing the programmability of a system. We define the programmability of a system as the set of valid execution paths that can be configured statically by software. We formally represent this programmability as a Boolean function. From this representation, we extract a subset of on-set minterms that we call minimal minterms. We prove that these minimal minterms represent the set of smallest schedulable atomic actions of the system, and that we can use a special generator relation to determine if subsets of these actions can be executed in parallel. We also prove that given an arbitrary Boolean function we can extract the minimal minterms and recreate the entire on-set by applying the generator relation to every element of the power set of the set of minimal minterms. Thus, the minimal minterms represent the complete instruction set supported by the system, and the generator relation represents the inherent parallelism among the instructions. Furthermore, we automatically generate the required software development tools and hardware implementation from this representation of programmability. Finally, we show that we can efficiently compute the minimal minterms and apply the generator relation to verify parallel executions on interesting data path systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Key research problems in NoC design: a holistic perspective

    Publication Year: 2005 , Page(s): 69 - 74
    Cited by:  Papers (52)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (151 KB) |  | HTML iconHTML  

    Networks-on-Chip (NoCs) have been recently proposed as a promising solution to complex on-chip communication problems. The lack of an unified representation of applications and architectures makes NoC problem formulation and classification both difficult and obscure. To remedy this situation, we provide a general description for NoC architectures and applications and then enumerate several outstanding research problems (denoted by P1-P8) organized under three topics: communication infrastructure synthesis, communication paradigm selection, and application mapping optimization. Far from being exhaustive, the discussed problems are deemed essential for future NoC research. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A unified approach to constrained mapping and routing on network-on-chip architectures

    Publication Year: 2005 , Page(s): 75 - 80
    Cited by:  Papers (25)  |  Patents (2)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (169 KB) |  | HTML iconHTML  

    One of the key steps in Network-on-Chip (NoC) based design is spatial mapping of cores and routing of the communication between those cores. Known solutions to the mapping and routing problem first map cores onto a topology and then route communication, using separated and possibly conflicting objective functions. In this paper we present a unified single-objective algorithm, called Unified MApping, Routing and Slot allocation (UMARS). As the main contribution we show how to couple path selection, mapping of cores and TDMA time-slot allocation such that the network required to meet the constraints of the application is minimized. The time-complexity of UMARS is low and experimental results indicate a run-time only 20% higher than that of path selection alone. We apply the algorithm to an MPEG decoder System-on-Chip (SoC), reducing area by 33%, power by 35% and worst-case latency by a factor four over a traditional multi-step approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spatial division multiplexing: a novel approach for guaranteed throughput on NoCs

    Publication Year: 2005 , Page(s): 81 - 86
    Cited by:  Papers (11)  |  Patents (2)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (477 KB) |  | HTML iconHTML  

    To ensure low power consumption while maintaining flexibility and performance, future Systems-on-Chip (SoC) will combine several types of processor cores and data memory units of widely different sizes. To interconnect the IPs of these heterogeneous platforms, Networks-on-Chip (NoC) have been proposed as an efficient and scalable alternative to shared buses. NoCs can provide throughput and latency guarantees by establishing virtual circuits between source and destination. State-of-the-art NoCs currently exploit Time-Division Multiplexing (TDM) to share network resources among virtual circuits, but this typically results in high network area and energy overhead with long circuit set-up time.We propose an alternative solution based on Spatial Division Multiplexing (SDM). This paper describes our first design of an SDM-based network, discusses design alternatives for network implementation and shows why SDM should be better adapted to NoCs than TDM for a limited number of circuits.Our case study clearly illustrates the advantages of our technique over TDM in terms of energy consumption, area overhead, and flexibility. SDM thus deserves to be explored in more depth, and in particular in combination with TDM in a hybrid scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Increasing on-chip memory space utilization for embedded chip multiprocessors through data compression

    Publication Year: 2005 , Page(s): 87 - 92
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (224 KB) |  | HTML iconHTML  

    Minimizing the number of off-chip memory references is very important in chip multiprocessors from both the performance and power perspectives. To achieve this the distance between successive reuses of the same data block must be reduced. However, this may not be possible in many cases due to data dependences between computations assigned to different processors. This paper focuses on software-managed on-chip memory space utilization for embedded chip multiprocessors and proposes a compression-based approach to reduce the memory space occupied by data blocks with large inter-processor reuse distances. The proposed approach has two major components: a compiler and an ILP (integer linear programming) solver. The compiler's job is to analyze the application code and extract information on data access patterns. This access pattern information is then passed to our ILP solver, which determines the data blocks to compress/decompress and the times (the program points) at which to compress/decompress them. We tested the effectiveness of this ILP based approach using access patterns extracted by our compiler from application codes. Our experimental results reveal that the proposed approach is very effective in reducing power consumption. Moreover, it leads to a lower energy consumption than an alternate scheme evaluated in our experiments for all the test cases studied. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CRAMES: compressed RAM for embedded systems

    Publication Year: 2005 , Page(s): 93 - 98
    Cited by:  Papers (5)  |  Patents (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (373 KB) |  | HTML iconHTML  

    Memory is a scarce resource in many embedded systems. Increasing memory often increases packaging and cooling costs, size, and energy consumption. This paper presents CRAMES, an efficient software-based RAM compression technique for embedded systems. The goal of CRAMES is to dramatically increase effective memory capacity without hardware design changes, while maintaining high performance and low energy consumption. To achieve this goal, CRAMES takes advantage of an operating system's virtual memory infrastructure by storing swapped-out pages in compressed format. It dynamically adjusts the size of the compressed RAM area, protecting applications capable of running without it from performance or energy consumption penalties. In addition to compressing working data sets, CRAMES also enables efficient in-RAM filesystem compression, thereby further increasing RAM capacity. CRAMES was implemented as a loadable module for the Linux kernel and evaluated on a battery-powered embedded system. Experimental results indicate that CRAMES is capable of doubling the amount of RAM available to applications. Execution time and energy consumption for a broad range of examples increase only slightly, by averages of 0.35% and 4.79%. In addition, this work identifies the software-based compression algorithms that are most appropriate for low-power embedded systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient behavior-driven runtime dynamic voltage scaling policies

    Publication Year: 2005 , Page(s): 105 - 110
    Cited by:  Papers (3)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (126 KB) |  | HTML iconHTML  

    Power consumption has long been a limiting factor in microprocessor design. In seeking energy efficiency solutions, dynamic voltage/frequency scaling (DVFS), a technique to vary voltage/frequency on the fly, has emerged as a powerful and practical power/energy reduction technique that exploits computation slack due to relaxed deadlines and memory accesses. DVFS has been implemented in some modern processors such as Intel XScale and Transmeta Crusoe. Hence the bulk of research efforts have been devoted to developing policies to detect slack and pick appropriate V/f assignments such that the energy is minimized while meeting performance requirements. Since slack is a product of memory accesses and relaxed deadlines, the number of instances and the duration of available slack are highly dependent on the runtime program behavior. Runtime DVFS policies must take into consideration program characteristics in order to achieve significant energy savings. In this paper, we characterize program behavior and classify programs in terms of the memory access behavior. We propose a runtime DVFS policy that takes into consideration the characteristics of program behavior for each category. Then we examine the efficiency of the proposed DVFS policies by comparing with previously derived upper bounds of energy savings. Results show that the proposed runtime DVFS policies approach the upper bounds of energy savings in most cases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • DVS for buffer-constrained architectures with predictable QoS-energy tradeoffs

    Publication Year: 2005 , Page(s): 111 - 116
    Cited by:  Papers (7)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (265 KB) |  | HTML iconHTML  

    We present a new scheme for dynamic voltage and frequency scaling (DVS) for processing multimedia streams on architectures with restricted buffer sizes. The main advantage of our scheme over previously published DVS schemes is its ability to provide hard QoS guarantees while still achieving considerable energy savings. Our scheme can handle workloads characterized by both, the data-dependent variability in the execution time of multimedia tasks and the burstiness in the on-chip traffic arising out of multimedia processing. Many previous DVS algorithms capable of handling such workloads rely on control-theoretic feedback mechanisms or prediction schemes based on probabilistic techniques. Usually it is difficult to provide QoS guarantees with such schemes. In contrast, our scheme relies on worst-case interval-based characterization of the workload. The main novelty of our scheme is a combination of offline analysis and runtime monitoring to obtain worst case bounds on the workload and then improving these bounds at runtime. Our scheme is fully scalable and has a bounded application-independent runtime overhead. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A system-level methodology for fully compensating process variability impact of memory organizations in periodic applications

    Publication Year: 2005 , Page(s): 117 - 122
    Cited by:  Papers (6)  |  Patents (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (143 KB) |  | HTML iconHTML  

    Process variability is an emerging problem that is becoming worse with each new technology node. Its impact on the performance and energy of memory organizations is severe and degrades the system-level parametric yield. In this paper we propose a broadly applicable system-level technique that can guarantee parametric yield on the memory organization and which minimizes the energy overhead associated to variability in the conventional design process. It is based on offering configuration capabilities at the memory-level and exploiting them at the system-level. This technique can decrease by up to a factor of 5 the energy overhead that is introduced by state-of-the-art process variability compensation techniques, including statistical timing analysis. In this way we obtain results close to the ideal nominal design again. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • What will system level design be when it grows up?

    Publication Year: 2005 , Page(s): 123
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (107 KB)  

    We have seen a growing new interest in Electronic System Level (ESL) architectures, design methods, tools and implementation fabrics in the last few years. But the picture of what types and approaches to building embedded systems will become the most widely-accepted norms in the future remains fuzzy at best. Everyone want to know where systems and system design is going "when it grows up", if it ever "grows up". Some of the key questions that need to be answered include which applications will be key system drivers, what SW&HW architectures will suit best, how programmable and configurable will they be, will systems designers need to deal with physical implementation issues or will that be hidden behind fabric abstractions and programming models, and what will those abstractions and models be? Moreover, will these abstractions stabilize and be still useful as the underlying technology keeps developing at high speed.This panel consists of proponents of a number of alternative visions for where we will end up, and how we will get there. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The design of a smart imaging core for automotive and consumer applications: a case study

    Publication Year: 2005 , Page(s): 124 - 129
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (352 KB) |  | HTML iconHTML  

    This paper describes the design of a low-cost, low-power smart imaging core that can be embedded in cameras. The core integrates an ARM 9 processor, a camera interface and two specific hardware blocks for image processing: a smart imaging coprocessor and an enhanced motion estimator. Both coprocessors have been designed using high-level synthesis tools taking the C programming language as a starting point. The resulting RTL code of each coprocessor has been synthesized and verified on an FPGA board. Two automotive and two mobile smart imaging applications are mapped onto the resulting smart imaging core. This mapping process of the original C++ applications onto the smart imaging core is also presented in this paper. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.