By Topic

System Synthesis, 1999. Proceedings. 12th International Symposium on

Date 10-12 Nov. 1999

Filter Results

Displaying Results 1 - 22 of 22
  • Proceedings 12th International Symposium on System Synthesis

    Save to Project icon | Request Permissions | PDF file iconPDF (111 KB)  
    Freely Available from IEEE
  • Table of contents

    Page(s): v - vii
    Save to Project icon | Request Permissions | PDF file iconPDF (82 KB)  
    Freely Available from IEEE
  • Index of authors

    Page(s): 141
    Save to Project icon | Request Permissions | PDF file iconPDF (8 KB)  
    Freely Available from IEEE
  • A graph theoretic approach for design and synthesis of multiplierless FIR filters

    Page(s): 94 - 99
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (236 KB)  

    We present a novel approach which can be used to obtain multiplierless implementations of finite impulse response (FIR) digital filters. The main idea is to reorder filter coefficients such that an implementation based on differential coefficients requires only a few adders. We represent this problem using a graph in which vertices represent the coefficients and edges represent the resources required when the differential coefficient corresponding to the edge is used in a computation. We also present a graph model for an implementation based on second-order coefficient differences. The optimal solution to the coefficient reordering problem is the well known problem of finding the Hamiltonian path of smallest weight in this graph. We use two approaches to find the smallest weight Hamiltonian cycle; a greedy approach, and the heuristic algorithm proposed by Lin and Kernighan. The power and potential of this approach is demonstrated by presenting results for large filters (lengths up to >300) which show that, in general, for 18-bit coefficients, the total number of adders required per coefficient is less than 2. Hence, high performance and/or low power filters can be designed and synthesized using the proposed approach View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • RTGEN: an algorithm for automatic generation of reservation tables from architectural descriptions

    Page(s): 44 - 50
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (88 KB)  

    Reservation tables (RTs) have long been used to detect conflicts between operations that simultaneously access the same architectural resource. Traditional these RTs have been specified explicitly by the designer. However, the increasing complexity of modern processors makes the manual specification of RTs cumbersome and error-prone. Furthermore, manual specification of such conflict information is infeasible for supporting rapid architectural exploration. We present an algorithm to automatically generate RTs from a high-level processor description, with the goal of avoiding manual specification of RTs, resulting in more concise architectural specifications and also supporting faster turn-around time in design space exploration. We demonstrate the utility of our approach on a set of experiments using the TI C6201 VLIW DSP and DLX processor architectures, and a suite of multimedia and scientific applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Loop alignment for memory accesses optimization

    Page(s): 71 - 77
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (120 KB)  

    Portable or embedded systems allow more and more complex applications like multimedia today. These applications and submicronic technologies have made the power consumption criterium crucial. We propose new techniques thanks to which we can optimize the behavioral description of an integrated system before the hardware/software partitioning (codedesign). These transformations are performed on “for” loops that constitute the main parts of the multimedia code which handle the arrays. We present two new (polynomial) techniques for minimizing memory accesses in loop nests by data temporal locality optimization View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A framework for scheduling and context allocation in reconfigurable computing

    Page(s): 134 - 140
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (216 KB)  

    Reconfigurable computing is emerging as a viable design alternative to implement a wide range of computationally intensive applications. The scheduling problem becomes a really critical issue in achieving the high performance that these kind of applications demand. The paper describes the different aspects regarding the scheduling problem in a reconfigurable architecture. We also propose a general strategy in order to perform at compilation time a scheduling that includes all possible optimizations regarding context (configuration) and data transfers. In particular, we focus especially on the methodology and mechanisms to solve the context scheduling. Some experimental results are presented to validate our assumptions. Finally, the problem of data transfers is formulated, to be addressed in future work View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Catalyst: a DSIP design flow development in industry

    Page(s): 122 - 127
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (64 KB)  

    The Motorola System on Chip Design Technologies (SoCDT) team aims at providing a system design environment for its customers. The Toulouse branch concentrates on design efforts incorporating DSP functionality. This is referred to as the Catalyst methodology. We found that in current systems, very often the software development cycle is longer than that of the silicon development. To ease the software burden, we have changed the silicon architecture and its flow to permit the DSP software to be written in the C language instead of assembler code, as is normally done. The resulting architecture is domain specific; it is smaller, has a reduced design cycle and is simpler to implement because it is tuned to the application software we are providing. The paper describes the methodology which we are developing to create domain specific architectures, it shows one example architecture and aspects which are critical for industry acceptance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A buffer merging technique for reducing memory requirements of synchronous dataflow specifications

    Page(s): 78 - 84
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (80 KB)  

    Synchronous Dataflow, a subset of dataflow has proven to be a good match for specifying DSP programs. Because of the limited amount of memory in embedded DSPs, a key problem during software synthesis from SDF specifications is the minimization of the memory used by the target code. We develop a powerful formal technique called buffer merging that attempts to overlay buffers in the SDF graph systematically, in order to drastically reduce data buffering requirements. We give a polynomial-time algorithm based on this formalism, and show that code synthesized using this technique results in more than a 60% reduction of the buffering memory consumption compared to existing techniques View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimized system synthesis of complex RT level building blocks from multirate dataflow graphs

    Page(s): 38 - 43
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (140 KB)  

    In order to cope with the ever increasing complexity of today's application specific integrated circuits, a building block based design methodology is established. The system is composed of high level building blocks, of which some are reused from previous designs while others might have been created by behavioral synthesis. In data flow oriented designs, these blocks usually have complex non-matching interface properties, making it necessary to generate additional interfacing and controlling hardware to integrate them into an operable system. An RTL-HDL code generation from a synchronous data flow representation is introduced, that efficiently automates the generation of the required additional hardware. While existing code generation approaches provide strong limitations concerning the building block interfacing properties, our method enables the integration of components that access their ports periodically with arbitrary patterns. In order to reduce interface register cost, a minimum-area retiming approach is taken to determine optimum building block activation times, which is known to have polynomial time complexity. The code generation methodology is compared to an existing approach using a simple case study View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Middleware techniques and optimizations for real-time, embedded systems

    Page(s): 12 - 16
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (140 KB)  

    Due to constraints on footprint, performance, and weight/power consumption, real time, embedded system software development has historically lagged mainstream software development methodologies. As a result, real time, embedded software systems are costly to evolve and maintain. Moreover, they are often so specialized that they cannot adapt readily to meet new market opportunities or technology innovations. To further exacerbate matters, a growing class of real time, embedded systems require end-to-end support for various quality of service (QoS) aspects, such as bandwidth, latency, jitter, and dependability. These applications include telecommunication systems (e.g., call processing and switching), avionics control systems (e.g., operational night programs for fighter aircraft), and multimedia (e.g., Internet streaming video and wireless PDAs). In addition to requiring support for stringent QoS requirements, these systems are often targeted at highly competitive markets, where deregulation and global competition are motivating the need for increased software productivity and quality. Requirements for increased software productivity and quality motivate the use of Distributed Object Computing (DOC) middleware (A. Gokhale and D.C. Schmidt, 1999). Middleware resides between client and server applications and services in complex software systems. The goal of middleware is to integrate reusable software components to decrease the cycle time and effort required to develop high quality real time and embedded applications and services View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pre-fetching for improved core interfacing

    Page(s): 51 - 55
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (96 KB)  

    Reuse of cores can reduce design time for systems-on-a-chip. Such reuse is dependent on being able to easily interface a core to any bus. To enable such interfacing, many propose separating a core's interface from its internals. However, this separation can lead to a performance penalty when reading a core's internal registers. We introduce pre-fetching, which is analogous to caching, as a technique to reduce or eliminate this performance penalty, involving a tradeoff with power and size. We describe the pre-fetching technique, classify different types of registers, describe our initial pre-fetching architectures and heuristics for certain classes of registers, and highlight experiments demonstrating the performance improvements and size/power tradeoffs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • System synthesis of synchronous multimedia applications

    Page(s): 128 - 133
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (128 KB)  

    Modern system design is being increasingly driven by applications such as multimedia and wireless sensing and communications, which all have intrinsic quality of service (QoS) requirements, such as throughput, error-rate, and resolution. One of the most crucial QoS guarantees that the system has to provide is the timing constraints among the interacting media (synchronization) and within each media (latency). We have developed the first framework for systems design with timing QoS guarantees, latency and synchronization. In particular we address how to design system-on-chip with minimal silicon area to meet timing constraints. We propose the two-phase design methodology. In the first phase, we select an architecture which facilitates the needs of synchronous low latency applications well. In the second phase, for a given processor configuration, we use our new scheduler in such a way that storage requirements are minimized. We have developed scheduling algorithms that solve the problem optimally for a-priori specified applications. The algorithms have been implemented and their effectiveness demonstrated on a set of simulated MPEG streams from popular movies View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Loop scheduling and partitions for hiding memory latencies

    Page(s): 64 - 70
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (156 KB)  

    Partition scheduling with prefetching (PSP) is a memory latency hiding technique which combines the loop pipelining technique with data prefetching. In PSP, the iteration space is first divided into regular partitions. Then two parts of the schedule, the ALU part and the memory part, are produced and balanced to produce an overall schedule with high throughput. These two parts are executed simultaneously, and hence the remote memory latency are overlapped. We study the optimal partition shape and size so that a well balanced overall schedule can be obtained. Experiments on DSP benchmarks show that the proposed methodology consistently produces optimal or near optimal solutions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Path-based edge activation for dynamic run-time scheduling

    Page(s): 30 - 36
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (292 KB)  

    We present a tool that performs real time analysis and dynamic execution of software tasks in a mixed hardware-software system with a custom run time scheduler. The tasks in hardware and software have control flow constraints (precedence and alternative execution), resource constraints, relative timing constraints, and a rate constraint. The custom run time scheduler dynamically executes tasks in different orders, based on the conditional execution path, such that a hard real time rate constraint can be predictably met. We describe the task modelling, run time scheduler implementation, and real time analysis. We introduce the concept of path based edge activation utilizing conditional edges. We show how our approach fits into an overall tool flow and target architecture. Finally, we conclude with a sample application of the system to a design example View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic architectural synthesis of VLIW and EPIC processors

    Page(s): 107 - 113
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (168 KB)  

    The paper describes a mechanism for automatic design and synthesis of very long instruction word (VLIW), and its generalization, explicitly parallel instruction computing (EPIC) processor architectures starting from an abstract specification of their desired functionality. The process of architecture design makes concrete decisions regarding the number and types of functional units, number of read/write ports on register files, the datapath interconnect, the instruction format, its decoding hardware, and the instruction unit datapath. The processor design is then automatically synthesized into a detailed RTL-level structural model in VHDL, along with an estimate of its area. The system also generates the corresponding detailed machine description and instruction format description that can be used to retarget a compiler and an assembler respectively. All this is part of an overall design system, called Program-In-Chip Out (PICO), which has the ability to perform automatic exploration of the architectural design space while customizing the architecture to a given application and making intelligent, quantitative, cost-performance tradeoffs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compressed code execution on DSP architectures

    Page(s): 56 - 61
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (108 KB)  

    Decreasing the program size has become an important goal in the design of embedded systems targeted to mass production. This problem has led to a number of efforts aimed at designing processors with shorter instruction formats (e.g. ARM Thumb and MIPS16), or that can execute compressed code (e.g. IBM CodePack PowerPC). Much of this work has been directed towards RISC architectures though. This paper proposes a solution to the problem of executing compressed code on embedded DSPs. The experimental results reveal an average compression ratio of 75% for typical DSP programs running on the TMS320C25 processor. This number includes the size of the decompression engine. Decompression is performed by a state machine that translates codeworks into instruction sequences during program execution. The decompression engine is synthesized using the AMS standard cell library and a 0.6 μm 5V technology. Gate level simulation of the decompression engine reveals minimum operation frequencies of 150 MHz View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploration and synthesis of dynamic data sets in telecom network applications

    Page(s): 85 - 91
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (308 KB)  

    We present a novel exploration and optimization method to select customized implementations for dynamic data sets, as encountered in telecom network, database and multimedia applications. Our method fits in the context of embedded system synthesis for such applications, and enables us to further raise the abstraction level of the initial specification, where dynamic data sets can be specified without low-level details. Our method is suited for hardware and software implementations. It mainly aims at minimizing the memory power consumption, although it can also be driven by other cost functions such as area or performance. Compared with existing methods, it can save up to 2/3 of the memory power consumption and 3/4 of the memory area View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Event-driven power management of portable systems

    Page(s): 18 - 23
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (112 KB)  

    The policy optimization problem for dynamic power management has received considerable attention in the recent past. We formulate policy optimization as a constrained optimization problem on continuous-time semi-Markov decision processes (SMDP). SMDPs generalize the stochastic optimization approach based on discrete-time Markov decision processes (DTMDP) presented in the earlier work by relaxing two limiting assumptions. In SMDPs, decisions are made at each event occurrence instead of at each discrete time interval as in DTMDP and thus saving power and giving higher performance. In addition, SMDPs can have general inter-state transition time distributions, allowing for greater generality and accuracy in modeling real-life systems where transition times between power states are not geometrically distributed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bit-width selection for data-path implementations

    Page(s): 114 - 119
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (116 KB)  

    Specifications of data computations may not necessarily describe the ranges of the intermediate results that can be generated. However, such information is critical to determine the bandwidths of the resources required for a data-path implementation. We present a novel approach based on interval computations that provides, not only guaranteed range estimates that take into account dependencies between variables, but estimates of their probability density functions that can be used when some truncation must be performed due to constraints in the specification. Results show that interval based estimates are obtained in reasonable times and are more accurate than those provided by independent range computation, thus leading to substantial reductions in area and latency of the corresponding data-path implementation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-time task scheduling for a variable voltage processor

    Page(s): 24 - 29
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (104 KB)  

    The paper presents a real time task scheduling technique with a variable voltage processor which can vary its supply voltage dynamically. Using such a processor, running tasks with a low supply voltage leads to drastic power reduction. However, reducing the supply voltage may violate real time constraints. We propose a scheduling technique which simultaneously assigns both CPU time and a supply voltage to each task so as to minimize total energy consumption while satisfying all real time constraints. Experimental results demonstrate effectiveness of the proposed technique View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient scheduling of DSP code on processors with distributed register files

    Page(s): 100 - 106
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (108 KB)  

    Code generation methods for digital signal processors are increasingly hampered by the combination of tight timing constraints imposed by the algorithms and the limited capacity of the available register files. Traditional methods that schedule spill code to satisfy storage capacity have difficulty satisfying the timing constraints. The method presented in the paper analyses the combination of limited register file capacity, resource- and timing constraints during scheduling. Value lifetimes are serialized until all capacity constraints are guaranteed to be satisfied after scheduling. Experiments in the FACTS environment show that we efficiently obtain high quality instruction schedules for innermost loops of DSP algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.