By Topic

Hardware/Software Codesign, 2002. CODES 2002. Proceedings of the Tenth International Symposium on

Date 8-8 May 2002

Filter Results

Displaying Results 1 - 25 of 38
  • Proceedings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002 (IEEE Cat. No.02TH8627)

    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (160 KB)  

    The following topics are dealt with: advances in system specification and system design frameworks; system design methods: analysis and verification; design space exploration and architectural design of HW/SW systems; co-design architecture and synthesis; system partitioning and timing analysis; energy efficiency in system design; system design methods: scheduling advances. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Algorithmic transformation techniques for efficient exploration of alternative application instances

    Page(s): 7 - 12
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (414 KB) |  | HTML iconHTML  

    Following the Y-chart paradigm for designing a system, an application and an architecture are modeled separately and mapped onto each other in an explicit design step. Next, a performance analysis for alternative application instances, architecture instances and mappings has to be done, thereby exploring the design space of the target system. Deriving alternative application instances is not trivially done. Nevertheless, many instances of a single application exist that are worth being derived for exploration. We present algorithmic transformation techniques for systematic and fast generation of alternative application instances that express task-level concurrency hidden in an application in some degree of explicitness. These techniques help a system designer to speedup significantly the design space exploration process. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Authors index

    Page(s): 217
    Save to Project icon | PDF file iconPDF (55 KB)  
    Freely Available from IEEE
  • Hardware-software bipartitioning for dynamically reconfigurable systems

    Page(s): 145 - 150
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (488 KB) |  | HTML iconHTML  

    The main unique feature of dynamically reconfigurable systems is the ability to time-share the same reconfigurable hardware resources. However, the energy-delay cost associated with reconfiguration must be accounted for during hardware-software partitioning. We propose a method for mapping nodes of an application control flow graph either to software or reconfigurable hardware, explicitly targeting minimization of the energy-delay cost due to both computation and configuration. The addressed problems are energy-delay product minimization, delay-constrained energy minimization, and energy-constrained delay minimization. We show how these problems can be tackled by using network flow techniques, after transforming the original control flow graph into an equivalent network. If there are no constraints, as in the case of the energy-delay product minimization, we are able to generate an optimal solution in polynomial time View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simulation Bridge: a framework for multi-processor simulation

    Page(s): 49 - 54
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (481 KB) |  | HTML iconHTML  

    Multi-processor solutions in the embedded world axe being designed to meet the ever increasing computational demands of the emerging applications. Such architectures comprise two or more processors (often a mix of general purpose and digital signal processors) together with a rich peripheral mix to provide a high performance computational platform. While there are many simulation solutions in the industry available to address the system partitioning issues and also the verification of HW-SW interactions in these complex systems, there are very few solutions targetted towards the SW application developers' needs. The primary concern of the SW application developers is to debug and optimize their code. Hence, cycle accuracy and performance of the simulation solution becomes the key enablers. Desired observability and controllability of the models are additional careabouts. Secondly, application developers are more comfortable at instruction level simulations than they are with RTL or gate level simulation. These specific requirements have a bearing on the choices in the simulation solutions. This paper describes the design of a generic, C based multi-processor instruction set simulator framework in the context of the above parameters. This framework, termed the "simulation bridge", facilitates highly accurate, yet efficient simulation. The SimBridge performs clock correct lock-step simulation of the models in the architecture using a global simulation engine that handles both intra-processor and inter-processor communication in a homogenous fashion. It addresses the multiple key issues of execution control, synchronization, connectivity and communication. The paper concludes with the performance analysis of the SimBridge in an experimental test setup as well as in the Texas Instruments (TI) TMS320C54x-based simulators View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Codesign-extended applications

    Page(s): 1 - 6
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (453 KB) |  | HTML iconHTML  

    We challenge the widespread assumption that an embedded system's functionality can be captured in a single specification and then partitioned among software and custom hardware processors. The specification of some functions in software is very different from the specification of the same function in hardware - too different to conceive of automatically deriving one from the other. We illustrate this concept using a digital camera example. We introduce the idea of codesign-extended applications to deal with the situation, wherein critical functions are written in multiple versions, and integrated such that simple compiler/synthesis flags instantiate a particular version along with the necessary control and communication behavior. By capturing a specification as a codesign-extended application, a designer enables smooth migration among platforms with increasing amounts of on-chip configurable logic View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast system-level power profiling for battery-efficient system design

    Page(s): 157 - 162
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (535 KB) |  | HTML iconHTML  

    An increasing disparity between the energy requirements of portable electronic devices and available battery capacities is driving the development of new design methodologies for battery-efficient systems. A crucial requirement for battery efficient system design is to be able to efficiently and accurately estimate battery life for candidate system architectures. Recently, efficient techniques have been developed to estimate battery life under given profiles of system power consumption over time. However, techniques for generating the power profiles themselves are either too cumbersome for system level exploration, or too inaccurate for battery life estimation. In this paper. we present a new methodology for efficiently and accurately generating power profiles for different system-level architectures. The designer can specify the manner in which (i) system tasks are mapped to a set of available implementations, and (ii) system communications are mapped to a specified communication architecture. For a given architecture, a power profile is automatically generated by analyzing an abstract representation of the system execution traces, while taking into account the selected implementations of the system's computations and communications. Experiments conducted on the design of an IEEE 802.11 MAC processor indicate that the power profiling approach offers run times that are several orders of magnitude lower than a simulation based power profiling technique. while sustaining negligible loss of accuracy (average profiling error was observed to be less than 3.4%) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel codesign approach based on distributed virtual machines

    Page(s): 109 - 114
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (450 KB) |  | HTML iconHTML  

    This paper describes a hardware/software codesign approach for the design of embedded systems based on digital signal processors and FPGAs. Our approach is based on distributed virtual machines for simulation and verification of the application on a Linux cluster and for running the application on different target architectures (DSPs, FPGAs) as well. The main focus is the description of the virtual machine, which was designed to make DSP applications portable across different platforms while maintaining optimal code View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scratchpad memory: a design alternative for cache on-chip memory in embedded systems

    Page(s): 73 - 78
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (390 KB) |  | HTML iconHTML  

    In this paper we address the problem of on-chip memory selection for computationally intensive applications, by proposing scratch pad memory as an alternative to cache. Area and energy for different scratch pad and cache sizes are computed using the CACTI tool while performance was evaluated using the trace results of the simulator. The target processor chosen for evaluation was AT91M40400. The results clearly establish scratchpad memory as a low power alternative in most situations with an average energy reduction of 40%. Further the average area-time reduction for the scratchpad memory was 46% of the cache memory View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Metrics for design space exploration of heterogeneous multiprocessor embedded systems

    Page(s): 55 - 60
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (504 KB) |  | HTML iconHTML  

    This paper considers the problem of designing heterogeneous multiprocessor embedded systems. The focus is on a step of the design flow: the definition of innovative metrics for the analysis of the system specification to statically identify the most suitable processing elements class for each system functionality. Experimental results are also included, to show the applicability and effectiveness of the proposed methodology View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FPGA resource and timing estimation from Matlab execution traces

    Page(s): 31 - 36
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (466 KB) |  | HTML iconHTML  

    We present a simulation-based technique to estimate area and latency of an FPGA implementation of a Matlab specification. During simulation of the Matlab model, a trace is generated that can be used for multiple estimations. For estimation the user provides some design constraints such as the rate and bit width of data streams. In our experience the runtime of the estimator is approximately only 1/10 of the simulation time, which is typically fast enough to generate dozens of estimates within a few hours and to build cost-performance trade-off curves for a particular algorithm and input data. In addition, the estimator reports on the scheduling and resource binding used for estimation. This information can be utilized not only to assess the estimation quality, but also as first starting point for the final implementation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimization and synthesis for complex reactive embedded systems by incremental collapsing

    Page(s): 115 - 120
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (422 KB) |  | HTML iconHTML  

    We propose a software synthesis procedure for reactive real-time embedded systems. In our approach, control parts of the system are represented in a decomposed form enabling more complex control structures to be represented. We propose a synthesis procedure for this representation that incrementally aggregates elements of the representation while keeping the resulting code size under tight control. This method combined with heuristic strategies works very well on real-life designs and demonstrates the potential to produce results that challenge or beat hand-written implementations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Locality-conscious process scheduling in embedded systems

    Page(s): 193 - 198
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (534 KB) |  | HTML iconHTML  

    In many embedded systems, the existence of a data cache might influence the effectiveness of process scheduling policy significantly. Consequently, a scheduling policy that takes inter-process data reuse into account might result in large performance benefits. In this paper, we focus on array-intensive embedded applications and present a locality-conscious scheduling strategy where we first evaluate the potential data reuse between processes, and then, using the results of this evaluation, select an order for process executions. We also show how process codes can be transformed by an optimizing compiler for increasing inter-process data reuse, thereby making locality-conscious scheduling more effective. Our experimental results obtained using two large, multi-process application codes indicate significant runtime benefits View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • HW/SW partitioning and code generation of embedded control applications on a reconfigurable architecture platform

    Page(s): 151 - 156
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (462 KB) |  | HTML iconHTML  

    This paper studies the use of a reconfigurable architecture platform for embedded control applications aimed at improving real time performance. The HW/SW codesign methodology from POLIS is used. It starts from high-level specifications, optimizes an intermediate model of computation (extended finite state machines) and derives both hardware and software, based on performance constraints. We study a particular architecture platform, which consists of a general purpose processor core, augmented with a reconfigurable function unit and data-path to improve run time performance. A new mapping flow and algorithms to partition hardware and software are proposed to generate implementations that best utilize this architecture. Encouraging preliminary results are shown for automotive electronic control examples View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Worst-case performance analysis of parallel, communicating software processes

    Page(s): 37 - 42
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (453 KB) |  | HTML iconHTML  

    In this paper we present a method to perform static timing analysis of SystemC models, that describe parallel, communicating software processes. The paper combines a worst-case execution time (WCET) analysis with an analysis of the communication behavior. The communication analysis allows the detection of points, where the program flow of two or more concurrent processes are synchronized. This knowledge allows the determination of the worst-case response time (WCRT). The method does not rely on restrictions on the system design to prevent deadlocks or data loss. Furthermore possible deadlocks and data loss can be detected during the analysis View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Transformation of SDL specifications for system-level timing analysis

    Page(s): 121 - 126
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (448 KB) |  | HTML iconHTML  

    Complex embedded systems are typically specified using multiple domain-specific languages. After code-generation, the implementation is simulated and tested. Validation of non-functional properties, in particular timing, remains a problem because full test coverage cannot be achieved for realistic designs. The alternative, formal timing analysis, requires a system representation based on key application and architecture properties. These properties must first be extracted from a system specification to enable analysis. In this paper we present a suitable transformation of SDL specifications for system-level timing analysis. We show ways to vary modeling accuracy in order to apply available formal techniques. A practical approach utilizing a recently developed system model is presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconfigurable SoC design with hierarchical FSM and synchronous dataflow model

    Page(s): 199 - 204
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (526 KB) |  | HTML iconHTML  

    We present a method of runtime configuration scheduling in reconfigurable SoC design. As a model of computation in system representation, we use a popular formal model of computation, hierarchical FSM (HFSM) with synchronous dataflow (SDF) model, in short, HFSM-SDF model. In reconfigurable SoC design with the HFSM-SDF model, the problem of configuration scheduling is challenging due to the dynamic behavior of the system such as concurrent execution of state transitions (by AND relation), complex control flow (in the HFSM), and complex schedules of SDF actor firing. Thus, compile-time static configuration scheduling may not efficiently hide configuration latency. To resolve the problem, it is necessary to know the exact order of required configurations during runtime and to perform runtime configuration scheduling. To obtain the exact order of configurations, we exploit the inherent property of HFSM-SDF that the execution order of SDF actors can be determined before the execution of state transition of top FSM. After obtaining the order information in a queue called ready configuration queue, we execute the state transition. During the execution, whenever there is new available FPGA resource, a new configuration is selected from the queue and fetched by the runtime configuration scheduler. We applied the method to an MPEG4 decoder design and obtained up to 21.8% improvement in system runtime with a negligible overhead of runtime (1.4%) and memory usage (0.94%) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Symbolic model checking of dual transition Petri Nets

    Page(s): 43 - 48
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (433 KB) |  | HTML iconHTML  

    This paper describes the formal verification of the recently introduced Dual Transition Petri Net (DTPN) models, using model checking techniques. The methodology presented addresses the Symbolic model checking of embedded systems behavioural properties, expressed in either computation tree logics (CTL) or linear temporal logics (LTL). The embedded system specification is given in terms of DTPN models. where elements of the model are captured in a four-module library which implements the behaviour of the model. Key issues in the development of the methodology are the heterogeneity and the nondeterministic nature of the model. This is handled by introducing some modifications in both structure and behaviour of the model, thus reducing the points of nondeterminism. Several features of the methodology are discussed and two examples are given in order to show the validity of the model View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Program slicing for codesign

    Page(s): 91 - 96
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (452 KB) |  | HTML iconHTML  

    Program slicing is a software analysis technique that computes the set of operations in a program that may affect the computation at a particular operation. Interprocedural slicing techniques have separately addressed concurrent programs and hardware description languages. However, application of slicing to codesign of embedded systems requires dependence analysis across the hardware-software interface. We extend program slicing for a codesign environment. Hardware-software interactions common in component-based systems are mapped to previously introduced dependences, including the interference and signal dependences. We introduce a novel access dependence that models a memory access side effect that results in activation of a process. A slicing algorithm that incorporates this variety of dependences is described View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Extended quasi-static scheduling for formal synthesis and code generation of embedded software

    Page(s): 211 - 216
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (480 KB) |  | HTML iconHTML  

    With the computerization of most daily-life amenities such as home appliances, the software in a real-time embedded system now accounts for as much as 70% of a system design. On one hand, this increase in software has made embedded systems more accessible and easy to use, while on the other hand, it has also necessitated further research on how complex embedded software can be designed automatically and correctly. Enhancing recent advances in this research, we propose an Extended Quasi-Static Scheduling (EQSS) method for formally synthesizing and automatically generating code for embedded software, using the Complex-Choice Petri Nets (CCPN) model. Our method improves on previous work in three ways: (1) by removing model restrictions to cover a much wider range of applications, (2) by proposing an extended algorithm to schedule the more unrestricted model, and (3) by implementing a code generator that can produce multi-threaded embedded software programs. The requirements of an embedded software are specified by a set of CCPN, which is scheduled using EQSS such that the schedules satisfy limited embedded memory requirements and task precedence constraints. Finally, a POSIX-based multi-threaded embedded software program is generated in the C programming language. Through an example, we illustrate the feasibility and advantages of the proposed EQSS method View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compiler-directed customization of ASIP cores

    Page(s): 97 - 102
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (499 KB) |  | HTML iconHTML  

    This paper presents an automatic method to customize embedded application-specific instruction processors (ASIPs) based on compiler analysis. ASIPs, also known as embedded soft cores, allow certain hardware parameters in the processor to be customized for a specific application domain. They offer low design cost as they use pre-designed and verified components. Our design goal is choosing parameter values for fastest runtime within a given silicon area budget for a particular application set. Present-day technologies for choosing parameter values rely on exhaustive simulation of the application set on all possible combinations of parameter values - a time-consuming and non-scalable procedure. We propose a compiler-based method that automatically derives the optimal values of parameters without simulating any configuration. Further we expand the space of parameters that can be changed from the limited set today, and evaluate the importance of each. Results show that for our benchmarks, the runtimes for different configurations are predicted with an average error of 2.5%. In the two area constrained customization problem we evaluate, our method is able to recommend the same configuration that is recommended by brute force exhaustive simulation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The design context of concurrent computation systems

    Page(s): 19 - 24
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (535 KB) |  | HTML iconHTML  

    The design for performance optimization of programmable, semicustom SoCs requires the ability to model and optimize the behavior of the system as a whole. Neither the hardware-testbench style nor the software-benchmark style is adequate to capture completely the design interactions required in concurrent software-on-hardware systems. We use a formal relationship between a computer system design content and its external context to motivate the need to consider a more effective modeling framework to which concurrent software-on-hardware computer systems are designed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic run-time HW/SW scheduling techniques for reconfigurable architectures

    Page(s): 205 - 210
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (500 KB) |  | HTML iconHTML  

    Dynamic run-time scheduling in System-on-Chip platforms has become recently an active area of research because of the performance and power requirements of new applications. Moreover, dynamically reconfigurable logic (DRL) architectures are an exciting alternative for embedded systems design. However, all previous approaches to DRL multi-context scheduling and HW/SW scheduling for DRL architectures are based on static scheduling techniques. In this paper, we address this problem and present: (1) a dynamic scheduler hardware architecture, and (2) four dynamic run-time scheduling algorithms for DRL-based multi-context platforms. The scheduling algorithms have been integrated in our codesign environment, where a large number of experiments have been carried out. Results demonstrate the benefits of our approach View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hardware support for real-time embedded multiprocessor system-on-a-chip memory management

    Page(s): 79 - 84
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (438 KB) |  | HTML iconHTML  

    The aggressive evolution of the semiconductor industry smaller process geometries, higher densities, and greater chip complexity - has provided design engineers the means to create complex, high-performance Systems-on-a-Chip (SoC) designs. Such SoC designs typically have more than one processor and huge memory, all on the same chip. Dealing with the global onchip memory allocation/de-allocation in a dynamic yet deterministic way is an important issue for the upcoming billion transistor multiprocessor SoC designs. To achieve this, we propose a memory management hierarchy we call Two-Level Memory Management. To implement this memory management scheme which presents a paradigm shift in the way designers look at on-chip dynamic memory allocation - we present a System-on-a-Chip Dynamic Memory Management Unit (SoCDMMU) for allocation of the global on-chip memory, which we refer to as Level Two memory management (Level One is the operating system management of memory allocated to a particular on-chip Processing Element). In this way, processing elements (heterogeneous or non-heterogeneous hardware or software) in an SoC can request and be granted portions of the global memory in a fast and deterministic time (for an example of a four processing element SoC, the dynamic memory allocation of the global onchip memory takes sixteen cycles per allocation/deallocation in the worst case). In this paper, we show how to modify an existing Real-Time Operating System (RTOS) to support the new proposed SoCDMMU. Our example shows a multiprocessor SoC that utilizes the SoCDMMU has 440% overall speedup of the application transition time over fully shared memory that does not utilize the SoCDMMU View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Strongly polynomial-time algorithm for over-constraint resolution: efficient debugging of timing constraint violations

    Page(s): 127 - 132
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (476 KB) |  | HTML iconHTML  

    A system of binary linear constraints or difference constraints (SDC) contains a set of variables that are constrained by a set of unary or binary linear inequalities. In such diverse applications as scheduling, interface timing verification, real-time systems, multimedia systems, layout compaction, and constraint satisfaction, SDCs have successfully been used to model systems of both temporal and spatial constraints. Formally, SDCs are modeled by weighted, directed (constraint) graphs. The consistency of an SDC means that there is at least one instantiation of its variables that satisfies all its constraints. It is well known that the absence of positive cycles in a graph implies the consistency of the corresponding SDC, so the consistency can be decided in strongly polynomial time. If a SDC is found to be inconsistent, it has to be repaired to make it consistent. This task is equivalent to removing positive cycles from the corresponding graph. All the previous algorithms for this task take time proportional to the number of positive cycles in the graph, which can grow exponentially. In this paper, we propose a strongly polynomial-time algorithm, i.e., an algorithm whose time complexity is polynomial in the size of the graph. Our algorithm takes in a graph and returns a list of edges and the changes in their weights to remove all the positive cycles from the graph. We experimentally quantify the length of the edge list and the running time of the algorithm on large benchmark graphs. We show that both are very small, so our algorithm is practical View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.