By Topic

Design Automation Conference (ASP-DAC), 2010 15th Asia and South Pacific

Date 18-21 Jan. 2010

Filter Results

Displaying Results 1 - 25 of 164
  • A PUF design for secure FPGA-based embedded systems

    Publication Year: 2010 , Page(s): 1 - 6
    Cited by:  Papers (17)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (359 KB) |  | HTML iconHTML  

    The concept of having an integrated circuit (IC) generate its own unique digital signature has broad application in areas such as embedded systems security, and IP/IC counter-piracy. Physically unclonable functions (PUFs) are circuits that compute a unique signature for a given IC based on the process variations inherent in the IC manufacturing process. This paper presents the first PUF design specifically targeted for field-programmable gate arrays (FPGAs). Our novel design makes use of the underlying FPGA architecture, and unlike prior published PUFs, the proposed PUF can be naturally embedded into a design's HDL, consuming very little area, and does not require the use of "hard macros" with fixed routing. Measured results on the Xilinx Virtex-5 65 nm FPGA demonstrate PUF signatures to be both unique and reliable under temperature variation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive power management for real-time event streams

    Publication Year: 2010 , Page(s): 7 - 12
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3422 KB) |  | HTML iconHTML  

    Dynamic power management has become essential for battery-driven embedded systems. This paper explores how to efficiently and effectively reduce the energy consumption of a device (system) for serving multiple event streams. Considering two different preemptive scheduling, i.e., earliest deadline first and fixed priority, we propose new method to adaptively control the power mode of the device according to historical arrivals of events. Our method can not only tackle arbitrary event arrivals but also provide hard real-time guarantees with respect to both timing and backlog constraints. Simulation results are presented as well to demonstrate the effectiveness of our approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An alternative polychronous model and synthesis methodology for model-driven embedded software

    Publication Year: 2010 , Page(s): 13 - 18
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (237 KB) |  | HTML iconHTML  

    Multi-clocked synchronous (a.k.a. Polychronous) specification languages do not assume that execution proceeds by sampling inputs at predetermined global synchronization points. The software synthesized from such specifications are paced by arrival of certain inputs, or evaluation of certain internal variables. Here, we present an alternate polychronous model of computation termed Multi-rate Instantaneous Channel connected Data Flow (MRICDF) actor network model. Sequential embedded software from MRICDF specifications can be synthesized using epoch analysis, a technique proposed to form a unique order of events without a reference time line. We show how to decide on the implementability of MRICDF specification and how additional epoch information can help in synthesizing deterministic sequential software. The semantics of an MRICDF is akin to that of SIGNAL, but is visual and easier to specify. Also, our prime implicate based epoch analysis technique avoids the complex clock-tree based analysis required in SIGNAL. We experimented with the usability of MRICDF formalism by creating EmCodeSyn, our visual specification and synthesis tool. Our attempt is to make polychronous specification based software synthesis more accessible to engineers, by proposing this alternative model with different semantic exposition and simpler analysis techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Trace-based performance analysis framework for heterogeneous multicore systems

    Publication Year: 2010 , Page(s): 19 - 24
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (315 KB) |  | HTML iconHTML  

    Performance evaluation is key to the optimization of computer applications on multicore systems. While many techniques and profiling tools are available for measuring performance on homogeneous multicore platforms, most of them depend on the hardware support from the vendors. For developing applications on heterogeneous multicore systems, very few analysis tools exist to help the developers. This paper describes a software-based trace collection and performance analysis framework that can be ported to a variety of platforms via code instrumentation at the source level. A pure software profiling toolkit, called ParallelTracer, were implemented based on ANTLR, an open source parser generator, to support this framework. In this paper, we present our framework and toolkit. We use the IBM Cell processor as a case study to demonstrate the capability of ParallelTrace. Our results show that ParallelTracer provided useful information for programmers to understand program behaviors and identify potential performance bottlenecks via graphical visualization. We also discuss the runtime overhead of ParallelTracer. With proper usage, the performance and code size overhead introduced by our toolkit are limited around 19% to 5% and 9%, respectively, for the benchmark program in the case study. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient model reduction of interconnects via double gramians approximation

    Publication Year: 2010 , Page(s): 25 - 30
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (246 KB) |  | HTML iconHTML  

    The gramian approximation methods have been proposed recently to overcome the high computing costs of classical balanced truncation based reduction methods. But those methods typically gain efficiency by projecting the original system only onto one dominant subspace of the approximate system gramian (for instance using only controllability gramian). This single gramian reduction method can lead to large errors as the subspaces of controllability and observability can be quite different for general interconnects with unsymmetric system matrices. In this paper, we propose a fast balanced truncation method where the system is balanced in terms of two approximate gramians as achieved in the classical balanced truncation method. The novelty of the new method is that we can keep the similar computing costs of the single gramian method. The proposed algorithm is based on a generalized SVD-based balancing scheme such that the dominant subspace of the approximate gramian product can be obtained in a very efficient way without explicitly forming the gramians. Experimental results on a number of published benchmarks show that the proposed method is much more accurate than the single gramian method with similar computing costs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Wideband reduced modeling of interconnect circuits by adaptive complex-valued sampling method

    Publication Year: 2010 , Page(s): 31 - 36
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (269 KB) |  | HTML iconHTML  

    In this paper, we propose a new wideband model order reduction method for interconnect circuits by using a novel adaptive sampling and error estimation scheme. We try to address the outstanding error control problems in the existing sampling-based reduction framework. In the new method, called WBMOR, we explicitly compute the exact residual errors to guide the sampling process. We show that by sampling along the imaginary axis and performing a new complex-valued reduction, the reduced model will match exactly with the original model at the sample points. We show theoretically that the proposed method can achieve the error bound over a given frequency range. Practically the new algorithm can help designers choose the best order of the reduced model for the given frequency range and error bound via adaptive sampling scheme. As a result, it can perform wideband accurate reductions of interconnect circuits for analog and RF applications. We compare several sampling schemes such as linear, logarithmic, and recently proposed re-sampling methods. Experimental results on a number of RLC circuits show that WBMOR is much more accurate than all the other simple sampling methods and the recently proposed re-sampling scheme with the same reduction orders. Compared with the real-valued sampling methods, the complex-valued sampling method is more accurate for the same computational costs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • VISA: Versatile Impulse Structure Approximation for time-domain linear macromodeling

    Publication Year: 2010 , Page(s): 37 - 42
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (215 KB) |  | HTML iconHTML  

    We develop a rational function macromodeling algorithm named VISA (Versatile Impulse Structure Approximation) for macromodeling of system responses with (discrete) time-sampled data. The ideas of Walsh theorem and complementary signal are introduced to convert the macromodeling problem into a non-pole-based Steiglitz-McBride (SM) iteration (a class of first- and second-order interpolations) without initial guess and eigenvalue computation. We demonstrate the fast convergence and the versatile macromodeling requirement adoption through a P-norm approximation expansion, using examples from practical data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An extension of the generalized Hamiltonian method to S-parameter descriptor systems

    Publication Year: 2010 , Page(s): 43 - 47
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (196 KB) |  | HTML iconHTML  

    A generalized Hamiltonian method (GHM) was recently proposed for the passivity test of hybrid descriptor systems. This paper extends the GHM theory to its S-parameter counterpart. Based on the S-parameter GHM, a passivity test flow is proposed, which is capable of detecting nonpassive regions of descriptor-form physical models. The proposed method is applicable to S-parameter and hybrid systems either in the standard state-space or descriptor forms. Experimental results confirm the effectiveness and accuracy of the proposed method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simultaneous slack budgeting and retiming for synchronous circuits optimization

    Publication Year: 2010 , Page(s): 49 - 54
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (295 KB) |  | HTML iconHTML  

    With the challenges of growing functionality and scaling chip size, the possible performance improvements should be considered in the earlier IC design stages, which gives more freedom to the later optimization. Potential slack as an effective metric of possible performance improvements is considered in this work which, as far as we known, is the first work that maximizes the potential slack by retiming for synchronous sequential circuit. A simultaneous slack budgeting and incremental retiming algorithm is proposed for maximizing potential slack. The overall slack budget is optimized by relocating the FFs iteratively with the MIS-based slack estimation. Compared with the potential slack of a well-known min-period retiming, our algorithm improves potential slack averagely 19.6% without degrading the circuit performance in reasonable runtime. Furthermore, at the expense of a small amount of timing performance, 0.52% and 2.08%, the potential slack is increased averagely by 19.89% and 28.16% separately, which give a hint of the tradeoff between the timing performance and the slack budget. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fast SPFD-based rewiring technique

    Publication Year: 2010 , Page(s): 55 - 60
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (234 KB) |  | HTML iconHTML  

    Circuit rewiring can be used to explore a larger solution space by modifying circuit structure to suit a given optimization problem. Among several rewiring techniques that have been proposed, SPFD-based rewiring has been shown to be more effective in terms of solution space coverage. However, its adoption in practice has been limited due to its long runtime. We propose a novel SAT-based algorithm that is much faster than the traditional BDD-based methods. Unlike BDD-based methods that completely specify all pairs of SPFD using BDDs, our algorithm uses a few SAT instances to perform rewiring for a given wire without explicitly enumerating all SPFDs. Experimental results show that our algorithm's runtime is only 13% of that of a conventional one when each wire has at most 25 candidate wires and the runtime scales well with the number of candidate wires considered. Our approach evaluates each rewiring instance independently in the order of milliseconds, rendering deployment of an SPFD-based rewiring inside the optimization loop of synthesis tools a possibility. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • iRetILP: An efficient incremental algorithm for min-period retiming under general delay model

    Publication Year: 2010 , Page(s): 61 - 67
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (182 KB) |  | HTML iconHTML  

    Retiming is one of the most powerful sequential transformations that relocates flip-flops in a circuit without changing its functionality. The min-period retiming problem seeks a solution with the minimal clock period. Since most min-period retiming algorithms assume a simple constant delay model that does not take into account many prominent electrical effects in ultra deep sub micron vlsi designs, a general delay model was proposed to improve the accuracy of the retiming optimization. Due to the complexity of the general delay model, the formulation of min-period retiming under such model is based on integer linear programming (ILP). However, because the previous ILP formulation was derived on a dense path graph, it incurred huge storage and running time overhead for the ILP solvers and the application was limited to small circuits. In this paper, we present the iRetILP algorithm to solve the min-period retiming problem efficiently under the general delay model by formulating and solving the ILP problems incrementally. Experimental results show that iRetILP is on average 100× faster than the previous algorithm for small circuits and is highly scalable to large circuits in term of memory consumption and running time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Room-temperature fuel cells and their integration into portable and embedded systems

    Publication Year: 2010 , Page(s): 69 - 74
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (779 KB) |  | HTML iconHTML  

    Direct methanol fuel cells (DMFCs) are a promising next-generation energy source for portable applications, due to their high energy density and the ease of handling of the liquid fuel. However, the limited range of output power obtainable from a fuel cell requires hybridization the introduction of a battery to form a stand-alone portable power source. Furthermore, the stringent operating conditions to be met by active DMFC systems mandate complicated balance of plant (BOP) control. We present a complete hybrid active DMFC system design and implementation in which a DMFC stack and a li-ion battery are linked by a hybridization circuit to share the applied load to exploit high energy density of the fuel cell and high power density of the battery. We describe systems for fuel delivery, air supply, temperature management, current and voltage measurement, DC-DC conversion and power distribution, motor driving, battery charge management, DMFC and circuit protection, and control of the DMFC and battery as a hybrid. We have designed and implemented an embedded system controller that consists of a 32-bit microcontroller, running under a real-time operating system, that incorporating multiple cascaded feedback control loops which manage the dynamics of BOP control. We demonstrate reliable and efficient maintenance of a constant fuel cell output current in spite of severe fluctuation of the load current. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Maximizing the harvested energy for micro-power applications through efficient MPPT and PMU design

    Publication Year: 2010 , Page(s): 75 - 80
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1002 KB) |  | HTML iconHTML  

    Energy harvesting is becoming more and more popular for micro-power applications where the environmental energy is used to power up the systems. In order to prolong the device lifetime and guarantee the system operation, the harvested power from the energy transducer to supply the system load should be maximized. This paper reviews different techniques and solutions to maximize the harvested power. Different environmental energy sources and the characteristics of the corresponding energy transducers are discussed. Algorithms to detect and track the maximum power point (MPP) of the energy transducer are summarized. Different power management unit (PMU) designs to execute MPP tracking (MPPT) algorithms are presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic power management in environmentally powered systems

    Publication Year: 2010 , Page(s): 81 - 88
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (395 KB) |  | HTML iconHTML  

    In this paper a framework for energy management in energy harvesting embedded systems is presented. As a possible example scenario, we focus on wireless sensor nodes which are powered by solar cells. We demonstrate that classical power management solutions have to be reconceived and/or new problems arise if perpetual operation of the system is required. In particular, we provide a set of algorithms and methods for different application scenarios, including real-time scheduling, application rate control as well as reward maximization. The goal is to optimize the performance of the application subject to given energy constraints. Our methods optimize the system performance which allows the usage of, e.g., smaller solar cells and smaller batteries. Our theoretical results are supported by simulations using long-term measurements of solar energy in an outdoor environment. Furthermore, to demonstrate the practical relevance of our approaches, we measured the implementation overhead of our algorithms on real sensor nodes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Micro-scale energy harvesting: A system design perspective

    Publication Year: 2010 , Page(s): 89 - 94
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (266 KB) |  | HTML iconHTML  

    Harvesting electrical power from environmental energy sources is an attractive and increasingly feasible option for several micro-scale electronic systems such as biomedical implants and wireless sensor nodes that need to operate autonomously for long periods of time (months to years). However, designing highly efficient micro-scale energy harvesting systems requires an in-depth understanding of various design considerations and tradeoffs. This paper provides an overview of the area of micro-scale energy harvesting and discusses the various challenges and considerations involved from a system-design perspective. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Co-optimization of memory access and task scheduling on MPSoC architectures with multi-level memory

    Publication Year: 2010 , Page(s): 95 - 100
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (172 KB) |  | HTML iconHTML  

    An MPSoC system usually consists of a number of processors, a memory hierarchy and a communication mechanism between processors. Because of the gap between the constantly increasing processor speed and slower memory access, how to utilize the memory subsystem more efficiently has become a critical issue for improving the overall system performance. To address this problem, two algorithms are proposed in this paper. The first one uses the integer linear programming method so that the memory access cost is minimized while tasks are scheduled in as short a time as possible. The second one is a heuristic algorithm which can achieve close to optimum results with linear running time. The experimental results show that the memory access cost can be reduced up to 56% comparing to LIST scheduling. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new compilation technique for SIMD code generation across basic block boundaries

    Publication Year: 2010 , Page(s): 101 - 106
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1230 KB) |  | HTML iconHTML  

    Although SIMD instructions are effective for many digital signal processing applications, current compilers cannot take full advantage of SIMD instructions. One factor inhibiting SIMD code generation is control flow structure; the target scope of SIMD code generation is currently limited to single basic block or loop that consists of single basic block. SIMD instructions cannot be mapped typically across basic block boundaries even if basic blocks inside the control structure have enough parallelism. In this paper, a new compilation technique to generate SIMD code without modifying control flow structure is proposed. The data dependency between basic blocks is exploited to generate SIMD instructions. The packing cost is introduced for effective vectorization to maintain data dependency across basic block boundaries. Experimental results show that the new SIMD code generation technique reduced 67% of dynamic execution cycles of inter prediction in H.264 decoder. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • LibGALS: A library for GALS systems design and modeling

    Publication Year: 2010 , Page(s): 107 - 112
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (179 KB) |  | HTML iconHTML  

    LibGALS is a library and run-time environment that extends a multi-process host operating system (OS) to support the design of Globally Asynchronous Locally Synchronous (GALS) software systems and models. LibGALS provides an application programming interface (API) that enables the designer to describe GALS concurrent programs and reactivity in sequential programming languages. Moreover, it facilitates the interface between the GALS concurrent program and other processes through the services provided by the host OS. LibGALS is also suitable as a target for code generation from GALS and synchronous concurrent languages. The experiments demonstrate code size and run-time gains when compared with other approaches to GALS system implementation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Joint variable partitioning and bank selection instruction optimization on embedded systems with multiple memory banks

    Publication Year: 2010 , Page(s): 113 - 118
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (187 KB) |  | HTML iconHTML  

    Multiple memory banks with bank switching is a technique to increase memory size without extending address buses. A special instruction, Bank Selection Instruction (BSL) is inserted into the original programs to modify the bank register to point to the right bank, which increases both the code size and runtime overhead. In this paper, we carefully partition variables into different banks and insert BSLs at different positions so that the overheads can be minimized. Minimizing code size and minimizing runtime overhead are two objectives investigated in this paper. Experiments show that the algorithms proposed can reduce the overhead caused by BSLs efficiently. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On-chip power network optimization with decoupling capacitors and controlled-ESRs

    Publication Year: 2010 , Page(s): 119 - 124
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (173 KB) |  | HTML iconHTML  

    In this paper, we propose an efficient approach to minimize the noise on power networks via the allocation of decoupling capacitors (decap) and controlled equivalent series resistors (ESR). The controlled-ESR is introduced to reduce the on-chip power voltage fluctuation, including both voltage drop and overshoot. We formulate an optimization problem of noise minimization with the constraint of decap budget. A revised sensitivity calculation method is derived to consider both voltage drop and overshoot. The sequential quadratic programming (SQP) algorithm is adopted to solve the optimization problem where the revised sensitivity is regarded as the gradient. Experimental results show that considering voltage drop without overshoot leads to underestimating noise by 4.8%. We also demonstrate that the controlled-ESR is able to reduce the noise by 25% with the same decap budget. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An adaptive parallel flow for power distribution network simulation using discrete Fourier transform

    Publication Year: 2010 , Page(s): 125 - 130
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (626 KB) |  | HTML iconHTML  

    A frequency-time-domain co-simulation flow using discrete Fourier transform (DFT) is introduced in this paper to analyze large power distribution networks (PDN's). The flow not only allows designers to gain an insight to the frequency-domain characteristics of the PDN but also to obtain accurate time-domain voltage responses according to different load current profiles. An adaptive method achieves accurate results within even shorter time compared to the basic DFT flow. In addition, parallel processing is incorporated which leads to a significant reduction in simulation time. Error bounds of the DFT flow are derived to assure the accuracy of simulation results. Experimental results show that the proposed flow has a relative error of 0.093% and a speedup of 10× compared to SPICE transient simulation with a single processor. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Technique for controlling power-mode transition noise in distributed sleep transistor network

    Publication Year: 2010 , Page(s): 131 - 136
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (236 KB) |  | HTML iconHTML  

    Power gating technique is one of the effective technologies to achieve both low leakage and high performance in circuits. This work focuses on considering power-mode transition noise (i.e., ground noise) in power gated circuit design. So far, even though satisfying the limit of power-mode transition noise is an important design constraint, not many works have seriously addressed it as yet, just simply sacrificing the wakeup delay to meet the constraint by turning on the sleep transistors sequentially one by one. In this work, we analyze how the switching current affects the size of sleep transistors, from which how the power-mode transition noise can be mitigated by controlling the power-up sequence of sleep transistors, and propose a systematic solution to the problem of integrating the power-up controlling of sleep transistors into the power gated design flow in distributed sleep transistor network to take into account power-mode transition noise constraint as well as performance loss constraint. Through experiments with ISCAS benchmarks, it is confirmed that under the same power-mode transition noise constraint, our proposed solution is able to reduce the wakeup delay by 23% - 51% compared to the designs produced by a previous power gated design technique. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel FDTD algorithm based on alternating-direction explicit method with PML absorbing boundary condition

    Publication Year: 2010 , Page(s): 137 - 141
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (285 KB) |  | HTML iconHTML  

    In this paper, we propose a new FDTD (Finite-Difference Time-Domain) method using the alternating-direction explicit (ADE) method for the efficient electromagnetic field simulation. Furthermore, the modified PML (Perfectly Matched Layer) absorbing boundary condition, which is applicable to the proposed new method, is introduced. Finally, The efficiency of the ADE-FDTD method is evaluated by computer simulations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speeding up SoC virtual platform simulation by data-dependency-aware synchronization and scheduling

    Publication Year: 2010 , Page(s): 143 - 148
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (143 KB) |  | HTML iconHTML  

    In this paper, we proposed a novel simulation scheme, called data-dependency-aware synchronization and scheduling, for SoC virtual platform simulation. In contrast to the conventional clock-or transaction-based synchronization, our simulation scheme can work with the clock decoupling and direct-data-access techniques to implement the trace-driven virtual synchronization methodology. In addition, we further extend the virtual synchronization concept to handle the interrupt signals in the system. This enables the porting of operating system (uCLinux) in our virtual platform. The experimental results show that our virtual platform can achieve 3 to 5 million-instructions-per-second simulation speed, or 44 times speed-up over the conventional cycle accurate approach, while still maintaining the same cycle-count accuracy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SCGPSim: A fast SystemC simulator on GPUs

    Publication Year: 2010 , Page(s): 149 - 154
    Cited by:  Papers (16)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (470 KB) |  | HTML iconHTML  

    The main objective of this paper is to speed up the simulation performance of SystemC designs at the RTL abstraction level by exploiting the high degree of parallelism afforded by today's general purpose graphics processors (GPGPUs). Our approach parallelizes SystemC's discrete-event simulation (DES) on GPGPUs by transforming the model of computation of DES into a model of concurrent threads that synchronize as and when necessary. Unlike the cooperative threading model employed in the SystemC reference implementation, our threading model is capable of executing in parallel on the large number of simple processing units available on GPUs. Our simulation infrastructure is called SCGPSim and it includes a source-to-source (S2S) translator to transform synthesizable SystemC models into parallelly executable programs targeting an NVIDIA GPU. The translator retains the simulation semantics of the original designs by applying semantics preserving transformations. The resulting transformed models mapped onto the massively parallel architecture of GPUs improve simulation efficiency quite substantially. Preliminary experiments with varying-sized examples such as AES, ALU, and FIR have shown simulation speed-ups ranging from 30?? to 100??. Considering that our transformations are not yet optimized, we believe that optimizing them will improve the simulation performance even further. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.