Scheduled System Maintenance:
On Monday, April 27th, IEEE Xplore will undergo scheduled maintenance from 1:00 PM - 3:00 PM ET (17:00 - 19:00 UTC). No interruption in service is anticipated.
By Topic

Embedded Systems for Real-Time Multimedia (ESTIMedia), 2010 8th IEEE Workshop on

Date 28-29 Oct. 2010

Filter Results

Displaying Results 1 - 21 of 21
  • 2010 8th IEEE Workshop on Embedded Systems for Real-Time Multimedia [Breaker page]

    Publication Year: 2010 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (438 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2010 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (36 KB)  
    Freely Available from IEEE
  • Message from the chairs

    Publication Year: 2010 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (153 KB)  
    Freely Available from IEEE
  • Committees

    Publication Year: 2010 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (429 KB)  
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2010 , Page(s): 1 - 2
    Save to Project icon | Request Permissions | PDF file iconPDF (463 KB)  
    Freely Available from IEEE
  • Adoptability as a key success factor for media architectures

    Publication Year: 2010 , Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (54 KB) |  | HTML iconHTML  

    Summary form only given. Deep submicron technologies have allowed modern SOC's to be composed 100s of distinct IP blocks ranging from IO interfaces, to network-on-chip, to processing elements. These SOCs often take years from concept to production. Despite this huge silicon design cost, actual system design costs are dominated by software development often happing at multiple customers for each SOC. Adoptability as defined here is the ability for those software teams to first select a new SOC for their system and then to actually deploy the system software on it. To commit spending effort on a new platform, software teams must feel confidence that they can reuse existing investment or that any baseline software is provided for them. The software effort they do invest must be in areas that provide the end product significant differentiation to the customer from previous products. This presentation will talk about the various aspects of adoptability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-time multimedia with C compiler for Hardware

    Publication Year: 2010 , Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (60 KB) |  | HTML iconHTML  

    This paper presents C-based design and Dynamic Reconfigurable Chip for real-time data processing such as HD movies, encryption, and face recognition. Even though the performance of CPU, GPU is growing very much, we will have more hardware accelerators in SoC because of the high performance-by-power efficiency. The drawbacks of the hardware accelerators are 1) huge design effort and 2) non programmability. These problems are solved by C-based design and DRP. C-based design, or behavioral synthesis which automatically generates RTL from C, becomes mature and is used for various types of LSI and FPGA of commercial chips. The C description of hardware is easier and more natural than the description for DSP or GPU. The paper explains how C compiler for Hardware and one for software are different, also how C descriptions for hardware and for software are different. Then, it describes how to debug the algorithm which takes weeks for RTL simulation. Next, it discusses how C programs are mapped into DRP. DRP contains dozens or hundreds of context which has hundreds of ALU's, and if we think the context is very huge sized instruction, DRP can be thought as VLIW processor with un-fixed instruction sets. Advantages and future possibilities with C-based design and DRP will be discussed with some commercial examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Operating system runtime management of partially dynamically reconfigurable embedded systems

    Publication Year: 2010 , Page(s): 1 - 10
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (191 KB) |  | HTML iconHTML  

    Reconfigurable embedded systems are capable of changing their functionalities by dynamically adding or removing components. This enables hardware/software architectures to adapt to changes in the system environment at run-time which has been proven to be a very useful technique in multimedia applications. This paper proposes a novel methodology to combine hardware and software to a System-on-Programmable-Chip architecture to exploit the flexibility and the power of a reconfigurable system solution. A novel layered approach for the system design is presented which offers a unified interface to the application by systematically abstracting from the hardware resources. We present a prototype, where a Linux operating system is used to manage the dynamically reconfigurable hardware resources. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A new method for minimizing buffer sizes for Cyclo-Static Dataflow graphs

    Publication Year: 2010 , Page(s): 11 - 20
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (265 KB) |  | HTML iconHTML  

    Several optimizations must be considered for the design of streaming applications (e.g. multimedia or network packet processing). These applications can be modelled as a set of processes that communicate using buffers. For this purpose, Cyclo-Static Dataflow graphs, which are an extension of Synchronous Dataflow graphs, allow to consider a large class of industrial applications. This paper presents an original methodology to minimize the global surface of the buffers for a Cyclo-Static Dataflow graph under a given throughput constraint. It is proved that, if the processes are periodic, each buffer introduces a linear constraint described analytically. The optimization problem is then modelled by an Integer Linear Program. A polynomial algorithm based on its relaxation provides a quasi-optimal solution for real life problems. The resolution of the optimization problem for a Reed-Solomon Decoder application is then detailed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Translating affine nested-loop programs with dynamic loop bounds into Polyhedral Process Networks

    Publication Year: 2010 , Page(s): 21 - 30
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (304 KB) |  | HTML iconHTML  

    The Process Network (PN) is a suitable parallel model of computation (MoC) used to specify embedded streaming applications in a parallel form facilitating the efficient mapping onto embedded parallel execution platforms. Unfortunately, specifying an application using a parallel MoC is very difficult and highly error-prone task. To overcome the associated difficulties, an automated procedure exists for derivation of a specific polyhedral process networks (PPN) from static affine nested loop programs (SANLPs). This procedure is implemented in the pn complier. However, there are many applications, e.g., multimedia applications (MPEG coders/decoders, smart cameras, etc.) that have adaptive and dynamic behavior which can not be expressed as SANLPs. Therefore, in order to handle more dynamic multimedia applications, in this paper we address the important question whether we can relax some of the restrictions of the SANLPs while keeping the ability to perform compile-time analysis and to derive PPNs. Achieving this would significantly extend the range of applications that can be parallelized in an automated way. The main contribution of this paper is a first approach for automated translation of affine nested loops programs with dynamic loop bounds into input-output equivalent polyhedral process networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Value-based scheduling of distributed fault-tolerant real-time systems with soft and hard timing constraints

    Publication Year: 2010 , Page(s): 31 - 40
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (275 KB) |  | HTML iconHTML  

    We present an approach for scheduling of fault-tolerant embedded applications composed of soft and hard real-time processes running on distributed embedded systems. The hard processes are critical and must always complete on time. A soft process can complete after its deadline and its completion time is associated with a value function that characterizes its contribution to the quality-of-service of the application. We propose a quasi-static scheduling algorithm to generate a tree of fault-tolerant distributed schedules that maximize the application's quality value and guarantee hard deadlines. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • NASA: A generic infrastructure for system-level MP-SoC design space exploration

    Publication Year: 2010 , Page(s): 41 - 50
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (442 KB) |  | HTML iconHTML  

    System-level simulation and design space exploration (DSE) are key ingredients for the design of multiprocessor system-on-chip (MP-SoC) based embedded systems. The efforts in this area, however, typically use ad-hoc software infrastructures to facilitate and support the system-level DSE experiments. In this paper, we present a new, generic system-level MP-SoC DSE infrastructure, called NASA (Non Ad-hoc Search Algorithm). This highly modular framework uses well-defined interfaces to easily integrate different system-level simulation tools as well as different combinations of search strategies in a simple plug-and-play fashion. Moreover, NASA deploys a so-called dimension-oriented DSE approach, allowing designers to configure the appropriate number of, possibly different, search algorithms to simultaneously co-explore the various design space dimensions. As a result, NASA provides a flexible and re-usable framework for the systematic exploration of the multi-dimensional MP-SoC design space, starting from a set of relatively simple user specifications. To demonstrate the distinct aspects of NASA, we also present several DSE experiments in which we, e.g., compare NASA configurations using a single search algorithm for all design space dimensions to configurations using a separate search algorithm per dimension. These experiments indicate that the latter multi-dimensional co-exploration can find better design points and evaluates a higher diversity of design alternatives as compared to the more traditional approach of using a single search algorithm for all dimensions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Conservative application-level performance analysis through simulation of MPSoCs

    Publication Year: 2010 , Page(s): 51 - 60
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (148 KB) |  | HTML iconHTML  

    Applications, often with real-time requirements, are mapped onto Multiprocessor Systems on Chip (MPSoCs). Hard real-time applications require no deadline misses, and a formal modelling approach must be used to analyse the worst-case performance, which is complicated and time consuming. Such models are restricted to specific application behaviours and not generally applicable. Soft real-time applications such as video decoders often do not fit these models while having less strict requirements. An infrequent frame drop is barely noticeable, and a worst-case analysis is too pessimistic. For such applications it suffices to meet deadlines for a given set of traces. In this work we propose conservative simulation as an alternative approach to formal modelling for soft real-time applications. We introduce a hybrid simulation method which enables performance guarantees on a per-trace basis, without any modelling effort. Furthermore we evaluate an implementation of the described technique and compare it with an actual MPSoC instance implemented on an FPGA. Our results show that the simulation technique is conservative, with less than a 10% difference in timing compared to the actual implementation, for a software JPEG decoder. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Task-level timed-functional simulation for multi-core embedded systems

    Publication Year: 2010 , Page(s): 61 - 70
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (395 KB) |  | HTML iconHTML  

    Since the design validation and correction cost is drastically increasing as the design steps proceed, software verification is considerably desired before the simulation model for the target architecture is constructed. As timing correctness is as important as functional correctness in real-time multimedia embedded systems, an important research issue is how to perform timed functional simulation with reasonable accuracy on a host machine. In addition, to allow design space exploration, a simulation platform should reflect hardware architectures, task mapping, and the scheduling policy for an operating system. To meet these requirements, we propose a timed functional simulator assuming that an application behavior is specified by a task graph. While the previous works usually resort to an event-driven simulator for timed simulation, the proposed technique separates data communication and timing management. Since the simulation kernel manages the timing and task scheduling, the simulation speed approaches to that of a functional simulator. The proposed simulation consists of two steps: preliminary and timing simulation. First, preliminary simulation is performed to profile the data size of each task execution. Then, timing simulation is executed to verify the timing correctness of the application with the profiled data size. Experiment results show that the proposed timed functional simulation approach is very fast enough for early verification of embedded software design. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A scalable performance prediction heuristic for implementation planning on heterogeneous systems

    Publication Year: 2010 , Page(s): 71 - 80
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (395 KB) |  | HTML iconHTML  

    Despite speedups of 10x to 1000x, effective usage of multi-core and heterogeneous systems has largely been limited to experts due to increased application design complexity resulting from the requirement for significantly different algorithms for different device types and amounts. Compiler and high-level synthesis research has attempted to address this problem but is fundamentally limited to the algorithm specified by the high-level code. Thus, future compilers will need to choose from numerous implementations/algorithms for a given function when optimizing for a multi-core heterogeneous system. This emerging problem, which we refer to as the implementation planning problem, requires compilers and similar tools to rapidly determine performance of a particular implementation on different devices for all possible input parameters. To help solve the implementation planning problem, we introduce a heuristic that repeatedly selects statistically significant input values, measures actual execution time, and then statistically analyzes the results to predict the execution time for all inputs within requested accuracy and confidence levels. We evaluated the heuristic using twelve examples on three different platforms with up to 16 microprocessor cores and a field-programmable gate array, achieving an average prediction error of 6.2% and a root-mean-squared error of 7.4%, which required an average of only 463 samples and 51 seconds to complete. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing disk power consumption in portable media players

    Publication Year: 2010 , Page(s): 81 - 89
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (331 KB) |  | HTML iconHTML  

    A video player may prefetch video frames into buffer to allow disk to go into standby mode, which involves complete spindown of the spindle motor. Frequent spindowns, however, affect disk longevity, so it is essential to limit the number of times that disk enters standby mode. We present the design and implementation of a data prefetching scheme that minimizes disk power consumption for a limited number of disk spindowns. We first present a data prefetching model that fully utilizes the available buffer space and analyze how power consumption is affected by the bit-rates of the frames in the buffer. We then formulate the problem that determines when the disk should enter standby mode and provide an optimal solution using dynamic programming. We implemented our algorithm in MPlayer running on the Linux 2.6. Experimental results show that it reduces disk energy consumption by up to 59%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Process variation aware transcoding for low power H.264 decoding

    Publication Year: 2010 , Page(s): 90 - 96
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (392 KB) |  | HTML iconHTML  

    Transcoding is commonly used in media servers to adapt video bitstreams to capabilities and specifications of the receiving playback devices or the transmission network channel in between. Primary adjustments are done on the video format, the resolution and the bitrate. In this paper we propose utilizing transcoding as a means of converting a regular standard video bitstream to a standard video bitstream that is also aware of power-reliability characteristics of the target decoder device with main focus on process variabilities within the on-chip decoder reference buffer memory. More specifically we introduce an H.264 video transcoding scheme in which the bitstream generated by transcoder is tolerant to defective SRAM cells that reside in the decoder reference buffer. Such error aware scheme allows for voltage scaling on decoder reference buffer in the presence of process variation and can result in significant power reduction considering the increasing power share of memories on SoCs. Our estimate shows this scheme can manage to realize about 40% power saving on decoder devices (in 32 nm) while preserving the decoded video quality and the bitrate of the bitstream. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Combining process splitting and merging transformations for Polyhedral Process Networks

    Publication Year: 2010 , Page(s): 97 - 106
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (330 KB) |  | HTML iconHTML  

    We use the polyhedral process network (PPN) model of computation to program and map streaming media applications onto embedded Multi-Processor Systems on Chip (MPSoCs) platforms. In previous works, it has been shown how to apply different process network transformations in isolation. In this work, we present a holistic approach combining the process splitting and merging transformations and show that it is necessary to use both transformations in combination to achieve the best performance results, which cannot be achieved using only one transformation. We solve the problem of ordering both transformation and, in addition, relieve the designer from the task to select the processes on which the transformation should be applied. Thus, our approach combines both transformations exploiting the data-level parallelism available in a PPN as much as possible, even in cases where the parallelism is restricted by topological cycles and stateful processes in the PPN. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A reprogrammable computing platform for JPEG 2000 and H.264 SHD video coding

    Publication Year: 2010 , Page(s): 107 - 113
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1151 KB) |  | HTML iconHTML  

    In this paper, the architecture of a DSP/FPGA based hardware platform is presented, which is conceived to leverage programmable logic processing power for high definition video processing. The system is reconfigurable and scalable, since multiple boards may be parallelized to speed-up the most demanding tasks. JPEG 2000 and H.264, both at HD and Super HD (SHD) resolutions, have been simulated and their performance found on the embedded processing cores. The results show that real-time, or near real-time, encoding is viable, and the modularity of the architecture allows for parallelization and performance scalability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Two-stage configurable decoder model for multiple forward error correction standards

    Publication Year: 2010 , Page(s): 114 - 120
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (226 KB) |  | HTML iconHTML  

    Recently, various mobile terminals such as mobile phone and personal digital assistance (PDA), etc. are widely spreading out in our daily life. Multimedia application is often executed on such mobile terminals, and these terminals often tightly coupled with wireless communication. Forward error correction (FEC) is one of important and heavy tasks for wireless communication. Leading edge mobile embedded systems usually support not only one FEC standard, but multiple FEC standards in order to adapt to various wireless communication standards. In this paper, we propose two-stage configurable decoder model for multiple FEC standards for Viterbi and Turbo coding which have a variation under the constraint length, coding rate, etc. Proposed decoder model realizes a decoder instance which supports dedicated multiple FEC standards. Proposed decoder model is configurable in two stages: at hardware generation time and at runtime, and designers can easily specify these specifications by various design parameters. Experimental results show proposed two-stage configurable decoder model is quite reasonable as FEC decoder. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving transient memory fault resilience of an H.264 decoder

    Publication Year: 2010 , Page(s): 121 - 130
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (203 KB) |  | HTML iconHTML  

    Traditionally, fault-tolerance has been the domain of expensive, hard real-time critical systems. However, the rates of transient faults occurring in semiconductor devices will increase significantly due to shrinking structure sizes and reduced operating voltages. Thus, even consumer-grade embedded applications with soft real-time requirements, like audio and video players, will require error detection and correction methods to ensure reliable everyday operation. Cost, timing and energy considerations, however, prevent the embedded system developer from correcting every single error. In many situations, however, it will not be required to create a totally error-free system. In such a system, only perceptible errors will have to be corrected. To distinguish between perceptible and non-perceptible errors, a classification of errors according to their relevance to the application is required. When real-time conditions have to be observed, the current timing properties of the system will provide additional contextual information. In this paper, we present a structure for an error-correcting embedded system based on a real-time aware classification. Using a cross-layer approach utilizing application annotations of error classifications as well as information available inside the operating system, the error correction overhead can be significantly reduced. This is shown in a first evaluation by analyzing the achievable improvements in an H.264 video decoder under error injection and simulated error correction. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.