By Topic

Embedded Systems for Real-Time Multimedia, 2009. ESTIMedia 2009. IEEE/ACM/IFIP 7th Workshop on

Date 15-16 Oct. 2009

Filter Results

Displaying Results 1 - 25 of 29
  • [Title page]

    Publication Year: 2009 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (165 KB)  
    Freely Available from IEEE
  • 2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia - ESTIMedia 2009 [cover]

    Publication Year: 2009 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (809 KB)  
    Freely Available from IEEE
  • ESTIMedia 2009 [CD Hub Page]

    Publication Year: 2009 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (324 KB)  
    Freely Available from IEEE
  • ESTIMedia 2009 hub page

    Publication Year: 2009 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (150 KB)  
    Freely Available from IEEE
  • Session list

    Publication Year: 2009 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (18 KB)  
    Freely Available from IEEE
  • ESTIMedia 2009 Table of Contents

    Publication Year: 2009 , Page(s): 1 - 3
    Save to Project icon | Request Permissions | PDF file iconPDF (36 KB)  
    Freely Available from IEEE
  • Author index

    Publication Year: 2009 , Page(s): 1 - 2
    Save to Project icon | Request Permissions | PDF file iconPDF (20 KB)  
    Freely Available from IEEE
  • ESTIMedia 2009 - Detailed author index

    Publication Year: 2009 , Page(s): 1 - 8
    Save to Project icon | Request Permissions | PDF file iconPDF (29 KB)  
    Freely Available from IEEE
  • The end of indexes

    Publication Year: 2009 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (21 KB)  
    Freely Available from IEEE
  • 2009 IEEE/ACM/IFIP 7th Workshop on Embedded Systems for Real-Time Multimedia (ESTIMEDIA 2009) [Conference welcome]

    Publication Year: 2009 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (146 KB)  
    Freely Available from IEEE
  • Technical program committee

    Publication Year: 2009 , Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (70 KB)  
    Freely Available from IEEE
  • Multimedia systems in a changing technology landscape

    Publication Year: 2009 , Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (72 KB)  

    Summary form only given. Multimedia systems are typically computation intensive. Ever more complex algorithms and the continuous increasing resolution suggest that this trend will continue for the foreseeable future. Silicon IC technology scaling delivers the required transistor densities to meet the computational needs of multimedia systems, however, at an ever increasing cost. Changes in the technology landscape influence multimedia systems architectures, resulting in design challenges and opportunities. In this presentation we provide a glimpse of this technology-design interaction in the context of multimedia systems, touching upon multi-core architectures, 3D chip-stacking, and embedded MEMS technology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Media processing challenges in today's (and tomorrow's) consumer SoCs

    Publication Year: 2009 , Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (71 KB)  

    Summary form only given. This presentation provides an industrial, hardware-centric perspective on embedded systems for real-time multimedia. It consists of two parts. The first part provides some background in terms of NXP consumer SoCs. It provides a high level overview of a TV SoC, the PNX85500, and details one of its components, the TriMedia processor. This mediaprocessor is used for a multitude of diverse tasks and allows for efficient hardware/software co-design. The second part zooms into some specific functions and challenges, related to the design of a TV SoC. This part addresses functions such as Frame Rate Conversion and 3D TV, and challenges such as off-chip memory communication, video compression and the need for "design for flexibility". It is the intent of the presentation to identify today's challenges from which research opportunities may be derived. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A high-performance low-power H.264/AVC video decoder accelerator for embedded systems

    Publication Year: 2009 , Page(s): 1 - 8
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (207 KB)  

    We present a high-performance and low-power pure-hardware accelerator for decoding H.264/AVC video. We propose novel VLSI architectures for every stage of the decoding pipeline. We wrap the decoder core with an AMBA bus interface, integrate it into a multimedia SOC platform, and verify it with FPGA prototyping. In order to reduce external memory traffic, we propose a memory fetch unit to increase the length of burst access. Running at a 16 MHz, our FPGA decoder prototype can real-time decode D1 video (720×480) at 30 fps. We also propose several techniques to reduce both average and peak power consumption. Simulation result shows that our design consumes only 21.2 mW of average power. The proposed H.264/AVC video decoder is suitable for embedded multimedia systems for mobile applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A high-throughput pipelined architecture for JPEG XR encoding

    Publication Year: 2009 , Page(s): 9 - 17
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (155 KB)  

    JPEG XR is an emerging image coding standard, based on HD Photo developed by Microsoft. It supports high compression performance twice as high as the de facto image coding system, namely JPEG, and also has an advantage over JPEG 2000 in terms of computational cost. JPEG XR is expected to be widespread for many devices including embedded systems in the near future. In this paper, we propose a novel architecture for JPEG XR encoding. In previous architectures, entropy coding was the throughput bottleneck because it was implemented as a sequential algorithm to handle data with dependency. We found that there is no dependency in intra-macroblock data, and we could safely pipeline all the encoding processes including the entropy coding. The proposed fully-pipelined architecture achieves 100 M pixel/sec at 125 MHz which could not be achieved by previous works. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A high-throughput, area-efficient hardware accelerator for adaptive deblocking filter in H.264/AVC

    Publication Year: 2009 , Page(s): 18 - 27
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (300 KB)  

    In this paper, we present a high-throughput, area-efficient, hardware accelerator for the deblocking filter in H.264/AVC video compression standard. In order to achieve this goal, we start with algorithmic optimization and propose a novel decomposition of the filter kernels for the deblocking filter. The proposed decomposition reduces the number of adders by 51% and thereby greatly reduces the area requirement for its implementation. Subsequently, at architecture level, while using two identical filtering units, the transpose units are realized by efficient reuse of hardware resources to further reduce the area requirement. The two filtering units process the horizontal and vertical edges of the macro-block simultaneously and therefore further enhance the throughput of the hardware accelerator. Several other optimization techniques, such as reuse of intermediate results, pipelining, and merging of processing blocks on critical path, result in a hardware accelerator for deblocking filter with high throughput at one hand and less area in terms of equivalent gates count on the other, when compared with existing state-of-the-art hardware accelerators in the literature. While working at clock frequency of 166 MHz, synthesized under 0.18 ¿m CMOS standard cell technology, it easily meets the throughput requirements of all the levels in H.264/AVC video coding standard and consumes only 12.06 K gates (excluding SRAM). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An effective dictionary-based display frame compressor

    Publication Year: 2009 , Page(s): 28 - 34
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (665 KB)  

    For all video applications, large amounts of data are processed within a bounded time. These data are usually stored in a low-cost slow external DRAM which results in high memory bandwidth requirement. The memory bandwidth will dominate the system performance, especially for applications running on embedded systems. In this paper, we propose an effective dictionary-based compression and de-compression algorithm for display frames in a video decoding system and present its hardware implementation. We have integrated the proposed design into an H.264/AVC video decoder. Simulation result shows that the proposed algorithm achieves 54% of compression ratio and 34% of memory traffic reduction when decoding 1080 HD video. It is much more effective than all previous works. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient execution of Kahn process networks on multi-processor systems using protothreads and windowed FIFOs

    Publication Year: 2009 , Page(s): 35 - 44
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (342 KB)  

    As single-processor systems are ceasing to scale effectively, multi-processor systems are becoming more and more popular. While there are many challenges of designing multi-processor systems in hardware, writing efficient parallel applications that utilize the computing capability of multiple processors may reveal to be even more challenging. In this paper, we introduce a framework that allows to efficiently execute applications expressed as Kahn process networks on multi-processor systems using protothreads and windowed FIFOs. We show that application developers can use this framework to achieve considerable speed-ups on the Cell Broadband Engine without needing to write architecture-specific code. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Inter-kernel data reuse and pipelining on chip-multiprocessors for multimedia applications

    Publication Year: 2009 , Page(s): 45 - 54
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (543 KB)  

    The increasing demand for low power and high performance multimedia embedded systems has motivated the need for effective solutions to satisfy application bandwidth and latency requirements under a tight power budget. As technology scales, it is imperative that applications are optimized to take full advantage of the underlying resources and meet both power and performance requirements. We propose a methodology capable of discovering and enabling parallelism opportunities via code transformations, efficiently distributing the computational load across resources, and minimizing unnecessary data transfers. Our approach decomposes the application's tasks into smaller units of computations called kernels, which are distributed and pipelined across the different processing resources. We exploit the ideas of inter-kernel data reuse to minimize unnecessary data transfers between kernels and early execution edges to drive performance. Our experimental results on a JPEG2000 case study show up to 80% performance improvement and 60% dynamic power reduction over standard application mapping approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast configuration of MEMS-based storage devices for streaming applications

    Publication Year: 2009 , Page(s): 55 - 63
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (428 KB)  

    An exciting class of storage devices is emerging: the class of Micro-Electro-Mechanical storage Systems (MEMS). Properties of MEMS-based storage devices include high density, small form factor, and low power. The use of this type of devices in mobile infotainment systems, such as video cameras is not at all obvious. We must explore their configuration and assess their benefit with respect to existing devices, such as Flash. In this paper, we study the configuration of the data layout of MEMS-based storage devices for predominately streaming applications. The configuration targets: the timing performance, energy consumption, and the capacity. We show that formatting the data layout based on knowledge of the expected streaming workload results in devices that consume at least 22% less energy than Flash and perform comparably. We present a fast and effective configuration method to format the data layout. This method exploits the bimodal distribution of the request size in streaming applications. Using the fast method, we present a design rule for streaming environments: reducing the amount of prefetching of streaming data allows to reach configurations with small trade-offs between the design targets. Index Terms-Secondary storage, Probe storage, Quality of service, Energy efficiency, Design space, Data layout. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • QoS management of dynamic video tasks by task splitting and skipping

    Publication Year: 2009 , Page(s): 64 - 69
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (244 KB)  

    We have integrated processing with deterministic and non-deterministic resource usage in an overall application and evaluated its performance on a multi-core processor platform. The non-determinism involves image analysis, which features a high variation in computing and memory requirements, as opposed to regular stream-oriented video processing. Quality-of-Service (QoS) control is based on resource-usage estimation functions. Scalability in parallel executing sub-functions is achieved by using task skipping and splitting as a concept, as every video application can be quickly made scalable in this way. The complete framework was validated for accurate latency control of an interactive medical imaging application. The proposed QoS mechanism runs fast enough to be executed in real time, and we achieved a reduction on the latency jitter of almost 70% for average-case processing, so that the quality can be significantly improved or an inexpensive hardware system can be employed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The wizard of OS: a heartbeat for Legacy multimedia applications

    Publication Year: 2009 , Page(s): 70 - 79
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (287 KB)  

    Multimedia applications are often characterised by implicit temporal constraints but, in many cases, they are not programmed using any specialised real-time API. These ¿Legacy applications¿ have no way to communicate their temporal constraints to the OS kernel, and their quality of service (QoS), being necessarily linked to the temporal behaviour, fails to satisfy acceptable standards. In this paper we propose an innovative way for dealing with these applications, based on the combination of an on-line identification mechanism (which extracts from high-level observations such important parameters as the execution rate) and an adaptive scheduler (specialised for legacy applications) that identifies the correct amount of CPU needed by each application. Preliminary experimental results are reported, proving the effectiveness of the proposed idea in providing a widely used multimedia player on Linux with appropriate QoS guarantees, through an appropriate choice of the scheduling parameters. Finally, a detailed road-map is presented with the possible extensions to the approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • System-level MP-SoC design space exploration using tree visualization

    Publication Year: 2009 , Page(s): 80 - 88
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1241 KB)  

    The complexity of today's embedded systems forces designers to model and simulate systems and their components to explore the wide range of design choices. Such design space exploration is especially needed during the early design stages, where the design space is at its largest. Due to the exponential design space in real problems, evaluating and comparing every single point in the design space is infeasible. Therefore, heuristic search techniques, such as Evolutionary Algorithms (EA), are often used to search the design space for optimum design points using only a finite number of design-point evaluations. Understanding how the design space was searched by such searching algorithms and providing insight into the ¿landscape¿ of the design space, may be of invaluable importance to the designer, To this end, this paper presents a novel interactive visualization application, based on tree visualization, to understand the search dynamics of an evolutionary algorithm and to visualize where the optimum design points are located in the design space. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design space exploration for optimal memory mapping of data and instructions in multimedia applications to Scratch-Pad Memories

    Publication Year: 2009 , Page(s): 89 - 95
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (607 KB)  

    In this paper, we propose a new methodology for optimal memory mapping of data and instructions to Scratch-Pad Memories (SPM). In the mapping process, we optimize, as the main priority, the number of memory accesses to minimize power consumption. Minimization of external memory accesses lowers switching activity and therefore power consumption. The optimization is done by finding Pareto-points, using multi-objective optimization that combines different cost functions. Our methodology is intended to be used in real-life situations in industry where there is often a need for mapping third party applications to a specific architecture. For evaluating our methodology, we also use commercial video H.264 and audio eAAC+ applications. Our experiments show that SPM is well suited for these applications for reducing external accesses to reduce power consumption but has limited significance on overall performance improvements. The proposed methodology provides a way to combine SPMs with caches to optimally use this memory architecture. Our experiments indicate high accuracy of our methodology for predicting SPM and external memory accesses. We have obtained 90% accuracy between results of our methodology and results for executing applications on a given architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploring trade-offs between performance and resource requirements for synchronous dataflow graphs

    Publication Year: 2009 , Page(s): 96 - 105
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1017 KB)  

    Synchronous dataflow graphs (SDFGs) are widely used to model streaming applications such as signal processing and multimedia applications. These are often implemented on resource-constrained embedded platforms ranging from PDAs and cell phones to automobile equipment and printing systems. Trade-off analysis between resource usage and performance is critical in the life cycle of those products, from tailoring platforms to target applications at design time to resource management at runtime. We present a trade-off analysis method for SDFGs based on model-checking techniques and leveraging knowledge from the dataflow domain. We develop results to prune the state space of an SDFG for multi-objective model checking without loosing optimality. To achieve scalability to large state spaces, we combine these pruning techniques with pragmatic heuristics. We evaluate our techniques with two sets of experiments. One set shows we can now do throughput-storage trade-off analysis for shared memory architectures, showing reductions in memory usage of 10-50% compared to existing distributed memory based analysis. A second set of experiments shows how our techniques support design-space exploration for the digital datapath of a professional printer system. Analysis times range from less than a second to at most several minutes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.