By Topic

Design and Architectures for Signal and Image Processing (DASIP), 2011 Conference on

Date 2-4 Nov. 2011

Filter Results

Displaying Results 1 - 25 of 64
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (256 KB)  
    Freely Available from IEEE
  • [Title page]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (24 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (26 KB)  
    Freely Available from IEEE
  • Table of contents

    Page(s): 1 - 3
    Save to Project icon | Request Permissions | PDF file iconPDF (80 KB)  
    Freely Available from IEEE
  • [Front matter]

    Page(s): 1 - 4
    Save to Project icon | Request Permissions | PDF file iconPDF (305 KB)  
    Freely Available from IEEE
  • Session1: System simulation and processor generation

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (118 KB)  
    Freely Available from IEEE
  • A systemc TLM framework for distributed simulation of complex systems with unpredictable communication

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (539 KB) |  | HTML iconHTML  

    Increasingly complex systems need parallelized simulation engines. In the context of SystemC simulation, existing proposals require predicting communication in the simulated system. However, this is often unpredictable. In order to deal with unpredictable systems, this paper presents a parallelization approach using asynchronous communication without modification of the SystemC simulation engine. Simulated system model is cut up and distributed across separate simulation engines, each part being evaluated in parallel of others. Functional consistency is preserved thanks to the simulated system write exclusive memory access policy while temporal consistency is guaranteed using explicit synchronization. Experimental results show up a speed-up up to 13× on 16 processors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance evaluation of an automotive distributed architecture based on HPAV communication protocol using a transaction level modeling approach

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (383 KB) |  | HTML iconHTML  

    Due to increasing complexity of communication infrastructures in the automotive domain, reliable models are necessary in order to assist designers in the development process of networked embedded systems. In this context, transaction level modeling, supported by languages as SystemC, is a promising solution to assess performances of networked architectures with a good compromise between accuracy and simulation speed. This article presents the application of a specific modeling approach for performance evaluation of a networked embedded system inspired from the automotive domain. The considered approach is illustrated by the modeling of a video transmission system made of three electronic controller units and based on a specific power line communication protocol. The created model incorporates description of various communication layers and simulation of the model allows evaluation of time properties and memory cost inferred. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Morpheo: A high-performance processor generator for a FPGA implementation

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (233 KB) |  | HTML iconHTML  

    Complex applications, such as multimedia, telephony or cryptography, in embedded systems must provide more and more performance that can be achieved by using multiple levels of parallelism. Today, FPGA are viable alternatives for these kinds of applications. Unfortunately, the available processors on FPGA do not provide sufficient performance. This work proposes the Morpheo tool that is a generator of configurable high performance processors dedicated to FPGA. As the FPGA architecture is more restrictive than on ASIC, VHDL models produced by Morpheo can also be used for an ASIC implementation. The main advantage is that there is no need for specific components, therefore, processors are easier to generate. Despite the architectural changes related to the FPGA target, the IPC (Instructions Per Cycle) of 2-way and 4-way superscalar processors are, respectively, 0.81 and 0.74 times that of M5 processors (ASIC targeted) with corresponding parameters. These processors can be placed in a Xilinx Virtex-5 xc5vlx330 using 15% and 31% of hardware available resources and perform at, respectively, 79 MHz and 72 MHz. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of a processor optimized for syntax parsing in video decoders

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (749 KB) |  | HTML iconHTML  

    Heterogeneous platforms aim to offer both performance and flexibility by providing designers processors and programmable logical units on a single platform. Processors implemented on these platforms are usually soft-cores (e.g. Altera NIOS) or ASIC (e.g. ARM Cortex-A8). However, these processors still face limitations in terms of performance compared to full hardware designs in particular for real-time video decoding applications. We present in this paper an innovative approach to improve performance using both a processor optimized for the syntax parsing (an Application-Specific Instruction-set Processor) and a FPGA. The case study has been synthesized on a Xilinx FPGA at a frequency of 100 MHz and we estimate the performance that could be obtained with an ASIC. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Session2: Low power design & methodologies

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (116 KB)  
    Freely Available from IEEE
  • Fast and accurate hybrid power estimation methodology for embedded systems

    Page(s): 1 - 7
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1428 KB) |  | HTML iconHTML  

    Nowadays, having the appropriate Electronic System Level (ESL) tools for power estimation in embedded systems is becoming mandatory. The main challenge for the design of such dedicated tools is to achieve a better trade-offs between accuracy and speed. In this paper, a new power consumption estimation methodology for embedded systems is proposed. First, the Functional Level Power Analysis (FLPA) is used to set up generic power models based on real board measurements. In the second step, a simulation framework is developed to evaluate accurately the architectural parameters of the elaborated power models. The proposed methodology has several benefits: it improves significantly the accuracy of the functional level approach and the power consumption estimation can be accomplished without a costly and complex material. In order to speed up the estimation process, our methodology refers to the selection of data pattern size and to the application sampling technique. Experimental results show that our tool achieves high simulation speed of 21 times faster with a marginal power estimation error of 1%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Embedded operating systems energy overhead

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (488 KB) |  | HTML iconHTML  

    In this paper, a flow of characterization of embedded operating system's energy consumption is presented. The objective is to determine the energy overhead of the services of the embedded OS, we interest particularly on the context switch service. The modeling is based on measurements on the hardware platform OMAP35x EVM board, running Linux omap. Based on the analysis results, a relationship between energy overhead and a set of hardware and software parameters is established. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Session 3: Reconfigurable systems & tools for signal & image processing — Part 1

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (119 KB)  
    Freely Available from IEEE
  • Flexible VLIW processor based on FPGA for real-time image processing

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (263 KB) |  | HTML iconHTML  

    Modern FPGA chips, with their larger memory capacity and reconfigurability potential, are opening new frontiers in rapid prototyping of embedded systems. With the advent of high density FPGAs it is now possible to implement a high performance Very Long Instruction Word (VLIW) processor core in an FPGA. With VLIW architecture, the processor effectiveness depends on the ability of compilers to provide sufficient Instruction Level Parallelism (ILP) from program code. This paper describes research result about enabling the VLIW processor model for real-time processing applications by exploiting FPGA technology. Our goals are to keep the flexibility of processors in order to shorten the development cycle, and to use the powerful FPGA resources in order to increase real-time performance. We present a flexible VLIW VHDL processor model with a variable instruction set and a customizable architecture which allow exploiting intrinsic parallelism of a target application using advanced compiler technology and implementing it in an optimal manner on FPGA. Some common algorithms of image processing were tested and validated on an FPGA Virtex-6 based board using the proposed development cycle. Our approach applies some criteria for co-design tools: flexibility, modularity, performance, and reusability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Acceleration of image reconstruction in 3D ultrasound computer tomography: An evaluation of CPU, GPU and FPGA computing

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (727 KB) |  | HTML iconHTML  

    As today's standard screening methods frequently fail to diagnose breast cancer before metastases have developed, earlier breast cancer diagnosis is still a major challenge. Three-dimensional ultrasound computer tomography promises high-quality images of the breast, but is currently limited by a time-consuming synthetic aperture focusing technique based image reconstruction. In this work, we investigate the acceleration of the image reconstruction by a GPU, and by the FPGAs embedded in our custom data acquisition system. We compare the obtained performance results with a recent multi-core CPU and show that both platforms are able to accelerate processing. The GPU reaches the highest performance. Furthermore, we draw conclusions in terms of applicability of the accelerated reconstructions in future clinical application and highlight general principles for speed-up on GPUs and FPGAs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Task model and online operating system API for hardware tasks in OLLAF platform

    Page(s): 1 - 7
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (525 KB) |  | HTML iconHTML  

    This article present an original hardware task model and the corresponding online API for Fine Grained Dynamically Reconfigurable Architecture. We cover the integration of this API in the OLLAF platform and more specifically its application to memory access management in a dynamically reconfigurable environment. Methods offered by this platform are compared to existing software and hardware solutions. We also discuss of the design complexity of an application using difference solutions. We demonstrate that our solution cans give application developer the same flexibility than with a software implementation, with a very close design complexity while ensuring the same performance gain a common FPGA based IP would permit. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Session 4: Signal and image processing on GPU

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (116 KB)  
    Freely Available from IEEE
  • DFG implementation on multi GPU cluster with computation-communication overlap

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (617 KB) |  | HTML iconHTML  

    Nowadays, it is possible to build a multi-GPU supercomputer, well suited for implementation of digital signal processing algorithms, for a few thousand dollars. However, to achieve the highest performance with this kind of architecture, the programmer has to focus on inter-processor communications, tasks synchronization ... In this paper, we propose a design flow allowing an efficient implementation of a Digital Signal Processing (DSP) application specified as a Data Flow Graph (DFG) on a multi GPU computer cluster. We focus particularly on the effective implementation of communications by automating the computation-communication overlap, which can lead to significant speedups as shown in the presented benchmark. The approach is validated on a 3D granulometry application developed for research on materials. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient parallel motion estimation algorithm and X264 parallelization in CUDA

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (665 KB) |  | HTML iconHTML  

    H.264/AVC video encoders have been widely used for its high coding efficiency. Since the computational demand proportional to the frame resolution is constantly increasing, it has been of great interest to accelerate H.264/AVC by parallel processing. Recently, graphics processing units (GPUs) have emerged as a viable target for accelerating general purpose applications by exploiting fine-grain data parallelisms. Despite extensive research effort to use GPUs to accelerate the H.264/AVC algorithm, it has not been successful to achieve any speed-up over the x264 algorithm that is known as the fastest CPU implementation because of significant communication overhead between the host CPU and the GPU and intra-frame dependency in the algorithm. In this paper, we propose a novel motion estimation (ME) algorithm tailored for NVIDIA GPU implementation. It is accompanied by a novel pipelining technique, called sub-frame ME processing, to effectively hide the communication overhead between the host CPU and the GPU. The proposed H.264 encoder achieves more than 20% speed-up compared with x264. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Session 5: Dynamic architectures & adaptive management for image & signal processing

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (130 KB)  
    Freely Available from IEEE
  • Middleware approaches for adaptivity of Kahn Process Networks on Networks-on-Chip

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1049 KB) |  | HTML iconHTML  

    We investigate and propose a number of different middleware approaches, namely virtual connector, virtual connector with variable rate, and request-driven, which implement the semantics of Kahn Process Networks on Network-on-Chip architectures. All of the presented solutions allow for run-time system adaptivity. We implement the approaches on a Network-on-Chip multiprocessor platform prototyped on an FPGA. Their comparison in terms of the introduced overhead is presented on two case studies with different communication characteristics. We found out that the virtual connector mechanism outperforms other approaches in the communication-intensive application. In the other case study, which has a higher computation/communication ratio, the middleware approaches show similar performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FPGA dynamic reconfiguration using the RVC technology: Inverse quantization case study

    Page(s): 1 - 7
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (360 KB) |  | HTML iconHTML  

    With the rapid evolution of technology, the latest FPGA architectures such as Virtex series of Xilinx introduced a new feature called Dynamic Partial Reconfiguration (DPR). This technique allows designer to configure a portion of the FPGA while other parts continue to run on the same FPGA. The design of an embedded system based on the DPR functionality is still complex and tedious. The MPEG consortium proposes the Reconfigurable Video Coding (RVC) technology. RVC provides a high level description of video decoders described as a set of interconnected Functional Units. This paper studies the use of the RVC technology for the specification of an application and the design of a system based on the DPR functionality. In this paper, we study the Inverse Quantization (IQ) algorithm of an MPEG-4 decoder and how to switch between the MPEG-2 and the H263 IQ algorithms using RVC and DPR. This simple and concrete case study highlights the DPR restrictions to take into account in MPEG RVC description in order to use the DPR. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Graphic rendering application profiling on a shared memory MPSOC architecture

    Page(s): 1 - 7
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (761 KB) |  | HTML iconHTML  

    This paper describes the implementation of a graphic rendering pipeline on an MPSoC architecture devoted to the dynamic management of static task graphs. It exhibits the highly non stationary workloads of this application domain and provides first useful feedbacks motivating the design of innovative embedded architectures that have to face heterogeneous computation domains such as graphics and telecommunications. Especially these experiments stress the needs for data dependent resource allocation strategies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Session 6: Signal processing and processor designs

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (117 KB)  
    Freely Available from IEEE