By Topic

Signal Processing Systems, 2009. SiPS 2009. IEEE Workshop on

Date 7-9 Oct. 2009

Filter Results

Displaying Results 1 - 25 of 61
  • [Title page]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (102 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (52 KB)  
    Freely Available from IEEE
  • Foreword

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (59 KB)  
    Freely Available from IEEE
  • SiPS 2009 organizing committee

    Page(s): 1 - 2
    Save to Project icon | Request Permissions | PDF file iconPDF (99 KB)  
    Freely Available from IEEE
  • SiPS 2009 reviewers

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (54 KB)  
    Freely Available from IEEE
  • SiPS 2009 technical papers

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | PDF file iconPDF (401 KB)  
    Freely Available from IEEE
  • Sparse severe error removal in OFDM demodulators for erasure channels

    Page(s): 001 - 006
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1271 KB) |  | HTML iconHTML  

    Fast Fourier transform (FFT) and inverse FFT (IFFT) are used in orthogonal frequency-division multiplexing (OFDM) systems for efficiently converting frequency domain signals that carry discrete information symbols from or to time domain sequences, respectively. This discreteness of the frequency-domain symbol set in context of FFT is of great interest in this paper. In such a paradigm, unfortunately, sparse data corruption in the time domain sequence leads to catastrophic signal degradation in FFT due to its natural noise spread effect, and little can be mitigated by using error control coding schemes (ECC). Potential causes of sparse errors include time-domain sample loss, impulsive noise, and circuit failure in integrated circuits. In this paper, we first characterize the impact of sparse severe errors in the context of FFT, as opposed to that of commonly considered additive white Gaussian noise (AWGN). Based on observation of distinct impact, a series of computationally efficient algorithms are proposed for detecting and mitigating such errors in lieu of ECC. Simulation results show that our proposed algorithms can effectively combat various sparse errors and dramatically improve the performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel circulant approximation method for frequency domain LMMSE equalization

    Page(s): 007 - 012
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1007 KB) |  | HTML iconHTML  

    The linear minimum mean square error (LMMSE) equalizer coefficients of a stationary signal are defined by a Toeplitz system. The Toeplitz structure lends itself to computation in frequency domain, which reduces complexity. In this paper we investigate circulant embedding and circulant approximation methods applied to the preconditioned conjugate gradient (PCG) method and frequency domain equalization. We develop a novel circulant approximation method which improves the performance/complexity tradeoff. All considered algorithms are benchmarked in terms of implementation complexity and capacity achieved by a high speed downlink packet access (HSDPA) receiver in a multipath fading scenario. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-level modulation soft-decision demapper for DVB-S2

    Page(s): 013 - 017
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (947 KB) |  | HTML iconHTML  

    This paper presents a low complexity soft-decision demapper for the digital video broadcasting - satellite second generation (DVB-S2). To achieve a good bit error rate (BER) performance of a low density parity check (LDPC) code decoder, the received signal should be soft-decided rather than hard-decided. However, the soft-decision demapper requires high hardware complexity to support higher-order modulation modes. The proposed soft-decision demapper can reduce the hardware complexity by reusing multipliers. In addition, we propose an efficient soft-decision demapper interface that can operate at a symbol rate and we can replace a parallel to serial (P/S) converter with the proposed interface by locating between an M-phase shift keying (PSK) demodulator and the proposed demapper. The proposed soft-decision demapper and and its interface have been verified in XilinxTM Virtex II. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of rotated QAM mapper/demapper for the DVB-T2 standard

    Page(s): 018 - 023
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1639 KB) |  | HTML iconHTML  

    Signal space diversity (SSD) has been lately adopted into the second generation of the terrestrial digital video broadcasting standard DVB-T2. While spectrally efficient, SSD improves the performance of QAM constellations over fading channels thanks to an increased diversity. In this paper, flexible mapper and demapper architectures for DVB-T2 standard are detailed. A detection based on the decomposition of the constellation into two-dimensional sub-regions in signal space associated to an algorithmic simplification constitute the main novelty of this work. They enable to strongly decrease the complexity of the demapper. The design and the FPGA prototyping of the resultant architecture are then described. Low architecture complexity and measured performance demonstrate the efficiency of the detection method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Register file exploration for a multi-standard wireless forward error correction ASIP

    Page(s): 024 - 029
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1103 KB) |  | HTML iconHTML  

    Given the increase in the number of wireless standards, software defined radio has emerged as a cost effective way of supporting multiple standards on the same platform architecture. Embedded systems with such platforms need to power efficient and meet the real time constraints of these wireless standards. Register files are known to be power and performance bottlenecks in high performance low power embedded processors. Given the strict power constraints and real-time performance constraints of these applications a comprehensive study of the register file architecture is needed to reach the most optimal architecture. In this paper we perform an in depth analysis on different register file architectures and their configurations for wireless forward error correction algorithms of different wireless standards like 802.11n and 802.16e. We analyze the traditional clustered register file, hierarchical register file, stream register file as well as asymmetrical register files and show that there are various trade-offs between power and performance across these different architectures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation of the W-CDMA cell search on a MPSOC designed for software defined radios

    Page(s): 030 - 035
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1133 KB) |  | HTML iconHTML  

    This paper describes the implementation of theW-CDMA cell search algorithm on a homogeneous general purpose multi-processor system-on-chip architecture. The architecture is composed of nine nodes based on COFFEE RISC cores communicating using hierarchical network-on-chip. The work focuses on the parallelization of the cell search algorithm, enabling execution on different processing nodes, and exploiting the capabilities of the network-on-chip. We achieved a total speed-up of 7.3 X when compared with a single processing core system, taking into account the overhead related with the communication between different nodes. The result is significant since very close to the theoretical maximum of 9 X. Considering the hardware implementation, the target cell search is performed in 104 ms on an FPGA with 75 MHz maximum frequency, and in 40 ms on an ASIC circuit with 200 MHz maximum frequency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Two-parallel concatenated BCH super-FEC architecture for 100-GB/S optical communications

    Page(s): 036 - 039
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (914 KB) |  | HTML iconHTML  

    This paper presents a high-speed forward error correction (FEC) architecture based on the concatenated BCH code for 100-Gb/s optical communication systems. The concatenated BCH code consists of BCH (3860, 3824) and BCH(2040, 1930), which provides 7.98 dB net coding gain at 10-12 corrected bit error rate without additive overhead as compared with the Reed-Solomon(255, 239) standardized in ITU-T G.975 and G.709. This architecture has been implemented with 90-nm CMOS standard cell technology in a supply voltage of 1.1 V. The implementation results show that the concatenated BCH super-FEC architecture can operates at a clock frequency of 400 MHz and has a throughput of 102.4-Gb/s for 90-nm CMOS technology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-power implementation of a high-throughput LDPC decoder for IEEE 802.11N standard

    Page(s): 040 - 045
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1194 KB) |  | HTML iconHTML  

    Flexible and scalable LDPC decoder architecture is developed for the IEEE 802.11n standard. The serial-parallel architecture is employed for achieving high throughput with low chip area, and triple-bank memory blocks are used for parallel factor expansion. Two low-power strategies using voltage over-scaling (VOS) and reduced-precision replica (RPR) are applied to the decoder. By applying these techniques, power saving of up to 35% is demonstrated when implemented in a 90 nm CMOS technology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-complexity frame-size down-scaling integrated with IDCT

    Page(s): 046 - 050
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (912 KB) |  | HTML iconHTML  

    This work develops a low-complexity method that integrates the inverse transform and downscaling. According to the application requirement, the reduction ratios of the frame width and length are adopted at the matrix computation of the inverse transform. Since a small matrix is employed to compute the inverse transform and downscaling, the non-zero high-frequency components are compensated at the spatial and frequency domains. As compared to the conventional methods, the computational complexity of proposed method with the frequency-domain compensation can be reduced effectively under the similar decoded picture quality. Particularly, the proposed method can meet various downscaling ratios to accomplish low-complexity transforms for mobile video applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Is the differential frequency-based attack effective against random delay insertion?

    Page(s): 051 - 056
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1131 KB) |  | HTML iconHTML  

    The secret key stored in a cryptographic device can be revealed from the power consumption using statistical analysis in a technique known as differential power analysis (DPA). However, DPA attacks are sensitive to measurement misalignments in the power samples that reduce the dependency between the power and the data. A countermeasure technique that increases this misalignment by inserting random delays between operations, known as random delay insertion, was shown in previous research to be effective against DPA on hardware implementations. A differential frequency-based attack (DFBA) is a DPA technique that involves a frequency-based preprocessing step and it can be utilized to attack security implementations that include misalignments. In this research, a DFBA attack is carried out on an AES algorithm implemented on both ASIC and FPGA devices. The results indicate that the length of delay which the DFBA attack can reduce is limited. Therefore, the RDI countermeasure is effective against DFBA when the inserted delay is larger than the effective DFBA window size. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Approximating sine functions using variable-precision Taylor polynomials

    Page(s): 057 - 062
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1008 KB) |  | HTML iconHTML  

    Sine is one of the fundamental mathematic functions which are widely used in a number of application fields. In particular, signal processing and telecommunications need to calculate sine and cosine of numerical values for several different purposes. One of the challenges which affected the implementation of sine calculation in digital signal processing (DSP) has been the method used to calculate it by means of rational functions, which would allow the implementation of sine calculation in a digital computer system. One possibility is to exploit the Taylor polynomials, even though their main drawback consists of a relatively high grade (thus computational load) already for relatively low-precision approximations. This paper proposes a variable-precision method that allows approximating sine and cosine functions with Taylor polynomials while significantly reducing the computational load required. Our analysis shows how using our method it is possible to achieve the same accuracy marked by other approximation methods, at a lower computational cost. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing processor energy consumption by compiler optimization

    Page(s): 063 - 068
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (913 KB) |  | HTML iconHTML  

    Purpose of embedded computing is to transform input data to output format. Functionality required to achieve this goal is therefore combination of operation executions on computing units and data transfers between those units. To avoid memory bottlenecks, processors use register files to store data during computation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hardware reduction methodology for 2-dimensional kurtotic fastica based on algorithmic analysis and architectural symmetry

    Page(s): 069 - 074
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1528 KB) |  | HTML iconHTML  

    In this paper we propose a hardware reduction methodology through detailed algorithmic analysis and exploiting datapath symmetry for 2D Kurtotic Fast ICA. The relationship of the hardware saving with respect to input data frame-length and maximum iteration for convergence is also explored. An example architecture following the developed hardware reduction methodology consumes 3:55 mm2 silicon area and 27:1 muW @1 MHz at 1:2 V supply using 0:13 mum standard cell CMOS technology showing the effectiveness of the proposed methodology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel object detection on multicore platforms

    Page(s): 075 - 080
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2060 KB) |  | HTML iconHTML  

    Object detection is an important function for intelligent multimedia processing, but its computational complexity prevented its pervasive uses in consumer electronics. Cost-effective & energy-efficient computations are now available with various innovative multicore architectures proposed for embedded systems. However, extensive software optimizations are needed to unravel the inherent parallelisms in object detection for multicore processing. This paper presents interleaved reordering and splitting of parallel tasks in object detection. Overall performance improvements by 10% & 19% have been measured for the proposed methods respectively on a face detection prototype implemented on Sony PlayStation 3. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconfigurable video decoder with transform acceleration

    Page(s): 081 - 086
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1811 KB) |  | HTML iconHTML  

    Reconfigurable video coding (RVC) is an emerging concept allowing efficient usage of hardware resources. With the aid of reconfiguration the same computing resources can be used with different video coding standards and different decoding parameters. In this paper, an application specific processor (ASP) is proposed for the transform block of the RVC. The most intensive functions of several video coding standards are identified and their computations are accelerated with special function units (SFU). The same SFUs are shared among all the targeted video standards. As a result, the proposed ASP is capable of real-time decoding of MPEG-4, H.264, and VC-1 video streams with all the profiles and HD 720p resolution and frame rate. Moreover, the ASP is easily reconfigurable to support other formats with the same kernel functions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SIMD processor based implementation of recursive filtering equations

    Page(s): 087 - 092
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1944 KB) |  | HTML iconHTML  

    Implementation of recursive equations using parallel computer architecture has long been of interest because the dependency problem makes it difficult to achieve significant speed-up. In this paper, efficient implementation of recursive filtering equations on partitioned data-path SIMD (Single Instruction Multiple Data) processors is studied. Especially, three parallel computation techniques, which are the block filtering, recursive doubling, and multi-block filtering methods, are implemented and their performances are compared using a Pentium CPU based system. The performance evaluation result of the multi-block processing method on a scalable SIMD processor is also presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An adaptive fast multiple reference frame selection algorithm for H.264/AVC using reference region data

    Page(s): 093 - 096
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2372 KB) |  | HTML iconHTML  

    Multiple reference frame motion estimation to improve video coding performance is newly adopted in the H.264/AVC video coding standard. However, much complexity is burdened in these features. Therefore, we propose a fast reference frame selection algorithm based on information from the reference region used for a motion estimation process. Simulation results show that the proposed algorithm effectively reduces the complexity of multiple reference frame motion estimation with negligible degradation of video quality. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An early block type decision method for intra prediction in H.264/AVC

    Page(s): 097 - 101
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2537 KB) |  | HTML iconHTML  

    In this paper, we propose an early block type decision method to reduce the complexity of rate-distortion (R-D) cost computation in intra prediction. Our method decides the block size early among luma 4 times 4, 8 times 8 and 16 times 16 with simple decision scheme. R-D cost for mode decision is used for R-D cost of a subblock ( 8 times 16 block ) instead of that of macroblock (MB) with the proposed encoding order for the intra prediction of 4 times 4 and 8 times 8 block. The R-D cost of a subblock consists of luma components only. Chroma components have a smaller potion of the R-D cost because they use subsampling (4:2:0) and there is relatively small variance among chroma pixels. Our method provides the reduction of computational complexity with average 0.04 dB PSNR degradation and rate increase of less than 1% in comparison with the full search method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of an interlayer deblocking filter architecture for H.264/SVC based on a novel sample-level filtering order

    Page(s): 102 - 108
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4420 KB) |  | HTML iconHTML  

    This paper presents the architectural design for an interlayer deblocking filter of the H.264/SVC standard. The architecture described applies a novel and efficient processing order based on sample-level filterings. This order allows a better exploration of the filter parallelism, decreasing in 25% the number of cycles used to filter the videos, when compared to the best related work. Four concurrent filter cores were used in the architecture, which was described in VHDL and synthesized for an Altera Stratix III FPGA device. The timing analysis results showed that this design is able to filter up to 130 HDTV (1920times1080 pixels) frames per second. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.