By Topic

Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on

Date 17-21 May 2004

Go

Filter Results

Displaying Results 1 - 25 of 274
  • 2004 IEEE International Conference on Acoustics, Speech and Signal Processing

    Page(s): 0_1
    Save to Project icon | Request Permissions | PDF file iconPDF (837 KB)  
    Freely Available from IEEE
  • 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing

    Page(s): i
    Save to Project icon | Request Permissions | PDF file iconPDF (183 KB)  
    Freely Available from IEEE
  • Copyright page

    Page(s): ii
    Save to Project icon | Request Permissions | PDF file iconPDF (171 KB)  
    Freely Available from IEEE
  • IEEE Signal Processing Society 2004 Board of Governors

    Page(s): iii
    Save to Project icon | Request Permissions | PDF file iconPDF (176 KB)  
    Freely Available from IEEE
  • ICASSP 2004 Conference Committee

    Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (175 KB)  
    Freely Available from IEEE
  • Technical Program Committee

    Page(s): v - xi
    Save to Project icon | Request Permissions | PDF file iconPDF (199 KB)  
    Freely Available from IEEE
  • Future ICASSP conferences

    Page(s): xii
    Save to Project icon | Request Permissions | PDF file iconPDF (164 KB)  
    Freely Available from IEEE
  • ICASSP 2005 Philadelphia

    Page(s): xiii
    Save to Project icon | Request Permissions | PDF file iconPDF (204 KB)  
    Freely Available from IEEE
  • Conference proceedings overview

    Page(s): xiv
    Save to Project icon | Request Permissions | PDF file iconPDF (163 KB)  
    Freely Available from IEEE
  • Table of contents

    Page(s): xv - XVIII
    Save to Project icon | Request Permissions | PDF file iconPDF (443 KB)  
    Freely Available from IEEE
  • ICASSP 2004 technical program

    Page(s): XIX
    Save to Project icon | Request Permissions | PDF file iconPDF (157 KB)  
    Freely Available from IEEE
  • Asynchronous multi-core architecture for level set methods

    Page(s): V - 1-4 vol.5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (265 KB) |  | HTML iconHTML  

    The paper proposes an asynchronous multi-core architecture for embedded systems using partial differential equation-based image processing algorithms. A study of data flow and timing analysis is carried out in order to reveal optimal global architecture specifications. The global architecture uses a semi-parallel approach with several processing units running in parallel and shared memory blocks. The results are illustrated by the implementation of a continuous watershed transform, followed by a discussion of the measured execution time and the computational load to demonstrate the efficiency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy efficient cluster co-processors [3G wireless applications]

    Page(s): V - 5-8 vol.5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (239 KB) |  | HTML iconHTML  

    New 3G wireless algorithms require more performance than can be currently provided by embedded processors. ASICs provide the necessary performance but are costly to design and sacrifice generality. This paper introduces a clustered VLIW coprocessor approach that organizes the execution and storage resources differently than a traditional general-purpose processor or DSP. The execution units of the coprocessor are clustered and embedded in a rich set of communication resources. Fine grain control of these resources is imposed by a wide-word horizontal micro-code program. The advantages of this approach are quantified on a suite of six algorithms that are taken from both traditional DSP applications and from the new 3G cellular telephony domain. The result is surprising. The execution clusters retain much of the generality of a conventional processor while simultaneously improving performance by one to two orders of magnitude and by reducing energy-delay by three to four orders of magnitude when compared to a conventional embedded processor such as the Intel XScale. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fully utilized and reusable architecture for fractional motion estimation of H.264/AVC

    Page(s): V - 9-12 vol.5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (742 KB) |  | HTML iconHTML  

    We contributed a new VLSI architecture for fractional motion estimation of the H.264/AVC video compression standard. Seven inter-related loops extracted from the complex procedure are analyzed and two decomposing techniques are proposed to parallelize the algorithm for hardware with a regular schedule and full utilization. The proposed architecture, also characterized by a reusable feature, can support situations in different specifications, multiple standards, fast algorithms and some cost considerations. H.264/AVC baseline profile level 3 with complete Lagrangian mode decision can be realized with 290K gates at operating frequency of 100 MHz. It is a useful intellectual property (IP) design for platform based multimedia systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Memory analysis and architecture for two-dimensional discrete wavelet transform

    Page(s): V - 13-16 vol.5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (249 KB) |  | HTML iconHTML  

    The large amount of the frame memory access and the die area occupied by the embedded internal buffer are the most critical issues for the implementation of the two-dimensional discrete wavelet transform (2D DWT). The former may consume the most power and waste the system memory bandwidth. The latter may enlarge the chip size and also consume much power. We categorize and analyze the 2D DWT architectures by different external memory scan methods. Then the overlapped stripe-based scan method is proposed to provide an efficient and flexible implementation for 2D DWT. The implementation issues of the internal buffer are also discussed, including the lifting-based and convolution-based. Some real-life experiments are given to show that the performance of area and power for the internal buffer is highly related to memory technology and working frequency, instead of the required memory bits only. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A low power reconfigurable DCT architecture to trade off image quality for computational complexity

    Page(s): V - 17-20 vol.5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (452 KB) |  | HTML iconHTML  

    We present a low power reconfigurable DCT design, which achieves considerable computational complexity reduction in DCT operation with minimum image quality degradation. The approach is based on the modification of DCT bases in a bit-wise manner. Different computational complexity/image quality trade off levels are presented and a reconfigurable architecture. which can dynamically change from one trade off level to another, is also proposed. The reconfigurable DCT architecture can achieve power savings ranging from 20% to 70% for 5 different trade off levels. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pipelining of parallel multiplexer loops and decision feedback equalizers

    Page(s): V - 21-4 vol.5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (307 KB) |  | HTML iconHTML  

    The high speed implementation of a DFE (decision feedback equalizer) requires reformulation of the DFE into an array of comparators and a multiplexer loop. The throughput of the DFE is limited by the speed of the multiplexer loop. This paper proposes a novel look-ahead computation approach to pipeline multiplexer loops. The proposed technique is demonstrated and applied to design multiplexer loop based DFEs with throughput in the range of 3.125-10 Gbps. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interleaved trellis coded modulation and decoding for 10 Gigabit Ethernet over copper

    Page(s): V - 25-8 vol.5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (259 KB) |  | HTML iconHTML  

    It is highly likely that 10 Gigabit Ethernet over copper (10GBASE-T) transceivers will use a 10-level pulse amplitude modulation (PAM 10) as well as a 4D trellis code as in 1000BASE-T. The traditional trellis coded modulation scheme, as in 1000BASE-T, leads to a design where the corresponding decoder with a long critical path needs to operate at 833 MHz. It is difficult to meet the critical path requirements of such a decoder. To solve the problem, two interleaved trellis coded modulation schemes are proposed. The inherent decoding speed requirements are relaxed by factors of 4 and 2, respectively. Parallel decoding of the interleaved codes requires multiple decoders. To reduce the hardware overhead, time-multiplexed or folded decoder structures are proposed where only one decoder is needed and each delay in the decoder is replaced with four delays for scheme 1 and two delays for scheme 2, respectively. These delays can be used to reduce the critical path. Compared with the conventional decoder, the folded decoders for the two proposed schemes can achieve speedups of 4 and 2, respectively. Simulation results show that the error-rate performances of the two schemes are quite close to that of the conventional scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Memory sub-banking scheme for high throughput turbo decoder

    Page(s): V - 29-32 vol.5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (330 KB) |  | HTML iconHTML  

    Turbo codes have revolutionized the world of coding theory with their superior performance. However, the implementation of these codes is both computationally and memory-intensive. Recently, the sliding window (SW) approach has been proposed as an effective means of reducing the decoding delay as well as the memory requirements of turbo implementations. In this paper, we present a sub-banked implementation of the SW-based approach that achieves high throughput, low decoding latency and reduced memory energy consumption. Our contributions include derivation of the optimal memory sub-banked structure for different SW configurations, study of the relationship between memory size, energy consumption and decoding latency for different SW configurations and study of the effect of number of sub-banks on the throughput and decoding latency of a given SW configuration. The theoretical study has been validated by SimpleScalar for a rate 1/3 MAP decoder. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reduced-complexity implementation of algebraic soft-decision decoding of Reed-Solomon codes

    Page(s): V - 33-6 vol.5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (290 KB) |  | HTML iconHTML  

    A reduced complexity implementation of a soft Chase algorithm for algebraic soft-decision decoding of Reed-Solomon (RS) codes, based on the recently proposed algorithm of Koetter and Vardy, is presented. The reduction in complexity is obtained at the algorithm level by integrating the re-encoding and Chase algorithms and at the architecture level by considering a backup mode which sharply reduces the average computational complexity of the hybrid decoder. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Wordlength optimization with complexity-and-distortion measure and its application to broadband wireless demodulator design

    Page(s): V - 37-40 vol.5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (252 KB) |  | HTML iconHTML  

    Many digital signal processing algorithms are first developed in floating point and later mapped into fixed point for digital hardware implementation. During this mapping, wordlengths are searched to minimize total hardware cost and maximize system performance. Complexity and distortion measures have been separately researched for optimum wordlength selection. This paper proposes a complexity-and-distortion measure (CDM) method that combines these two measures. The CDM method trades off these two measures using a weighting factor. The proposed method is applied to wordlength design of a fixed broadband wireless demodulator. For this case study, the proposed method finds the optimal solution in one-third the time that exhaustive search takes. The contributions of this paper are: (1) a generalization of search methods based on complexity or distortion measures; (2) a framework of automatic wordlength optimization; and (3) a wireless demodulator case study. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Floating-point to fixed-point conversion with decision errors due to quantization

    Page(s): V - 41-4 vol.5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (251 KB) |  | HTML iconHTML  

    Most existing analyses of quantization effects are given under the condition that all decision-making blocks, if they exist in a system, produce identical decisions in both fixed-point and infinite-precision (IP) implementations. However, in doing floating-point to fixed-point conversion (FFC), a fixed-point design with occasional decision errors may still be an acceptable approximation of the IP system. We study the effect of this decision error, and relate its probability to the fixed-point data types. Our previous FFC methodology is then extended to include systems with possible decision errors due to quantization. The extended approach is applied to both CORDIC and a BPSK transceiver. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A methodology for IP integration into DSP SoC: a case study of a MAP algorithm for turbo decoder

    Page(s): V - 45-8 vol.5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (456 KB) |  | HTML iconHTML  

    The re-use of complex digital signal processing (DSP) coprocessors can be improved using IP cores described at a high abstraction level. System integration, which is a major step in SoC design, requires taking into account communication and timing constraints to design and integrate IP. In this paper, we describe an IP design approach that relies on three main phases: constraints modeling, IP constraints analysis steps for feasibility checking, and synthesis. Based on a generic architecture, the presented method provides automatic generation of IP cores designed under integration constraints. We show the effectiveness of our approach in a case study of a maximum a posteriori (MAP) algorithm for a turbo decoder. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Area efficient decoding of quasi-cyclic low density parity check codes

    Page(s): V - 49-52 vol.5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (289 KB) |  | HTML iconHTML  

    This paper exploits the similarity between the two stages of belief propagation decoding algorithm for low density parity check codes to derive an area efficient design that re-maps the check node functional units and variable node functional units into the same hardware. Consequently, the novel approach could reduce the logic core size by approximately 21% without any performance degradation. In addition, the proposed approach improves the hardware utilization efficiency as well. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fast Newton/Smith algorithm for solving algebraic Riccati equations and its application in model order reduction

    Page(s): V - 53-6 vol.5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1016 KB) |  | HTML iconHTML  

    A very fast Smith-method-based Newton algorithm is introduced for the solution of large-scale continuous-time algebraic Riccati equations (CAREs). When the CARE contains low-rank matrices, as is common in the modeling of physical systems, the proposed algorithm, called the Newton/Smith CARE or NSCARE algorithm, offers significant computational savings over conventional CARE solvers. The effectiveness of the algorithm is demonstrated in the context of VLSI model order reduction, wherein stochastic balanced truncation (SBT) is used to reduce large-scale passive circuits. It is shown that the NSCARE algorithm exhibits guaranteed quadratic convergence under mild assumptions. Moreover, two large-sized matrix factorizations and one large-scale singular value decomposition (SVD), necessary for SBT, can be omitted by utilizing the Smith method output in each Newton iteration, thereby significantly speeding up the model reduction process. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.