By Topic

Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on

Date 23-23 March 2005

Go

Filter Results

Displaying Results 1 - 25 of 289
  • 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing

    Page(s): 0_1
    Save to Project icon | Request Permissions | PDF file iconPDF (154 KB)  
    Freely Available from IEEE
  • 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing

    Page(s): i
    Save to Project icon | Request Permissions | PDF file iconPDF (154 KB)  
    Freely Available from IEEE
  • Copyright page

    Page(s): ii
    Save to Project icon | Request Permissions | PDF file iconPDF (139 KB)  
    Freely Available from IEEE
  • IEEE Signal Processing Society 2004 Board of Governors

    Page(s): iii
    Save to Project icon | Request Permissions | PDF file iconPDF (142 KB)  
    Freely Available from IEEE
  • ICASSP 2005 Conference Committee

    Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (146 KB)  
    Freely Available from IEEE
  • Technical Program Committee

    Page(s): v - xiii
    Save to Project icon | Request Permissions | PDF file iconPDF (176 KB)  
    Freely Available from IEEE
  • Future ICASSP conferences

    Page(s): xiv
    Save to Project icon | Request Permissions | PDF file iconPDF (145 KB)  
    Freely Available from IEEE
  • Table of contents

    Page(s): xv - i
    Save to Project icon | Request Permissions | PDF file iconPDF (411 KB)  
    Freely Available from IEEE
  • [Breaker page]

    Page(s): cxviii
    Save to Project icon | Request Permissions | PDF file iconPDF (141 KB)  
    Freely Available from IEEE
  • Memory efficient JPEG2000 architecture with stripe pipeline scheme

    Page(s): v/1 - v/4 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (592 KB) |  | HTML iconHTML  

    The memory issue is the most critical problem for a high performance JPEG2000 architecture. The tile memory occupies more than 50% of the area in conventional JPEG2000 architectures. To solve this problem, we propose a stripe pipeline scheme. For this scheme, a level switch discrete wavelet transform (LS-DWT) and a code-block switch embedded block coding (CS-EBC) are proposed. With small additional memory, the LS-DWT and the CS-EBC can process multiple levels and code-blocks in parallel by an interleaved scheme. As a result, the overall memory requirements of the proposed architecture can be reduced to only 8.5% compared with conventional architectures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Three-level parallel high speed architecture for EBCOT in JPEG2000

    Page(s): v/5 - v/8 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (247 KB) |  | HTML iconHTML  

    A multi-level parallel high speed architecture for embedded block coding with optimized truncation (EBCOT) tier-1 in JPEG2000 is proposed. To increase the system throughput, this architecture adopts three levels of parallelism: 1) parallelism among bit-planes - all the bit-planes can be processed simultaneously; 2) parallelism among three pass scannings - three passes scan one bit-plane in parallel; 3) parallelism among coding bits - bits that are coded in different passes can be coded simultaneously without any conflict. Experimental results show that the proposed architecture can encode one code block with size N×N in only 0.6×N×N clock cycles, and is twice as fast as the fastest architecture in the literature so far. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A low memory QCB-based DWT for JPEG2000 coprocessor supporting large tile size

    Page(s): v/9 - v12 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (308 KB) |  | HTML iconHTML  

    JPEG2000, which provides a higher compression ratio than the traditional JPEG, is an upcoming compression standard for still images. The experimental results imply that the larger tile size used for JPEG2000 results in better image quality. However, processing the large tile image requires more memory in the hardware implementation. To reduce the hardware resources, a QCB (quad codeblock) based DWT method is proposed to support the processing of large tile images with low memory. Based on the QCB-based DWT engine, three code-blocks belonging to LH0, HL0 and HH0 bands can be generated recursively after each fixed time slice, and the EBC (embedded block coding) processors can directly process the three code-blocks. It can save the size of tile memory by up to 75%. Moreover, the remaining 1/4 size of tile memory can be decreased through the zero holding extension for unavailable data. That is, it only requires 24 Kbytes memory to support the processing of a 512×512 tile image, with slight image degradation, especially at low bit rates. The low memory requirement makes the hardware implementation practicable. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Novel high-throughput EBCOT architecture for JPEG2000

    Page(s): v/13 - v/16 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (217 KB) |  | HTML iconHTML  

    Embedded block coding with optimized truncation (EBCOT) consumes more than 50% of the processing time in JPEG200 encoding system. Hardware implementation with careful handling for the control nature of tier-1 is essential. Although, some architectures have been developed to speed up the coding operations, they still require a tedious checking mechanism to decide if each sample is eligible or not for coding. We propose a novel checking scheme for the three coding passes for EBCOT that works in parallel with the encoding process to achieve the required high throughput. The simulation results show that the proposed architecture increases the throughput by 19% on average compared, to other well known architectures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Memory analysis and throughput enhancement for cost effective bit-plane coder in JPEG2000 applications

    Page(s): v/17 - v/20 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (285 KB) |  | HTML iconHTML  

    A cost effective bit-plane coder with throughput enhancement in JPEG2000 applications is proposed. Many papers and the results of chip implementation show that memory requirement dominates the hardware cost of the bit-plane coder. In order to reduce the memory size, a memory-free algorithm is proposed to eliminate state variable memories by calculating three coding state variables (γp+1[n], σp+1[n], and πp[n]) on the fly. We also propose a stripe-column-based pass-parallel operation to perform three coding passes in pipeline operation and to encode four samples within the stripe-column concurrently for the high throughput requirement. Experimental results show that the hardware cost and memory size of the proposed architecture is smaller than other existing architectures because of the proposed memory-free algorithm. Furthermore, the proposed architecture has 3 times greater throughput than other familiar architectures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel efficient rate control algorithm for hardware implementation in JPEG2000

    Page(s): v/21 - v/24 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (207 KB) |  | HTML iconHTML  

    In multimedia applications where images are extremely used, the rate control method has a significant role in image encoder performance, computational complexity and hardware implementation. We propose a simple rate control algorithm suitable for the hardware implementation of a JPEG2000 encoder with less computational complexity and area. The proposed algorithm, which is based on exponential modeling of R-D curves, employs distortion instead of slope values. Simulation results show similar performance compared to the full search method, with considerable reduction in hardware resources. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A parametrizable low-power high-throughput turbo-decoder

    Page(s): v/25 - v/28 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (293 KB) |  | HTML iconHTML  

    The paper presents a high performance turbo decoder. Its major building blocks, the maximum-a-posteriori decoder and the interleaver, are optimized from architecture to layout level to achieve high-throughput at low-power. This includes a novel architecture for parallel interleaving, that sustains any interleaving scheme. Moreover, the key features of the major building blocks are analyzed and modeled for quick design space exploration e.g. achieving 760 Mb/s at 570 mW in a 0.13 μm-CMOS-technology. Finally, the characterized implementations are benchmarked. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FPGA based implementation of decoder for array low-density parity-check codes

    Page(s): v/29 - v/32 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (207 KB) |  | HTML iconHTML  

    Low density parity check (LDPC) codes have received much attention for their excellent performance, and the inherent parallelism involved in decoding them. We consider a type of structured binary LDPC codes, known as array LDPC codes, which have low encoding complexity and good performance, for implementation on a Xilinx field programmable gate array (FPGA) device. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Time and energy efficient Viterbi decoding using FPGAs

    Page(s): v/33 - v/36 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (511 KB) |  | HTML iconHTML  

    State-of-the-art FPGAs integrate multi-million gate configurable logic and heterogeneous hardware components. They are an attractive choice for implementing Viterbi decoders. As more emphasis is placed on time and energy performance, previous FPGA implementations of Viterbi decoders either fail to provide high data throughput or are not energy efficient. We propose an architecture for implementing Viterbi decoders on FPGAs. Our architecture can provide various throughput and energy trade-offs. Considering the throughput/energy performance metric, experimental results show that our design achieves improvements up to 26.1% compared with previous designs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Viterbi decoder for high-speed ultra-wideband communication systems

    Page(s): v/37 - v/40 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (227 KB) |  | HTML iconHTML  

    Ultra-wideband (UWB) communication systems have attracted both academic research and commercial interests due to their potential high throughput and precise ranging capability. Convolutional codes are widely used in different proposals for high-speed UWB communication systems. In this paper, a novel Viterbi decoder architecture is studied for the multiband orthogonal frequency division multiplexing (MB-OFDM) UWB system. In the Viterbi decoder, sliding block, 2-step lookahead and 2 parallel techniques are combined to achieve the highest desired data rate. For lower data rates, it is possible to disable some parts of the decoder for power saving by proper analysis of the effects of puncturing on word length and trace back length. At the same time, in the add-compare-select (ACS) unit, a pipelined most-significant bit (MSB) first ACS unit is also utilized to shorten the length of the critical path. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A memory efficient serial LDPC decoder architecture

    Page(s): v/41 - v/44 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (948 KB) |  | HTML iconHTML  

    We present a memory efficient serial low density parity check (LDPC) decoder that implements a modified sum product algorithm (SPA). The modification is similar to the approximate min constraint presented by C. Jones et al. (see IEEE Conf. Military Commun., MILCOM 2003, p.157-162, 2003) but differs in hardware implementation to suit a serial architecture. Our main contribution is the proposed architecture that exploits the min constraint to reduce the storage of extrinsic messages which forms the bulk of the hardware. The least reliable bit to check input along with the check sum are the only quantities stored in the decoder. Extrinsic message memory reduction increases with the rate of the code and up to 68% saving is achieved for a rate 9/10 code. Simulation results show that the proposed changes do not degrade the bit error rate performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiplier-less based parallel-pipelined FFT architectures for wireless communication applications

    Page(s): v/45 - v/48 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (356 KB) |  | HTML iconHTML  

    This paper proposes two novel parallel-pipelined FFT architectures, based on multiplier-less implementation, targeting wireless communication applications, such as IEEE 802.11 wireless baseband chip and MC-CDMA receiver. The proposed parallel-pipelined architectures have the advantages of high throughput and high power efficiency. The multiplier-less architecture uses shift and addition operations to realize complex multiplications. By combining a new commutator architecture, and a low power butterfly with this approach, the resulting power and area savings are up to 31% and 20% respectively, for 64-point and 16-point FFTs, as compared to parallel-pipelined FFTs based on Booth coded Wallace tree multipliers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simulation of DSP algorithms on fixed point architectures

    Page(s): v/49 - v/52 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (274 KB) |  | HTML iconHTML  

    This paper presents software tools to simulate DSP algorithms on a wide variety of fixed point architectures including microprocessors, DSPs and FPGA devices. Existing solutions for evaluating the signal quality in fixed point algorithms are either unable to deal with non-linear systems, fail to consider the architectural details of the target device or do not produce a real output that can be used in subjective testing. Using example non-linear algorithms it is shown that architectural details must be considered when evaluating numerical performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of sigma-delta modulators with arbitrary transfer functions

    Page(s): v/53 - v/56 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (281 KB) |  | HTML iconHTML  

    This paper addresses the design of sigma-delta modulators with arbitrary signal and noise transfer functions by presenting a genetic algorithm (GA) based search method. The objective function is defined to include the difference D between the magnitude of the frequency responses of the designed transfer functions and the ideal one, the quantizer gain λcritical for which the poles of the modulator start moving out of the unit circle, and the spread of the coefficients S. Stability can be improved by reducing λcritical while a smaller S reduces the implementation complexity. A genetic algorithm (GA) searches for poles/zeros of the transfer functions to minimize the objective function D+w1critical+w2*S, where w1 and w2 are two weighing factors. Numerical results demonstrate the effectiveness of the proposed method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accuracy evaluation of fixed-point APA algorithm [adaptive filter applications]

    Page(s): v/57 - v/60 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (329 KB) |  | HTML iconHTML  

    The implementation of adaptive filters with fixed-point arithmetic requires us to evaluate the computation quality. The accuracy can be determined by calculating the global quantization noise power in the system output. In this paper, a new model for evaluating analytically the global noise power in the APA (affine projection algorithm) is developed. The model is presented and applied to the NLMS-OCF. The accuracy of our model is analyzed by experimentation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A more efficient and flexible DSP design flow from Matlab-Simulink [FFT algorithm example]

    Page(s): v/61 - v/64 Vol. 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (292 KB) |  | HTML iconHTML  

    The design of complex digital signal processing systems implies to minimize architectural cost and to maximize timing performances while taking into account communication and memory access constraints for the integration of dedicated hardware accelerators. Unfortunately, the traditional Matlab/Simulink design flows gather not very flexible hardware blocks. In this paper, we present a methodology and a tool that permit the high-level synthesis of DSP applications, under both I/O timing and memory constraints. Based on formal models and a generic architecture, this tool helps the designer in finding a reasonable trade-off between the circuit's latency and its architectural complexity. The efficiency of our approach is demonstrated on the case study of an FFT algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.