IEEE Transactions on Speech and Audio Processing

Issue 1 • Jan. 2004

Filter Results

Displaying Results 1 - 16 of 16
  • Table of contents

    Publication Year: 2004, Page(s): C1
    Request permission for commercial reuse | |PDF file iconPDF (35 KB)
    Freely Available from IEEE
  • IEEE Signal Processing Society

    Publication Year: 2004, Page(s): 2
    Request permission for commercial reuse | |PDF file iconPDF (34 KB)
    Freely Available from IEEE
  • Optimal multistage vector quantization of LPC parameters over noisy channels

    Publication Year: 2004, Page(s):1 - 8
    Cited by:  Papers (10)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (296 KB) | HTML iconHTML

    The direct use of vector quantization (VQ) to encode LPC parameters in a communication system suffers from the following two limitations: 1) complexity of implementation for large vector dimensions and codebook sizes and 2) sensitivity to errors in the received indices due to noise in the communication channel. In the past, these issues have been simultaneously addressed by designing channel match... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Signal modification for robust speech coding

    Publication Year: 2004, Page(s):9 - 18
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (464 KB) | HTML iconHTML

    Usually, the performance of a low-bit-rate speech coder degrades seriously in the presence of various interfering signals such as the background noise, acoustic echo, co-talkers' speech and other unwanted signals. This comes from the mismatch between the input signal and the assumed speech production model on which the design of the given speech coder is based. In this paper, we present an approac... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sequential estimation with optimal forgetting for robust speech recognition

    Publication Year: 2004, Page(s):19 - 26
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (208 KB) | HTML iconHTML

    Mismatch is known to degrade the performance of speech recognition systems. In real life applications we often encounter nonstationary mismatch sources. A general way to compensate for slowly time varying mismatch is by using sequential algorithms with forgetting. The choice of the forgetting factor is usually performed empirically on some development data, and no optimality criterion is used. In ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discriminative auditory-based features for robust speech recognition

    Publication Year: 2004, Page(s):27 - 36
    Cited by:  Papers (18)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (464 KB) | HTML iconHTML

    Recently, a new auditory-based feature extraction algorithm for robust speech recognition in noisy environments was proposed. The new features are derived by mimicking closely the human peripheral auditory process and the filters in the outer ear, middle ear, and inner ear are obtained from psychoacoustics literature with some manual adjustments. In this paper, we extend the auditory-based feature... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling inverse covariance matrices by basis expansion

    Publication Year: 2004, Page(s):37 - 46
    Cited by:  Papers (31)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (336 KB) | HTML iconHTML

    This paper proposes a new covariance modeling technique for Gaussian mixture models. Specifically the inverse covariance (precision) matrix of each Gaussian is expanded in a rank-1 basis i.e., Σj-1=Pj=Σk=1DλkjakakT, λkj∈R,ak&i... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Target-directed mixture dynamic models for spontaneous speech recognition

    Publication Year: 2004, Page(s):47 - 58
    Cited by:  Papers (18)  |  Patents (5)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (368 KB) | HTML iconHTML

    In this paper, a novel mixture linear dynamic model (MLDM) for speech recognition is developed and evaluated, where several linear dynamic models are combined (mixed) to represent different vocal-tract-resonance (VTR) dynamic behaviors and the mapping relationships between the VTRs and the acoustic observations. Each linear dynamic model is formulated as the state-space equations, where the VTRs t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech enhancement based on wavelet thresholding the multitaper spectrum

    Publication Year: 2004, Page(s):59 - 67
    Cited by:  Papers (105)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (312 KB) | HTML iconHTML

    It is well known that the "musical noise" encountered in most frequency domain speech enhancement algorithms is partially due to the large variance estimates of the spectra. To address this issue, we propose in this paper the use of low-variance spectral estimators based on wavelet thresholding the multitaper spectra for speech enhancement. A short-time spectral amplitude estimator is derived whic... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IIR-based pure linear prediction

    Publication Year: 2004, Page(s):68 - 75
    Cited by:  Papers (7)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (224 KB) | HTML iconHTML

    This paper considers general, pure linear prediction schemes, where the prediction of the input signal is based on IIR-filtered versions of the one-sample-delayed input signal. Properties of these schemes are discussed, in particular, the whitening property and the realization and stability of the synthesis filter. In contrast to warped linear prediction, the synthesis filter can be realized in a ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Maximum a-posteriori probability pitch tracking in noisy environments using harmonic model

    Publication Year: 2004, Page(s):76 - 87
    Cited by:  Papers (46)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (752 KB) | HTML iconHTML

    Modern speech processing applications require operation on signal of interest that is contaminated by high level of noise. This situation calls for a greater robustness in estimation of the speech parameters, a task which is hard to achieve using standard speech models. In this paper, we present an optimal estimation procedure for sound signals (such as speech) that are modeled by harmonic sources... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Edics

    Publication Year: 2004, Page(s): 88
    Request permission for commercial reuse | |PDF file iconPDF (29 KB)
    Freely Available from IEEE
  • Information for authors (Updated September 2002)

    Publication Year: 2004, Page(s):89 - 90
    Request permission for commercial reuse | |PDF file iconPDF (43 KB)
    Freely Available from IEEE
  • IEEE copyright form

    Publication Year: 2004, Page(s):91 - 92
    Request permission for commercial reuse | |PDF file iconPDF (1058 KB)
    Freely Available from IEEE
  • IEEE Signal Processing Society Information

    Publication Year: 2004, Page(s): 3
    Request permission for commercial reuse | |PDF file iconPDF (29 KB)
    Freely Available from IEEE
  • [Breaker page]

    Publication Year: 2004, Page(s): c4
    Request permission for commercial reuse | |PDF file iconPDF (2 KB)
    Freely Available from IEEE

Aims & Scope

Covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

This Transactions ceased publication in 2005. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope