By Topic

IEEE Transactions on Speech and Audio Processing

Issue 3 • May 2004

Filter Results

Displaying Results 1 - 20 of 20
  • Table of contents

    Publication Year: 2004, Page(s):c1 - c4
    Request permission for commercial reuse | PDF file iconPDF (39 KB)
    Freely Available from IEEE
  • IEEE Transactions on Speech and Audio Processing publication information

    Publication Year: 2004, Page(s): c2
    Request permission for commercial reuse | PDF file iconPDF (35 KB)
    Freely Available from IEEE
  • Speech recognition with auxiliary information

    Publication Year: 2004, Page(s):189 - 203
    Cited by:  Papers (16)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (656 KB) | HTML iconHTML

    State-of-the-art automatic speech recognition (ASR) systems are usually based on hidden Markov models (HMMs) that emit cepstral-based features which are assumed to be piecewise stationary. While not really robust to noise, these features are also known to be very sensitive to "auxiliary" information, such as pitch, energy, rate-of-speech (ROS), etc. Attempts so far to include such auxiliary inform... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A discriminative training algorithm for hidden Markov models

    Publication Year: 2004, Page(s):204 - 217
    Cited by:  Papers (23)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (440 KB) | HTML iconHTML

    We introduce a discriminative training algorithm for the estimation of hidden Markov model (HMM) parameters. This algorithm is based on an approximation of the maximum mutual information (MMI) objective function and its maximization in a technique similar to the expectation-maximization (EM) algorithm. The algorithm is implemented by a simple modification of the standard Baum-Welch algorithm, and ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features

    Publication Year: 2004, Page(s):218 - 233
    Cited by:  Papers (38)  |  Patents (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1176 KB) | HTML iconHTML

    In this paper, we present a new algorithm for statistical speech feature enhancement in the cepstral domain. The algorithm exploits joint prior distributions (in the form of Gaussian mixture) in the clean speech model, which incorporate both the static and frame-differential dynamic cepstral parameters. Full posterior probabilities for clean speech given the noisy observation are computed using a ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Segmental minimum Bayes-risk decoding for automatic speech recognition

    Publication Year: 2004, Page(s):234 - 249
    Cited by:  Papers (24)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (592 KB) | HTML iconHTML

    Minimum Bayes-risk (MBR) speech recognizers have been shown to yield improvements over the conventional maximum a-posteriori probability (MAP) decoders through N-best list rescoring and A* search over word lattices. We present a segmental minimum Bayes-risk decoding (SMBR) framework that simplifies the implementation of MBR recognizers through the segmentation of the N-best lists or lat... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mixtures of inverse covariances

    Publication Year: 2004, Page(s):250 - 264
    Cited by:  Papers (11)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (384 KB) | HTML iconHTML

    We describe a model which approximates full covariances in a Gaussian mixture while reducing significantly both the number of parameters to estimate and the computations required to evaluate the Gaussian likelihoods. In this model, the inverse covariance of each Gaussian in the mixture is expressed as a linear combination of a small set of prototype matrices that are shared across components. In a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Joint matrix quantization of face parameters and LPC coefficients for low bit rate audiovisual speech coding

    Publication Year: 2004, Page(s):265 - 276
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (512 KB) | HTML iconHTML

    A key problem for videophony, that is telephony including the processing of images of the speaker's face in addition to acoustic speech, concerns signal compression for transmission. In such systems, audio and video compression are separately achieved by using both audio and video coders. In this paper, an audio-visual approach to this problem is considered, since we claim that the fundamental pro... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • KLT-based adaptive classified VQ of the speech signal

    Publication Year: 2004, Page(s):277 - 289
    Cited by:  Papers (14)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (384 KB) | HTML iconHTML

    Compared to scalar quantization (SQ), vector quantization (VQ) has memory, space-filling, and shape advantages. If the signal statistics are known, direct vector quantization (DVQ) according to these statistics provides the highest coding efficiency, but requires unmanageable storage requirements if the statistics are time varying. In code-excited linear predictive (CELP) coding, a single "comprom... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Time evolution in LPC spectrum coding

    Publication Year: 2004, Page(s):290 - 301
    Cited by:  Papers (12)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (360 KB) | HTML iconHTML

    Many current speech coders use a source-filter model in which the signal is decomposed on a frame-wise basis into an excitation component and a filter component. The filter component is then parameterized and coded by a separate spectrum encoder. The performance of such spectrum encoders is often evaluated in terms of a spectral distortion measure that treats each frame independently. Doing this t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MDCT analysis of sinusoids: exact results and applications to coding artifacts reduction

    Publication Year: 2004, Page(s):302 - 312
    Cited by:  Papers (27)  |  Patents (15)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (696 KB) | HTML iconHTML

    Although widely used in many coding applications, the Modified Discrete Cosine Transform (MDCT) has the drawback of being sensitive to time shifts. With the popular choice of a sine window, we show that it is possible to compute an explicit formulation of this time dependency. Starting from the exact MDCT of a pure sine and a simple interpretation in terms of combined modulations, we propose a reg... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Active mitigation of nonlinear noise Processes using a novel filtered-s LMS algorithm

    Publication Year: 2004, Page(s):313 - 322
    Cited by:  Papers (106)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (416 KB) | HTML iconHTML

    In many practical applications the acoustic noise generated from dynamical systems is nonlinear and deterministic or stochastic, colored, and non-Gaussian. It has been reported that the linear techniques used to control such noise exhibit degradation in performance. In addition, the actuators of an active noise control (ANC) system very often have nonminimum-phase response. A linear controller und... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A theoretical analysis of normal- and impaired-hearing intensity discrimination

    Publication Year: 2004, Page(s):323 - 333
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (280 KB) | HTML iconHTML

    Interpretation of psychophysical data from impaired-hearing individuals on intensity discrimination tasks has been confounded by the fact that some impaired individuals' performance is near-normal in quiet, whereas for others, the difference limen is elevated. It has been observed that a subject's discrimination abilities may be related to the underlying audiogram configuration, which is often dep... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive language modeling with varied sources to cover new vocabulary items

    Publication Year: 2004, Page(s):334 - 342
    Cited by:  Papers (9)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (280 KB) | HTML iconHTML

    N-gram language modeling typically requires large quantities of in-domain training data, i.e., data that matches the task in both topic and style. For conversational speech applications, particularly meeting transcription, obtaining large volumes of speech transcripts is often unrealistic; topics change frequently and collecting conversational-style training data is time-consuming and expensive. I... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Speech and Audio Processing EDICS

    Publication Year: 2004, Page(s): 343
    Request permission for commercial reuse | PDF file iconPDF (29 KB)
    Freely Available from IEEE
  • IEEE Transactions on Speech and Audio Processing Information for authors

    Publication Year: 2004, Page(s):344 - 345
    Request permission for commercial reuse | PDF file iconPDF (42 KB)
    Freely Available from IEEE
  • Supplement on secure media

    Publication Year: 2004, Page(s): 346
    Request permission for commercial reuse | PDF file iconPDF (128 KB)
    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Special issue on data mining of speech, audio and dialog

    Publication Year: 2004, Page(s): 347
    Request permission for commercial reuse | PDF file iconPDF (134 KB)
    Freely Available from IEEE
  • Special issue on speech-to-speech machine translation

    Publication Year: 2004, Page(s): 348
    Request permission for commercial reuse | PDF file iconPDF (124 KB)
    Freely Available from IEEE
  • IEEE Signal Processing Society Information

    Publication Year: 2004, Page(s): c3
    Request permission for commercial reuse | PDF file iconPDF (29 KB)
    Freely Available from IEEE

Aims & Scope

Covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

This Transactions ceased publication in 2005. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope