Scheduled System Maintenance
On Saturday, December 10, single article sales and account management will be unavailable from 5:00 AM-7:30 PM ET.
We apologize for the inconvenience.
By Topic

IEEE Transactions on Audio, Speech, and Language Processing

Issue 2 • Feb. 2009

Filter Results

Displaying Results 1 - 25 of 25
  • Table of contents

    Publication Year: 2009, Page(s): C1
    Request permission for commercial reuse | PDF file iconPDF (100 KB)
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing publication information

    Publication Year: 2009, Page(s): C2
    Request permission for commercial reuse | PDF file iconPDF (38 KB)
    Freely Available from IEEE
  • Higher Order Cepstral Moment Normalization for Improved Robust Speech Recognition

    Publication Year: 2009, Page(s):205 - 220
    Cited by:  Papers (12)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1157 KB) | HTML iconHTML

    Cepstral normalization has widely been used as a powerful approach to produce robust features for speech recognition. Good examples of this approach include cepstral mean subtraction, and cepstral mean and variance normalization, in which either the first or both the first and the second moments of the Mel-frequency cepstral coefficients (MFCCs) are normalized. In this paper, we propose the family... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimized Sinusoid Synthesis via Inverse Truncated Fourier Transform

    Publication Year: 2009, Page(s):221 - 230
    Cited by:  Papers (2)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (621 KB) | HTML iconHTML

    It was shown that sinusoid synthesis can be implemented efficiently by an inverse Fourier transform on consecutive frames where all but a small number of coefficients per oscillator are dropped. This leads to a compromise between computational complexity and approximation accuracy. The method can be improved by two approaches. First, optimal coefficients can be found by minimizing the average appr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrated Speech Enhancement Method Using Noise Suppression and Dereverberation

    Publication Year: 2009, Page(s):231 - 246
    Cited by:  Papers (27)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (762 KB) | HTML iconHTML

    This paper proposes a method for enhancing speech signals contaminated by room reverberation and additive stationary noise. The following conditions are assumed. 1) Short-time spectral components of speech and noise are statistically independent Gaussian random variables. 2) A room's convolutive system is modeled as an autoregressive system in each frequency band. 3) A short-time power spectral de... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effect of Emulated Head-Tracking for Reducing Localization Errors in Virtual Audio Simulation

    Publication Year: 2009, Page(s):247 - 252
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (321 KB) | HTML iconHTML

    Virtual audio simulation uses head-related transfer function (HRTF) synthesis and headphone playback to create a sound field similar to real-life environments. Localization performance is influenced by parameters such as the recording method and the spatial resolution of the HRTFs, equalization of the measurement chain as well as common headphone playback errors. The most important errors are in-t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reverberant Speech Enhancement by Temporal and Spectral Processing

    Publication Year: 2009, Page(s):253 - 266
    Cited by:  Papers (16)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1085 KB) | HTML iconHTML

    This paper presents an approach for the enhancement of reverberant speech by temporal and spectral processing. Temporal processing involves identification and enhancement of high signal-to-reverberation ratio (SRR) regions in the temporal domain. Spectral processing involves removal of late reverberant components in the spectral domain. First, the spectral subtraction-based processing is performed... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Minimum Rank Error Language Modeling

    Publication Year: 2009, Page(s):267 - 276
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1115 KB) | HTML iconHTML

    Statistical language modeling has been successfully developed for speech recognition and information retrieval. The minimum classification error (MCE) training was undertaken to enhance speech recognition performance by minimizing the word error rate. This paper presents a new minimum rank error (MRE) algorithm for n-gram language model training. Rather than speech recognition, the proposed... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Estimation of Place of Articulation During Stop Closures of Vowel–Consonant–Vowel Utterances

    Publication Year: 2009, Page(s):277 - 286
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2924 KB) | HTML iconHTML

    Production of vowel-oral stop consonant-vowel utterances involves movement of articulators from the articulatory position of the initial vowel towards that of the oral stop closure, and then to that of the final vowel. As the closure segments have zero or low signal energy, linear predictive coding (LPC)-based estimation of vocal tract shape fails during stop closure. This paper reports a techniqu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust Detection of Phone Boundaries Using Model Selection Criteria With Few Observations

    Publication Year: 2009, Page(s):287 - 298
    Cited by:  Papers (10)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1236 KB) | HTML iconHTML

    Automatic phone segmentation techniques based on model selection criteria are studied. We investigate the phone boundary detection efficiency of entropy- and Bayesian- based model selection criteria in continuous speech based on the DISTBIC hybrid segmentation algorithm. DISTBIC is a text-independent bottom-up approach that identifies sequential model changes by combining metric distances with sta... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Indeterminacy Free Frequency-Domain Blind Separation of Reverberant Audio Sources

    Publication Year: 2009, Page(s):299 - 311
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (720 KB) | HTML iconHTML

    Blind separation of convolutive mixtures is a very complicated task that has applications in many fields of speech and audio processing, such as hearing aids and man-machine interfaces. One of the proposed solutions is the frequency-domain independent component analysis. The main disadvantage of this method is the presence of permutation ambiguities among consecutive frequency bins. Moreover, this... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enhanced Speech Features by Single-Channel Joint Compensation of Noise and Reverberation

    Publication Year: 2009, Page(s):312 - 323
    Cited by:  Papers (24)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (647 KB) | HTML iconHTML

    For a natural verbal communication between humans and machines, automatic speech recognition, which works reasonably well on recordings captured with mid- or far-field microphones, is essential. While a lot of research and development are devoted to address one of the two distortions frequently encountered in mid- and far-field sound pickup, namely noise or reverberation, less effort has been unde... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing

    Publication Year: 2009, Page(s):324 - 334
    Cited by:  Papers (28)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (674 KB) | HTML iconHTML

    The performance of automatic speech recognition is severely degraded in the presence of noise or reverberation. Much research has been undertaken on noise robustness. In contrast, the problem of the recognition of reverberant speech has received far less attention and remains very challenging. In this paper, we use a dereverberation method to reduce reverberation prior to recognition. Such a prepr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving Multilabel Analysis of Music Titles: A Large-Scale Validation of the Correction Approach

    Publication Year: 2009, Page(s):335 - 343
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (764 KB) | HTML iconHTML

    This paper addresses the problem of automatically extracting perceptive information from acoustic signals, in a supervised classification context. Global labels, i.e., atomic information describing a music title in its entirety, such as its genre, mood, main instruments, or type of vocals, are entered by humans. Classifiers are trained to map audio features to these labels. However, the performanc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Particle Swarm Optimization for Sorted Adapted Gaussian Mixture Models

    Publication Year: 2009, Page(s):344 - 353
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1532 KB) | HTML iconHTML

    Recently, we introduced the sorted Gaussian mixture models (SGMMs) algorithm providing the means to tradeoff performance for operational speed and thus permitting the speed-up of GMM-based classification schemes. The performance of the SGMM algorithm depends on the proper choice of the sorting function, and the proper adjustment of its parameters. In the present work, we employ particle swarm opti... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech Recognition Using Augmented Conditional Random Fields

    Publication Year: 2009, Page(s):354 - 365
    Cited by:  Papers (26)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (589 KB) | HTML iconHTML

    Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ability to model spectral phenomena well. In this paper, a new acoustic modeling paradigm based on augmented... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis and Compensation of Lombard Speech Across Noise Type and Levels With Application to In-Set/Out-of-Set Speaker Recognition

    Publication Year: 2009, Page(s):366 - 378
    Cited by:  Papers (24)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1769 KB) | HTML iconHTML

    Speech production in the presence of noise results in the Lombard effect, which is known to have a serious impact on speech system performance. In this study, Lombard speech produced under different types and levels of noise is analyzed in terms of duration, energy histogram, and spectral tilt. Acoustic-phonetic differences are shown to exist between different ldquoflavorsrdquo of Lombard speech b... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gaussian Mixture Kalman Predictive Coding of Line Spectral Frequencies

    Publication Year: 2009, Page(s):379 - 391
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (810 KB) | HTML iconHTML

    Gaussian mixture model (GMM)-based predictive coding of line spectral frequencies (LSFs) has gained wide acceptance. In such coders, each mixture of a GMM can be interpreted as defining a linear predictive transform coder. In this paper, we use Kalman filtering principles to model each of these linear predictive transform coders to present GMM Kalman predictive coding. In particular, we show how s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Information-Theoretic View of Array Processing

    Publication Year: 2009, Page(s):392 - 401
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (651 KB) | HTML iconHTML

    The removal of noise and interference from an array of received signals is a most fundamental problem in signal processing research. To date, many well-known solutions based on second-order statistics (SOS) have been proposed. This paper views the signal enhancement problem as one of maximizing the mutual information between the source signal and array output. It is shown that if the signal and no... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Audio, Speech, and Language Processing Edics

    Publication Year: 2009, Page(s):402 - 403
    Request permission for commercial reuse | PDF file iconPDF (31 KB)
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing Information for authors

    Publication Year: 2009, Page(s):404 - 405
    Request permission for commercial reuse | PDF file iconPDF (46 KB)
    Freely Available from IEEE
  • 2009 IEEE Workshop on Statistical Signal Processing Cardiff Wales, UK

    Publication Year: 2009, Page(s): 406
    Request permission for commercial reuse | PDF file iconPDF (603 KB)
    Freely Available from IEEE
  • 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

    Publication Year: 2009, Page(s): 407
    Request permission for commercial reuse | PDF file iconPDF (522 KB)
    Freely Available from IEEE
  • Access over 1 million articles - The IEEE Digital Library [advertisement]

    Publication Year: 2009, Page(s): 408
    Request permission for commercial reuse | PDF file iconPDF (370 KB)
    Freely Available from IEEE
  • IEEE Signal Processing Society Information

    Publication Year: 2009, Page(s): C3
    Request permission for commercial reuse | PDF file iconPDF (33 KB)
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2013. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Li Deng
Microsoft Research