IEEE Transactions on Audio, Speech, and Language Processing

Volume 21 Issue 6 • June 2013

Filter Results

Displaying Results 1 - 24 of 24
  • [Front cover]

    Publication Year: 2013, Page(s): C1
    Request permission for commercial reuse | PDF file iconPDF (285 KB)
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing publication information

    Publication Year: 2013, Page(s): C2
    Request permission for commercial reuse | PDF file iconPDF (130 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2013, Page(s):1121 - 1122
    Request permission for commercial reuse | PDF file iconPDF (209 KB)
    Freely Available from IEEE
  • Multi-Microphone Noise Reduction Based on Orthogonal Noise Signal Decompositions

    Publication Year: 2013, Page(s):1123 - 1133
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (3871 KB) | HTML iconHTML

    Multi-microphone noise reduction plays an increasing and important role in acoustic communication systems. Existing multichannel noise reduction filters are commonly computed based on a single noise covariance matrix. Recently, an orthogonal noise signal decomposition was proposed that uses a single noise signal as a reference. Using this decomposition, it was possible to reformulate the noise red... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cross-Lingual Language Modeling for Low-Resource Speech Recognition

    Publication Year: 2013, Page(s):1134 - 1144
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1735 KB) | HTML iconHTML

    This paper proposes using cross-lingual language modeling with syntactic information for low-resource speech recognition. We propose phrase-level transduction and syntactic reordering for transcribing a resource-poor language and translating it into a resource-rich language, if necessary. The phrase-level transduction is capable of performing n -m cross-lingual transduction. The synt... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Noise Estimation Using a Constrained Sequential Hidden Markov Model in the Log-Spectral Domain

    Publication Year: 2013, Page(s):1145 - 1157
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (3912 KB) | HTML iconHTML

    The temporal correlation of speech presence/absence is widely used in noise estimation. The most popular technique for exploiting temporal correlation is the smoothing of noisy spectra using a time-recursive filter, in which the forgetting factor is controlled by speech presence probability. However, this technique is not unified into a theoretical framework that enables optimal noise estimation. ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Large Scale Distributed Acoustic Modeling With Back-Off ${rm N}$ -Grams

    Publication Year: 2013, Page(s):1158 - 1169
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1815 KB) | HTML iconHTML

    The paper revives an older approach to acoustic modeling that borrows from n-gram language modeling in an attempt to scale up both the amount of training data and model size (as measured by the number of parameters in the model), to approximately 100 times larger than current sizes used in automatic speech recognition. In such a data-rich setting, we can expand the phonetic context significantly b... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Wavelet Maxima Dispersion for Breathy to Tense Voice Discrimination

    Publication Year: 2013, Page(s):1170 - 1179
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1154 KB) | HTML iconHTML

    This paper proposes a new parameter, the Maxima Dispersion Quotient (MDQ), for differentiating breathy to tense voice. Maxima derived following wavelet decomposition are often used for detecting edges in image processing, where locations of these maxima organize in the vicinity of the edge location. Similarly for tense voice, which typically displays sharp glottal closing characteristics, maxima f... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning Phrase Patterns for Text Classification

    Publication Year: 2013, Page(s):1180 - 1189
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1239 KB) | HTML iconHTML

    This paper introduces methods to discriminatively learn phrase patterns for use as features in text classification. An efficient solution is described using a recursive algorithm with a mutual information selection criterion. The algorithm automatically determines when word classes are useful in specific locations of a phrase pattern, allowing for variable specificity depending on the amount of la... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Simple Prior for Audio Signals

    Publication Year: 2013, Page(s):1190 - 1200
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2936 KB) | HTML iconHTML

    We propose a simple prior for restoration problems involving oscillatory signals. The prior makes use of an underlying analytic frame decomposition with narrow subbands. Other than this, the prior does not have any other parameters, which makes it simple to use and apply. We demonstrate the utility of the proposed prior through some real audio restoration experiments. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compensation of Loudspeaker–Room Responses in a Robust MIMO Control Framework

    Publication Year: 2013, Page(s):1201 - 1216
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (4873 KB) | HTML iconHTML

    A new multichannel approach to robust broadband loudspeaker-room equalization is presented. Traditionally, the equalization (or room correction) problem has been treated primarily by single-channel methods, where loudspeaker input signals are prefiltered individually by separate scalar filters. Single-channel methods are generally able to improve the average spectral flatness of the acoustic trans... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pairwise Discriminative Speaker Verification in the ${rm I}$-Vector Space

    Publication Year: 2013, Page(s):1217 - 1227
    Cited by:  Papers (16)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (3233 KB) | HTML iconHTML

    This work presents a new and efficient approach to discriminative speaker verification in the i-vector space. We illustrate the development of a linear discriminative classifier that is trained to discriminate between the hypothesis that a pair of feature vectors in a trial belong to the same speaker or to different speakers. This approach is alternative to the usual discriminative setup that disc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Position-Dependent Crosstalk Cancellation Using Space Partitioning

    Publication Year: 2013, Page(s):1228 - 1239
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2446 KB) | HTML iconHTML

    The present study tested a new stereo playback system that effectively cancels cross-talk signals at an arbitrary listening position. Such a playback system was implemented by integrating listener position tracking techniques and crosstalk cancellation techniques. The entire listening space was partitioned into a number of non-overlapped cells and a crosstalk cancellation filter was assigned to ea... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Underdetermined Sound Source Separation Using Power Spectrum Density Estimated by Combination of Directivity Gain

    Publication Year: 2013, Page(s):1240 - 1250
    Cited by:  Papers (28)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2427 KB) | HTML iconHTML

    A method for separating underdetermined sound sources based on a novel power spectral density (PSD) estimation is proposed. The method enables up to M(M-1)+1 sources to be separated when we use a microphone array of M sensors and a Wiener post-filter calculated by the estimated PSDs. The PSD of a beamformer's output is modelled by a mixture of source PSDs multiplied by the bea... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding

    Publication Year: 2013, Page(s):1251 - 1260
    Cited by:  Papers (3)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1833 KB) | HTML iconHTML

    Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven decoding algorithm (DDA), the secondary systems are viewed as observation sources that should be eval... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Voice Activity Detection in Presence of Transient Noise Using Spectral Clustering

    Publication Year: 2013, Page(s):1261 - 1271
    Cited by:  Papers (23)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2437 KB) | HTML iconHTML

    Voice activity detection has attracted significant research efforts in the last two decades. Despite much progress in designing voice activity detectors, voice activity detection (VAD) in presence of transient noise is a challenging problem. In this paper, we develop a novel VAD algorithm based on spectral clustering methods. We propose a VAD technique which is a supervised learning algorithm. Thi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enhanced Spoken Term Detection Using Support Vector Machines and Weighted Pseudo Examples

    Publication Year: 2013, Page(s):1272 - 1284
    Cited by:  Papers (7)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1938 KB) | HTML iconHTML

    Spoken term detection (STD) is a key technology for retrieval of spoken content, which will be very important to retrieve and browse multimedia content over the Internet. The discriminative capability of machine learning methods has recently been used to facilitate STD. This paper presents a new approach to improve STD using support vector machines (SVM) based on acoustic information. The concept ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Eigentriphones for Context-Dependent Acoustic Modeling

    Publication Year: 2013, Page(s):1285 - 1294
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1125 KB) | HTML iconHTML

    Most automatic speech recognizers employ tied-state triphone hidden Markov models (HMM), in which the corresponding triphone states of the same base phone are tied. State tying is commonly performed with the use of a phonetic regression class tree which renders robust context-dependent modeling possible by carefully balancing the amount of training data with the degree of tying. However, tying ine... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Lexical Prefix Tree and WFST: A Comparison of Two Dynamic Search Concepts for LVCSR

    Publication Year: 2013, Page(s):1295 - 1307
    Cited by:  Papers (4)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1447 KB) | HTML iconHTML

    Dynamic network decoders have the advantage of significantly lower memory consumption compared to static network decoders, especially when huge vocabularies and complex language models are required. This paper compares the properties of two well-known search strategies for dynamic network decoding, namely history conditioned lexical tree search and weighted finite-state transducer-based search usi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Correction to `Speaker Diarization: A Review of Recent Research' [Feb 12 356-370]

    Publication Year: 2013, Page(s): 1308
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (86 KB) | HTML iconHTML

    In the following two articles, the author name "Xavier Anguera Miro" was published mistakenly. It should have been "Xavier Anguera." Please use "Xavier Anguera" when referencing these articles. [1] X. Anguera, S. Bozonnet, N. Evans, C. Fredouille, G. Friedland, and O. Vinyals, "Speaker diarization: A review of recent research," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 2, pp. 356-370... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Audio, Speech, and Language Processing Edics

    Publication Year: 2013, Page(s):1309 - 1310
    Request permission for commercial reuse | PDF file iconPDF (108 KB)
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing Information for Authors

    Publication Year: 2013, Page(s):1311 - 1312
    Request permission for commercial reuse | PDF file iconPDF (146 KB)
    Freely Available from IEEE
  • IEEE Signal Processing Society Information

    Publication Year: 2013, Page(s): C3
    Request permission for commercial reuse | PDF file iconPDF (109 KB)
    Freely Available from IEEE
  • [Blank page - back cover]

    Publication Year: 2013, Page(s): C4
    Request permission for commercial reuse | PDF file iconPDF (5 KB)
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2013. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Li Deng
Microsoft Research