By Topic

Audio, Speech, and Language Processing, IEEE Transactions on

Issue 1 • Date Jan. 2008

Filter Results

Displaying Results 1 - 25 of 28
  • Table of contents

    Publication Year: 2008 , Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (44 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing publication information

    Publication Year: 2008 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (36 KB)  
    Freely Available from IEEE
  • Cascade Prediction Filters With Adaptive Zeros to Track the Time-Varying Resonances of the Vocal Tract

    Publication Year: 2008 , Page(s): 1 - 7
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1932 KB) |  | HTML iconHTML  

    In this paper, a simple and reliable technique is proposed to track vocal tract resonances in continuous speech. The approach is based on the use of predictor filters with adaptive zeros whose constrained trajectories guarantee the successful tracking of the frequency and the damping of each resonance. The zeros are adapted using a gradient-based algorithm to minimize an instantaneous prediction r... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using Articulatory Representations to Detect Segmental Errors in Nonnative Pronunciation

    Publication Year: 2008 , Page(s): 8 - 22
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (719 KB) |  | HTML iconHTML  

    Motivated by potential applications in second-language pedagogy, we present a novel approach to using articulatory information to improve automatic detection of typical phone-level errors made by nonnative speakers of English-a difficult task that involves discrimination between close pronunciations. We describe a reformulation of the hidden-articulator Markov model (HAMM) framework that is approp... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Subjective Intelligibility Testing of Chinese Speech

    Publication Year: 2008 , Page(s): 23 - 33
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (525 KB) |  | HTML iconHTML  

    This paper presents a complete methodology and rationale for the subjective intelligibility testing of Chinese speech. It replaces the combination of several previously published Chinese intelligibility tests which have been in use for almost a decade, with a single composite test procedure constructed from a foundation of subjective trials and auditory evidence. Since publication of the first ele... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spectral Representations of Nonmodal Phonation

    Publication Year: 2008 , Page(s): 34 - 46
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1209 KB) |  | HTML iconHTML  

    Regions of nonmodal phonation, which exhibit deviations from uniform glottal-pulse periods and amplitudes, occur often in speech and convey information about linguistic content, speaker identity, and vocal health. Some aspects of these deviations are random, including small perturbations, known as jitter and shimmer, as well as more significant aperiodicities. Other aspects are deterministic, incl... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Method for Automatic Detection of Vocal Fry

    Publication Year: 2008 , Page(s): 47 - 56
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (746 KB) |  | HTML iconHTML  

    Vocal fry (also called creak, creaky voice, and pulse register phonation) is a voice quality that carries important linguistic or paralinguistic information, depending on the language. We propose a set of acoustic measures and a method for automatically detecting vocal fry segments in speech utterances. A glottal pulse-synchronized method is proposed to deal with the very low fundamental frequency... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Generalized Postfilter for Speech Quality Enhancement

    Publication Year: 2008 , Page(s): 57 - 64
    Cited by:  Papers (4)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (334 KB) |  | HTML iconHTML  

    Postfilters are commonly used in speech coding for the attenuation of quantization noise. In the presence of acoustic background noise or distortion due to tandeming operations, the postfilter parameters are not adjusted and the performance is, therefore, not optimal. We propose a modification that consists of replacing the nonadaptive postfilter parameters with parameters that adapt to variations... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Regularized Linear Prediction of Speech

    Publication Year: 2008 , Page(s): 65 - 73
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (826 KB) |  | HTML iconHTML  

    All-pole spectral envelope estimates based on linear prediction (LP) for speech signals often exhibit unnaturally sharp peaks, especially for high-pitch speakers. In this paper, regularization is used to penalize rapid changes in the spectral envelope, which improves the spectral envelope estimate. Based on extensive experimental evidence, we conclude that regularized linear prediction outperforms... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unit-Centric Feature Mapping for Inventory Pruning in Unit Selection Text-to-Speech Synthesis

    Publication Year: 2008 , Page(s): 74 - 82
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (824 KB) |  | HTML iconHTML  

    The level of quality that can be attained in concatenative text-to-speech (TTS) synthesis is primarily governed by the inventory of units used in unit selection. This has led to the collection of ever larger corpora in the quest for ever more natural synthetic speech. As operational considerations limit the size of the unit inventory, however, pruning is critical to removing any instances that pro... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Backward-Compatible Multichannel Audio Codec

    Publication Year: 2008 , Page(s): 83 - 93
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (497 KB) |  | HTML iconHTML  

    We propose in this paper a backward-compatible multichannel audio codec. This codec represents a multichannel audio input signal by a down mix and parametric data. In order to enable backward compatibility, it is necessary to have the possibility of exerting control over the down-mixing procedure. At the same time, in order to achieve a high coding efficiency, both signal and perceptual redundanci... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Frequency Region-Based Prioritized Bit-Plane Coding for Scalable Audio

    Publication Year: 2008 , Page(s): 94 - 105
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1693 KB) |  | HTML iconHTML  

    A perceptually enhanced prioritized bit-plane audio coding algorithm is presented in this paper. According to the energy distribution in different frequency regions, the bit-planes are prioritized with optimized parameters. Based on the statistical modeling of the frequency spectrum, a much more simplified implementation of prioritized bit-plane coding is integrated with the recent release of MPEG... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Time-Scale Modification of Audio Signals Using Enhanced WSOLA With Management of Transients

    Publication Year: 2008 , Page(s): 106 - 115
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1061 KB) |  | HTML iconHTML  

    In this paper, we present an algorithm for time-scale modification of music signals, based on the waveform similarity overlap-and-add technique (WSOLA). A well-known disadvantage of the standard WSOLA is the uniform time-scaling of the entire signal, including the perceptually significant transient sections (PSTs), where temporal envelope changes as well as significant spectral transitions occur. ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Instrument-Specific Harmonic Atoms for Mid-Level Music Representation

    Publication Year: 2008 , Page(s): 116 - 128
    Cited by:  Papers (27)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1017 KB) |  | HTML iconHTML  

    Several studies have pointed out the need for accurate mid-level representations of music signals for information retrieval and signal processing purposes. In this paper, we propose a new mid-level representation based on the decomposition of a signal into a small number of sound atoms or molecules bearing explicit musical instrument labels. Each atom is a sum of windowed harmonic sinusoidal parti... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Objective Metric of Human Subjective Audio Quality Optimized for a Wide Range of Audio Fidelities

    Publication Year: 2008 , Page(s): 129 - 136
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (487 KB) |  | HTML iconHTML  

    The goal of this paper is to develop an audio quality metric that can accurately quantify subjective quality over audio fidelities ranging from highly impaired to perceptually lossless. As one example of its utility, such a metric would allow scalable audio coding algorithms to be easily optimized over their entire operating ranges. We have found that the ITU-recommended objective quality metric, ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Noise-Robust FFT-Based Auditory Spectrum With Application in Audio Classification

    Publication Year: 2008 , Page(s): 137 - 150
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1388 KB) |  | HTML iconHTML  

    In this paper, we investigate the noise robustness of Wang and Shamma's early auditory (EA) model for the calculation of an auditory spectrum in audio classification applications. First, a stochastic analysis is conducted wherein an approximate expression of the auditory spectrum is derived to justify the noise-suppression property of the EA model. Second, we present an efficient fast Fourier tran... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Cascaded Broadcast News Highlighter

    Publication Year: 2008 , Page(s): 151 - 161
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (950 KB) |  | HTML iconHTML  

    This paper presents a fully automatic news skimming system which takes a broadcast news audio stream and provides the user with the segmented, structured, and highlighted transcript. This constitutes a system with three different, cascading stages: converting the audio stream to text using an automatic speech recognizer, segmenting into utterances and stories, and finally determining which utteran... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive System Identification in the Short-Time Fourier Transform Domain Using Cross-Multiplicative Transfer Function Approximation

    Publication Year: 2008 , Page(s): 162 - 173
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (796 KB) |  | HTML iconHTML  

    In this paper, we introduce cross-multiplicative transfer function (CMTF) approximation for modeling linear systems in the short-time Fourier transform (STFT) domain. We assume that the transfer function can be represented by cross-multiplicative terms between distinct subbands. We investigate the influence of cross-terms on a system identifier implemented in the STFT domain and derive analytical ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sparse Linear Regression With Structured Priors and Application to Denoising of Musical Audio

    Publication Year: 2008 , Page(s): 174 - 185
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (613 KB) |  | HTML iconHTML  

    We describe in this paper an audio denoising technique based on sparse linear regression with structured priors. The noisy signal is decomposed as a linear combination of atoms belonging to two modified discrete cosine transform (MDCT) bases, plus a residual part containing the noise. One MDCT basis has a long time resolution, and thus high frequency resolution, and is aimed at modeling tonal part... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unsupervised Pattern Discovery in Speech

    Publication Year: 2008 , Page(s): 186 - 197
    Cited by:  Papers (34)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1534 KB) |  | HTML iconHTML  

    We present a novel approach to speech processing based on the principle of pattern discovery. Our work represents a departure from traditional models of speech recognition, where the end goal is to classify speech into categories defined by a prespecified inventory of lexical units (i.e., phones or words). Instead, we attempt to discover such an inventory in an unsupervised manner by exploiting th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive Bayesian Latent Semantic Analysis

    Publication Year: 2008 , Page(s): 198 - 207
    Cited by:  Papers (18)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (821 KB) |  | HTML iconHTML  

    Due to the vast growth of data collections, the statistical document modeling has become increasingly important in language processing areas. Probabilistic latent semantic analysis (PLSA) is a popular approach whereby the semantics and statistics can be effectively captured for modeling. However, PLSA is highly sensitive to task domain, which is continuously changing in real-world documents. In th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Constrained Minimization and Discriminative Training for Natural Language Call Routing

    Publication Year: 2008 , Page(s): 208 - 215
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (476 KB) |  | HTML iconHTML  

    This paper presents a combination strategy of multiple individual routing classifiers to improve classification accuracy in natural language call routing applications. Since errors of individual classifiers in the ensemble should somehow be uncorrelated, we propose a combination strategy where the combined classifier accuracy is a function of the accuracy of individual classifiers and also the cor... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence

    Publication Year: 2008 , Page(s): 216 - 228
    Cited by:  Papers (19)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (617 KB) |  | HTML iconHTML  

    With the advent of prosody annotation standards such as tones and break indices (ToBI), speech technologists and linguists alike have been interested in automatically detecting prosodic events in speech. This is because the prosodic tier provides an additional layer of information over the short-term segment-level features and lexical representation of an utterance. As the prosody of an utterance ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluation of Objective Quality Measures for Speech Enhancement

    Publication Year: 2008 , Page(s): 229 - 238
    Cited by:  Papers (200)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (456 KB) |  | HTML iconHTML  

    In this paper, we evaluate the performance of several objective measures in terms of predicting the quality of noisy speech enhanced by noise suppression algorithms. The objective measures considered a wide range of distortions introduced by four types of real-world noise at two signal-to-noise ratio levels by four classes of speech enhancement algorithms: spectral subtractive, subspace, statistic... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Factor Analyzed Subspace Modeling and Selection

    Publication Year: 2008 , Page(s): 239 - 248
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (864 KB) |  | HTML iconHTML  

    We present a novel subspace modeling and selection approach for noisy speech recognition. In subspace modeling, we develop a factor analysis (FA) representation of noisy speech, which is a generalization of a signal subspace (SS) representation. Using FA, noisy speech is represented by the extracted common factors, factor loading matrix, and specific factors. The observation space of noisy speech ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2013. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Li Deng
Microsoft Research