By Topic

Audio, Speech, and Language Processing, IEEE Transactions on

Issue 7 • Date Sept. 2009

Filter Results

Displaying Results 1 - 25 of 29
  • Table of contents

    Publication Year: 2009 , Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (102 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing publication information

    Publication Year: 2009 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (38 KB)  
    Freely Available from IEEE
  • Building A Highly Accurate Mandarin Speech Recognizer With Language-Independent Technologies and Language-Dependent Modules

    Publication Year: 2009 , Page(s): 1253 - 1262
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (681 KB) |  | HTML iconHTML  

    We describe a system for highly accurate large-vocabulary Mandarin speech recognition. The prevailing hidden Markov model based technologies are essentially language independent and constitute the backbone of our system. These include minimum-phone-error discriminative training and maximum-likelihood linear regression adaptation, among others. Additionally, careful considerations are taken into ac... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved Features and Models for Detecting Edit Disfluencies in Transcribing Spontaneous Mandarin Speech

    Publication Year: 2009 , Page(s): 1263 - 1278
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1242 KB) |  | HTML iconHTML  

    Detection of edit disfluencies is key to transcribing spontaneous utterances. In this paper, we present improved features and models to detect edit disfluencies and enhance transcription of spontaneous Mandarin speech using hypothesized disfluency interruption points (IPs) and edit word detection. A comprehensive set of prosodic features that takes into account the special characteristics of edit ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Acoustic Factor Analysis for Streamed Hidden Markov Modeling

    Publication Year: 2009 , Page(s): 1279 - 1291
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (789 KB) |  | HTML iconHTML  

    This paper presents a novel streamed hidden Markov model (HMM) framework for speech recognition. The factor analysis (FA) principle is adopted to explore the common factors from acoustic features. The streaming regularities in building HMMs are governed by the correlation between cepstral features, which is inherent in common factors. Those features corresponding to the same factor are generated b... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Time–Frequency Correlation-Based Missing-Feature Reconstruction for Robust Speech Recognition in Band-Restricted Conditions

    Publication Year: 2009 , Page(s): 1292 - 1304
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1839 KB) |  | HTML iconHTML  

    Band-limited speech represents one of the most challenging factors for robust speech recognition. This is especially true in supporting audio corpora from sources that have a range of conditions in spoken document retrieval requiring effective automatic speech recognition. The missing-feature reconstruction method has a problem when applied to band-limited speech reconstruction, since it assumes t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving Structural Statistical Machine Translation for Sign Language With Small Corpus Using Thematic Role Templates as Translation Memory

    Publication Year: 2009 , Page(s): 1305 - 1315
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (2201 KB) |  | HTML iconHTML  

    This paper presents a structural statistical machine translation (SSMT) model to deal with the data sparseness problem that occurs as a result of the necessarily small corpus to translate Chinese into Taiwanese Sign Language (TSL). A parallel bilingual corpus was developed, and linguistic information from the Sinica Treebank is adopted for Chinese sentence analysis. The synchronous context free gr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving Throat Microphone Speech Recognition by Joint Analysis of Throat and Acoustic Microphone Recordings

    Publication Year: 2009 , Page(s): 1316 - 1324
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1282 KB) |  | HTML iconHTML  

    We present a new framework for joint analysis of throat and acoustic microphone (TAM) recordings to improve throat microphone only speech recognition. The proposed analysis framework aims to learn joint sub-phone patterns of throat and acoustic microphone recordings through a parallel branch HMM structure. The joint sub-phone patterns define temporally correlated neighborhoods, in which a linear p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stereo-Based Stochastic Mapping for Robust Speech Recognition

    Publication Year: 2009 , Page(s): 1325 - 1334
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (413 KB) |  | HTML iconHTML  

    We present a stochastic mapping technique for robust speech recognition that uses stereo data. The idea is based on constructing a Gaussian mixture model for the joint distribution of the clean and noisy features and using this distribution to predict the clean speech during testing. The proposed mapping is called stereo-based stochastic mapping (SSM). Two different estimators are considered. One ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Target-Oriented Phonotactic Front-End for Spoken Language Recognition

    Publication Year: 2009 , Page(s): 1335 - 1347
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1648 KB) |  | HTML iconHTML  

    This paper presents a strategy to optimize the phonotactic front-end for spoken language recognition. This is achieved by selecting a subset of phones from an existing phone recognizer's phone inventory such that only the phones that best discriminate each of the target languages are selected. Each such phone subset will be used to construct a target-oriented phone tokenizer (TOPT). In this study,... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Novel Framework and Training Algorithm for Variable-Parameter Hidden Markov Models

    Publication Year: 2009 , Page(s): 1348 - 1360
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1414 KB) |  | HTML iconHTML  

    We propose a new framework and the associated maximum-likelihood and discriminative training algorithms for the variable-parameter hidden Markov model (VPHMM) whose mean and variance parameters vary as functions of additional environment-dependent conditioning parameters. Our framework differs from the VPHMM proposed by Cui and Gong (2007) in that piecewise spline interpolation instead of global p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Monaural Musical Sound Separation Based on Pitch and Common Amplitude Modulation

    Publication Year: 2009 , Page(s): 1361 - 1371
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (703 KB) |  | HTML iconHTML  

    Monaural musical sound separation has been extensively studied recently. An important problem in separation of pitched musical sounds is the estimation of time-frequency regions where harmonics overlap. In this paper, we propose a sinusoidal modeling-based separation system that can effectively resolve overlapping harmonics. Our strategy is based on the observations that harmonics of the same sour... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Maximum Penalized Likelihood Kernel Regression for Fast Adaptation

    Publication Year: 2009 , Page(s): 1372 - 1381
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (578 KB) |  | HTML iconHTML  

    This paper proposes a nonlinear generalization of the popular maximum-likelihood linear regression (MLLR) adaptation algorithm using kernel methods. The proposed method, called maximum penalized likelihood kernel regression adaptation (MPLKR), applies kernel regression with appropriate regularization to determine the affine model transform in... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Information Theoretic Approach to Speaker Diarization of Meeting Data

    Publication Year: 2009 , Page(s): 1382 - 1393
    Cited by:  Papers (19)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (559 KB) |  | HTML iconHTML  

    A speaker diarization system based on an information theoretic framework is described. The problem is formulated according to the information bottleneck (IB) principle. Unlike other approaches where the distance between speaker segments is arbitrarily introduced, the IB method seeks the partition that maximizes the mutual information between observations and variables relevant for the problem whil... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Babble Noise: Modeling, Analysis, and Applications

    Publication Year: 2009 , Page(s): 1394 - 1407
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1704 KB) |  | HTML iconHTML  

    Speech babble is one of the most challenging noise interference for all speech systems. Here, a systematic approach to model its underlying structure is proposed to further the existing knowledge of speech processing in noisy environments. This paper establishes a working foundation for the analysis and modeling of babble speech. We first address the underlying model for multiple speaker babble sp... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Increase and Subjective Evaluation of Feedback Stability in Hearing Aids by a Binaural Coherence-Based Noise Reduction Scheme

    Publication Year: 2009 , Page(s): 1408 - 1419
    Cited by:  Papers (7)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1054 KB) |  | HTML iconHTML  

    The effect of a binaural coherence-based noise reduction scheme on the feedback stability margin and sound quality in hearing aids has been analyzed. For comparison, a conventional adaptive feedback canceler (AFC) and the combination of the adaptive filter with the binaural coherence filter have been tested. The observed quantities are feedback stability and target signal attenuation. An objective... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Convolutive Transfer Function Generalized Sidelobe Canceler

    Publication Year: 2009 , Page(s): 1420 - 1434
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1226 KB) |  | HTML iconHTML  

    In this paper, we propose a convolutive transfer function generalized sidelobe canceler (CTF-GSC), which is an adaptive beamformer designed for multichannel speech enhancement in reverberant environments. Using a complete system representation in the short-time Fourier transform (STFT) domain, we formulate a constrained minimization problem of total output noise power subject to the constraint tha... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Empirical Methods to Determine the Number of Sources in Single-Channel Musical Signals

    Publication Year: 2009 , Page(s): 1435 - 1444
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (567 KB) |  | HTML iconHTML  

    We present a sequence of empirical methods to determine the number of sources in musical signals when only one channel is available. Rather than building evidence through a statistical model-based approach, we instead develop a carefully tuned and tested two-stage system that is able to function effectively even in extremely underdetermined conditions. A first, more general procedure accurately de... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Audio, Speech, and Language Processing Edics

    Publication Year: 2009 , Page(s): 1445 - 1446
    Save to Project icon | Request Permissions | PDF file iconPDF (31 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing Information for authors

    Publication Year: 2009 , Page(s): 1447 - 1448
    Save to Project icon | Request Permissions | PDF file iconPDF (46 KB)  
    Freely Available from IEEE
  • Special issue on Processing Reverberant Speech

    Publication Year: 2009 , Page(s): 1449
    Save to Project icon | Request Permissions | PDF file iconPDF (136 KB)  
    Freely Available from IEEE
  • Special issue on distributed camera networks: sensing, processing, communication and computing

    Publication Year: 2009 , Page(s): 1450
    Save to Project icon | Request Permissions | PDF file iconPDF (122 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Multimedia special issue on Multimodal Affective Interaction

    Publication Year: 2009 , Page(s): 1451
    Save to Project icon | Request Permissions | PDF file iconPDF (146 KB)  
    Freely Available from IEEE
  • Call for papers ISBI 2010

    Publication Year: 2009 , Page(s): 1452
    Save to Project icon | Request Permissions | PDF file iconPDF (553 KB)  
    Freely Available from IEEE
  • IEEE ICIP 2010

    Publication Year: 2009 , Page(s): 1453
    Save to Project icon | Request Permissions | PDF file iconPDF (683 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2013. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Li Deng
Microsoft Research