By Topic

Audio, Speech, and Language Processing, IEEE Transactions on

Issue 3 • Date March 2008

Filter Results

Displaying Results 1 - 22 of 22
  • Table of contents

    Publication Year: 2008 , Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (139 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing publication information

    Publication Year: 2008 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (36 KB)  
    Freely Available from IEEE
  • A Minimum Distortion Noise Reduction Algorithm With Multiple Microphones

    Publication Year: 2008 , Page(s): 481 - 493
    Cited by:  Papers (19)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (867 KB) |  | HTML iconHTML  

    The problem of noise reduction using multiple microphones has long been an active area of research. Over the past few decades, most efforts have been devoted to beamforming techniques, which aim at recovering the desired source signal from the outputs of an array of microphones. In order to work reasonably well in reverberant environments, this approach often requires such knowledge as the directi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • HMM Word and Phrase Alignment for Statistical Machine Translation

    Publication Year: 2008 , Page(s): 494 - 507
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (625 KB) |  | HTML iconHTML  

    Estimation and alignment procedures for word and phrase alignment hidden Markov models (HMMs) are developed for the alignment of parallel text. The development of these models is motivated by an analysis of the desirable features of IBM Model 4, one of the original and most effective models for word alignment. These models are formulated to capture the desirable aspects of Model 4 in an HMM alignm... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

    Publication Year: 2008 , Page(s): 508 - 518
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (740 KB) |  | HTML iconHTML  

    In this paper, we investigate the combination of complementary acoustic feature streams in large-vocabulary continuous speech recognition (LVCSR). We have explored the use of acoustic features obtained using a pitch-synchronous analysis, Straight, in combination with conventional features such as Mel frequency cepstral coefficients. Pitch-synchronous acoustic features are of particular interest wh... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Random Forests of Phonetic Decision Trees for Acoustic Modeling in Conversational Speech Recognition

    Publication Year: 2008 , Page(s): 519 - 528
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (837 KB) |  | HTML iconHTML  

    In this paper, we present a novel technique of constructing phonetic decision trees (PDTs) for acoustic modeling in conversational speech recognition. We use random forests (RFs) to train a set of PDTs for each phone state unit and obtain multiple acoustic models accordingly. We investigate several methods of combining acoustic scores from the multiple models, including maximum-likelihood estimati... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Transcription and Separation of Drum Signals From Polyphonic Music

    Publication Year: 2008 , Page(s): 529 - 540
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (915 KB) |  | HTML iconHTML  

    The purpose of this article is to present new advances in music transcription and source separation with a focus on drum signals. A complete drum transcription system is described, which combines information from the original music signal and a drum track enhanced version obtained by source separation. In addition to efficient fusion strategies to take into account these two complementary sources ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Noise Tracking Using DFT Domain Subspace Decompositions

    Publication Year: 2008 , Page(s): 541 - 553
    Cited by:  Papers (16)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (900 KB) |  | HTML iconHTML  

    All discrete Fourier transform (DFT) domain-based speech enhancement gain functions rely on knowledge of the noise power spectral density (PSD). Since the noise PSD is unknown in advance, estimation from the noisy speech signal is necessary. An overestimation of the noise PSD will lead to a loss in speech quality, while an underestimation will lead to an unnecessary high level of residual noise. W... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cascaded RLS–LMS Prediction in MPEG-4 Lossless Audio Coding

    Publication Year: 2008 , Page(s): 554 - 562
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (722 KB) |  | HTML iconHTML  

    This paper describes the cascaded recursive least square-least mean square (RLS-LMS) prediction, which is part of the recently published MPEG-4 Audio Lossless Coding international standard. The predictor consists of cascaded stages of simple linear predictors, with the prediction error at the output of one stage passed to the next stage as the input signal. A linear combiner adds up the intermedia... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Constructing Modulation Frequency Domain-Based Features for Robust Speech Recognition

    Publication Year: 2008 , Page(s): 563 - 577
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1297 KB) |  | HTML iconHTML  

    Data-driven temporal filtering approaches based on a specific optimization technique have been shown to be capable of enhancing the discrimination and robustness of speech features in speech recognition. The filters in these approaches are often obtained with the statistics of the features in the temporal domain. In this paper, we derive new data-driven temporal filters that employ the statistics ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Capturing Local Variability for Speaker Normalization in Speech Recognition

    Publication Year: 2008 , Page(s): 578 - 593
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1973 KB) |  | HTML iconHTML  

    The new model reduces the impact of local spectral and temporal variability by estimating a finite set of spectral and temporal warping factors which are applied to speech at the frame level. Optimum warping factors are obtained while decoding in a locally constrained search. The model involves augmenting the states of a standard hidden Markov model (HMM), providing an additional degree of freedom... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Incorporating Model-Specific Score Distribution in Speaker Verification Systems

    Publication Year: 2008 , Page(s): 594 - 606
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (944 KB) |  | HTML iconHTML  

    It has been shown that the authentication performance of a biometric system is dependent on the models/templates specific to a user. As a result, some users may be more easily recognized or impersonated than others. The various categories of users have been characterized by Doddington et al. (1988). We refer to this unbalanced performance across users as the Doddington's zoo effect. ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rapid Speaker Adaptation Using Clustered Maximum-Likelihood Linear Basis With Sparse Training Data

    Publication Year: 2008 , Page(s): 607 - 616
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (443 KB) |  | HTML iconHTML  

    Speaker space-based adaptation methods for automatic speech recognition have been shown to provide significant performance improvements for tasks where only a few seconds of adaptation speech is available. However, these techniques are not widely used in practical applications because they require large amounts of speaker-dependent training data and large amounts of computer memory. The authors pr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Conditional Random Fields for Integrating Local Discriminative Classifiers

    Publication Year: 2008 , Page(s): 617 - 628
    Cited by:  Papers (16)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (464 KB) |  | HTML iconHTML  

    Conditional random fields (CRFs) are a statistical framework that has recently gained in popularity in both the automatic speech recognition (ASR) and natural language processing communities because of the different nature of assumptions that are made in predicting sequences of labels compared to the more traditional hidden Markov model (HMM). In the ASR community, CRFs have been employed in a met... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Highly Robust, Secure, and Perceptual-Quality Echo Hiding Scheme

    Publication Year: 2008 , Page(s): 629 - 638
    Cited by:  Papers (19)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (630 KB) |  | HTML iconHTML  

    Audio watermarking using echo hiding has fairly good perceptual quality. However, security and the tradeoff between robustness and imperceptibility are still relevant issues. This paper presents the echo hiding scheme in which the analysis-by-synthesis approach, interlaced kernels, and frequency hopping are adopted to achieve high robustness, security, and perceptual quality. The amplitudes of the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Specmurt Analysis of Polyphonic Music Signals

    Publication Year: 2008 , Page(s): 639 - 650
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1395 KB) |  | HTML iconHTML  

    This paper introduces a new music signal processing method to extract multiple fundamental frequencies, which we call specmurt analysis. In contrast with cepstrum which is the inverse Fourier transform of log-scaled power spectrum with linear frequency, specmurt is defined as the inverse Fourier transform of linear power spectrum with log-scaled frequency. Assuming that all tones in a polyphonic s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Modeling of Diffuse Boundaries in the 2-D Digital Waveguide Mesh

    Publication Year: 2008 , Page(s): 651 - 665
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (2457 KB) |  | HTML iconHTML  

    The digital waveguide mesh can be used to simulate the propagation of sound waves in an acoustic system. The accurate simulation of the acoustic characteristics of boundaries within such a system is an important part of the model. One significant property of an acoustic boundary is its diffusivity. Previous approaches to simulating diffuse boundaries in a digital waveguide mesh are effective but e... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Microphone Array Shape Calibration in Diffuse Noise Fields

    Publication Year: 2008 , Page(s): 666 - 670
    Cited by:  Papers (24)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (358 KB) |  | HTML iconHTML  

    This correspondence presents a microphone array shape calibration procedure for diffuse noise environments. The procedure estimates intermicrophone distances by fitting the measured noise coherence with its theoretical model and then estimates the array geometry using classical multidimensional scaling. The technique is validated on noise recordings from two office environments. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dark Energy in Sparse Atomic Estimations

    Publication Year: 2008 , Page(s): 671 - 676
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (864 KB) |  | HTML iconHTML  

    Sparse overcomplete methods, such as matching pursuit, attempt to find an efficient estimation of a signal using terms (atoms) selected from an overcomplete dictionary. In some cases, atoms can be selected that have energy in regions of the signal that have no energy. Other atoms are then used to destructively interfere with these terms in order to preserve the original waveform. Because some term... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Audio, Speech, and Language Processing Edics

    Publication Year: 2008 , Page(s): 677 - 678
    Save to Project icon | Request Permissions | PDF file iconPDF (30 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing Information for authors

    Publication Year: 2008 , Page(s): 679 - 680
    Save to Project icon | Request Permissions | PDF file iconPDF (45 KB)  
    Freely Available from IEEE
  • IEEE Signal Processing Society Information

    Publication Year: 2008 , Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (31 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2013. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Li Deng
Microsoft Research