By Topic

Speech and Audio Processing, IEEE Transactions on

Issue 3 • Date May 2005

Filter Results

Displaying Results 1 - 21 of 21
  • Table of contents

    Publication Year: 2005 , Page(s): c1 - c4
    Save to Project icon | Request Permissions | PDF file iconPDF (41 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Speech and Audio Processing publication information

    Publication Year: 2005 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (34 KB)  
    Freely Available from IEEE
  • Stereophonic noise reduction using a combined sliding subspace projection and adaptive signal enhancement

    Publication Year: 2005 , Page(s): 309 - 320
    Cited by:  Papers (1)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1064 KB) |  | HTML iconHTML  

    A novel stereophonic noise reduction method is proposed. This method is based upon a combination of a subspace approach realized in a sliding window operation and two-channel adaptive signal enhancing. The signal obtained from the signal subspace is used as the input signal to the adaptive signal enhancer for each channel, instead of noise, as in the ordinary adaptive noise canceling scheme. Simul... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive categorical understanding for spoken dialogue systems

    Publication Year: 2005 , Page(s): 321 - 329
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (368 KB) |  | HTML iconHTML  

    In this paper, the speech understanding problem in the context of a spoken dialogue system is formalized in a maximum likelihood framework. Off-line adaptation of stochastic language models that interpolate dialogue state specific and general application-level language models is proposed. Word and dialogue-state n-grams are used for building categorical understanding and dialogue models, respectiv... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech act modeling and verification of spontaneous speech with disfluency in a spoken dialogue system

    Publication Year: 2005 , Page(s): 330 - 344
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (592 KB) |  | HTML iconHTML  

    This work presents an approach to modeling speech acts and verifying spontaneous speech with disfluency in a spoken dialogue system. According to this approach, semantic information, syntactic structure and fragment class of an input utterance are statistically encapsulated in a proposed speech act hidden Markov model (SAHMM) to characterize the speech act. An interpolation mechanism is exploited ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Eigenvoice modeling with sparse training data

    Publication Year: 2005 , Page(s): 345 - 354
    Cited by:  Papers (110)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (240 KB) |  | HTML iconHTML  

    We derive an exact solution to the problem of maximum likelihood estimation of the supervector covariance matrix used in extended MAP (or EMAP) speaker adaptation and show how it can be regarded as a new method of eigenvoice estimation. Unlike other approaches to the problem of estimating eigenvoices in situations where speaker-dependent training is not feasible, our method enables us to estimate ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Histogram equalization of speech representation for robust speech recognition

    Publication Year: 2005 , Page(s): 355 - 366
    Cited by:  Papers (82)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (704 KB) |  | HTML iconHTML  

    This paper describes a method of compensating for nonlinear distortions in speech representation caused by noise. The method described here is based on the histogram equalization method often used in digital image processing. Histogram equalization is applied to each component of the feature vector in order to improve the robustness of speech recognition systems. The paper describes how the propos... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discriminative linear transforms for feature normalization and speaker adaptation in HMM estimation

    Publication Year: 2005 , Page(s): 367 - 376
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (416 KB) |  | HTML iconHTML  

    Linear transforms have been used extensively for training and adaptation of HMM-based ASR systems. Recently procedures have been developed for the estimation of linear transforms under the Maximum Mutual Information (MMI) criterion. In this paper we introduce discriminative training procedures that employ linear transforms for feature normalization and for speaker adaptive training. We integrate t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Predictive hidden Markov model selection for speech recognition

    Publication Year: 2005 , Page(s): 377 - 387
    Cited by:  Papers (16)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (472 KB) |  | HTML iconHTML  

    This paper surveys a series of model selection approaches and presents a novel predictive information criterion (PIC) for hidden Markov model (HMM) selection. The approximate Bayesian using Viterbi approach is applied for PIC selection of the best HMMs providing the largest prediction information for generalization of future data. When the perturbation of HMM parameters is expressed by a product o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accurate compensation in the log-spectral domain for noisy speech recognition

    Publication Year: 2005 , Page(s): 388 - 398
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (312 KB) |  | HTML iconHTML  

    This paper presents a new algorithm for noise compensation in the log-spectral domain. We first note that using a Gaussian mixture assumption a compensation algorithm in the log-spectral domain is completely defined by three parameters for each Gaussian component: the noisy speech mean, the noisy speech variance, and the covariance of clean and noisy speech. Starting from a well known mismatch fun... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Segmental eigenvoice with delicate eigenspace for improved speaker adaptation

    Publication Year: 2005 , Page(s): 399 - 411
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (512 KB) |  | HTML iconHTML  

    Eigenvoice techniques have been proposed to provide rapid speaker adaptation with very limited adaptation data, but the performance may be saturated when more adaptation data become available. This is because in these techniques an eigenspace with reduced dimensionality is established by properly utilizing the a priori knowledge from the large quantity of training data. The reduced dimensionality ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion

    Publication Year: 2005 , Page(s): 412 - 421
    Cited by:  Papers (45)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (408 KB) |  | HTML iconHTML  

    This paper presents a new technique for dynamic, frame-by-frame compensation of the Gaussian variances in the hidden Markov model (HMM), exploiting the feature variance or uncertainty estimated during the speech feature enhancement process, to improve noise-robust speech recognition. The new technique provides an alternative to the Bayesian predictive classification decision rule by carrying out a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Understanding perceptual distortion in MPEG scalable audio coding

    Publication Year: 2005 , Page(s): 422 - 431
    Cited by:  Papers (12)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (2032 KB) |  | HTML iconHTML  

    In this paper, we study coding artifacts in MPEG-compressed scalable audio. Specifically, we consider the MPEG advanced audio coder (AAC) using bit slice scalable arithmetic coding (BSAC) as implemented in the MPEG-4 reference software. First we perform human subjective testing using the comparison category rating (CCR) approach, quantitatively comparing the performance of scalable BSAC with the n... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data embedding in audio using time-scale modification

    Publication Year: 2005 , Page(s): 432 - 440
    Cited by:  Papers (17)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (336 KB) |  | HTML iconHTML  

    A new framework for data embedding in audio is proposed. The basic idea of the algorithm is to change the length of the intervals between salient points of the audio signal to embed data. The intervals are quantized and the data is embedded in the quantization indices. In our particular implementation, we use the wavelet extrema of the signal envelope as the salient points. We propose novel ideas ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic music classification and summarization

    Publication Year: 2005 , Page(s): 441 - 450
    Cited by:  Papers (36)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (568 KB) |  | HTML iconHTML  

    Automatic music classification and summarization are very useful to music indexing, content-based music retrieval and on-line music distribution, but it is a challenge to extract the most common and salient themes from unstructured raw music data. In this paper, we propose effective algorithms to automatically classify and summarize music content. Support vector machines are applied to classify mu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Speech and Audio Processing Edics

    Publication Year: 2005 , Page(s): 451
    Save to Project icon | Request Permissions | PDF file iconPDF (24 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Speech and Audio Processing Information for authors

    Publication Year: 2005 , Page(s): 452 - 453
    Save to Project icon | Request Permissions | PDF file iconPDF (50 KB)  
    Freely Available from IEEE
  • Special issue on progress in rich transcription

    Publication Year: 2005 , Page(s): 454
    Save to Project icon | Request Permissions | PDF file iconPDF (139 KB)  
    Freely Available from IEEE
  • Call for Papers IEEE Transactions on Information Forensics and Security

    Publication Year: 2005 , Page(s): 455
    Save to Project icon | Request Permissions | PDF file iconPDF (105 KB)  
    Freely Available from IEEE
  • 2006 IEEE International Symposium on Biomedical Imaging: From Nano to Macro (ISBI'06)

    Publication Year: 2005 , Page(s): 456
    Save to Project icon | Request Permissions | PDF file iconPDF (569 KB)  
    Freely Available from IEEE
  • IEEE Signal Processing Society Information

    Publication Year: 2005 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (30 KB)  
    Freely Available from IEEE

Aims & Scope

Covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2005. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope