By Topic

IEEE Transactions on Speech and Audio Processing

Issue 2 • Date March 2005

Filter Results

Displaying Results 1 - 23 of 23
  • Table of contents

    Publication Year: 2005, Page(s): c1
    Request permission for commercial reuse | PDF file iconPDF (39 KB)
    Freely Available from IEEE
  • IEEE Transactions on Speech and Audio Processing publication information

    Publication Year: 2005, Page(s): c2
    Request permission for commercial reuse | PDF file iconPDF (34 KB)
    Freely Available from IEEE
  • Perceptual segmentation and component selection for sinusoidal representations of audio

    Publication Year: 2005, Page(s):149 - 162
    Cited by:  Papers (16)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (856 KB) | HTML iconHTML

    This paper presents two fundamental enhancements in a hybrid audio signal model consisting of sinusoidal, transient, and noise (STN) components. The first enhancement involves a novel application of a perceptual metric for optimal time segmentation for the analysis of transients. In particular, Moore and Glasberg's model of partial loudness is modified for use with general signals and then integra... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Companded quantization of speech MDCT coefficients

    Publication Year: 2005, Page(s):163 - 173
    Cited by:  Papers (7)  |  Patents (13)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (504 KB) | HTML iconHTML

    Here, we propose speech-coding procedures achieving high subjective quality, avoiding speech-specific processing and interframe exploitation. Thus, the scheme is tractable for packet-based voice communication, and has the capability of coding generic audio. The architecture is based on an modified discrete cosine transform (MDCT) representation of the signal, and combines efficient vector quantiza... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Boosting with prior knowledge for call classification

    Publication Year: 2005, Page(s):174 - 181
    Cited by:  Papers (27)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (536 KB) | HTML iconHTML

    The use of boosting for call classification in spoken language understanding is described in this paper. An extension to the AdaBoost algorithm is presented that permits the incorporation of prior knowledge of the application as a means of compensating for the large dependence on training data. We give a convergence result for the algorithm, and we describe experiments on four datasets showing tha... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Decision tree State tying using cluster validity criteria

    Publication Year: 2005, Page(s):182 - 193
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (504 KB) | HTML iconHTML

    Decision tree state tying aims to perform divisive clustering, which can combine the phonetics and acoustics of speech signal for large vocabulary continuous speech recognition. A tree is built by successively splitting the observation frames of a phonetic unit according to the best phonetic questions. To prevent building over-large tree models, the stopping criterion is required to suppress tree ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rapid online adaptation based on transformation space model evolution

    Publication Year: 2005, Page(s):194 - 202
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (368 KB) | HTML iconHTML

    This paper presents a new approach to online linear regression adaptation of continuous density hidden Markov models based on transformation space model (TSM) evolution. The TSM which characterizes the a priori knowledge of the training speakers associated with maximum likelihood linear regression matrix parameters is effectively described in terms of the latent variable models such as the factor ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speaker verification using sequence discriminant support vector machines

    Publication Year: 2005, Page(s):203 - 210
    Cited by:  Papers (89)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (336 KB) | HTML iconHTML

    This paper presents a text-independent speaker verification system using support vector machines (SVMs) with score-space kernels. Score-space kernels generalize Fisher kernels and are based on underlying generative models such as Gaussian mixture models (GMMs). This approach provides direct discrimination between whole sequences, in contrast with the frame-level approaches at the heart of most cur... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speaker classification using composite hypothesis testing and list decoding

    Publication Year: 2005, Page(s):211 - 219
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (368 KB) | HTML iconHTML

    Speaker classification is seen as a hypothesis testing problem of J simple hypotheses and a composite hypothesis. The simple hypotheses represent target speakers while the composite hypothesis represents nontarget speakers. The simple hypotheses have well-defined distributions that are estimated from training signals. The distribution of the signal under the composite hypothesis is assumed to belo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Entropy-constrained polar quantization and its application to audio coding

    Publication Year: 2005, Page(s):220 - 232
    Cited by:  Papers (21)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (480 KB) | HTML iconHTML

    In this work, we present a new method for quantization of sinusoidal amplitudes and phases, and apply the method to sinusoidal coding of speech and audio signals. The method is based on unrestricted polar quantization, where phase quantization accuracy depends on amplitude. Amplitude and phase quantizers are derived under an entropy (average rate) constraint using high-rate assumptions. First, we ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Observer-based feedback linearization of dynamic loudspeakers with Ac amplifiers

    Publication Year: 2005, Page(s):233 - 242
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (568 KB) | HTML iconHTML

    For a reduction of nonlinear distortion produced by a dynamic loudspeaker, there exists a variety of approaches, one of them using the known principle of exact input-output linearization. In combination with a discrete state-space observer, this approach can successfully be applied as long as the amplifier driving the loudspeaker is dc-coupled. This paper discusses how ac amplifiers can be used in... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A bio-inspired companding strategy for spectral enhancement

    Publication Year: 2005, Page(s):243 - 253
    Cited by:  Papers (29)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (720 KB) | HTML iconHTML

    This work presents a compressing-and-expanding, i.e., companding, strategy for spectral enhancement inspired by the operation of the auditory system. The companding strategy simulates the two-tone suppression phenomena of the auditory system and implements a soft local winner-take-all-like enhancement of the input spectrum. It performs multichannel syllabic compression without degrading spectral c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiresolution sinusoidal model with dynamic segmentation for timescale modification of polyphonic audio signals

    Publication Year: 2005, Page(s):254 - 262
    Cited by:  Papers (3)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (552 KB) | HTML iconHTML

    In this paper, we propose an efficient sinusoidal model of polyphonic audio signals especially good for the application of timescale modification. One of the critical problem of sinusoidal modeling is that the signal is smeared during the synthesis frame, which is a very undesirable effect for transient parts. We solve this problem by introducing multiresolution analysis-synthesis and dynamic segm... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multichannel audio synthesis by subband-based spectral conversion and parameter adaptation

    Publication Year: 2005, Page(s):263 - 274
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (640 KB) | HTML iconHTML

    Multichannel audio can immerse a group of listeners in a seamless aural environment. Previously, we proposed a system capable of synthesizing the multiple channels of a virtual multichannel recording from a smaller set of reference recordings. This problem was termed multichannel audio resynthesis and the application was to reduce the excessive transmission requirements of multichannel audio. In t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Beat tracking of musical performances using low-level audio features

    Publication Year: 2005, Page(s):275 - 285
    Cited by:  Papers (26)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (552 KB) | HTML iconHTML

    This paper presents and compares two methods of tracking the beat in musical performances, one based on a Bayesian decision framework and the other a gradient strategy. The techniques can be applied directly to a digitized performance (i.e., a soundfile) and do not require a musical score or a MIDI transcription. In both cases, the raw audio is first processed into a collection of "rhythm tracks" ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Convergence analysis of a complex LMS algorithm with tonal reference signals

    Publication Year: 2005, Page(s):286 - 292
    Cited by:  Papers (19)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (456 KB) | HTML iconHTML

    Often one encounters the presence of tonal noise in many active noise control applications. Such noise, usually generated by periodic noise sources like rotating machines, is cancelled by synthesizing the so-called antinoise by a set of adaptive filters which are trained to model the noise generation mechanism. Performance of such noise cancellation schemes depends on, among other things, the conv... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Toward detecting emotions in spoken dialogs

    Publication Year: 2005, Page(s):293 - 303
    Cited by:  Papers (231)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (600 KB) | HTML iconHTML

    The importance of automatically recognizing emotions from human speech has grown with the increasing role of spoken language interfaces in human-computer interaction applications. This paper explores the detection of domain-specific emotions using language and discourse information in conjunction with acoustic correlates of emotion in speech signals. The specific focus is on a case study of detect... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Speech and Audio Processing Edics

    Publication Year: 2005, Page(s): 304
    Request permission for commercial reuse | PDF file iconPDF (31 KB)
    Freely Available from IEEE
  • IEEE Transactions on Speech and Audio Processing Information for authors

    Publication Year: 2005, Page(s):305 - 306
    Request permission for commercial reuse | PDF file iconPDF (42 KB)
    Freely Available from IEEE
  • Have you visited lately? www.ieee.org [advertisement]

    Publication Year: 2005, Page(s): 307
    Request permission for commercial reuse | PDF file iconPDF (220 KB)
    Freely Available from IEEE
  • Quality without compromise [advertisement]

    Publication Year: 2005, Page(s): 308
    Request permission for commercial reuse | PDF file iconPDF (318 KB)
    Freely Available from IEEE
  • IEEE Signal Processing Society Information

    Publication Year: 2005, Page(s): c3
    Request permission for commercial reuse | PDF file iconPDF (30 KB)
    Freely Available from IEEE
  • Blank page [back cover]

    Publication Year: 2005, Page(s): c4
    Request permission for commercial reuse | PDF file iconPDF (2 KB)
    Freely Available from IEEE

Aims & Scope

Covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2005. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope