By Topic

Speech and Audio Processing, IEEE Transactions on

Issue 4 • Date Jul 1997

Filter Results

Displaying Results 1 - 10 of 10
  • Parallel auditory filtering by sustained and transient channels separates coarticulated vowels and consonants

    Page(s): 301 - 318
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (492 KB)  

    A neural model of peripheral auditory processing is described and used to separate features of coarticulated vowels and consonants. After preprocessing of speech via a filterbank, the model splits into two parallel channels, a sustained channel and a transient channel. The sustained channel is sensitive to relatively stable parts of the speech waveform, notably synchronous properties of the vocalic portion of the stimulus. It extends the dynamic range of eighth nerve filters using coincidence detectors that combine operations of raising to a power, rectification, delay, multiplication, time averaging, and preemphasis. The transient channel is sensitive to critical features at the onsets and offsets of speech segments. It is built up from fast excitatory neurons that are modulated by slow inhibitory interneurons. These units are combined over high-frequency and low-frequency ranges using operations of rectification, normalization, multiplicative gating, and opponent processing. Detectors sensitive to frication and to onset or offset of stop consonants and vowels are described. Model properties are characterized by mathematical analysis and computer simulations. Neural analogs of model cells in the cochlear nucleus and inferior colliculus are noted, as are psychophysical data about perception of CV syllables that may be explained by the sustained-transient channel hypothesis. The proposed sustained and transient processing seems to be an auditory analog of the sustained and transient processing that is known to occur in vision View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speaker-independent phonetic classification using hidden Markov models with mixtures of trend functions

    Page(s): 319 - 324
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (148 KB)  

    We extend the nonstationary-state or trended hidden Markov model (HMM) from the previous single-trend formulation (Deng, 1992; Deng et al., 1994) to the current mixture-trended one. This extension is motivated by the observation of wide variations in the trajectories of the acoustic data in fluent, speaker-independent speech associated with a fixed underlying linguistic unit. It is also motivated by potential use of mixtures of trend functions to characterize heterogeneous time-varying data generated from distinctive sources such as the speech signals collected from different microphones or from different telephone channels. We show how HMMs with mixtures of trend functions can be implemented simply in the already well-established single-trend HMM framework via the device of expanding each state into a set of parallel states. Details of a maximum-likelihood-based (ML-based) algorithm are given for estimating state-dependent mixture trajectory parameters in the model. Experimental results on the task of classifying speaker-independent vowels excised from the TIMIT data base demonstrate consistent performance improvement using phonemic mixture-trended HMMs over their single-trend counterpart View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Frequency-domain periodic active noise control and equalization

    Page(s): 348 - 358
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (368 KB)  

    This paper analyzes a frequency-domain periodic active noise control (ANC) system. The adaptive filter employs the frequency domain complex least mean square (LMS) algorithm driven by a unit value at each frequency bin. The synchronously sampled frequency-domain adaptive structure acts as a comb filter to cancel narrowband noises with harmonically related frequencies. A frequency-domain periodic active noise equalization (ANE) system, which reshapes the residual noise by controlling the output of the adaptive comb filter at each frequency bin, is also developed and analyzed in this paper. Computer simulations and real-time experiments conducted attest to the practical usefulness of the proposed ANC and ANE systems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A secondary path modeling technique for active noise control systems

    Page(s): 374 - 377
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (104 KB)  

    A secondary path modeling technique for active noise control systems is developed for both on-line and off-line modeling with faster convergence and higher modeling accuracy. The optimum delay for the adaptive prediction error filter to reduce the interference in system modeling is equal to the length of the impulse response of the secondary path being modeled View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Algorithm adaptation rate in active control: is faster necessarily better?

    Page(s): 378 - 381
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (60 KB)  

    Owing to limitations on the convergence coefficient placed by time delays in the cancellation path, fast rates of algorithm adaptation may yield poorer performance than slow rates in active control implementations. This is shown experimentally for the performance criteria of error minimization, speed of convergence, and excitation of unreferenced signal components View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A constrained transform domain adaptive IIR filter structure for active noise control

    Page(s): 334 - 347
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (380 KB)  

    An active noise controller utilizing a transform domain recursive LMS (RLMS) infinite impulse response (IIR) filter is presented in this paper. The new filter, termed the TRLMS adaptive filter, mitigates the filter instability problems that often exist with controllers based on direct form IIR filters by directly constraining the adaptation of the filter's frequency response. A filtered-U adaptive algorithm for the TRLMS filter structure is derived. Two different implementations of the constrained TRLMS filter are discussed. The first represents the adaptive filter in terms of its transform domain weights. They are directly adapted by the filtered-U TRLMS adaptive algorithm. The second implementation represents the filter in terms of its original time domain weights. These are updated by the filtered-U RLMS algorithm and subsequently projected onto an adaptation subspace at each iteration. Extensions of these algorithms that enforce soft constraints on the filter response are also presented. The stability of the constrained TRLMS adaptive filter is investigated utilizing two different transformations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the effects of short-term spectrum smoothing in channel normalization

    Page(s): 372 - 374
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (80 KB)  

    We present a simple analysis showing that channel normalization techniques are less effective when applied to spectral energies obtained by (weighted) summation of components of the short-time Fourier power spectrum of speech. We show that applying channel normalization processing prior to critical band integration or linear predictive all-pole modeling improves the effectiveness of the techniques View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiple-point equalization of room transfer functions by using common acoustical poles

    Page(s): 325 - 333
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (248 KB)  

    A multiple-point equalization filter using the common acoustical poles of room transfer functions is proposed. The common acoustical poles correspond to the resonance frequencies, which are independent of source and receiver positions. They are estimated as common autoregressive (AR) coefficients from multiple room transfer functions. The equalization is achieved with a finite impulse response (FIR) filter, which has the inverse characteristics of the common acoustical pole function. Although the proposed filter cannot recover the frequency response dips of the multiple room transfer functions, it can suppress their common peaks due to resonance; it is also less sensitive to changes in receiver position. Evaluation of the proposed equalization filter using measured room transfer functions shows that it can reduce the deviations in the frequency characteristics of multiple room transfer functions better than a conventional multiple-point inverse filter. Experiments show that the proposed filter enables 1-5 dB additional amplifier gain in a public address system without acoustic feedback at multiple receiver positions. Furthermore, the proposed filter reduces the reflected sound in room impulse responses without the pre-echo that occurs with a multiple-point inverse filter. A multiple-point equalization filter using common acoustical poles can thus equalize multiple room transfer functions by suppressing their common peaks View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Embedded RPE based on multistage coding

    Page(s): 367 - 371
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (164 KB)  

    The feasibility and performance of an embedded regular pulse excited speech coder (ERPE) based on multistage coding is investigated. The simulated ERPE system exhibits a graceful reduction of reconstructed speech quality for bit rates from 14.8 to 6.4 kb/s in 4.2 kb/s steps, and carries a very small signal-to-noise ratio (SNR) penalty compared to its conventional version View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The modulated lapped transform, its time-varying forms, and its applications to audio coding standards

    Page(s): 359 - 366
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (236 KB)  

    The modulated lapped transform (MLT) is used in both audio and video data compression schemes. This paper describes its properties and how it can be used to generate a time-varying filterbank. Examples of its implementation in two audio coding standards are presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

Covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2005. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope