By Topic

Speech and Audio Processing, IEEE Transactions on

Issue 6 • Date Nov 1995

Filter Results

Displaying Results 1 - 10 of 10
  • A subband approach to time-scale expansion of complex acoustic signals

    Page(s): 515 - 519
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (464 KB)  

    A new approach to time-scale expansion of short-duration complex acoustic signals is introduced. Using a subband signal representation, channel phases are selected to preserve a desired time-scaled temporal envelope. The phase representation is derived from locations of events that occur within filter bank outputs. A frame-based generalization of the method imposes phase consistency across consecutive synthesis frames. The method is applied to synthetic and actual complex acoustic signals consisting of closely spaced rapidly damped sine waves. Time-frequency resolution limitations are discussed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reduction of broad-band noise in speech by truncated QSVD

    Page(s): 439 - 448
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (948 KB)  

    We consider an algorithm for reduction of broadband noise in speech based on signal subspaces. The algorithm is formulated by means of the quotient singular value decomposition (QSVD). With this formulation, a prewhitening operation becomes an integral part of the algorithm. We demonstrate that this is essential in connection with updating issues in real-time recursive applications. We also illustrate by examples that we are able to achieve a satisfactory quality of the reconstructed signal View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Isolated word recognition using Markov chain models

    Page(s): 458 - 463
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (660 KB)  

    The paper describes how Markov chains may be applied to speech recognition. In this application, a spectral vector is modeled by a state of the Markov chain, and an utterance is represented by a sequence of states. The Markov chain model (MCM) offers a substantial reduction in computation, but at the expense of a significant increase in memory requirement when compared to the hidden Markov model (HMM). Experiments on isolated word recognition show that the MCM achieved results that are comparable to those of the HMMs tested for comparison View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fast algorithm for computing the vocal-tract impulse response from the transfer function

    Page(s): 449 - 457
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (908 KB)  

    This paper describes a fast algorithm that computes the impulse response of the vocal tract from its transfer function. First, numerical methods for computing the transfer function of a given vocal-tract configuration are briefly outlined. These methods include techniques (1) to decompose the numerator and denominator of the transfer function and (2) to efficiently determine the resonance modes of the vocal tract. Next, is a description of how to calculate residues at the poles and how to express the vocal-tract transfer function as a partial fraction expansion series. Each term in the expansion corresponds to an elementary formant generator, and the additive terms correspond to a parallel formant architecture. A second-order digital filter is derived for each formant generator. The impulse response of the vocal tract can therefore be specified compactly by a set of such filters. Good agreement is observed between the directly calculated transfer function and the one synthesized by the proposed algorithm. The algorithm is being used in the articulatory speech synthesizer under development both at Rutgers University and at the Royal Institute of Technology, Sweden. An ambitious goal is to incorporate the method into a text-to-speech synthesizer and/or an adaptive voice mimic system View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CELP coding using trellis-coded vector quantization of the excitation

    Page(s): 464 - 472
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (948 KB)  

    We analyze the performance of a CELP coder where the vector quantization (VQ) of the excitation is replaced with trellis-coded vector quantization (TCVQ). Our results show that TCVQ performs significantly better than VQ, with reasonable complexity. This makes TCVQ a fair choice for trading quality against complexity and/or delay. We describe a systematic procedure to replace VQ with TCVQ for existing CELP coders. We propose an optimization algorithm to appropriately populate the trellis. We show how pseudo-Gray coding can be applied to the TCVQ codebook to improve intrinsic coder robustness to channel errors. Finally, we evaluate the complexity and performance of the method View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of the filtered-X LMS algorithm

    Page(s): 504 - 514
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (836 KB)  

    The presence of a transfer function in the auxiliary-path following the adaptive filter and/or in the error-path, as in the case of active noise control, has been shown to generally degrade the performance of the LMS algorithm. Thus, the convergence rate is lowered, the residual power is increased, and the algorithm can even become unstable. To ensure convergence of the algorithm, the input to the error correlator has to be filtered by a copy of the auxiliary-error-path transfer function. This paper presents an analysis of the filtered-X LMS algorithm using stochastic methods. The influence of off-line and on-line estimation of the error-path filter on the algorithm is also investigated. Some derived bounds and predicted dynamic behavior are found to correspond very well to simulation results View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Neural network filters for speech enhancement

    Page(s): 433 - 438
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (660 KB)  

    In adaptive noise cancelling, linear digital filters have been used to minimize the mean squared difference between filter outputs and the desired signal. However, for non-Gaussian probability density functions of the involved signals, nonlinear filters can further reduce the mean squared difference, thereby improving the signal-to-noise ratio at the system output. This is illustrated with a two-microphone beamformer for cancelling directional interference. In the case of a single uniformly distributed interference, we establish the optimum nonlinear performance limit. To approximate optimum performance, we realize two nonlinear filter architectures, the Volterra filter and the multilayer perceptron. The Volterra filter is also examined for speech interference. The beamformer is adapted to minimize the mean squared difference, but performance is measured with the intelligibility weighted gain. This criterion requires the signal-to-noise ratio at the beamformer output. For the nonlinear processor, this can only be determined when no target components exist in the reference channel of the noise canceller so that the target is transmitted without distortion. Under these ideal conditions and at equal filter lengths, the quadratic Volterra filter improves the intelligibility-weighted gain by maximally 2 dB relative to the linear filter View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A differential perceptual audio coding method with reduced bitrate requirements

    Page(s): 490 - 503
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1356 KB)  

    A new audio transform coding technique is proposed that reduces the bitrate requirements of the perceptual transform audio coders by utilizing the stationarity characteristics of the audio signals. The method detects the frames that have significant audible content and codes them in a way similar to conventional perceptual transform coders. However, when successive data frames are found to be similar to those sections, then their audible differences only are coded. An error analysis for the proposed method is presented and results from tests on different types of audio material are listed, indicating that an average of 30% in compression gain (over the conventional perceptual audio coders bitrate) can be achieved, with a small deterioration in the audio quality of the coded signal. The proposed method has the advantage of easy adaptation within the perceptual transform coders architecture and adds only a small computational overhead to these systems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive cepstral analysis of speech

    Page(s): 481 - 489
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (704 KB)  

    This paper proposes an algorithm for adaptive cepstral analysis based on the UELS (unbiased estimation of log spectrum). In the UELS, the model spectrum is represented by cepstral coefficients and the mean square of the inverse filter output is minimized with respect to the cepstral coefficients. By introducing an instantaneous gradient estimate of the criterion in a similar manner of the LMS algorithm, we develop an adaptive cepstral analysis algorithm. In the analysis system, an IIR adaptive filter whose coefficients are given by cepstral coefficients is realized using the log magnitude approximation (LMA) filter. The filter approximates an exponential transfer function and its stability is guaranteed for approximation of speech spectra. To implement the M th order cepstral analysis, the algorithm requires O(M) operations per sample. It is shown that the algorithm has fast convergence properties in comparison with the LMS algorithm. Several examples of the adaptive cepstral analysis for synthetic signal and natural speech are shown to demonstrate the effectiveness of the algorithm View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fast determination of stochastic excitation without codebook search in CELP coder

    Page(s): 473 - 480
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (792 KB)  

    The major drawback of the code excitation linear prediction (CELP) coder is computational complexity that finds the best excitation vector from a stochastic codebook. To provide a synthesized speech signal with reasonable quality, the size of the stochastic codebook should be large. For this reason, the search becomes highly complex. To overcome this difficulty, several methods have been proposed. In this paper, we consider a method that enables us to directly determine the stochastic excitation vector without a codebook search. The stochastic excitation vector has been determined by projection onto a subspace that is obtained using the Karhunen-Loeve (K-L) expansion and the spectral property of the random excitation residual vector. Since the excitation vector can be determined without a codebook search, the computational complexity becomes low. From experimental results, it is shown that the proposed coder provides a synthesized speech signal that is quite comparable in quality to that of the conventional CELP coder with low computational complexity View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

Covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2005. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope