By Topic

Speech and Audio Processing, IEEE Transactions on

Issue 8 • Date Nov. 2001

Filter Results

Displaying Results 1 - 21 of 21
  • List of reviewers

    Publication Year: 2001 , Page(s): 957 - 958
    Save to Project icon | Request Permissions | PDF file iconPDF (9 KB)  
    Freely Available from IEEE
  • 2001 index IEEE Transactions on speech and audio processing vol. 9 [Author Index]

    Publication Year: 2001 , Page(s): 959 - 962
    Save to Project icon | Request Permissions | PDF file iconPDF (61 KB)  
    Freely Available from IEEE
  • 2001 index IEEE Transactions on speech and audio processing vol. 9 [Subject Index]

    Publication Year: 2001 , Page(s): 962 - 970
    Save to Project icon | Request Permissions | PDF file iconPDF (90 KB)  
    Freely Available from IEEE
  • Enhanced waveform interpolative coding at low bit-rate

    Publication Year: 2001 , Page(s): 786 - 798
    Cited by:  Papers (7)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (376 KB) |  | HTML iconHTML  

    This paper presents a high quality enhanced waveform interpolative (EWI) speech coder at low bit-rate. The system incorporates novel features such as optimization of the slowly evolving waveform (SEW) for interpolation, analysis-by-synthesis (AbS) vector quantization (VQ) of the SEW dispersion phase, dual-predictive AbS quantization of the SEW, efficient parameterization of the rapidly-evolving waveform (REW) magnitude, and VQ of the REW parameter, a special pitch search for transitions, and switched-predictive analysis-by-synthesis gain VQ. Subjective tests indicate that the 2.8 kb/s EWI coder's quality exceeds that of G.723.1 at 5.3 kb/s, and it is slightly better than that of G.723.1 at 6.3 kb/s View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spectral subtraction using reduced delay convolution and adaptive averaging

    Publication Year: 2001 , Page(s): 799 - 807
    Cited by:  Papers (35)  |  Patents (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (217 KB) |  | HTML iconHTML  

    In hands-free speech communication, the signal-to-noise ratio (SNR) is often poor, which makes it difficult to have a relaxed conversation. By using noise suppression, the conversation quality can be improved. This paper describes a noise suppression algorithm based on spectral subtraction. The method employs a noise and speech-dependent gain function for each frequency component. Proper measures have been taken to obtain a corresponding causal filter and also to ensure that the circular convolution originating from fast Fourier transform (FFT) filtering yields a truly linear filtering. A novel method that uses spectrum-dependent adaptive averaging to decrease the variance of the gain function is also presented. The results show a 10-dB background noise reduction for all input SNR situations tested in the range -6 to 16 dB, as well as improvement in speech quality and reduction of noise artifacts as compared with conventional spectral subtraction methods View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Combined noise and echo reduction in hands-free systems: a survey

    Publication Year: 2001 , Page(s): 808 - 820
    Cited by:  Papers (18)  |  Patents (15)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (233 KB) |  | HTML iconHTML  

    The modern telecommunications field is concerned with freedom and, in this context, hands-free systems offer subscribers the possibility of talking more naturally, without using a handset. This new type of use leads to new problems which were negligible in traditional telephony, namely the superposition of noise and echo on the speech signal. To solve these problems and provide a quality that is sufficient for telecommunications, combined reduction of these disturbances is required. This paper presents a summary of the solutions retained for this dual reduction in the context of mono-channel and two-channel sound pick-ups View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Bayesian approach to the verification problem: applications to speaker verification

    Publication Year: 2001 , Page(s): 874 - 884
    Cited by:  Papers (11)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (208 KB) |  | HTML iconHTML  

    In this paper, we study the general verification problem from a Bayesian viewpoint. In the Bayesian approach, the verification decision is made by evaluating Bayes factors against a critical threshold. The calculation of the Bayes factors in turn requires the computation of several Bayesian predictive densities. As a case study, we apply the method to speaker verification based on the Gaussian mixture model (GMM). We propose an efficient algorithm to calculate the Bayes factors for the GMM, where the Viterbi approximation is adopted in the computation of joint Bayesian predictive densities. We evaluate the proposed method for the NIST98 speaker verification evaluation data. Experimental results show that the new Bayesian approach achieves moderate improvements over a well-trained baseline system using the conventional likelihood ratio test View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the perceptually irrelevant phase information in sinusoidal representation of speech

    Publication Year: 2001 , Page(s): 900 - 905
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (109 KB) |  | HTML iconHTML  

    For efficient quantization of speech representations, it is essential to incorporate perceptual characteristics of human hearing. However, the focus has been confined only to the magnitude information of speech, and little attention has been paid to phase information. This paper presents a novel approach, termed perceptually irrelevant phase elimination (PIPE), to find out irrelevant phase information in acoustic signals in terms of perceived quality. The proposed method, inspired by the observation that the relative phase relationship within a critical band is perceptually important, is derived not only for stationary Fourier signals but also for harmonic signals. For harmonic signals, the "critical phase frequency" is defined below which phase information is perceptually irrelevant. The PIPE algorithm is incorporated into the harmonic analysis/synthesis of speech, and subjective test results demonstrate the effectiveness of the proposed method View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A delayless subband active noise control system for wideband noise control

    Publication Year: 2001 , Page(s): 892 - 899
    Cited by:  Papers (11)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (207 KB) |  | HTML iconHTML  

    In this paper, we present an efficient noise control algorithm based on the delayless subband adaptive filter. The algorithm reduces the computational complexity of the delayless subband filter by decomposing the secondary path response into a set of subband functions. In this new algorithm, the filtered reference signal is generated for each subband by using a short impulse response filter that models the secondary path transfer function in a subband-decomposed form. Computational efficiency of the presented method originates from the fact that the filtering process occurs only in one subband for each reference input sample. Furthermore, computational efficiency is enhanced when the presented algorithm is combined with an online identification of the secondary path transfer function. We also propose a frequency-domain implementation of the presented algorithm. In this implementation, it is shown that the computational complexity is further reduced by employing the block-processing approach View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Noise robust speech parameterization using multiresolution feature extraction

    Publication Year: 2001 , Page(s): 856 - 865
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (137 KB) |  | HTML iconHTML  

    In this paper, we present a multiresolution-based feature extraction technique for speech recognition in adverse conditions. The proposed front-end algorithm uses mel cepstrum-based feature computation of subbands in order not to spread noise distortions over the entire feature space. Conventional full-band features are also augmented to the final feature vector which is fed to the recognition unit. Other novel features of the proposed front-end algorithm include emphasis of long-term spectral information combined with cepstral domain feature vector normalization and the use of the PCA transform, instead of DCT, to provide the final cepstral parameters. The proposed algorithm was experimentally evaluated in a connected digit recognition task under various noise conditions. The results obtained show that the new feature extraction algorithm improves word recognition accuracy by 41 % when compared to the performance of mel cepstrum front-end. A substantial increase in recognition accuracy was observed in all tested noise environments at all different SNRs. The good performance of the multiresolution front-end is not only due to the higher feature vector dimension, but the proposed algorithm clearly outperformed the mel cepstral front-end when the same number of HMM parameters were used in both systems. We also propose methods to reduce the computational complexity of the multiresolution front-end-based speech recognition system. Experimental results indicate the viability of the proposed techniques View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bias of feedback cancellation algorithms in hearing aids based on direct closed loop identification

    Publication Year: 2001 , Page(s): 906 - 913
    Cited by:  Papers (26)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (178 KB) |  | HTML iconHTML  

    The undesired effects of acoustic feedback of hearing aids can be reduced with an internal feedback path that is an estimate of the external feedback path. This paper analyzes the limiting estimate of the feedback for feedback cancellation schemes that apply some recursive prediction error method with a quadratic norm, e.g., least mean square (LMS) and recursive least squares (RLS), to the output and input signals of the hearing aid to identify the feedback path. The data used for identification are then collected in a closed loop and the estimate used in one recursion will affect the data used in succeeding recursions. These properties have to be considered in the analysis. The analysis shows that the limiting estimate may be biased if there is an error in the used model of the input signal to the hearing aid, and that the system is not identifiable unless a second input signal to the system is added to the output of the hearing aid or the signal processing of the hearing aid used to modify the signal to the impaired ear is nonlinear. The limiting estimate is presented as the solution to an optimization problem in the frequency domain. An analytical expression of the limiting estimate is presented for a special case. For other cases an algorithm is presented that can be used to find a numerical solution. The results can be useful when the model structure used with the recursive identification is chosen View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An SNR-incremental stochastic matching algorithm for noisy speech recognition

    Publication Year: 2001 , Page(s): 866 - 873
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (141 KB) |  | HTML iconHTML  

    In this paper, an signal-to-noise ratio (SNR)-incremental stochastic matching (SISM) algorithm is proposed for robust speech recognition in noisy environments. The SISM algorithm is an extension of Sankar and Lee's (1996) stochastic matching (SM) for dealing with the distortion due to additive noise. We address two issues concerning the original maximum likelihood-based SM techniques. One concern is that the initial condition of the expectation-maximization (EM) algorithm has to be set carefully if the mismatch between training and testing is large. The other is that the performance is often limited by the newly adapted model in noise compensation instead of reaching the higher level of accuracy often obtained in clean environments. Our proposed SISM algorithm attempts to improve the initial condition and to relax the performance bound. First, the SISM algorithm provides a good initial condition making use of a set of environment-matched models. The second is a recursive operation, i.e., the reference model in each recursion is changed along the direction of SNR increment in order to push the recognition performance to that obtained at higher SNR levels. Experimental results show that the SISM algorithm provides further improvement after the best environment-matched performance has been reached, and can therefore obtain an additional discriminative power through using the speech models with higher SNR instead of retraining process View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Acoustic echo cancellation using iterative-maximal-length correlation and double-talk detection

    Publication Year: 2001 , Page(s): 932 - 942
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (278 KB) |  | HTML iconHTML  

    The conventional maximal-length-correlation (MLC) algorithm to estimate room impulse response for adaptive echo cancellation (AEC) is disturbed by both far-end and near-end speech. In this paper, a new iterative-maximal-length-correlation (IMLC) algorithm is proposed to reduce the far-end speech interference. To avoid the near-end interference, a new double-talk detection (DTD) method is proposed by tracking the squared coefficients errors of the AEC filter. This DTD method has well-separated detection margins among single-talk (ST), double-talk (DT), and echo path changes. Statistical analysis and computer simulations confirm that our proposed IMLC-DTD algorithm outperforms conventional methods View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Linear predictive coding with modified filter structures

    Publication Year: 2001 , Page(s): 769 - 777
    Cited by:  Papers (6)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (159 KB) |  | HTML iconHTML  

    In conventional one-step forward linear prediction, an estimate for the current sample value is formed as a linear combination of previous sample values. In this paper, a generalized form of this scheme is studied. Here, the prediction is not based simply on the previous sample values but on the signal history as seen through an arbitrary filterbank. It is shown in the paper how the coefficients of a modified model can be obtained and how the inverse and synthesis filters can be implemented. Various properties of such systems are derived in this article. As an example, a novel linear predictive system using inherently logarithmic frequency representation is introduced View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech recognition and utterance verification based on a generalized confidence score

    Publication Year: 2001 , Page(s): 821 - 832
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (197 KB)  

    In this paper, we introduce a generalized confidence score (GCS) function that enables a framework to integrate different confidence scores in speech recognition and utterance verification. A modified decoder based on the GCS is then proposed. The GCS is defined as a combination of various confidence scores obtained by exponential weighting from various confidence information sources, such as likelihood, likelihood ratio, duration, language model probabilities, etc. We also propose the use of a confidence preprocessor to transform raw scores into manageable terms for easy integration. We consider two kinds of hybrid decoders, an ordinary hybrid decoder and an extended hybrid decoder, as implementation examples based on the generalized confidence score. The ordinary hybrid decoder uses a frame-level likelihood ratio in addition to a frame-level likelihood, while a conventional decoder uses only the frame likelihood or likelihood ratio. The extended hybrid decoder uses not only the frame-level likelihood but also multilevel information such as frame-level, phone-level, and word-level confidence scores based on the likelihood ratios. Our experimental evaluation shows that the proposed hybrid decoders give better results than those obtained by the conventional decoders, especially in dealing with ill-formed utterances that contain out-of-vocabulary words and phrases View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Passive parametric modeling of dynamic loudspeakers

    Publication Year: 2001 , Page(s): 885 - 891
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (137 KB) |  | HTML iconHTML  

    In this paper, an electrical circuit is proposed which is suitable for a parametric modeling of electrodynamic transducers, e.g., dynamic loudspeakers. Since the system modeled is known to be passive, the preservation of this property is a key issue of the approach presented here. This is of special importance when nonlinear effects are taken into account, as passivity always ensures different kinds of stability. The only nonlinear elements occurring in the circuit model are ideal transformers, the turns ratios of which are not constant but depend, in particular, on some signal quantity. As a consequence, the circuit can be digitally simulated by applying principles known from the theory of wave digital filters View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structured audio, Kolmogorov complexity, and generalized audio coding

    Publication Year: 2001 , Page(s): 914 - 931
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (236 KB) |  | HTML iconHTML  

    Structured-audio techniques are a development in audio coding that develop new connections between the existing practices of audio synthesis and audio compression. A theoretical basis for this coding model is presented, grounded in information theory and Kolmogorov complexity theory. It is demonstrated that algorithmic structured audio can provide higher compression ratios than other techniques for many audio signals and proved rigorously that it can provide compression at least as good as every other technique (up to a constant term) for every audio signal. The MPEG-4 structured audio standard is the first practical application of algorithmic coding theory. It points the direction toward a new paradigm of generalized audio coding, in which structured-audio coding subsumes all other audio-coding techniques. Generalized audio coding offers new marketplace models that enable advances in compression technology to be rapidly leveraged toward the solution of problems in audio coding View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-time passive source localization: a practical linear-correction least-squares approach

    Publication Year: 2001 , Page(s): 943 - 956
    Cited by:  Papers (115)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (355 KB) |  | HTML iconHTML  

    A linear-correction least-squares estimation procedure is proposed for the source localization problem under an additive measurement error model. The method, which can be easily implemented in a real-time system with moderate computational complexity, yields an efficient source location estimator without assuming a priori knowledge of noise distribution. Alternative existing estimators, including likelihood-based, spherical intersection, spherical interpolation, and quadratic-correction least-squares estimators, are reviewed and comparisons of their complexity, estimation consistency and efficiency against the Cramer-Rao lower bound are made. Numerical studies demonstrate that the proposed estimator performs better under many practical situations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Linear prediction based packet loss concealment algorithm for PCM coded speech

    Publication Year: 2001 , Page(s): 778 - 785
    Cited by:  Papers (23)  |  Patents (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (148 KB) |  | HTML iconHTML  

    One of the well-known problems in real-time packetized voice applications is the degradation in voice quality due to delayed or misrouted packets. When a voice packet does not arrive at the receiver on time, the receiver needs a packet loss concealment algorithm to generate a signal instead of the missing voice segment. In this paper we describe a high performance packet loss concealment algorithm for pulse code modulation (PCM) coded speech. The algorithm extracts the residual signal of the previously received speech by linear prediction analysis, uses periodic replication to generate an approximation for the excitation signal of missing speech, and generates synthesized speech using this excitation. It also performs overlap-and-adding and scaling operations to smooth out transitions at frame boundaries. The new algorithm is compared to other algorithms by subjective quality tests, and is found to be better than the existing algorithms in some cases View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New approaches for domain transformation and parameter combination for improved accuracy in parallel model combination (PMC) techniques

    Publication Year: 2001 , Page(s): 842 - 855
    Cited by:  Papers (4)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (277 KB) |  | HTML iconHTML  

    Parallel model combination (PMC) techniques have been very successful and popularly used in many applications to improve the performance of speech recognition systems under noisy environments. However, it is believed that some assumptions and approximations made in this approach, primarily in the domain transformation and parameter combination processes, are not necessarily accurate enough in certain practical situations, which may degrade the achievable performance of PMC. In this paper, the possible sources that cause the performance degradation in these processes are carefully analyzed and discussed. Three new approaches, including the truncated Gaussian approach and the split mixture approach for the domain transformation process and the estimated cross-term approach for parameter combination process, are proposed in this paper in order to handle these problems, minimize such degradation, and improve the accuracy of the PMC techniques. These proposed approaches were analyzed and discussed with two recognition tasks, one relatively simple, and the other more complicated and realistic. Both sets of experiments showed that these proposed approaches are able to provide significant improvements over the original PMC method, especially when the SNR condition is worse View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Acoustic-phonetic features for the automatic classification of stop consonants

    Publication Year: 2001 , Page(s): 833 - 841
    Cited by:  Papers (19)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (202 KB) |  | HTML iconHTML  

    In this paper, the acoustic-phonetic characteristics of the American English stop consonants are investigated. Features studied in the literature are evaluated for their information content and new features are proposed. A statistically guided, knowledge-based, acoustic-phonetic system for the automatic classification of stops, in speaker independent continuous speech, is proposed. The system uses a new auditory-based front-end processing and incorporates new algorithms for the extraction and manipulation of the acoustic-phonetic features that proved to be rich in their information content. Recognition experiments are performed using hard decision algorithms on stops extracted from the TIMIT database continuous speech of 60 speakers (not used in the design process) from seven different dialects of American English. An accuracy of 96% is obtained for voicing detection, 90% for place of articulation detection and 86% for the overall classification of stops View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

Covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2005. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope