Scheduled System Maintenance:
On Monday, April 27th, IEEE Xplore will undergo scheduled maintenance from 1:00 PM - 3:00 PM ET (17:00 - 19:00 UTC). No interruption in service is anticipated.
By Topic

Speech and Audio Processing, IEEE Transactions on

Issue 6 • Date Nov. 2000

Filter Results

Displaying Results 1 - 16 of 16
  • Comments on "Efficient training algorithms for HMMs using incremental estimation"

    Publication Year: 2000 , Page(s): 751 - 754
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (97 KB)  

    The paper entitled "Efficient training algorithms for HMMs using incremental estimation" by Gotoh et al. (IEEE Trans. Speech Audio Processing, vol.6, p.539-48, Nov. 1998) investigated expectation maximization (EM) procedures that increase training speed. The claim of Gotoh et al. that these procedures are generalized EM (Dempster et al. 1977) procedures is shown to be incorrect in the present paper. We discuss why this is so, provide an example of nonmonotonic convergence to a local maximum in likelihood, and outline conditions that guarantee such convergence. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • List of reviewers

    Publication Year: 2000 , Page(s): 757 - 758
    Save to Project icon | Request Permissions | PDF file iconPDF (7 KB)  
    Freely Available from IEEE
  • Cell-based beamforming (CE-BABE) for speech acquisition with microphone arrays

    Publication Year: 2000 , Page(s): 738 - 743
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1160 KB)  

    This paper introduces a microphone array processing method that possesses the robustness of fixed beamforming along with the ability to be dynamically reconfigured to limit interference and reverberation. The basic approach is to partition the environment into two regions: an interior region (containing sources that are physically present within the room enclosure), and an exterior region (containing virtual sources of reverberation). The interior region is further divided into cells, and standard source localization techniques are used to identify those cells containing the desired source as well as sources of interference (e.g., competing talkers). Beamforming weights are then found to pass the desired signal, while simultaneously minimizing a weighted combination of interior interference and exterior reverberation. Simulation results are presented to demonstrate the effectiveness of the proposed technique when compared with conventional beamforming methods View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A good read

    Publication Year: 2000 , Page(s): 645
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (12 KB)  

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A computationally efficient multipitch analysis model

    Publication Year: 2000 , Page(s): 708 - 716
    Cited by:  Papers (94)  |  Patents (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (144 KB)  

    A computationally efficient model for multipitch and periodicity analysis of complex audio signals is presented. The model essentially divides the signal into two channels, below and above 1000 Hz, computes a “generalized” autocorrelation of the low-channel signal and of the envelope of the high-channel signal, and sums the autocorrelation functions. The summary autocorrelation function (SACF) is further processed to obtain an enhanced SACF (ESACF). The SACF and ESACP representations are used in observing the periodicities of the signal. The model performance is demonstrated to be comparable to those of recent time-domain models that apply a multichannel analysis. In contrast to the multichannel models, the proposed pitch analysis model can be run in real time using typical personal computers. The parameters of the model are experimentally tuned for best multipitch discrimination with typical mixtures of complex tones. The proposed pitch analysis model may be used in complex audio signal processing applications, such as sound source separation, computational auditory scene analysis, and structural representation of audio signals. The performance of the model is demonstrated by pitch analysis examples using sound mixtures which are available for download at http://www.acoustics.hut.fi/-ttolonen/pitchAnalysis/ View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • R/D optimal linear prediction

    Publication Year: 2000 , Page(s): 646 - 655
    Cited by:  Papers (19)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (236 KB)  

    A common technique to extend linear prediction to nonstationary signals is time segmentation: the signal is split into small portions and the modelization is carried out locally. The accuracy of the analysis is, however, dependent on the window size and on the signal characteristics, so that the problem of finding a good segmentation is crucial to the entire modeling scheme. In this paper, we present an algorithm which determines the optimal segmentation with respect to a cost function relating prediction error to modeling cost. The proposed approach casts the problem in a rate/distortion (R/D) framework, whereby the segmentation is implicitly computed while minimizing the modelization distortion for a given modelization cost. The algorithm is implemented by means of dynamic programming and takes the form of a trellis-based Lagrangian minimization. The optimal linear predictor, when applied to speech coding, dramatically reduces the number of bits per second devoted to the modeling parameters in comparison to fixed-window schemes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fractionally addressed delay lines

    Publication Year: 2000 , Page(s): 717 - 727
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (656 KB)  

    While traditional implementations of variable-length digital delay lines are based on a circular buffer accessed by two pointers, we propose an implementation where a single fractional pointer is used both for read and write operations. On modern general-purpose architectures, the proposed method is nearly as efficient as the popular interpolated circular buffer, and it behaves well for delay-length modulations commonly found in digital audio effects. The physical interpretation of the new implementation shows that it is suitable for simulating tension or density modulations in wave-propagating media View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A minimax search algorithm for robust continuous speech recognition

    Publication Year: 2000 , Page(s): 688 - 694
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (196 KB)  

    In this paper, we propose a novel implementation of a minimax decision rule for continuous density hidden Markov-model-based robust speech recognition. By combining the idea of the minimax decision rule with a normal Viterbi search, we derive a recursive minimax search algorithm, where the minimax decision rule is repetitively applied to determine the partial paths during the search procedure. Because of the intrinsic nature of a recursive search, the proposed method can be easily extended to perform continuous speech recognition. Experimental results on Japanese isolated digits and TIDIGITS, where the mismatch between training and testing conditions is caused by additive white Gaussian noise, show the viability and efficiency of the proposed minimax search algorithm View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The time-conditioned approach in dynamic programming search for LVCSR

    Publication Year: 2000 , Page(s): 676 - 687
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (280 KB)  

    This paper presents the time-conditioned approach in dynamic programming search for large-vocabulary continuous-speech recognition. The following topics are presented: the baseline algorithm, a time-synchronous beam search version, a comparison with the word-conditioned approach, a comparison with stack decoding. The approach has been successfully tested on the NAB task using a vocabulary of 64000 words View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • GA-based noisy speech recognition using two-dimensional cepstrum

    Publication Year: 2000 , Page(s): 664 - 675
    Cited by:  Papers (11)  |  Patents (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (232 KB)  

    Among various kinds of speech features, the two-dimensional (2-D) cepstrum (TDC) is a special one, which can simultaneously represent several types of information contained in the speech waveform: static and dynamic features, as well as global and fine frequency structures. Analysis results show that the coefficients located at lower indexes portion of the TDC matrix seem to be more significant than others. Hence, to represent an utterance only some TDC coefficients need to be selected to form a feature vector instead of the sequence of feature vectors. It has the advantages of simple computation and less storage space. However, our experiments show that the selection of TDC coefficients is quite sensitive to background noise. In order to solve this problem, we propose the GA-based M-TDC (modified TDC) method in this paper to improve the representativeness and robustness of the selected TDC coefficients in noisy environments. The M-TDC differs from the standard TDC by the use of filters to remove the noise components. Furthermore, in the GA-based M-TDC method, we apply the genetic algorithms (GAs) to find the robust coefficients in the M-TDC matrix. From the experiments with five noise types, we find that the GA-based M-TDC method has better recognition results than the original TDC approaching noisy environments View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A DCT-based fast signal subspace technique for robust speech recognition

    Publication Year: 2000 , Page(s): 747 - 751
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (168 KB)  

    In this correspondence, a fast computational method is proposed to approximate the Karhunen-Loeve transform (KLT) for the covariance matrix of the autoregressive process. A fast algorithm which reduces the computation of eigenvalues of an N×N symmetric Toeplitz matrix from O(N3) in KLT to N2 is further developed. Experimental results demonstrate that the performance of the fast algorithm is very close to the KLT in eigenvalue computation and in energy constrained signal subspace speech enhancement for speech recognition in a car environment View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient distance measure for quantization of LSF and its Karhunen-Loeve transformed parameters

    Publication Year: 2000 , Page(s): 744 - 746
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (104 KB)  

    This paper presents a new distance measure that is based on the spectral sensitivity of the line spectrum frequency parameters (LSFs) and its Karhunen-Loeve (KL) transformed coefficients. It is shown that the proposed distance measure achieves better performance of vector quantization (VQ) compared to other methods in the field of LSF coding. In most cases, the percentage of outliers is reduced when using the new one, compared to the best results of using other conventional weighting functions, The use of this distance as the weighting function of the LSF transformed parameters is also suggested View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Double-talk robust fast converging algorithms for network echo cancellation

    Publication Year: 2000 , Page(s): 656 - 663
    Cited by:  Papers (58)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (224 KB)  

    There is a need for echo cancelers for echo paths with long impulse responses (⩾64 ms). This in turn creates a need for more rapidly converging algorithms in order to meet the specifications for network echo cancelers. Faster convergence, however, in general implies a higher sensitivity to near-end disturbances, especially “double-talk.” Previously, a fast converging algorithm has been proposed called proportionate normalized least mean squares (PNLMS) algorithm. This algorithm exploits the sparseness of the echo path and has the advantage that no detection of active coefficients is needed. In this paper we propose a method for making the PNLMS algorithm more robust against double-talk. The slower divergence rate of these algorithms in combination with a standard Geigel double-talk detector improves the performance of a network echo canceler considerably during double-talk. The principle is based on a scaled nonlinearity which is applied to the residual error signal. This results in the robust PNLMS algorithm which diverges much slower than PNLMS and standard NLMS. Tradeoff between convergence and divergence rate is easily adjusted with one parameter and the added complexity is about seven instructions per sample which is less than 0.3% of the total load of a PNLMS algorithm with 512 filter coefficients. A generalization of the robust PNLMS algorithm to a robust proportionate affine projection algorithm (APA) is also presented. It converges very fast, and unlike PNLMS, is not as dependent on the assumption of a sparse echo path response. The complexity of the robust proportionate APA of order two is roughly the same as that of PNLMS View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Control of feedback in hearing aids-a robust filter design approach

    Publication Year: 2000 , Page(s): 754 - 756
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (76 KB)  

    A bound on the variability of the feedback path is employed in the design of fixed FIR hearing aid filters that are robust to the specified variability, thus avoiding instability and howling in everyday use. A design example is presented for a linear gain hearing aid filter with a given maximal mismatch of the feedback cancellation filter View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Nonminimum-phase equalization and its subjective importance in room acoustics

    Publication Year: 2000 , Page(s): 728 - 737
    Cited by:  Papers (30)  |  Patents (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (216 KB)  

    This paper investigates the perceptual significance of residual phase distortion due to an approximate equalization of the nonminimum-phase room response from a sound source to a microphone in a reverberant room. It is shown that disrupted phase relationships introduced by a minimum-phase equalization filter may have a detrimental effect on perceived sound quality. The subjective assessment of phase distortion on speech signals is related to an objective error criterion, newly introduced in this paper. An alternative approach to the minimum-phase/all-pass decomposition based on iterative flattening of the room transfer function (RTF) magnitude is also presented, which overcomes potential numerical problems and provides more insight into subjective aspects of magnitude and phase equalization in the reduction of acoustic reverberation. Factors contributing to the results and practical implications for equalization are discussed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rapid speaker adaptation in eigenvoice space

    Publication Year: 2000 , Page(s): 695 - 707
    Cited by:  Papers (196)  |  Patents (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (212 KB)  

    This paper describes a new model-based speaker adaptation algorithm called the eigenvoice approach. The approach constrains the adapted model to be a linear combination of a small number of basis vectors obtained offline from a set of reference speakers, and thus greatly reduces the number of free parameters to be estimated from adaptation data. These “eigenvoice” basis vectors are orthogonal to each other and guaranteed to represent the most important components of variation between the reference speakers. Experimental results for a small-vocabulary task (letter recognition) given in the paper show that the approach yields major improvements in performance for tiny amounts of adaptation data. For instance, we obtained 16% relative improvement in error rate with one letter of supervised adaptation data, and 26% relative improvement with four letters of supervised adaptation data. After a comparison of the eigenvoice approach with other speaker adaptation algorithms, the paper concludes with a discussion of future work View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

Covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2005. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope