Volume 7 Issue 3 • May 1999
Filter Results
-
Improved phase vocoder time-scale modification of audio
Publication Year: 1999, Page(s):323 - 332
Cited by: Papers (79) | Patents (36)The phase vocoder is a well established tool for time scaling and pitch shifting speech and audio signals via modification of their short-time Fourier transforms (STFTs). In contrast to time-domain time-scaling and pitch-shifting techniques, the phase vocoder is generally considered to yield high quality results, especially for large modification factors and/or polyphonic signals. However, the pha... View full abstract»
-
Semi-tied covariance matrices for hidden Markov models
Publication Year: 1999, Page(s):272 - 281
Cited by: Papers (247) | Patents (9)There is normally a simple choice made in the form of the covariance matrix to be used with continuous-density HMMs. Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or block-diagonal matrix is used, where all or some of the correlations are explicitly modeled. Unfortunately when using full or block-diagonal ... View full abstract»
-
Wavelet packet filterbanks for low time delay audio coding
Publication Year: 1999, Page(s):310 - 322
Cited by: Papers (34) | Patents (1)We study the application of wavelet packet filterbanks to low bit-rate transparent audio coding, taking the audio coders' delay requirements into account, and propose low-delay coders based on wavelet packet filterbanks. We first develop a method of comparison between filterbanks for perceptual audio coding by estimating the necessary bit-rate for a transparent compression. We use this comparison ... View full abstract»
-
Performance of an HMM speech recognizer using a real-time tracking microphone array as input
Publication Year: 1999, Page(s):346 - 349
Cited by: Papers (17)This correspondence reports results for a tracking, real-time microphone array as an input to a hidden Markov model based (HMM-based) connected alpha-digits speech recognizer. For a talker in the near field of the array (within 0.5 m), performance approaches that of a close-talking microphone input device View full abstract»
-
Adaptive microphone array employing calibration signals: an analytical evaluation
Publication Year: 1999, Page(s):241 - 252
Cited by: Papers (47) | Patents (4)This paper gives an analytical description of an adaptive microphone array that facilitates a simple built-in calibration to the environment and instrumentation. This method, suggested for use in hands-free mobile telephones and speech recognition systems for cars, provides speech enhancement and acoustic echo-cancellation. The scheme offers several advantages, such as a simple calibration procedu... View full abstract»
-
A discriminative training algorithm for VQ-based speaker identification
Publication Year: 1999, Page(s):353 - 356
Cited by: Papers (23)A novel method, referred to as group vector quantization (GVQ), is proposed to train VQ codebooks for closed-set speaker identification. In GVQ training, speaker codebooks are optimized for vector groups rather than for individual vectors. An evaluation experiment has been conducted to compare the codebooks trained by the Linde-Buzo-Grey (LBG), the learning vector quantization (LVQ), and the GVQ a... View full abstract»
-
Segmental modeling using a continuous mixture of nonparametric models
Publication Year: 1999, Page(s):262 - 271
Cited by: Papers (10)A major limitation of hidden Markov model (HMM) based automatic speech recognition is the inherent assumption that successive observations within a state are independent and identically distributed (i.i.d.). The i.i.d. assumption is reasonable for some of the states (e.g., a state that corresponds to a steady state vowel). However, most states clearly violate this assumption (e.g., states correspo... View full abstract»
-
Improved speech recognition using a subspace projection approach
Publication Year: 1999, Page(s):343 - 345
Cited by: Papers (4)Two class separability criteria based on the divergence measure are proposed to improve speech recognition performance. The average and weighted average divergence measures are used as criteria for finding a transformation matrix which maps the original features into a more discriminative subspace. Results are presented for a highly confusable task View full abstract»
-
A dynamical system model for generating fundamental frequency for speech synthesis
Publication Year: 1999, Page(s):295 - 309
Cited by: Papers (24) | Patents (5)Higher quality speech synthesis is required for widespread use of text to-speech (TTS) technology, and prosody is one component of synthesis technology with the greatest need for improvement. This paper describes a new approach to generation of two important cues to prosodic patterns-fundamental frequency (F0) and energy contours-given symbolic prosodic labels and text. Specifically, th... View full abstract»
-
Cepstrum-based pitch detection using a new statistical V/UV classification algorithm
Publication Year: 1999, Page(s):333 - 338
Cited by: Papers (85) | Patents (4)An improved cepstrum-based voicing detection and pitch determination algorithm is presented. Voicing decisions are made using a multifeature voiced/unvoiced classification algorithm based on statistical analysis of cepstral peak, zero-crossing rate, and energy of short-time segments of the speech signal. Pitch frequency information is extracted by a modified cepstrum-based method and then carefull... View full abstract»
-
Partitioning the feature space of a classifier with linear hyperplanes
Publication Year: 1999, Page(s):282 - 288
Cited by: Papers (6) | Patents (1)We describe the design and use of linear hyperplanes to partition the feature space of a classifier. The objective of the partitioning is to minimize the average entropy of the class distribution in the final partitions. The hyperplanes are characterized by a vector νn and scalar hn, which are computed with the objective of maximizing the mutual information associated wit... View full abstract»
-
Understanding speech recognition using correlation-generated neural network targets
Publication Year: 1999, Page(s):350 - 352
Cited by: Papers (3)Training neural networks with variable targets for speech recognition systems has been shown to be effective in improving word accuracy. In this correspondence, a new and simple method for estimating variable targets for a given training pattern is presented. It uses estimated correlations between different output nodes of a neural network to create a set of variable targets for each training patt... View full abstract»
-
Online adaptation of hidden Markov models using incremental estimation algorithms
Publication Year: 1999, Page(s):253 - 261
Cited by: Papers (25) | Patents (1)The mismatch that frequently occurs between the training and testing conditions of an automatic speech recognizer can be efficiently reduced by adapting the parameters of the recognizer to the testing conditions. Two measures that characterize the performance of an adaptation algorithm are the speed with which it adapts to the new conditions, and its computational complexity, which is important fo... View full abstract»
-
Equalization of speech and audio signals using a nonlinear dynamical approach
Publication Year: 1999, Page(s):356 - 360
Cited by: Papers (15)We present the minimum phase space volume (MPSV) technique, a nonlinear dynamical technique for enhancing speech and audio signals corrupted by convolutional noise. The MPSV technique requires no assumptions or a priori information about the original signal, remains effective when the inverse filter order is overestimated, and significantly outperforms the LS method View full abstract»
-
A transcription-based approach to determine the difficulty of a speech recognition task
Publication Year: 1999, Page(s):339 - 342A new parameter for estimating the difficulty of a continuous speech recognition task, called speech decoding difficulty, is presented. It is obtained from the language model defined for the recognition task and the phonetic similarity between the transcriptions of the words that make up the vocabulary used. Two variants of the proposed task difficulty measure are introduced: ideal speech decoding... View full abstract»
-
A comparison of speaker identification results using features based on cepstrum and Fourier-Bessel expansion
Publication Year: 1999, Page(s):289 - 294
Cited by: Papers (29)A compact representation of speech is possible using Bessel functions because of the similarity between voiced speech and the Bessel functions. Both voiced speech and the Bessel functions exhibit quasiperiodicity and decaying amplitude with time. This paper presents the results of speaker identification experiments using features obtained from (1) the Fourier-Bessel expansion and (2) the cepstral ... View full abstract»
Aims & Scope
Covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.
This Transactions ceased publication in 2005. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.