By Topic

1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings

17-17 Dec. 1997

Filter Results

Displaying Results 1 - 25 of 80
  • 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings

    Publication Year: 1997
    Request permission for commercial reuse | PDF file iconPDF (389 KB)
    Freely Available from IEEE
  • Experiments of hands-free connected digit recognition using a microphone array

    Publication Year: 1997, Page(s):490 - 497
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (409 KB)

    A scenario concerning hands-free connected digit recognition in a noisy office environment is investigated. An array of six omnidirectional microphones and a corresponding time delay compensation module are used to provide a beamformed signal as input to a hidden Markov model (HMM) based recognizer. Phone HMM adaptation is used to further reduce the mismatch between training and test conditions. B... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Publication Year: 1997, Page(s):631 - 633
    Request permission for commercial reuse | PDF file iconPDF (81 KB)
    Freely Available from IEEE
  • Flexible human speech recognition

    Publication Year: 1997, Page(s):273 - 283
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (596 KB)

    In describing human performance in sound perception, in word recognition, in speech understanding and in dialogue handling, we generally test human limits under controlled conditions and try to understand the underlying mechanisms. However, the human system itself has already been built by nature. In speech and language technology, we would like to equal, or perhaps even outrank, human performance... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Frame-synchronous adaptation of cepstrum by linear regression

    Publication Year: 1997, Page(s):420 - 427
    Cited by:  Papers (2)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (440 KB)

    We propose using hidden Markov models (HMMs) associated with the cepstrum coefficients to remove disturbances that degrade the speech recognition process. In order to perform this task in an online manner, we use the MUltipath Stochastic Equalization (MUSE) framework. This method allows one to process data at the frame level. Two equalization functions are examined: bias removal and linear regress... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A statistical language modeling approach integrating local and global constraints

    Publication Year: 1997, Page(s):262 - 269
    Cited by:  Papers (1)  |  Patents (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (424 KB)

    A new framework is proposed to integrate the various constraints, both local and global, that are present in language. Local constraints are captured via n-gram language modeling, while global constraints are taken into account through the use of latent semantic analysis. An integrative formulation is derived for the combination of these two paradigms, resulting in several families of multi-span l... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive compensation for robust speech recognition

    Publication Year: 1997, Page(s):357 - 364
    Cited by:  Papers (3)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (416 KB)

    Adaptation and compensation are two commonly adopted strategies to improve the robustness of speech recognition systems, especially in those cases when the testing data do not resemble the training data. In many ways, adaptation and compensation share similar goals and should be considered as a unified strategy for robust speech recognition. In this paper, we discuss adaptive compensation in which... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Combined optimisation of baseforms and model parameters in speech recognition based on acoustic subword units

    Publication Year: 1997, Page(s):199 - 206
    Cited by:  Papers (2)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (416 KB)

    A major challenge in speech recognition is creating a lexicon which is robust to inter and intra speaker variations. This is even more so in speech recognisers based on non linguistic units, e.g., acoustic subword units (ASWUs), since no standard pronunciation dictionaries are available. Thus the baseforms describing the vocabulary words in terms of the recognition units need to be generated from ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptation of HMMS in the presence of additive and convolutional noise

    Publication Year: 1997, Page(s):412 - 419
    Cited by:  Patents (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (360 KB)

    The performance of speech recognizers deteriorates in the case of a mismatch between the conditions during training and recognition. One difference is the presence of a stationary background noise during recognition which is also referred to as additive noise. Furthermore the recognition is influenced by the frequency response of the whole transmission channel from the speaker to the audio input o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analyzing and predicting language model improvements

    Publication Year: 1997, Page(s):254 - 261
    Cited by:  Papers (2)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (408 KB)

    Statistical n-gram language models are traditionally developed using perplexity as a measure of goodness. However, perplexity often demonstrates a poor correlation with recognition improvements, mainly because it fails to account for the acoustic confusability between words and for search errors in a recognizer. In this paper, we study alternatives to perplexity for predicting language model perfo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)

    Publication Year: 1997, Page(s):347 - 354
    Cited by:  Papers (246)  |  Patents (29)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (400 KB)

    Describes a system developed at NIST to produce a composite automatic speech recognition (ASR) system output when the outputs of multiple ASR systems are available, and for which, in many cases, the composite ASR output has a lower error rate than any of the individual systems. The system implements a “voting” or rescoring process to reconcile differences in ASR system outputs. We refe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Phone-context specific gender-dependent acoustic-models for continuous speech recognition

    Publication Year: 1997, Page(s):192 - 198
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (308 KB)

    Gender dependent systems are usually created by splitting the training data into each gender and building two separate acoustic models for each gender. This method assumes that every state of a subphonetic model is uniformly dependent on the gender. We use the premise that the acoustic realizations of various sub phonetic units are dependent on gender in varying degrees across phones and more part... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stressed speech recognition using multi-dimensional hidden Markov models

    Publication Year: 1997, Page(s):404 - 411
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (468 KB)

    Robust speech recognition systems must address variations due to perceptually induced stress in order to maintain acceptable levels of performance in adverse conditions. This study proposes a new approach which combines stress classification and speech recognition into one algorithm. This is accomplished by generalizing the one-dimensional hidden Markov model to a multi-dimensional hidden Markov m... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Language modeling for robust balancing of acoustic and linguistic probabilities

    Publication Year: 1997, Page(s):246 - 253
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (360 KB)

    The length of a word sequence is not taken into account under language modeling in n-gram local probability modeling. Due to this property, the optimal value of the language weight for balancing acoustic and linguistic probabilities is affected by the sequence length. To deal with this problem, a new language model is developed based on the Bernoulli trial model. By taking the sequence length into... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stream derivation and clustering scheme for subspace distribution clustering hidden Markov model

    Publication Year: 1997, Page(s):339 - 346
    Cited by:  Papers (5)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (352 KB)

    Bocchieri and Mak (Proc. Eurospeech, vol. 1, p. 107-10, 1997) introduced a novel subspace distribution clustering hidden Markov model (SDCHMM) as an approximation to a continuous-density HMM (CDHMM). Deriving SDCHMMs from CDHMMs requires a definition of multiple streams and a Gaussian clustering scheme. Previously, we have tried 4 and 13 streams, which are common but ad hoc choices. In this paper,... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Progress towards speech models that model speech

    Publication Year: 1997, Page(s):115 - 123
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (496 KB)

    This paper presents a personal view of recent advances in automatic speech recognition. The analysis is concerned with progress in speech pattern modelling, rather than recogniser performance. Despite the limitations of current approaches, it is argued that extension and development of these techniques provides a viable way forward. It is further suggested that the significance of a number of rece... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Voice input for collaborative systems

    Publication Year: 1997, Page(s):19 - 25
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (432 KB)

    Collaborative decision making is a central part of command and control, logistics, planning, and many other applications where multiple people work together to solve a common problem and need to be able to view the same information. This paper focuses on issues in bringing voice input to collaborative environments, ranging from practical considerations of how we can incorporate speech input today,... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fast segmental clustering approach to decision tree tying based acoustic modeling

    Publication Year: 1997, Page(s):185 - 191
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (384 KB)

    A fast two level segmental clustering approach to decision tree based state tying is proposed for large vocabulary speech recognition. This approach extends the conventional segmental K-means approach to phonetic decision tree tying based acoustic modeling. It achieves high recognition performances while reducing the model training time from days to hours, compared to approaches based on increment... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Jacobian adaptation of noisy speech models

    Publication Year: 1997, Page(s):396 - 403
    Cited by:  Papers (9)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (404 KB)

    A Jacobian approach to fast adaptation of acoustic models is described. Acoustic models of speech under assumed noise and channel condition A are compensated by Jacobian matrices with the difference between condition A and actual condition B. Compared with existing model composition approaches for noisy speech recognition, this approach drastically reduces the computational cost while providing eq... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discriminative model combination

    Publication Year: 1997, Page(s):238 - 245
    Cited by:  Papers (8)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (300 KB)

    Discriminative model combination is a new approach in the field of automatic speech recognition. It aims at an optimal integration of all given (acoustic and language) models into one log-linear posterior probability distribution. As opposed to the maximum entropy approach, the coefficients of the log-linear combination are estimated on training samples using discriminative methods to obtain an op... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Variable threshold vector quantization for reduced continuous density likelihood computation in speech recognition

    Publication Year: 1997, Page(s):331 - 338
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (432 KB)

    Vector quantization (VQ) has been explored in the past as a means of achieving reductions in likelihood computation for hidden Markov models (HMMs) which use Gaussian mixtures for their output densities. In this paper, we present a new method for choosing which mixtures can be discarded for each pair of HMM state and vector quantization index. Traditionally, a global threshold was used to specify ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A dynamic, feature-based approach to speech modeling and recognition

    Publication Year: 1997, Page(s):107 - 114
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (556 KB)

    An overview of a statistical paradigm for speech recognition is given where phonetic and phonological knowledge sources are seamlessly integrated into the structure of a speech model. A unifying computational formalism is outlined in which the sub-models for the discrete, feature-based phonological and the continuous, dynamic phonetic processes in human speech production are computationally interf... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using expanded question sets in decision tree clustering for acoustic modelling

    Publication Year: 1997, Page(s):179 - 184
    Cited by:  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (260 KB)

    Phone-like units (PLUs) for automatic speech recognition are derived using a decision tree algorithm. In our approach we use information such as target phone label, immediate context, lexical stress level and function word affiliation in the decision tree analysis. The resulting PLUs are shown to improve phone and word recognition View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Synergistic modalities for human/machine communication

    Publication Year: 1997, Page(s):1 - 8
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (448 KB)

    Natural communication with machines is a crucial factor in bringing the benefits of networked computers to mass markets. In particular, the sensory dimensions of sight, sound and touch are comfortable and convenient modalities for the human user. New technologies are now emerging in these domains that can support human/machine communication with features that emulate face-to-face interaction. A cu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Application of sequential estimation to time-varying environment compensation [in speech recognition]

    Publication Year: 1997, Page(s):389 - 395
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (308 KB)

    Sequential approaches are proposed to compensate for the effects of a nonstationary environment. Unlike batch approaches, the proposed methods derive a different parameter estimate each time using the sequential expectation maximization (EM) algorithm. Moreover, we also propose a forward-backward estimation scheme as an improvement of the sequential parameter estimation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.