Scheduled Maintenance on December 18, 2017:
IEEE Xplore will undergo system maintenance from 1:00 - 5:00 PM EST. During this time there may be intermittent impact on performance. We apologize for any inconvenience.

Automatic Speech Recognition and Understanding, 2001. ASRU '01. IEEE Workshop on

9-13 Dec. 2001

Filter Results

Displaying Results 1 - 25 of 112
  • 2001 IEEE Workshop on Automatic Speech Recognition and Understanding. ASRU 2001. Conference Proceedings (Cat. No.01EX544)

    Publication Year: 2001
    Request permission for commercial reuse | PDF file iconPDF (311 KB)
    Freely Available from IEEE
  • Markovian combination of language and prosodic models for better speech understanding and recognition

    Publication Year: 2001
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (76 KB) | HTML iconHTML

    Summary form only given. Traditionally, "language" models capture only the word sequences of a language. A crucial component of spoken language, however is its prosody, i.e., rhythmic and melodic properties. This paper summarizes recent work on integrated, computationally efficient modeling of word sequences and prosodic properties of speech, for a variety of speech recognition and understanding t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluating dialogue strategies and user behavior

    Publication Year: 2001
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (66 KB)

    Summary form only given. The need for accurate and flexible evaluation frameworks for spoken and multimodal dialogue systems has become crucial. In the early design phases of spoken dialogue systems, it is worthwhile evaluating the user's easiness in interacting with different dialogue strategies, rather than the efficiency of the dialogue system in providing the required information. The success ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic accent identification using Gaussian mixture models

    Publication Year: 2001, Page(s):343 - 346
    Cited by:  Papers (11)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (324 KB) | HTML iconHTML

    It is well known that speaker variability caused by accent is an important factor io speech recognition. Some major accents in China are so different as to make this problem very severe. We propose a Gaussian mixture model (GMM) based Mandarin accent identitication method. In this method a number of GMMs are trained to identify the most likely accent given test utterances. The identified accent ty... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Finite-state transducers for speech-input translation

    Publication Year: 2001, Page(s):375 - 380
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (388 KB) | HTML iconHTML

    Nowadays, hidden Markov models (HMMs) and n-grams are the basic components of the most successful speech recognition systems. In such systems, HMMs (the acoustic models) are integrated into a n-gram or a stochastic finite-state grammar (the language model). Similar models can be used for speech translation, and HMMs (the acoustic models) can be integrated into a finite-state transducer (the transl... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Publication Year: 2001, Page(s):467 - 468
    Request permission for commercial reuse | PDF file iconPDF (99 KB)
    Freely Available from IEEE
  • Multimodal browsing

    Publication Year: 2001, Page(s):272 - 275
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (274 KB) | HTML iconHTML

    With the increasing development of devices such as personal computers, WAP enabled wireless telephones and personal digital assistants connected to the World Wide Web, end users feel the need to browse the Internet through multiple modalities. We intend to investigate on how to create a user interface and a service distribution platform granting the user access to the Internet through standard I/O... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Example-based query generation for spontaneous speech

    Publication Year: 2001, Page(s):268 - 271
    Cited by:  Papers (1)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (333 KB) | HTML iconHTML

    This paper proposes a new query generation method that is based on examples of human-to-human dialogue. Along with modeling the information flow in dialogue, a system for information retrieval in-car has been designed. The system refers to the dialogue corpus to find an example that is similar to input speech, and makes a query from the example. We also give the experimental results to show the ef... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval

    Publication Year: 2001, Page(s):311 - 314
    Cited by:  Papers (8)  |  Patents (44)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (377 KB) | HTML iconHTML

    We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). Our retrieval system integrates machine translation, speech recognition and information retrieval technologies. An English news story forms a textual query that is automatically translated into Chinese words, which are mapped into Mandarin syllables b... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An open concept metric for assessing dialog system complexity

    Publication Year: 2001, Page(s):264 - 267
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (326 KB) | HTML iconHTML

    Techniques for assessing dialog system performance commonly focus on characteristics of the interaction, using metrics such as completion, satisfaction or time on task. However, such metrics are not always capable of differentiating systems that operate on fundamentally different principles, particularly when tested on tasks that focus on common-denominator capabilities. We introduce a new metric,... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unsupervised training of acoustic models for large vocabulary continuous speech recognition

    Publication Year: 2001, Page(s):307 - 310
    Cited by:  Papers (13)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (378 KB) | HTML iconHTML

    For speech recognition systems, the amount of acoustic training data is of crucial importance. In the past, large amounts of speech were recorded and transcribed manually for training. Since untranscribed speech is available in various forms these days, the unsupervised training of a speech recognizer on recognized transcriptions is studied. A low-cost recognizer trained with only one hour of manu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic sharings of Gaussian densities using phonetic features

    Publication Year: 2001, Page(s):425 - 428
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (357 KB) | HTML iconHTML

    This paper describes a way to adapt the recognizer to pronunciation variability by dynamically sharing Gaussian densities across phonetic models. The method is divided in three steps. First, given an input utterance, an HMM recognizer outputs a lattice of the most likely word hypotheses. Then, the canonical pronunciation of each hypothesis is checked by comparing its theoretical phonetic features ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Trend tying in the segmental-feature HMM

    Publication Year: 2001, Page(s):45 - 48
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (323 KB) | HTML iconHTML

    We present a reduction method for the number of parameters in a segmental-feature HMM (SFHMM). If the SFHMM shows better results than the CHMM, the number of parameters is greater than that of the CHMM. Therefore, there is a need for a new approach that reduces the number of parameters. In general, the trajectory can be separated by the trend and location. Since the trend means the variation of se... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Investigating stochastic speech understanding

    Publication Year: 2001, Page(s):260 - 263
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (363 KB) | HTML iconHTML

    The need for human expertise in the development of a speech understanding system can be greatly reduced by the use of stochastic techniques. However corpus-based techniques require the annotation of large amounts of training data. Manual semantic annotation of such corpora is tedious, expensive, and subject to inconsistencies. This work investigates the influence of the training corpus size on the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Incremental language models for speech recognition using finite-state transducers

    Publication Year: 2001, Page(s):194 - 197
    Cited by:  Papers (12)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (377 KB) | HTML iconHTML

    In the context of the weighted finite-state transducer approach to speech recognition, we investigate a novel decoding strategy to deal with very large n-gram language models often used in large-vocabulary systems. In particular, we present an alternative to full, static expansion and optimization of the finite-state transducer network. This alternative is useful when the individual knowledge sour... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ETUDE, a recursive dialog manager with embedded user interface patterns

    Publication Year: 2001, Page(s):244 - 247
    Cited by:  Papers (1)  |  Patents (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (360 KB) | HTML iconHTML

    We describe ETUDE, a dialog manager that supports recursive descriptions of the dialog flow in spoken dialog applications. We also introduce the notion of user interface patterns, i.e. those dialog patterns that are frequently used in applications. We then describe how these patterns can be built into the dialog manager engine in order to facilitate the design and development of complex applicatio... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Piecewise-linear transformation-based HMM adaptation for noisy speech

    Publication Year: 2001, Page(s):159 - 162
    Cited by:  Papers (1)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (353 KB) | HTML iconHTML

    This paper proposes a new method using a piecewise-linear transformation for adapting phone HMM to noisy speech. Various noises are clustered according to their acoustic properties and signal-to-noise ratios (SNR), and a noisy speech HMM corresponding to each clustered noise is made. Based on the likelihood maximization criterion, the HMM which best matches the input speech is selected and further... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multispeaker speech activity detection for the ICSI meeting recorder

    Publication Year: 2001, Page(s):107 - 110
    Cited by:  Papers (22)  |  Patents (18)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (374 KB) | HTML iconHTML

    As part of a project into speech recognition in meeting environments, we have collected a corpus of multichannel meeting recordings. We expected the identification of speaker activity to be straightforward given that the participants had individual microphones, but simple approaches yielded unacceptably erroneous labelings, mainly due to crosstalk between nearby speakers and wide variations in cha... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust speech recognition using wavelet coefficient features

    Publication Year: 2001, Page(s):445 - 448
    Cited by:  Papers (13)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (326 KB) | HTML iconHTML

    We propose a new vein of feature vectors for robust speech recognition that use denoised wavelet coefficients; greater robustness to unexpected additive noise or spectrum distortions begins with more robust acoustic features. The use of wavelet coefficients is motivated by human acoustic process modelling and by the ability of wavelet coefficients to capture important time and frequency features. ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The ALERT system: advanced broadcast speech recognition technology for selective dissemination of multimedia information

    Publication Year: 2001, Page(s):301 - 306
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (511 KB) | HTML iconHTML

    This paper presents a brief description of the ALERT system, which is under development by a consortium working on a research project sponsored by the European Commission. The ALERT system uses advanced speech recognition technology and video processing techniques in order to process large broadcast speech archives and multimedia information resources for the purpose of extracting specific informa... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MLLR adaptation techniques for pronunciation modeling

    Publication Year: 2001, Page(s):421 - 424
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (351 KB) | HTML iconHTML

    Multiple regression class MLLR (maximum likelihood linear regression) transforms are investigated for use with pronunciation models that predict variation in the observed pronunciations given the phonetic context. Regression classes can be constructed so that MLLR transforms can be estimated and used to model specific acoustic changes associated with pronunciation variation. The effectiveness of t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Continuous multi-band speech recognition using Bayesian networks

    Publication Year: 2001, Page(s):41 - 44
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (350 KB) | HTML iconHTML

    Using the Bayesian networks framework, we present a new multi-band approach for continuous speech recognition. This new approach has the advantage of overcoming all the limitations of the standard multi-band techniques. Moreover, it leads to a higher fidelity speech modeling than HMMs. We provide a preliminary evaluation of the performance of our new approach on a connected digits recognition task... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech data retrieval system constructed on a universal phonetic code domain

    Publication Year: 2001, Page(s):323 - 326
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (335 KB) | HTML iconHTML

    We propose a novel speech processing framework, where all of the speech data are encoded into universal phonetic code (UPC) sequences and speech processing systems, such as speech recognition, retrieval, digesting, etc., are constructed on this UPC domain. As the first step, we introduce a sub-phonetic segment (SPS) set, based on IPA (international phonetic alphabet), to deal with multilingual spe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust speaker clustering in eigenspace

    Publication Year: 2001, Page(s):57 - 60
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (340 KB) | HTML iconHTML

    We propose a speaker clustering scheme working in 'eigenspace'. Speaker models are transformed to a low-dimensional subspace using 'eigenvoices'. For the speaker clustering procedure, simple distance measures, e.g. Euclidean distance, can be applied. Moreover, clustering can be accomplished with base models (for eigenvoice projection) like Gaussian mixture models as well as conventional HMMs. In c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MATCH: multimodal access to city help

    Publication Year: 2001, Page(s):256 - 259
    Cited by:  Papers (2)  |  Patents (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (491 KB) | HTML iconHTML

    Interfaces to mobile information access devices need to allow users to interact using whichever mode or combination of modes are most appropriate, given their user preference, task at hand, and physical and social environment. This paper describes a multimodal application architecture which facilitates rapid prototyping of flexible next-generation multimodal interfaces. Our sample application MATC... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.