Automatic Speech Recognition and Understanding, 2001. ASRU '01. IEEE Workshop on

9-13 Dec. 2001

Filter Results

Displaying Results 1 - 25 of 112
  • 2001 IEEE Workshop on Automatic Speech Recognition and Understanding. ASRU 2001. Conference Proceedings (Cat. No.01EX544)

    Publication Year: 2001
    Request permission for commercial reuse | |PDF file iconPDF (311 KB)
    Freely Available from IEEE
  • Markovian combination of language and prosodic models for better speech understanding and recognition

    Publication Year: 2001
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (76 KB) | HTML iconHTML

    Summary form only given. Traditionally, "language" models capture only the word sequences of a language. A crucial component of spoken language, however is its prosody, i.e., rhythmic and melodic properties. This paper summarizes recent work on integrated, computationally efficient modeling of word sequences and prosodic properties of speech, for a variety of speech recognition and understanding t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluating dialogue strategies and user behavior

    Publication Year: 2001
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (66 KB)

    Summary form only given. The need for accurate and flexible evaluation frameworks for spoken and multimodal dialogue systems has become crucial. In the early design phases of spoken dialogue systems, it is worthwhile evaluating the user's easiness in interacting with different dialogue strategies, rather than the efficiency of the dialogue system in providing the required information. The success ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic accent identification using Gaussian mixture models

    Publication Year: 2001, Page(s):343 - 346
    Cited by:  Papers (11)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (324 KB) | HTML iconHTML

    It is well known that speaker variability caused by accent is an important factor io speech recognition. Some major accents in China are so different as to make this problem very severe. We propose a Gaussian mixture model (GMM) based Mandarin accent identitication method. In this method a number of GMMs are trained to identify the most likely accent given test utterances. The identified accent ty... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Finite-state transducers for speech-input translation

    Publication Year: 2001, Page(s):375 - 380
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (388 KB) | HTML iconHTML

    Nowadays, hidden Markov models (HMMs) and n-grams are the basic components of the most successful speech recognition systems. In such systems, HMMs (the acoustic models) are integrated into a n-gram or a stochastic finite-state grammar (the language model). Similar models can be used for speech translation, and HMMs (the acoustic models) can be integrated into a finite-state transducer (the transl... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Publication Year: 2001, Page(s):467 - 468
    Request permission for commercial reuse | |PDF file iconPDF (99 KB)
    Freely Available from IEEE
  • Brancusi, neo-plasticism, and the art of designing speech-recognition applications

    Publication Year: 2001, Page(s):9 - 14
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (396 KB) | HTML iconHTML

    Designing over-the-phone speech-recognition systems requires that designers have a design methodology and philosophy that enables them to understand how to research, design, evaluate and re-design their application. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Phoneme-to-grapheme conversion for out-of-vocabulary words in large vocabulary speech recognition

    Publication Year: 2001, Page(s):413 - 416
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (369 KB) | HTML iconHTML

    We describe a method to enhance the readability of the textual output in a large vocabulary continuous speech recognition system when out-of-vocabulary words occur. The basic idea is to replace uncertain words in the transcriptions with a phoneme recognition result that is post-processed using a phoneme-to-grapheme converter. This converter turns phoneme strings into grapheme strings and is traine... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • VoiceXML 2.0 and the W3C speech interface framework

    Publication Year: 2001, Page(s):5 - 8
    Cited by:  Papers (1)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (265 KB) | HTML iconHTML

    The World Wide Web Voice Browser Working Group has released specifications for four integrated languages to developing speech applications: VoiceXML 2.0, Speech Synthesis Markup Language, Speech Recognition Grammar Markup Language, and Semantic Interpretation. These languages enable developers to specify quickly conversational speech Web applications that can be accessed by any telephone or cell p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dialogue management in the Talk'n'Travel system

    Publication Year: 2001, Page(s):235 - 239
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (364 KB) | HTML iconHTML

    A central problem for mixed-initiative dialogue management is coping with user utterances that fall outside of the expected sequence of dialogue. Independent initiative by the user may require a complete revision of the future course of the dialogue, even when the system is engaged in activities of its own, such as querying a database, etc. This paper presents an event-driven, goal-based dialogue ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Incremental language models for speech recognition using finite-state transducers

    Publication Year: 2001, Page(s):194 - 197
    Cited by:  Papers (12)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (377 KB) | HTML iconHTML

    In the context of the weighted finite-state transducer approach to speech recognition, we investigate a novel decoding strategy to deal with very large n-gram language models often used in large-vocabulary systems. In particular, we present an alternative to full, static expansion and optimization of the finite-state transducer network. This alternative is useful when the individual knowledge sour... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Acoustic analysis and recognition of whispered speech

    Publication Year: 2001, Page(s):429 - 432
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (320 KB) | HTML iconHTML

    The acoustic properties and a recognition method of whispered speech are discussed. A whispered speech database that consists of whispered speech, normal speech and the corresponding facial video images of more than 6,000 sentences from 100 speakers was prepared. The comparison between whispered and normal utterances show that: 1) the cepstrum distance between them is 4 dB for voiced and 2 dB for ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • State synchronous modeling of audio-visual information for bi-modal speech recognition

    Publication Year: 2001, Page(s):409 - 412
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (338 KB) | HTML iconHTML

    There has been a higher demand recently for automatic speech recognition (ASR) systems able to operate robustly in acoustically noisy environments. This paper proposes a method to integrate audio and visual information effectively in audio-visual (bi-modal) ASR systems. Such integration inevitably necessitates modeling of the synchronization of the audio and visual information. To address the time... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The symbiosis of DSP and speech recognition or an outsider's view of the inside

    Publication Year: 2001, Page(s):1 - 4
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (361 KB) | HTML iconHTML

    From an historical review of how we got to where we are now, we discuss the interrelationship between our system design objectives and goals, our modeling of the speech signal and its generation and parameterization, and the broadly developing DSP methodology. We take a critical look at some of the underlying assumptions in. our modeling to see if they may be limiting the performance that can be o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A finite-state approach to machine translation

    Publication Year: 2001, Page(s):381 - 388
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (534 KB) | HTML iconHTML

    The problem of machine translation can be viewed as consisting of two subproblems: (a) lexical selection; (b) lexical reordering. We propose stochastic finite-state models for these two subproblems. Stochastic finite-state models are efficiently able to learn from data, effective for decoding and are associated with a calculus for composing models which allows for tight integration of constraints ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improvement of non-negative matrix factorization based language model using exponential models

    Publication Year: 2001, Page(s):190 - 193
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (313 KB) | HTML iconHTML

    This paper describes the use of exponential models to improve non-negative matrix factorization (NMF) based topic language models for automatic speech recognition. This modeling technique borrows the basic idea from latent semantic analysis (LSA), which is typically used in information retrieval. An improvement was achieved when exponential models were used to estimate the a posteriori topic proba... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic sharings of Gaussian densities using phonetic features

    Publication Year: 2001, Page(s):425 - 428
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (357 KB) | HTML iconHTML

    This paper describes a way to adapt the recognizer to pronunciation variability by dynamically sharing Gaussian densities across phonetic models. The method is divided in three steps. First, given an input utterance, an HMM recognizer outputs a lattice of the most likely word hypotheses. Then, the canonical pronunciation of each hypothesis is checked by comparing its theoretical phonetic features ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech interfaces for mobile communications

    Publication Year: 2001, Page(s):93 - 95
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (239 KB) | HTML iconHTML

    This paper explains speech interfaces for mobile communication. Mobile interfaces have three important design rules: do not disturb the user's main task, work within the restrictions of user's ability, and minimize the resource requirements. Social acceptance is also important. In Japan, trial and regular services with speech interfaces in mobile environments have already been launched, but they a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Example-based query generation for spontaneous speech

    Publication Year: 2001, Page(s):268 - 271
    Cited by:  Papers (1)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (333 KB) | HTML iconHTML

    This paper proposes a new query generation method that is based on examples of human-to-human dialogue. Along with modeling the information flow in dialogue, a system for information retrieval in-car has been designed. The system refers to the dialogue corpus to find an example that is similar to input speech, and makes a query from the example. We also give the experimental results to show the ef... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High performance telephone bandwidth speaker independent continuous digit recognition

    Publication Year: 2001, Page(s):405 - 408
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (367 KB) | HTML iconHTML

    The development of a high-performance telephone-bandwidth speaker independent connected digit recognizer for Italian is described. The CSLU Speech Toolkit was used to develop and implement the hybrid ANN/HMM system, which is trained on context-dependent categories to account for coarticulatory variation. Various front-end processing and system architectures were compared and, when the best feature... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Acoustic factorisation

    Publication Year: 2001, Page(s):77 - 80
    Cited by:  Papers (24)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (367 KB) | HTML iconHTML

    This paper describes a new technique for training a speech recognition system on inhomogeneous training data. The proposed technique, acoustic factorisation, attempts to model explicitly all the factors that affect the acoustic signal. By explicitly modelling all the factors, the trained model set may be used in a more flexible fashion than in standard adaptive training schemes. Since an individua... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • n-gram and decision tree based language identification for written words

    Publication Year: 2001, Page(s):335 - 338
    Cited by:  Papers (8)  |  Patents (8)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (360 KB) | HTML iconHTML

    As the demand for multilingual speech recognizers increases, the development of systems which combine automatic language identification, language-specific pronunciation modeling and language-independent acoustic models becomes increasingly important. When the recognition grammar is dynamic and obtained directly from written text, the language associated with each grammar item has to be identified ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semantic modeling for dialog systems in a pattern recognition framework

    Publication Year: 2001, Page(s):284 - 287
    Cited by:  Patents (16)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (408 KB) | HTML iconHTML

    In this paper, we describe a multimodal dialog system based on the pattern recognition framework that has been successfully applied to automatic speech recognition. We treat the dialog problem as to recognize the optimal action based on the user's input and context. Analogous to the acoustic, pronunciation, and language models for speech recognition, the dialog system in this framework has languag... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust speaker clustering in eigenspace

    Publication Year: 2001, Page(s):57 - 60
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (340 KB) | HTML iconHTML

    We propose a speaker clustering scheme working in 'eigenspace'. Speaker models are transformed to a low-dimensional subspace using 'eigenvoices'. For the speaker clustering procedure, simple distance measures, e.g. Euclidean distance, can be applied. Moreover, clustering can be accomplished with base models (for eigenvoice projection) like Gaussian mixture models as well as conventional HMMs. In c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MATCH: multimodal access to city help

    Publication Year: 2001, Page(s):256 - 259
    Cited by:  Papers (2)  |  Patents (7)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (491 KB) | HTML iconHTML

    Interfaces to mobile information access devices need to allow users to interact using whichever mode or combination of modes are most appropriate, given their user preference, task at hand, and physical and social environment. This paper describes a multimodal application architecture which facilitates rapid prototyping of flexible next-generation multimodal interfaces. Our sample application MATC... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.