30 Nov.-4 Dec. 2003
Filter Results
-
2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721)
Publication Year: 2003|
PDF (308 KB)
-
Technical reviewers
Publication Year: 2003, Page(s): iv|
PDF (27 KB)
-
Maximum entropy direct models for speech recognition
Publication Year: 2003, Page(s):1 - 6
Cited by: Papers (9) | Patents (1)Traditional statistical models for speech recognition have all been based on a Bayesian framework using generative models such as hidden Markov models (HMMs). The paper focuses on a new framework for speech recognition using maximum entropy direct modeling, where the probability of a state or word sequence given an observation sequence is computed directly from the model. In contrast to HMMs, feat... View full abstract»
-
Design of fast LVCSR systems
Publication Year: 2003, Page(s):7 - 12
Cited by: Papers (10)The paper describes the development of fast (less than 10 times real-time) large vocabulary continuous speech recognition (LVCSR) systems based on technology developed for unlimited runtime systems assembled for participation in recent DARPA/NIST LVCSR evaluations. A general system structure for 10 times real-time systems is proposed and two specific systems that have been built for broadcast news... View full abstract»
-
Support vector machines for segmental minimum Bayes risk decoding of continuous speech
Publication Year: 2003, Page(s):13 - 18
Cited by: Papers (10)Segmental minimum Bayes risk (SMBR) decoding involves the refinement of the search space into sequences of small sets of confusable words. We describe the application of support vector machines (SVMs) as discriminative models for the refined search spaces. We show that SVMs, which in their basic formulation are binary classifiers of fixed dimensional observations, can be used for continuous speech... View full abstract»
-
Speaker recognition using prosodic and lexical features
Publication Year: 2003, Page(s):19 - 24
Cited by: Papers (7)Conventional speaker recognition systems identify speakers by using spectral information from very short slices of speech. Such systems perform well (especially in quiet conditions), but fail to capture idiosyncratic longer-term patterns in a speaker's habitual speaking style, including duration and pausing patterns, intonation contours, and the use of particular phrases. We investigate the contri... View full abstract»
-
Recognizing emotions from student speech in tutoring dialogues
Publication Year: 2003, Page(s):25 - 30
Cited by: Papers (18)We investigate the automatic classification of student emotional states in a corpus of human-human spoken tutoring dialogues. We first annotated student turns in this corpus for negative, neutral and positive emotions. We then automatically extracted acoustic and prosodic features from the student speech, and compared the results of a variety of machine learning algorithms that use 8 different fea... View full abstract»
-
Voice signatures
Publication Year: 2003, Page(s):31 - 36
Cited by: Papers (32)Most current spoken-dialog systems only extract sequences of words from a speaker's voice. This largely ignores other useful information that can be inferred from speech such as gender, age, dialect, or emotion. These characteristics of a speaker's voice, voice signatures, whether static or dynamic, can be useful for speech mining applications or for the design of a natural spoken-dialog system. T... View full abstract»
-
Automatic model complexity control using marginalized discriminative growth functions
Publication Year: 2003, Page(s):37 - 42
Cited by: Papers (7)Designing a large vocabulary speech recognition system is a highly complex problem. Many techniques affect both the system complexity and recognition performance. Automatic complexity control criteria are needed to quickly predict the recognition performance ranking of systems with varying complexity, in order to select an optimal model structure with the minimum word error. In this paper a novel ... View full abstract»
-
Baum-Welch training for segment-based speech recognition
Publication Year: 2003, Page(s):43 - 48
Cited by: Papers (2)The use of segment-based features and segmentation networks in a segment-based speech recognizer complicates the probabilistic modeling because it alters the sample space of all possible segmentation paths and the feature observation space. This paper describes a novel Baum-Welch training algorithm for segment-based speech recognition which addresses these issues by an innovative use of finite-sta... View full abstract»
-
Two-stage continuous speech recognition using feature-based models: a preliminary study
Publication Year: 2003, Page(s):49 - 54
Cited by: Papers (4)In recent research, we have demonstrated that linguistic features can be used to improve speech recognition for an isolated vocabulary recognition task. This paper addresses two important new research problems in our attempts to build a two-stage speech recognition system using linguistic features. First, through a controlled study we show that our knowledge-driven feature sets perform competitive... View full abstract»
-
Word-selective training for speech recognition
Publication Year: 2003, Page(s):55 - 60
Cited by: Papers (2) | Patents (2)We previously proposed (Kamm and Meyer (2001, 2002)) a two-pronged approach to improve system performance by selective use of training data. We demonstrated a sentence-selective algorithm that, first, made effective use of the available humanly transcribed training data and, second, focused future human transcription effort on data that was more likely to improve system performance. We now extend ... View full abstract»
-
'Early recognition' of words in continuous speech
Publication Year: 2003, Page(s):61 - 66
Cited by: Patents (3)In this paper, we present an automatic speech recognition (ASR) system based on the combination of an automatic phone recogniser and a computational model of human speech recognition - SpeM - that is capable of computing 'word activations' during the recognition process, in addition to doing normal speech recognition, a task in which conventional ASR architectures only provide output after the end... View full abstract»
-
In search of optimal data selection for training of automatic speech recognition systems
Publication Year: 2003, Page(s):67 - 72
Cited by: Papers (8)This paper presents an extended study in the topic of optimal selection of speech data from a database for efficient training of ASR systems. We reconsider a method of optimal selection introduced in our previous work and introduce variosearch as an alternative selection method developed in order to find a representative sample of speech data with a simultaneous control of acoustical and statistic... View full abstract»
-
Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation
Publication Year: 2003, Page(s):73 - 76
Cited by: Papers (9) | Patents (10)In previous works, we introduced a special device (Non-Audible Murmur (NATM) microphone) able to detect very quietly uttered speech (murmur), which cannot be heard by listeners near the talker. Experimental results showed the efficiency of the device in NAM recognition. Using normal-speech monophone hidden Markov models (HMM) retrained with NAM data from a specific speaker, we could recognize NAM ... View full abstract»
-
Variational Bayesian approach for automatic generation of HMM topologies
Publication Year: 2003, Page(s):77 - 82
Cited by: Papers (2)We propose a new method of automatically creating non-uniform, context-dependent HMM topologies by using the variational Bayesian (VB) approach. The maximum likelihood (ML) criterion is generally used to create HMM topologies. However, it has an overfitting problem. Information criteria have been used to overcome this problem, but, theoretically, they cannot be applied to complicated models like H... View full abstract»
-
Slovenian large vocabulary speech recognition with data-driven models of inflectional morphology
Publication Year: 2003, Page(s):83 - 88The paper describes experiments in large vocabulary speech recognition of the highly inflective Slovenian language. The main problem of an inflective language is its high OOV (out-of-vocabulary) rate. To achieve a usable OOV rate, smaller modeling units (namely stems and endings) are used instead of words. Word decompositions are based on data-driven methods. Experiments with different-sized vocab... View full abstract»
-
Comparing NN paradigms in hybrid NN/HMM speech recognition using tied posteriors
Publication Year: 2003, Page(s):89 - 93Hybrid NN/HMM acoustic modeling is nowadays an established alternative approach in automatic speech recognition technology. A comparison of feedforward and recurrent neural network topologies integrated in the tied posteriors framework is presented. We give some insights in the training process of the networks estimating class posterior probabilities and show how the net's quality can be determine... View full abstract»
-
Phoneme-grapheme based speech recognition system
Publication Year: 2003, Page(s):94 - 98
Cited by: Papers (2)State-of-the-art ASR systems typically use phonemes as the subword units. We investigate a system where the word models are defined in-terms of two different subword units, i.e., phonemes and graphemes. We train models for both the subword units, and then perform decoding using either both or just one subword unit. We have studied this system for American English where there is weak correspondence... View full abstract»
-
Transcribing Mandarin broadcast news
Publication Year: 2003, Page(s):99 - 104
Cited by: Papers (4)The paper describes improvements to the LIMSI broadcast news transcription system for the Mandarin language in preparation for the DARPA/NIST Rich Transcription 2003 (RT'03) evaluation. The transcription system has been substantially updated to deal with the varied acoustic and linguistic characteristics of the RT'03 test conditions. The major improvements come from the use of lightly supervised a... View full abstract»
-
Recent advances in broadcast news transcription
Publication Year: 2003, Page(s):105 - 110
Cited by: Papers (12) | Patents (2)Th paper describes recent advances in the CU-HTK Broadcast News English (BN-E) transcription system and its performance in the DARPA/NIST Rich Transcription 2003 Speech-to-Text (RT-03) evaluation. Heteroscedastic linear discriminant analysis (HLDA) and discriminative training, which were previously developed in the context of the recognition of conversational telephone speech, have been successful... View full abstract»
-
Partial change accent models for accented Mandarin speech recognition
Publication Year: 2003, Page(s):111 - 116
Cited by: Papers (10) | Patents (2)Regional accents in Mandarin speech result mostly from partial phone changes due to the interlanguage system of non-native speakers. We propose partial change accent models based on accent-specific units with acoustic model reconstruction for accented Mandarin speech recognition. We use phonological rules of dialectical pronunciations together with likelihood ratio test to model actual accented va... View full abstract»
-
Pronunciation variation analysis based on acoustic and phonemic distance measures with application examples on Mandarin Chinese
Publication Year: 2003, Page(s):117 - 122
Cited by: Papers (8)In this paper, two conceptually different statistical distance metrics are defined and analyzed. First, the asymmetric acoustic distance measures how the acoustic property of one phoneme is close to that of another, and is defined here based on the Mahalanobis distance between two hidden Markov models. Second, the asymmetric phonemic distance measures how probable a phoneme is realized as another,... View full abstract»
-
Automatic pronunciation modelling for multiple non-native accents
Publication Year: 2003, Page(s):123 - 128
Cited by: Patents (11)This paper describes an automatic method for generating non-native pronunciations and its combination with speaker adaptation to solve the problem of a performance decrease if state-of-the-art speech recognisers are faced with non-native speech. Although being a data-driven approach it overcomes the problem of gathering accented speech data for deriving the non-native variants. It rather uses sole... View full abstract»
-
Improvements in English ASR for the MALACH project using syllable-centric models
Publication Year: 2003, Page(s):129 - 134
Cited by: Papers (5) | Patents (1)LVCSR systems have traditionally used phones as the basic acoustic unit for recognition. Syllable and other longer length units provide an efficient means for modeling long-term temporal dependencies in speech that are difficult to capture in a phone based recognition framework. However, it is well known that longer duration units suffer from training data sparsity problems since a large number of... View full abstract»