Automatic Speech Recognition and Understanding, 2001. ASRU '01. IEEE Workshop on

9-13 Dec. 2001

Filter Results

Displaying Results 1 - 25 of 112
  • 2001 IEEE Workshop on Automatic Speech Recognition and Understanding. ASRU 2001. Conference Proceedings (Cat. No.01EX544)

    Publication Year: 2001
    Request permission for commercial reuse | PDF file iconPDF (311 KB)
    Freely Available from IEEE
  • The symbiosis of DSP and speech recognition or an outsider's view of the inside

    Publication Year: 2001, Page(s):1 - 4
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (287 KB) | HTML iconHTML

    From an historical review of how we got to where we are now, we discuss the interrelationship between our system design objectives and goals, our modeling of the speech signal and its generation and parameterization, and the broadly developing DSP methodology. We take a critical look at some of the underlying assumptions in. our modeling to see if they may be limiting the performance that can be o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • VoiceXML 2.0 and the W3C speech interface framework

    Publication Year: 2001, Page(s):5 - 8
    Cited by:  Papers (1)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (179 KB) | HTML iconHTML

    The World Wide Web Voice Browser Working Group has released specifications for four integrated languages to developing speech applications: VoiceXML 2.0, Speech Synthesis Markup Language, Speech Recognition Grammar Markup Language, and Semantic Interpretation. These languages enable developers to specify quickly conversational speech Web applications that can be accessed by any telephone or cell p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Brancusi, neo-plasticism, and the art of designing speech-recognition applications

    Publication Year: 2001, Page(s):9 - 14
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (343 KB) | HTML iconHTML

    Designing over-the-phone speech-recognition systems requires that designers have a design methodology and philosophy that enables them to understand how to research, design, evaluate and re-design their application. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive training for robust ASR

    Publication Year: 2001, Page(s):15 - 20
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (450 KB) | HTML iconHTML

    Adaptive training is a powerful training technique for building speech recognition systems on nonhomogeneous data. The aim is to remove unwanted variability, such as changes in speaker, channel or acoustic environment, from desired changes, the acoustic differences between words. During training, two sets of models are generated: a canonical model set for the desired "true" variability of the spee... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Histogram based normalization in the acoustic feature space

    Publication Year: 2001, Page(s):21 - 24
    Cited by:  Papers (23)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (319 KB) | HTML iconHTML

    We describe a technique called histogram normalization that aims at normalizing feature space distributions at different stages in the signal analysis front-end, namely the log-compressed filterbank vectors, cepstrum coefficients, and LDA (local density approximation) transformed acoustic vectors. Best results are obtained at the filterbank, and in most cases there is a minor additional gain when ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An improved union model for continuous speech recognition with partial duration corruption

    Publication Year: 2001, Page(s):25 - 28
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (313 KB) | HTML iconHTML

    The probabilistic union model is improved for continuous speech recognition involving partial duration corruption, assuming no knowledge about the corrupting noise. The new developments include: an n-best rescoring strategy for union based continuous speech recognition; a dynamic segmentation algorithm for reducing the number of corrupted segments in the union model; a combination of the union mod... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A comparative study of model-based adaptation techniques for a compact speech recognizer

    Publication Year: 2001, Page(s):29 - 32
    Cited by:  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (299 KB) | HTML iconHTML

    Many techniques for speaker adaptation have been successfully applied to automatic speech recognition. This paper compares the performance of several adaptation methods with respect to their memory need and processing demand. For adaptation of a compact acoustic model with 4k densities, eigenvoices and structural MAP (SMAP) are investigated next to the well-known techniques of MAP (maximum a poste... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gaussian mixture models of phonetic boundaries for speech recognition

    Publication Year: 2001, Page(s):33 - 36
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (242 KB) | HTML iconHTML

    A new approach to represent temporal correlation in an automatic speech recognition system is described. It introduces an acoustic feature set that captures the dynamics of a speech signal at the phoneme boundaries in combination with the traditional acoustic feature set representing the periods that are assumed to be quasi-stationary of speech. This newly introduced feature set represents an obse... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiple time resolutions for derivatives of Mel-frequency cepstral coefficients

    Publication Year: 2001, Page(s):37 - 40
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (275 KB) | HTML iconHTML

    Most speech recognition systems are based on Mel-frequency cepstral coefficients and their first- and second-order derivatives. The derivatives are normally approximated by fitting a linear regression line to a fixed-length segment of consecutive frames. The time resolution and smoothness of the estimated derivative depends on the length of the segment. We present an approach to improve the repres... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Continuous multi-band speech recognition using Bayesian networks

    Publication Year: 2001, Page(s):41 - 44
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (280 KB) | HTML iconHTML

    Using the Bayesian networks framework, we present a new multi-band approach for continuous speech recognition. This new approach has the advantage of overcoming all the limitations of the standard multi-band techniques. Moreover, it leads to a higher fidelity speech modeling than HMMs. We provide a preliminary evaluation of the performance of our new approach on a connected digits recognition task... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Trend tying in the segmental-feature HMM

    Publication Year: 2001, Page(s):45 - 48
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (245 KB) | HTML iconHTML

    We present a reduction method for the number of parameters in a segmental-feature HMM (SFHMM). If the SFHMM shows better results than the CHMM, the number of parameters is greater than that of the CHMM. Therefore, there is a need for a new approach that reduces the number of parameters. In general, the trajectory can be separated by the trend and location. Since the trend means the variation of se... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition

    Publication Year: 2001, Page(s):49 - 52
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (305 KB) | HTML iconHTML

    Although Mel-frequency cepstral coefficients (MFCC) have been proven to perform very well under most conditions, some limited efforts have been made in optimizing the shape of the filters in the filter-bank in the conventional MFCC approach. This paper presents a new feature extraction approach that designs the shapes of the filters in the filter-bank. In this new approach, the filter-bank coeffic... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved pronunciation modelling by inverse word frequency and pronunciation entropy

    Publication Year: 2001, Page(s):53 - 56
    Cited by:  Papers (1)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (300 KB) | HTML iconHTML

    We propose a new approach to rank the potential pronunciations for each word by their pronunciation frequency and inverse word frequency (pf-iwf) weights. The pronunciation set obtained in this way can then be pruned with different criteria. This approach not only considers the frequencies of occurrence of the pronunciations, but tries to minimize the extra confusion which may be introduced by pro... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust speaker clustering in eigenspace

    Publication Year: 2001, Page(s):57 - 60
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (271 KB) | HTML iconHTML

    We propose a speaker clustering scheme working in 'eigenspace'. Speaker models are transformed to a low-dimensional subspace using 'eigenvoices'. For the speaker clustering procedure, simple distance measures, e.g. Euclidean distance, can be applied. Moreover, clustering can be accomplished with base models (for eigenvoice projection) like Gaussian mixture models as well as conventional HMMs. In c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speaker-trained recognition using allophonic enrollment models

    Publication Year: 2001, Page(s):61 - 64
    Cited by:  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (243 KB) | HTML iconHTML

    We introduce a method for performing speaker-trained recognition based on context-dependent allophone models from a large-vocabulary, speaker-independent recognition system. A set of speaker-enrollment templates is selected from the context-dependent allophone models. These templates are used to build representations of the speaker-enrolled utterances. The advantages of this approach include impro... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech recognition using advanced HMM2 features

    Publication Year: 2001, Page(s):65 - 68
    Cited by:  Papers (5)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (325 KB) | HTML iconHTML

    HMM2 is a particular hidden Markov model where state emission probabilities of the temporal (primary) HMM are modeled through (secondary) state-dependent frequency-based HMMs (see Weber, K. et al., Proc. ICSGP, vol.III, p.147-50, 2000). As we show in another paper (see Weber et al., Proc. Eurospeech, Sep. 2001), a secondary HMM can also be used to extract robust ASR features. Here, we further inve... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Construction of model-space constraints

    Publication Year: 2001, Page(s):69 - 72
    Cited by:  Papers (29)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (240 KB) | HTML iconHTML

    HMM systems exhibit a large amount of redundancy. To this end, a technique called eigenvoices was found to be very effective for speaker adaptation. The correlation between HMM parameters is exploited via a linear constraint called eigenspace. This constraint is obtained through a PCA of the training speakers. We show how PCA can be linked to the maximum-likelihood criterion. Then, we extend the m... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Eliminating inter-speaker variability prior to discriminant transforms

    Publication Year: 2001, Page(s):73 - 76
    Cited by:  Papers (4)  |  Patents (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (265 KB) | HTML iconHTML

    This paper shows the impact of speaker normalization techniques, such as vocal tract length normalization (VTLN) and speaker-adaptive training (SAT), prior to discriminant feature space transforms, such as LDA (linear discriminant analysis). We demonstrate that removing the inter-speaker variability by using speaker compensation methods results in improved discrimination as measured by the LDA eig... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Acoustic factorisation

    Publication Year: 2001, Page(s):77 - 80
    Cited by:  Papers (24)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (302 KB) | HTML iconHTML

    This paper describes a new technique for training a speech recognition system on inhomogeneous training data. The proposed technique, acoustic factorisation, attempts to model explicitly all the factors that affect the acoustic signal. By explicitly modelling all the factors, the trained model set may be used in a more flexible fashion than in standard adaptive training schemes. Since an individua... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recursive noise estimation using iterative stochastic approximation for stereo-based robust speech recognition

    Publication Year: 2001, Page(s):81 - 84
    Cited by:  Papers (3)  |  Patents (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (282 KB) | HTML iconHTML

    We present an algorithm for recursive estimation of parameters in a mildly nonlinear model involving incomplete data. In particular, we focus on the time-varying deterministic parameters of additive noise in the nonlinear model. For the nonstationary noise that we encounter in robust speech recognition, different observation data segments correspond to different noise parameter values. Hence, recu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ubiquitous speech communication interface

    Publication Year: 2001, Page(s):85 - 92
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (435 KB) | HTML iconHTML

    The Holy Grail of telecommunication is to bring people thousands miles apart, anytime, anywhere, together to communicate as if they were having a face-to-face conversation in a ubiquitous telepresence scenario. One key component necessary to reach this Holy Grail is the technology that supports hands-free speech communication. Hands-free telecommunication (both telephony and teleconferencing) refe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech interfaces for mobile communications

    Publication Year: 2001, Page(s):93 - 95
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (143 KB) | HTML iconHTML

    This paper explains speech interfaces for mobile communication. Mobile interfaces have three important design rules: do not disturb the user's main task, work within the restrictions of user's ability, and minimize the resource requirements. Social acceptance is also important. In Japan, trial and regular services with speech interfaces in mobile environments have already been launched, but they a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ASR in portable wireless devices

    Publication Year: 2001, Page(s):96 - 102
    Cited by:  Papers (7)  |  Patents (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (507 KB) | HTML iconHTML

    This paper discusses the applicability and role of automatic speech recognition in portable wireless devices. Due to the author's background, the viewpoints are somewhat biased to mobile telephones, but many of the aspects are nevertheless common for other portable devices as well. While still dominated by the speaker-dependent technology, there are today signs that also in wireless devices, there... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluating long-term spectral subtraction for reverberant ASR

    Publication Year: 2001, Page(s):103 - 106
    Cited by:  Papers (16)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (280 KB) | HTML iconHTML

    Even a modest degree of room reverberation can greatly increase the difficulty of automatic speech recognition. We have observed large increases in speech recognition word error rates when using a far-field (3-6 feet) microphone in a conference room, in comparison with recordings from head-mounted microphones. In this paper, we describe experiments with a proposed remedy based on the subtraction o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.