By Topic

IEEE Transactions on Speech and Audio Processing

Issue 1 • Date Jan. 2001

Filter Results

Displaying Results 1 - 7 of 7
  • Why speech synthesis? (in memory of Prof. Jonathan Allen, 1934-2000) [Special issue intro.]

    Publication Year: 2001, Page(s):1 - 2
    Request permission for commercial reuse | PDF file iconPDF (14 KB) | HTML iconHTML
    Freely Available from IEEE
  • Concatenative synthesis based on a harmonic model

    Publication Year: 2001, Page(s):11 - 20
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (212 KB) | HTML iconHTML

    One of the most successful approaches to synthesizing speech, concatenative synthesis, combines recorded speech units to build full utterances. However, the prosody of the stored units is often not consistent with that of the target utterance and must be altered. Furthermore, several types of mismatch can occur at unit boundaries and must be smoothed. Thus, both pitch and time-scale modification t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Japanese TTS system based on multiform units and a speech modification algorithm with harmonics reconstruction

    Publication Year: 2001, Page(s):3 - 10
    Cited by:  Papers (13)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (200 KB) | HTML iconHTML

    This paper proposes a new text-to-speech (TTS) system that utilizes large numbers of speech segments to produce very natural and intelligible synthetic speech. There are two innovations; new multiform synthesis units and a new speech modification algorithm based on a vocoder that offers harmonics reconstruction. The multiform units make it possible to reduce acoustic discontinuities at concatenati... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Applying the harmonic plus noise model in concatenative speech synthesis

    Publication Year: 2001, Page(s):21 - 29
    Cited by:  Papers (104)  |  Patents (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (148 KB) | HTML iconHTML

    This paper describes the application of the harmonic plus noise model (HNM) for concatenative text-to-speech (TTS) synthesis. In the context of HNM, speech signals are represented as a time-varying harmonic component plus a modulated noise component. The decomposition of a speech signal into these two components allows for more natural-sounding modifications of the signal (e.g., by using different... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Control of spectral dynamics in concatenative speech synthesis

    Publication Year: 2001, Page(s):30 - 38
    Cited by:  Papers (21)  |  Patents (11)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (172 KB) | HTML iconHTML

    Current speech synthesis methods based on the concatenation of waveform units can produce highly intelligible speech capturing the identity of a particular speaker. However, the quality of concatenated speech often suffers from discontinuities between the acoustic units, due to contextual differences and variations in speaking style across the database. In this paper, we present methods to spectra... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing audible spectral discontinuities

    Publication Year: 2001, Page(s):39 - 51
    Cited by:  Papers (42)  |  Patents (53)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (420 KB) | HTML iconHTML

    A common problem in diphone synthesis is discussed, viz., the occurrence of audible discontinuities at diphone boundaries. Informal observations show that spectral mismatch is the most likely the clause of this phenomenon. We first set out to find an objective spectral measure for discontinuity. To this end, several spectral distance measures are related to the results of a listening experiment. T... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Statistical prosodic modeling: from corpus design to parameter estimation

    Publication Year: 2001, Page(s):52 - 66
    Cited by:  Papers (24)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (288 KB) | HTML iconHTML

    The increasing availability of carefully designed and collected speech corpora opens up new possibilities for the statistical estimation of formal multivariate prosodic models. At Apple Computer, statistical prosodic modeling exploits the Victoria corpus, created to broadly support ongoing speech synthesis research and development. This corpus is composed of five constituent parts, each designed t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

Covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2005. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope