2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)

17-20 Sept. 2000

Filter Results

Displaying Results 1 - 25 of 56
  • 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421)

    Publication Year: 2000
    Request permission for commercial reuse | PDF file iconPDF (367 KB)
    Freely Available from IEEE
  • An overview of text-to-speech synthesis

    Publication Year: 2000
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (34 KB)

    Summary form only given. The article gives an overview of text-to-speech (TTS) technology and a description of some issues of potential interest to speech coding experts. After motivation for the use of TTS technology, it describes the general architecture of a text-to-speech system with particular emphasis on the speech synthesis component. Both formant synthesis and concatenative synthesis are p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Signal processing for cochlear implants and low-rate speech coding

    Publication Year: 2000
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (58 KB)

    Summary form only given. Cochlear implants are now established as a new option for individuals with profound (sensorineural) hearing impairment. Many of the cochlear implant patients are able to understand speech without lip-reading, and some can communicate over the phone. The success of cochlear implants can be attributed to the combined efforts of scientists from various disciplines including b... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Acoustic front-end processing for communication systems

    Publication Year: 2000
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (49 KB)

    Summary form only given. As communication systems have become more mobile and portable, we now have situations where audio communication in difficult acoustic environments is common. Speech coders at low bit rates tend to have problems with non-speech signals that are typically found in noisy acoustic environments. As a result, there can be degradation in the perceived audio quality for low-bit ra... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Trellis-based optimization of MPEG-4 advanced audio coding

    Publication Year: 2000, Page(s):142 - 144
    Cited by:  Papers (11)  |  Patents (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (339 KB)

    We outline a method to perform efficient low rate quantization for MPEG-4 advanced audio coding (AAC). The AAC bit stream consists of indices for quantized spectral coefficients as well as side information about quantizer step sizes and Huffman codebooks. The MPEG-4 Verification Model does not explicitly account for side information bits in its optimization and suffers from poor compression effici... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Publication Year: 2000, Page(s): 157
    Request permission for commercial reuse | PDF file iconPDF (56 KB)
    Freely Available from IEEE
  • Design and performance of a 4.0 kbit/s speech coder based on frequency-domain interpolation

    Publication Year: 2000, Page(s):8 - 10
    Cited by:  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (248 KB)

    The 4.0 kbit/s speech codec described is based on a frequency domain interpolative (FDI) coding technique, which belongs to the class of prototype waveform interpolation (PWI) coding techniques. The codec also has an integrated voice activity detector (VAD) and a noise reduction capability. The input signal is subjected to LPC analysis and the prediction residual is separated into a slowly evolvin... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Coding of spectral magnitudes using optimized linear transformations

    Publication Year: 2000, Page(s):5 - 7
    Cited by:  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (224 KB)

    This paper introduces a novel vector quantization (VQ) technique, wherein the quantized vector is obtained by applying a linear transformation selected from a first codebook to a codevector selected from a second codebook. The transformation is selected from a family of linear transformations, represented by a matrix codebook. Vectors in the second codebook are called residual codevectors. In orde... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A sinusoidal LPC vocoder

    Publication Year: 2000, Page(s):2 - 4
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (284 KB)

    Twenty years of work with sinusoidal modeling of speech has lead to very competitive principles of low rate coding. In this study, we discuss a few issues in the design of a sinusoidal coding system. We stress that by a careful design of all blocks of the encoder and decoder, allowing for some additional complexity, it is possible to build a low rate coder free of many of the artifacts associated ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An adaptive multi rate wideband speech codec with adaptive gain re-quantization

    Publication Year: 2000, Page(s):145 - 147
    Cited by:  Papers (1)  |  Patents (16)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (328 KB)

    This paper describes an adaptive multi-rate wideband (AMR-WB) speech codec proposed for the GSM system and also for the evolving third generation (3G) mobile speech services. The coder is a multi rate SB-CELP (subband-code excited linear prediction) with five modes operating at bit rates from 24 kbit/s down to 9.1 kbit/s. Our basic approach consists of an unequal band-splitting of the input signal... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis-by-synthesis voicing cut-off determination in harmonic coding

    Publication Year: 2000, Page(s):65 - 67
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (232 KB)

    In low bit-rate harmonic speech coding, voicing information is often specified by a cut-off frequency of the spectrum. Many approaches of cut-off estimation depend on spectral matching, where a fixed prototype spectrum is used to model voiced harmonics. However, voiced harmonics do not always show a regular shape. One of the causes is harmonic interference. We propose an analysis-by-synthesis voic... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Changes in voice quality judgments as a function of background noise level in the listening environment

    Publication Year: 2000, Page(s):26 - 28
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (228 KB)

    This study explores the extent to which differences in voice quality with different bit rates become less perceptible when users are listening in a noisy environment. The individual rate modes of two multi-rate codecs were rated by listeners in various background noise conditions, including a quiet baseline, crowd babble, street noise, factory noise, and two levels of car noise. The results sugges... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the perceptual weighting function for phase quantization of speech

    Publication Year: 2000, Page(s):62 - 64
    Cited by:  Papers (1)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (256 KB)

    This paper addresses the issue on the utilization of the perceptual characteristics of the human auditory system for the phase quantization of speech signals. Taking into account the phase quantization noise, we propose the perceptual weighting function to make the quantization noise below the threshold of human perception. The weighting function is derived from psychoacoustic experiments in which... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Predictive quantization of spectral amplitudes for harmonic coders

    Publication Year: 2000, Page(s):47 - 49
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (244 KB)

    We present a novel predictive vector quantization scheme for coding the variable-dimension spectral amplitude vectors produced by harmonic coders. The scheme has a safety-net prediction structure, but it uses analysis-by-synthesis codebook search. A “closed-loop” codebook design algorithm is devised. Significant improvement over conventional predictive VQ design methods is obtained View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Results on reverse water-filling, SNR, and log-spectral error in codebook-based coding

    Publication Year: 2000, Page(s):38 - 40
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (240 KB)

    This paper identifies optimum levels of reverse water-filling for codebook-based coding of noise and speech signals. We find that there is little to be gained from optimizing an effective rate parameter. We identify trade-offs between SNR and log-spectral error. We show that the use of a gain factor compares favorably with reverse water-filling in some situations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New objective measures for characterisation of noise suppression algorithms

    Publication Year: 2000, Page(s):23 - 25
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (244 KB)

    We present two new objective quality measures for the assessment of the performance of noise suppression (NS) algorithms. The signal-to-noise ratio improvement (SNRI) measure attempts to characterise the capability of an NS method to enhance the speech component of a noisy speech signal from an additive background noise. The SNRI measure includes a segmentation of the input speech signal into thre... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of an MPEG-4 general audio coder for improving speech quality

    Publication Year: 2000, Page(s):139 - 141
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (236 KB)

    This paper proposes a design for an ISO/IEC MPEG-4 general audio encoder to improve the speech quality at low bit rates. The main contributions to the improvement are using i) a higher sampling rate to get higher time resolution for a given frame length and ii) adaptive preprocessing to reduce the bandwidth. Listening tests at 8 and 16 kbit/s showed that compared with a conventional audio coder an... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Model based spectrum prediction

    Publication Year: 2000, Page(s):117 - 119
    Cited by:  Papers (15)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (264 KB)

    This paper presents methods for speech spectrum prediction based on Gaussian mixture models. Spectrum prediction may be useful in a packet transmission system where the sensitivity to packet losses is a major problem. Models of speech are trained by the expectation maximization algorithm using pairs, triples etc. of consecutive cepstral vectors. The models are used to design first, second etc. ord... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Very low rate speech coding using temporal decomposition and waveform interpolation

    Publication Year: 2000, Page(s):29 - 31
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (264 KB)

    In very low rate coding the aim is to accurately represent speech characteristics as efficiently as possible. High coding gains for the spectral features can be achieved through the use of temporal decomposition. Waveform interpolation coders accurately represent the excitation using characteristic waveforms (CWs) extracted at a constant rate. In this paper, the two approaches are combined into a ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploring the characteristics of analytic decomposition of speech signals

    Publication Year: 2000, Page(s):59 - 61
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (276 KB)

    This paper investigates the properties of analytic transformation of speech into envelope and phase functions. The envelope is shown to evolve slowly with the pitch of the input speech, whilst the phase consists of two components; one evolving slowly with pitch and another that exhibits a more rapid evolution. We investigate decomposing the phase component further using two distinct methods: (a) f... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A preprocessing method for perfect reconstruction WI coding

    Publication Year: 2000, Page(s):53 - 55
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (248 KB)

    Waveform interpolation (WI) coding with perfect reconstruction is a promising speech coding method with high-level speech quality. However, for efficient coding several pitch values have to be transmitted per frame to guarantee the required accuracy of the pitch contour. In this paper we propose a preprocessing algorithm which slightly modifies the residual signal such that its pitch period evolve... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient synthesis method for sinusoidal vocoders

    Publication Year: 2000, Page(s):44 - 46
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (260 KB)

    An FFT-based technique for synthesizing a “sum-of-sinusoids” signal is described. The technique is useful for efficient speech synthesis in sinusoidal-type vocoders. Each sinusoid to be synthesized is represented by a small number of FFT coefficients around the frequency of interest. Linear amplitude modulation of the sinusoid is approximated by convolution with a three-point sequence.... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Voicing detection in DAP-STC

    Publication Year: 2000, Page(s):35 - 37
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (248 KB)

    Sinusoidal transform coding (STC) requires an all-pole representation of spectra derived periodically from the short-term speech spectral envelope and a “voicing probability” frequency fv to divide each spectrum into two sub-bands: voiced below fv and unvoiced above fv. Discrete all-pole (DAP) modeling may be applied to STC to improve the accuracy of th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dichotic presentation of interleaving critical-band envelopes: an application to multi-descriptive coding

    Publication Year: 2000, Page(s):72 - 74
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (288 KB)

    A coding paradigm is proposed which is based solely on the properties of the human auditory system and does not assume any specific source properties. Hence, its performance is equally good for speech, noisy speech, and music signals. The signal decomposition in the proposed paradigm takes advantage of binaural properties of the human auditory system. This also leads to a natural multi-descriptive... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Application of multidimensional scaling to subjective evaluation of coded speech

    Publication Year: 2000, Page(s):20 - 22
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (244 KB)

    We propose a new procedure for subjective evaluation of coded speech. This procedure has the potential of providing an anchorable measure of quality that contains more information than the single number provided by MOS testing. A stimulus space and the relationship between this space and speech quality are established with multidimensional scaling techniques in a large-scale listening test. In the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.