By Topic

Speech and Audio Processing, IEEE Transactions on

Issue 1 • Date Jan 1995

Filter Results

Displaying Results 1 - 10 of 10
  • DPCM system design for diversity systems with applications to packetized speech

    Publication Year: 1995 , Page(s): 48 - 58
    Cited by:  Papers (42)  |  Patents (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (804 KB)  

    Speech quality in packetized speech systems can degrade substantially when packets are lost. We consider the problem of DPCM system design for packetized speech systems. The problem is formulated as a multiple description problem and the problem of optimal selection of the encoder and decoder filters is addressed. We show that significant improvements in performance are obtained as compared to an earlier system proposed by Jayant and Christensen (1981). Further, we show that for a first-order Gauss-Markov source significant performance improvements can be obtained by using a second-order predictor instead of a first-order predictor View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive enhancement of Fourier spectra

    Publication Year: 1995 , Page(s): 35 - 39
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (372 KB)  

    An adaptive enhancement procedure is presented which emphasizes continuant spectral features such as formant frequencies, by imposing frequency and amplitude continuity constraints on a short-time Fourier representation of the speech signal. At each point in the time-frequency field, the direction of maximum energy correlation is determined by the angle of a linear window at which the energy density within it is closest in magnitude to the point under consideration. Weighted smoothing is then performed in that direction to enhance continuant features View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dual-channel iterative speech enhancement with constraints on an auditory-based spectrum

    Publication Year: 1995 , Page(s): 22 - 34
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1216 KB)  

    A new frequency-domain, constrained iterative algorithm is proposed for dual-channel speech enhancement. The dual-channel enhancement scheme is shown to follow the iterative expectation-maximization (EM) algorithm, resulting in a two-step dual-channel Wiener filtering scheme. A new technique for applying constraints during the EM iterations is developed so as to take advantage of the auditory properties of speech perception. An overriding goal is to enhance quality and at least maintain intelligibility of the estimated speech signal. Constraints are applied over time and iteration on mel-cepstral parameters which parametrize an auditory based spectrum. These constraints also adapt to changing speech characteristics over time with the aid of an adaptive boundary detector. Performance is demonstrated in three areas for speech degraded by additive white Gaussian noise, aircraft cockpit noise, and computer cooling-fan noise. First, global objective speech quality measures show improved quality when compared to unconstrained dual-channel Wiener filtering and a traditional LMS-based adaptive noise cancellation technique, over a range of signal-to-noise ratios and cross-talk levels. Second, time waveforms and frame-to-frame quality measures show good improvement, especially in unvoiced and transitional regions of speech. Informal listening tests confirm improvement in duality as measured by objective measures. Finally, objective measures classified over individual phonemes for a subset of sentences from the TIMIT speech database show a consistent and superior improvement in quality View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The challenge of spoken language systems: research directions for the nineties

    Publication Year: 1995 , Page(s): 1 - 21
    Cited by:  Papers (35)  |  Patents (30)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2276 KB)  

    A spoken language system combines speech recognition, natural language processing and human interface technology. It functions by recognizing the person's words, interpreting the sequence of words to obtain a meaning in terms of the application, and providing an appropriate response back to the user. Potential applications of spoken language systems range from simple tasks, such as retrieving information from an existing database (traffic reports, airline schedules), to interactive problem solving tasks involving complex planning and reasoning (travel planning, traffic routing), to support for multilingual interactions. We examine eight key areas in which basic research is needed to produce spoken language systems: (1) robust speech recognition; (2) automatic training and adaptation; (3) spontaneous speech; (4) dialogue models; (5) natural language response generation; (6) speech synthesis and speech generation; (7) multilingual systems; and (8) interactive multimodal systems. In each area, we identify key research challenges, the infrastructure needed to support research, and the expected benefits. We conclude by reviewing the need for multidisciplinary research, for development of shared corpora and related resources, for computational support and far rapid communication among researchers. The successful development of this technology will increase accessibility of computers to a wide range of users, will facilitate multinational communication and trade, and will create new research specialties and jobs in this rapidly expanding area View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Markov model-based phoneme class partitioning for improved constrained iterative speech enhancement

    Publication Year: 1995 , Page(s): 98 - 104
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (564 KB)  

    Research has shown that degrading acoustic background noise influences speech quality across phoneme classes in a nonuniform manner. This results in variable quality performance of many speech enhancement algorithms in noisy environments. A phoneme classification procedure is proposed which directs single-channel constrained speech enhancement. The procedure performs broad phoneme class partitioning of noisy speech frames using a continuous mixture hidden Markov model recognizer in conjunction with a perceptually motivated cost-based decision process. Once noisy speech frames are identified, iterative speech enhancement based on all-pole parameter estimation with inter- and intra-frame spectral constraints is employed. The phoneme class-directed enhancement algorithm is evaluated using TIMIT speech data and shown to result in substantial improvement in objective speech quality over a range of signal-to-noise ratios and individual phoneme classes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive postfiltering for quality enhancement of coded speech

    Publication Year: 1995 , Page(s): 59 - 71
    Cited by:  Papers (60)  |  Patents (64)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1336 KB)  

    An adaptive postfiltering algorithm for enhancing the perceptual quality of coded speech is presented. The postfilter consists of a long-term postfilter section in cascade with a short-term postfilter section and includes spectral tilt compensation and automatic gain control. The long-term section emphasizes pitch harmonics and attenuates the spectral valleys between pitch harmonics. The short-term section, on the other hand, emphasizes speech formants and attenuates the spectral valleys between formants. Both filter sections have poles and zeros. Unlike earlier postfilters that often introduced a substantial amount of muffling to the output speech, our postfilter significantly reduces this effect by minimizing the spectral tilt in its frequency response. As a result, this postfilter achieves noticeable noise reduction while introducing only minimal distortion in speech. The complexity of the postfilter is quite low. Variations of this postfilter are now being used in several national and international speech coding standards. This paper presents for the first time a complete description of our original postfiltering algorithm and the underlying ideas that motivated its development View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Voiced speech coding at very low bit rates based on forward-backward waveform prediction

    Publication Year: 1995 , Page(s): 40 - 47
    Cited by:  Papers (3)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (676 KB)  

    Techniques for coding voiced speech at very low bit rates are investigated and a new algorithm, designed to produce high quality speech with low complexity, is proposed. This algorithm encodes and transmits partial representative waveforms (RWs) from which the complete speech waveforms are reconstructed by using a method called forward-backward waveform prediction (FBWP). The RW is encoded at 20-30 ms intervals with a low complexity approach, taking into account the special initial conditions of short- and long-term filters. The basic idea of FBWP is essentially consistent with that of the prototype waveform interpolation (PWI) algorithm, which was reported to be capable of producing high-quality voiced speech at a bit rate of between 3.0 and 4.0 kb/s. By implementing the FBWP in the time domain, fast computation is thereby made possible while high-quality speech can be obtained at bit rate of about 3 kb/s. As in the PWI method, the proposed algorithm may be combined with an LP-based speech coder which uses a noise-like excitation to reproduce unvoiced speech View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust text-independent speaker identification using Gaussian mixture speaker models

    Publication Year: 1995 , Page(s): 72 - 83
    Cited by:  Papers (534)  |  Patents (73)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1144 KB)  

    This paper introduces and motivates the use of Gaussian mixture models (GMM) for robust text-independent speaker identification. The individual Gaussian components of a GMM are shown to represent some general speaker-dependent spectral shapes that are effective for modeling speaker identity. The focus of this work is on applications which require high identification rates using short utterance from unconstrained conversational speech and robustness to degradations produced by transmission over a telephone channel. A complete experimental evaluation of the Gaussian mixture speaker model is conducted on a 49 speaker, conversational telephone speech database. The experiments examine algorithmic issues (initialization, variance limiting, model order selection), spectral variability robustness techniques, large population performance, and comparisons to other speaker modeling techniques (uni-modal Gaussian, VQ codebook, tied Gaussian mixture, and radial basis functions). The Gaussian mixture speaker model attains 96.8% identification accuracy using 5 second clean speech utterances and 80.8% accuracy using 15 second telephone speech utterances with a 49 speaker population and is shown to outperform the other speaker modeling techniques on an identical 16 speaker telephone speech task View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech classification embedded in adaptive codebook search for low bit-rate CELP coding

    Publication Year: 1995 , Page(s): 94 - 98
    Cited by:  Patents (23)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (440 KB)  

    This correspondence proposes a new CELP coding method which embeds speech classification in adaptive codebook search. This approach can retain the synthesized speech quality at bit-rates below 4 kb/s. A pitch analyzer is designed to classify each frame by its periodicity, and with a finite-state machine, one of four states is determined. Then the adaptive codebook search scheme is switched according to the state. Simulation results show that higher SEGSNR and lower computation complexity can be achieved, and the pitch contour of the synthesized speech is smoother than that produced by conventional CELP coders View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluation of short-time spectral attenuation techniques for the restoration of musical recordings

    Publication Year: 1995 , Page(s): 84 - 93
    Cited by:  Papers (7)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (832 KB)  

    This paper deals with the application of short-time spectral attenuation techniques to the restoration of musical recordings degraded by background noise. Signal distortions induced by the restoration process are evaluated analytically, and their audibility is assessed on the basis of objective criteria. The results obtained highlight the influence of adjustable parameters (e.g., short-time frame duration or noise overestimation) on the quality of the restoration View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

Covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2005. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope