IEEE Transactions on Audio, Speech, and Language Processing

Issue 6 • Aug. 2007

Filter Results

Displaying Results 1 - 24 of 24
  • Table of contents

    Publication Year: 2007, Page(s):C1 - C4
    Request permission for commercial reuse | PDF file iconPDF (42 KB)
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing publication information

    Publication Year: 2007, Page(s): C2
    Request permission for commercial reuse | PDF file iconPDF (36 KB)
    Freely Available from IEEE
  • Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors

    Publication Year: 2007, Page(s):1741 - 1752
    Cited by:  Papers (113)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (742 KB) | HTML iconHTML

    This paper considers techniques for single-channel speech enhancement based on the discrete Fourier transform (DFT). Specifically, we derive minimum mean-square error (MMSE) estimators of speech DFT coefficient magnitudes as well as of complex-valued DFT coefficients based on two classes of generalized gamma distributions, under an additive Gaussian noise assumption. The resulting generalized DFT ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Audible Noise Reduction in Eigendomain for Speech Enhancement

    Publication Year: 2007, Page(s):1753 - 1765
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (3034 KB) | HTML iconHTML

    A signal subspace scheme based on masking properties is proposed for enhancement of speech degraded by additive noise. Since the masking properties are related to the critical frequency band that is derived from the characteristics of human cochlea, the incorporation of masking threshold into a subspace technique requires the transformation between the frequency and eigen domains. We present and a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Soft Mask Methods for Single-Channel Speaker Separation

    Publication Year: 2007, Page(s):1766 - 1776
    Cited by:  Papers (53)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (938 KB) | HTML iconHTML

    The problem of single-channel speaker separation attempts to extract a speech signal uttered by the speaker of interest from a signal containing a mixture of acoustic signals. Most algorithms that deal with this problem are based on masking, wherein unreliable frequency components from the mixed signal spectrogram are suppressed, and the reliable components are inverted to obtain the speech signal... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Combined Feedback and Noise Suppression in Hearing Aids

    Publication Year: 2007, Page(s):1777 - 1790
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (746 KB) | HTML iconHTML

    In this paper, solutions for combined feedback and noise suppression in hearing aids are developed. The techniques presented are based on the generalized sidelobe canceller (GSC) and adaptive feedback canceller (AFC), with a prediction error method (PEM) adaptation to avoid speech distortion. Two possible cascades of GSC-based noise reduction and AFC, namely an ldquoAFC firstrdquo and a ldquoGSC f... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dereverberation and Denoising Using Multichannel Linear Prediction

    Publication Year: 2007, Page(s):1791 - 1801
    Cited by:  Papers (13)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1119 KB) | HTML iconHTML

    Reverberation in a room severely degrades the characteristics and auditory quality of speech captured by distant microphones, thus posing a severe problem for many speech applications. Several dereverberation techniques have been proposed with a view to solving this problem. There are, however, few reports of dereverberation methods working under noisy conditions. In this paper, we propose an exte... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech Analysis in a Model of the Central Auditory System

    Publication Year: 2007, Page(s):1802 - 1817
    Cited by:  Papers (20)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1420 KB) | HTML iconHTML

    Recently, there is a significant increase in research interest in the area of biologically inspired systems, which, in the context of speech communications, attempt to learn from human's auditory perception and cognition capabilities so as to derive the knowledge and benefits currently unavailable in practice. One particular pursuit is to understand why the human auditory system generally performs... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Batch and Online Underdetermined Source Separation Using Laplacian Mixture Models

    Publication Year: 2007, Page(s):1818 - 1832
    Cited by:  Papers (16)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1275 KB) | HTML iconHTML

    In this paper, we explore the problem of sound source separation and identification from a two-sensor instantaneous mixture. The estimation of the mixing and the sources is performed using Laplacian mixture models (LMM). The proposed algorithm fits the model using batch processing of the observed data and performs separation using either a hard or a soft decision scheme. An extension of the algori... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Virtual Head Driven by Music Expressivity

    Publication Year: 2007, Page(s):1833 - 1841
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (702 KB) | HTML iconHTML

    In this paper, we present a system that visualizes the expressive quality of a music performance using a virtual head. We provide a mapping through several parameter spaces: on the input side, we have elaborated a mapping between values of acoustic cues and emotion as well as expressivity parameters; on the output side, we propose a mapping between these parameters and the behaviors of the virtual... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust Speech Feature Extraction by Growth Transformation in Reproducing Kernel Hilbert Space

    Publication Year: 2007, Page(s):1842 - 1849
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1109 KB) | HTML iconHTML

    The performance of speech recognition systems depends on consistent quality of the speech features across variable environmental conditions encountered during training and evaluation. This paper presents a kernel-based nonlinear predictive coding procedure that yields speech features which are robust to nonstationary noise contaminating the speech signal. Features maximally insensitive to additive... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Switching Linear Dynamical Systems for Noise Robust Speech Recognition

    Publication Year: 2007, Page(s):1850 - 1858
    Cited by:  Papers (25)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (415 KB) | HTML iconHTML

    Real world applications such as hands-free dialling in cars may have to deal with potentially very noisy environments. Existing state-of-the-art solutions to this problem use feature-based HMMs, with a preprocessing stage to clean the noisy signal. However, the effect that raw signal noise has on the induced HMM features is poorly understood, and limits the performance of the HMM system. An altern... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Speaker Change Detection Using Adapted Gaussian Mixture Models

    Publication Year: 2007, Page(s):1859 - 1869
    Cited by:  Papers (15)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (473 KB) | HTML iconHTML

    A new approach to speaker change detection is proposed and investigated. The method, which is based on a probabilistic framework, provides an effective means for tackling the problem posed by phonetic variation in high-resolution speaker change detection. Additionally, the approach incorporates the capability for dealing with undesired effects of variations in speech characteristics. Using the exp... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Latent Prosody Analysis for Robust Speaker Identification

    Publication Year: 2007, Page(s):1870 - 1883
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1526 KB) | HTML iconHTML

    Handsets that are not seen in the training phase (unseen handsets) are significant sources of performance degradation for speaker identification (SID) applications in the telecommunication environment. In this paper, a novel latent prosody analysis (LPA) approach to automatically extract the most discriminative prosodic cues for assisting in conventional spectral feature-based SID is proposed. The... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discrimination Power of Vocal Source and Vocal Tract Related Features for Speaker Segmentation

    Publication Year: 2007, Page(s):1884 - 1892
    Cited by:  Papers (14)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (861 KB) | HTML iconHTML

    This paper presents an analysis of the speaker discrimination power of vocal source related features, in comparison to the conventional vocal tract related features. The vocal source features, named wavelet octave coefficients of residues (WOCOR), are extracted by pitch-synchronous wavelet transform of the linear predictive (LP) residual signals. Using a series of controlled experiments, it is sho... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Cohort-Based Speaker Model Synthesis for Mismatched Channels in Speaker Verification

    Publication Year: 2007, Page(s):1893 - 1903
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1194 KB) | HTML iconHTML

    Mismatch between enrollment and test data is one of the top performance degrading factors in speaker recognition applications. This mismatch is particularly true over public telephone networks, where input speech data is collected over different handsets and transmitted over different channels from one trial to the next. In this paper, a cohort-based speaker model synthesis (SMS) algorithm, design... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic Prosodic Variations Modeling for Language and Dialect Discrimination

    Publication Year: 2007, Page(s):1904 - 1911
    Cited by:  Papers (16)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1020 KB) | HTML iconHTML

    This paper addresses the problem of modeling prosody for language identification. The aim is to create a system that can be used prior to any linguistic work to show if prosodic differences among languages or dialects can be automatically determined. In previous papers, we defined a prosodic unit, the pseudosyllable. Rhythmic modeling has proven the relevance of the pseudosyllable unit for automat... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Kneser–Ney Smoothing With a Correcting Transformation for Small Data Sets

    Publication Year: 2007, Page(s):1912 - 1921
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (439 KB) | HTML iconHTML

    We present a technique which improves the Kneser-Ney smoothing algorithm on small data sets for bigrams, and we develop a numerical algorithm which computes the parameters for the heuristic formula with a correction. We give motivation for the formula with correction on a simple example. Using the same example, we show the possible difficulties one may run into with the numerical algorithm. Applyi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Replacement Attack

    Publication Year: 2007, Page(s):1922 - 1931
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1051 KB) | HTML iconHTML

    Billions of dollars allegedly lost to piracy of multimedia have recently triggered the industry to rethink the way music and movies are distributed. As encryption is vulnerable to rerecording, currently all copyright protection mechanisms tend to rely on watermarking. A watermark is an imperceptive secret hidden in a host signal. In this paper, we analyze the security of multimedia copyright prote... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bayesian Adaptive Inference and Adaptive Training

    Publication Year: 2007, Page(s):1932 - 1943
    Cited by:  Papers (9)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (482 KB) | HTML iconHTML

    Large-vocabulary speech recognition systems are often built using found data, such as broadcast news. In contrast to carefully collected data, found data normally contains multiple acoustic conditions, such as speaker or environmental noise. Adaptive training is a powerful approach to build systems on such data. Here, transforms are used to represent the different acoustic conditions, and then a c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Audio, Speech, and Language Processing Edics

    Publication Year: 2007, Page(s):1944 - 1945
    Request permission for commercial reuse | PDF file iconPDF (30 KB)
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing Information for authors

    Publication Year: 2007, Page(s):1946 - 1947
    Request permission for commercial reuse | PDF file iconPDF (45 KB)
    Freely Available from IEEE
  • Special issue on Genomic and Proteomic Signal Processing

    Publication Year: 2007, Page(s): 1948
    Request permission for commercial reuse | PDF file iconPDF (128 KB)
    Freely Available from IEEE
  • IEEE Signal Processing Society Information

    Publication Year: 2007, Page(s): C3
    Request permission for commercial reuse | PDF file iconPDF (31 KB)
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2013. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Li Deng
Microsoft Research