IEEE Transactions on Audio, Speech, and Language Processing

Filter Results

Displaying Results 1 - 25 of 30

Publication Year: 2007, Page(s):C1 - C4
| PDF (48 KB)
• IEEE Transactions on Audio, Speech, and Language Processing publication information

Publication Year: 2007, Page(s): C2
| PDF (36 KB)
• Introduction to the Special Section on Blind Signal Processing for Speech and Audio Applications

Publication Year: 2007, Page(s):1509 - 1510
Cited by:  Papers (1)
| PDF (490 KB) | HTML
• Spatio–Temporal FastICA Algorithms for the Blind Separation of Convolutive Mixtures

Publication Year: 2007, Page(s):1511 - 1520
Cited by:  Papers (42)
| | PDF (960 KB) | HTML

This paper derives two spatio-temporal extensions of the well-known FastICA algorithm of Hyvarinen and Oja that are applicable to the convolutive blind source separation task. Our time-domain algorithms combine multichannel spatio-temporal prewhitening via multistage least-squares linear prediction with novel adaptive procedures that impose paraunitary constraints on the multichannel separation fi... View full abstract»

• On the Assumption of Spherical Symmetry and Sparseness for the Frequency-Domain Speech Model

Publication Year: 2007, Page(s):1521 - 1528
Cited by:  Papers (12)
| | PDF (1476 KB) | HTML

A new independent component analysis (ICA) formulation called independent vector analysis (IVA) was proposed in order to solve the permutation problem in convolutive blind source separation (BSS). Instead of running ICA in each frequency bin separately and correcting the disorder with an additional algorithmic scheme afterwards, IVA exploited the dependency among the frequency components of a sour... View full abstract»

• Blind Acoustic Beamforming Based on Generalized Eigenvalue Decomposition

Publication Year: 2007, Page(s):1529 - 1539
Cited by:  Papers (23)  |  Patents (4)
| | PDF (578 KB) | HTML

Maximizing the output signal-to-noise ratio (SNR) of a sensor array in the presence of spatially colored noise leads to a generalized eigenvalue problem. While this approach has extensively been employed in narrowband (antenna) array beamforming, it is typically not used for broadband (microphone) array beamforming due to the uncontrolled amount of speech distortion introduced by a narrowband SNR ... View full abstract»

• Blind Separation of Underdetermined Convolutive Mixtures Using Their Time–Frequency Representation

Publication Year: 2007, Page(s):1540 - 1550
Cited by:  Papers (20)
| | PDF (1515 KB) | HTML

This paper considers the blind separation of nonstationary sources in the underdetermined convolutive mixture case. We introduce, two methods based on the sparsity assumption of the sources in the time-frequency (TF) domain. The first one assumes that the sources are disjoint in the TF domain, i.e., there is at most one source signal present at a given point in the TF domain. In the second method,... View full abstract»

• Convolutive Blind Source Separation in the Frequency Domain Based on Sparse Representation

Publication Year: 2007, Page(s):1551 - 1563
Cited by:  Papers (42)
| | PDF (637 KB) | HTML

Convolutive blind source separation (CBSS) that exploits the sparsity of source signals in the frequency domain is addressed in this paper. We assume the sources follow complex Laplacian-like distribution for complex random variable, in which the real part and imaginary part of complex-valued source signals are not necessarily independent. Based on the maximum a posteriori (MAP) criterion, we prop... View full abstract»

• Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs

Publication Year: 2007, Page(s):1564 - 1578
Cited by:  Papers (63)  |  Patents (1)
| | PDF (882 KB) | HTML

Probabilistic approaches can offer satisfactory solutions to source separation with a single channel, provided that the models of the sources match accurately the statistical properties of the mixed signals. However, it is not always possible to train such models. To overcome this problem, we propose to resort to an adaptation scheme for adjusting the source models with respect to the actual prope... View full abstract»

• Robust Speech Dereverberation Using Multichannel Blind Deconvolution With Spectral Subtraction

Publication Year: 2007, Page(s):1579 - 1591
Cited by:  Papers (32)  |  Patents (3)
| | PDF (1327 KB) | HTML

A robust dereverberation method is presented for speech enhancement in a situation requiring adaptation where a speaker shifts his/her head under reverberant conditions causing the impulse responses to change frequently. We combine correlation-based blind deconvolution with modified spectral subtraction to improve the quality of inverse-filtered speech degraded by the estimation error of inverse f... View full abstract»

• Grouping Separated Frequency Components by Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation

Publication Year: 2007, Page(s):1592 - 1604
Cited by:  Papers (64)  |  Patents (2)
| | PDF (1046 KB) | HTML

This paper proposes a new formulation and optimization procedure for grouping frequency components in frequency-domain blind source separation (BSS). We adopt two separation techniques, independent component analysis (ICA) and time-frequency (T-F) masking, for the frequency-domain BSS. With ICA, grouping the frequency components corresponds to aligning the permutation ambiguity of the ICA solution... View full abstract»

• Content-Dependent Watermarking Scheme in Compressed Speech With Identifying Manner and Location of Attacks

Publication Year: 2007, Page(s):1605 - 1616
Cited by:  Papers (20)
| | PDF (733 KB) | HTML

As speech compression technologies have advanced, digital recording devices have become increasingly popular. However, data formats used in popular speech codecs are known a priori, such that compressed data can be modified easily via insertion, deletion, and replacement. This work proposes a content-dependent watermarking scheme suitable for codebook-excited linear prediction (CELP)-based speech ... View full abstract»

• On Growing and Pruning Kneser–Ney Smoothed $N$-Gram Models

Publication Year: 2007, Page(s):1617 - 1624
Cited by:  Papers (18)
| | PDF (476 KB) | HTML

N-gram models are the most widely used language models in large vocabulary continuous speech recognition. Since the size of the model grows rapidly with respect to the model order and available training data, many methods have been proposed for pruning the least relevant -grams from the model. However, correct smoothing of the N-gram probability distributions is important and performance may degra... View full abstract»

• Enhancing the Tracking of Partials for the Sinusoidal Modeling of Polyphonic Sounds

Publication Year: 2007, Page(s):1625 - 1634
Cited by:  Papers (10)
| | PDF (544 KB) | HTML

This paper addresses the problem of tracking partials, i.e., determining the evolution over time of the parameters of a given number of sinusoids with respect to the analyzed audio stream. We first show that the minimal frequency difference heuristic generally used to identify continuities between local maxima of successive short-time spectra can be successfully generalized using the linear predic... View full abstract»

• Joint High-Resolution Fundamental Frequency and Order Estimation

Publication Year: 2007, Page(s):1635 - 1644
Cited by:  Papers (50)
| | PDF (1493 KB) | HTML

In this paper, we present a novel method for joint estimation of the fundamental frequency and order of a set of harmonically related sinusoids based on the multiple signal classification (MUSIC) estimation criterion. The presented method, termed HMUSIC, is shown to have an efficient implementation using fast Fourier transforms (FFTs). Furthermore, refined estimates can be obtained using a gradien... View full abstract»

• Real-Time Signal Estimation From Modified Short-Time Fourier Transform Magnitude Spectra

Publication Year: 2007, Page(s):1645 - 1653
Cited by:  Papers (24)  |  Patents (1)
| | PDF (692 KB) | HTML

An algorithm for estimating signals from short-time magnitude spectra is introduced offering a significant improvement in quality and efficiency over current methods. The key issue is how to invert a sequence of overlapping magnitude spectra (a ldquospectrogramrdquo) containing no phase information to generate a real-valued signal free of audible artifacts. Also important is that the algorithm per... View full abstract»

• Temporal Feature Integration for Music Genre Classification

Publication Year: 2007, Page(s):1654 - 1664
Cited by:  Papers (57)
| | PDF (889 KB) | HTML

Temporal feature integration is the process of combining all the feature vectors in a time window into a single feature vector in order to capture the relevant temporal information in the window. The mean and variance along the temporal dimension are often used for temporal feature integration, but they capture neither the temporal dynamics nor dependencies among the individual feature dimensions.... View full abstract»

Publication Year: 2007, Page(s):1665 - 1680
Cited by:  Papers (26)
| | PDF (827 KB) | HTML

This paper indicates that an appropriate design of metric leads to significant improvements in the adaptive projected subgradient method (APSM), which unifies a wide range of projection-based algorithms [including normalized least mean square (NLMS) and affine projection algorithm (APA)]. The key is to incorporate a priori (or a posteriori) information on characteristics of an estimandum, a system... View full abstract»

• Selective-Tap Adaptive Filtering With Performance Analysis for Identification of Time-Varying Systems

Publication Year: 2007, Page(s):1681 - 1695
Cited by:  Papers (30)
| | PDF (679 KB) | HTML

Selective-tap algorithms employing the MMax tap selection criterion were originally proposed for low-complexity adaptive filtering. The concept has recently been extended to multichannel adaptive filtering and applied to stereophonic acoustic echo cancellation. This paper first briefly reviews least mean square versions of MMax selective-tap adaptive filtering and then introduces new recursive lea... View full abstract»

• Short-Term Spatio–Temporal Clustering Applied to Multiple Moving Speakers

Publication Year: 2007, Page(s):1696 - 1710
Cited by:  Papers (10)
| | PDF (1342 KB) | HTML

Distant microphones permit to process spontaneous multiparty speech with very little constraints on speakers, as opposed to close-talking microphones. Minimizing the constraints on speakers permits a large diversity of applications, including meeting summarization and browsing, surveillance, hearing aids, and more natural human-machine interaction. Such applications of distant microphones require ... View full abstract»

• Robust Speaker Recognition in Noisy Conditions

Publication Year: 2007, Page(s):1711 - 1723
Cited by:  Papers (95)  |  Patents (2)
| | PDF (3578 KB) | HTML

This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise, but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of speaker recognition technologies on handheld devices or the Internet. While the technologies promise a... View full abstract»

• Noise-Robust Automatic Speech Recognition Using a Predictive Echo State Network

Publication Year: 2007, Page(s):1724 - 1730
Cited by:  Papers (39)  |  Patents (2)
| | PDF (212 KB) | HTML

Artificial neural networks have been shown to perform well in automatic speech recognition (ASR) tasks, although their complexity and excessive computational costs have limited their use. Recently, a recurrent neural network with simplified training, the echo state network (ESN), was introduced by Jaeger and shown to outperform conventional methods in time series prediction experiments. We created... View full abstract»

• Comments on Vocal Tract Length Normalization Equals Linear Transformation in Cepstral Space

Publication Year: 2007, Page(s):1731 - 1732
Cited by:  Papers (5)
| | PDF (84 KB) | HTML

The bilinear transformation (BT) is used for vocal tract length normalization (VTLN) in speech recogniton systems. We prove two properties of the bilinear mapping that motivated the band-diagonal transform proposed in M. Afify and O. Siohan, (ldquoConstrained maximum likelihood linear regression for speaker adaptation,rdquo in Proc. ICSLP, Beijing, China, Oct. 2000.) This is in contrast to what is... View full abstract»

• IEEE Transactions on Audio, Speech, and Language Processing Edics

Publication Year: 2007, Page(s):1733 - 1734
| PDF (30 KB)
• IEEE Transactions on Audio, Speech, and Language Processing Information for authors

Publication Year: 2007, Page(s):1735 - 1736
| PDF (45 KB)

Aims & Scope

IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

This Transactions ceased publication in 2013. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Li Deng
Microsoft Research