By Topic

Audio, Speech, and Language Processing, IEEE Transactions on

Issue 2 • Date Feb. 2013

Filter Results

Displaying Results 1 - 25 of 32
  • [Front cover]

    Publication Year: 2013 , Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (216 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing publication information

    Publication Year: 2013 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (130 KB)  
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2013 , Page(s): 221 - 222
    Save to Project icon | Request Permissions | PDF file iconPDF (216 KB)  
    Freely Available from IEEE
  • Sentence-Based Sentiment Analysis for Expressive Text-to-Speech

    Publication Year: 2013 , Page(s): 223 - 233
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (943 KB) |  | HTML iconHTML  

    Current research to improve state of the art Text-To-Speech (TTS) synthesis studies both the processing of input text and the ability to render natural expressive speech. Focusing on the former as a front-end task in the production of synthetic speech, this article investigates the proper adaptation of a Sentiment Analysis procedure (positive/neutral/negative) that can then be used as an input feature for expressive speech synthesis. To this end, we evaluate different combinations of textual features and classifiers to determine the most appropriate adaptation procedure. The effectiveness of this scheme for Sentiment Analysis is evaluated using the Semeval 2007 dataset and a Twitter corpus, for their affective nature and their granularity at the sentence level, which is appropriate for an expressive TTS scenario. The experiments conducted validate the proposed procedure with respect to the state of the art for Sentiment Analysis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TDOA-Based Speed of Sound Estimation for Air Temperature and Room Geometry Inference

    Publication Year: 2013 , Page(s): 234 - 246
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2738 KB) |  | HTML iconHTML  

    Spatially distributed acoustic sensors find increasingly many new applications in speech-based human-machine interfaces. One well researched topic is the localization of sound sources from Time Differences Of Arrival (TDOAs) measurements. Typically, the propagation speed of sound is considered a known constant. However due to temperature variations its value is known only up to some uncertainty. This paper exploits TDOA-based localization techniques in order to estimate accurately the actual speed of sound. Experimental results using both simulated and real data demonstrate the feasibility of the proposed method. Furthermore, the practical validation of this work considers two distinct experiments that are aimed at inferring information about enclosed sound fields. The first experiment concerns the calculation of the air temperature from the estimated speed of sound. The second experiment highlights the effects of temperature variations on the inference of the physical location of reflective boundaries of the acoustic enclosure. In the latter case it is shown that the position estimates of the reflective surfaces in a room can be improved when the correct propagation speed is first estimated using this method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sound Field Reproduction of a Virtual Source Inside a Loudspeaker Array With Minimal External Radiation

    Publication Year: 2013 , Page(s): 247 - 259
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3858 KB) |  | HTML iconHTML  

    We derived an integral equation for reproducing the sound field of a virtual source inside an array of loudspeakers with reduced radiation to the outside. Reproduction of a sound field over a finite interior region inevitably generates sound waves that propagate outside the region. This undesirable radiation is reflected from walls and can induce artifacts in the interior region. In principle, the Kirchhoff-Helmholtz (KH) integral can be used to reproduce the interior sound field from an exterior virtual source without any external radiation. However, if there is a virtual source inside the array, the integral formula does not explicitly demonstrate how one can reproduce the sound field or minimize the external radiation. In this work, we derive an explicit formula for reproducing a sound field with minimal external radiation when a virtual source is located inside a loudspeaker array. The theory shows that external radiation can be effectively reduced without solving any inverse problem. The proposed formula follows the form of the KH integral and thus requires monopole and dipole sources. Although dipole sources are difficult to build in practice, the theory predicts that sound field reproduction with minimal external radiation is possible and that the room dependency of the sound field reproduction system can be decreased. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Graph-Based Query Strategies for Active Learning

    Publication Year: 2013 , Page(s): 260 - 269
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1912 KB) |  | HTML iconHTML  

    This paper proposes two new graph-based query strategies for active learning in a framework that is convenient to combine with semi-supervised learning based on label propagation. The first strategy selects instances independently to maximize the change to a maximum entropy model using label propagation results in a gradient length measure of model change. The second strategy involves a batch criterion that integrates label uncertainty with diversity and density objectives. Experiments on sentiment classification demonstrate that both methods consistently improve over a standard active learning baseline, and that the batch criterion also gives consistent improvement over semi-supervised learning alone. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploring Monaural Features for Classification-Based Speech Segregation

    Publication Year: 2013 , Page(s): 270 - 279
    Cited by:  Papers (17)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1220 KB) |  | HTML iconHTML  

    Monaural speech segregation has been a very challenging problem for decades. By casting speech segregation as a binary classification problem, recent advances have been made in computational auditory scene analysis on segregation of both voiced and unvoiced speech. So far, pitch and amplitude modulation spectrogram have been used as two main kinds of time-frequency (T-F) unit level features in classification. In this paper, we expand T-F unit features to include gammatone frequency cepstral coefficients (GFCC), mel-frequency cepstral coefficients, relative spectral transform (RASTA) and perceptual linear prediction (PLP). Comprehensive comparisons are performed in order to identify effective features for classification-based speech segregation. Our experiments in matched and unmatched test conditions show that these newly included features significantly improve speech segregation performance. Specifically, GFCC and RASTA-PLP are the best single features in matched-noise and unmatched-noise test conditions, respectively. We also find that pitch-based features are crucial for good generalization to unseen environments. To further explore complementarity in terms of discriminative power, we propose to use a group Lasso approach to select complementary features in a principled way. The final combined feature set yields promising results in both matched and unmatched test conditions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Unified Trajectory Tiling Approach to High Quality Speech Rendering

    Publication Year: 2013 , Page(s): 280 - 290
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2242 KB) |  | HTML iconHTML  

    It is technically challenging to make a machine talk as naturally as a human so as to facilitate “frictionless” interactions between machine and human. We propose a trajectory tiling-based approach to high-quality speech rendering, where speech parameter trajectories, extracted from natural, processed, or synthesized speech, are used to guide the search for the best sequence of waveform “tiles” stored in a pre-recorded speech database. We test the proposed unified algorithm in both Text-To-Speech (TTS) synthesis and cross-lingual voice transformation applications. Experimental results show that the proposed trajectory tiling approach can render speech which is both natural and highly intelligible. The perceived high quality of rendered speech is also confirmed in both objective and subjective evaluations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Classification and Ranking Approaches to Discriminative Language Modeling for ASR

    Publication Year: 2013 , Page(s): 291 - 300
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1461 KB) |  | HTML iconHTML  

    Discriminative language modeling (DLM) is a feature-based approach that is used as an error-correcting step after hypothesis generation in automatic speech recognition (ASR). We formulate this both as a classification and a ranking problem and employ the perceptron, the margin infused relaxed algorithm (MIRA) and the support vector machine (SVM). To decrease training complexity, we try count-based thresholding for feature selection and data sampling from the list of hypotheses. On a Turkish morphology based feature set we examine the use of first and higher order n -grams and present an extensive analysis on the complexity and accuracy of the models with an emphasis on statistical significance. We find that we can save significantly from computation by feature selection and data sampling, without significant loss in accuracy. Using the MIRA or SVM does not lead to any further improvement over the perceptron but the use of ranking as opposed to classification leads to a 0.4% reduction in word error rate (WER) which is statistically significant. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Variable Step-Size FXLMS Algorithm for Narrowband Active Noise Control

    Publication Year: 2013 , Page(s): 301 - 312
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2093 KB) |  | HTML iconHTML  

    In this paper, a variable step-size filtered-x LMS (VSS-FXLMS) algorithm is proposed for a typical narrowband active noise control system. The new algorithm converges much faster than the conventional FXLMS algorithm does, and indicates a convergence rate quite similar to that of the filtered-x recursive least square (FXRLS) algorithm in stationary noise environments. It also considerably outperforms these two existing algorithms in nonstationary situations. The proposed algorithm requires some more computations as compared with the FXLMS algorithm; however, its computational complexity is significantly less than that of the FXRLS algorithm. Numerous simulations for stationary and nonstationary scenarios are conducted to demonstrate the superior performance of the proposed VSS-FXLMS algorithm as compared with the FXLMS and the FXRLS algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New Family of Wave-Digital Triode Models

    Publication Year: 2013 , Page(s): 313 - 321
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1644 KB) |  | HTML iconHTML  

    A new family of wave-digital vacuum tube triode models is presented. These models are inspired by the triode model by Cardarilli , which provides realistic simulation of the triode's transconductance behavior, and hence high accuracy in saturation conditions. The triode is modeled as a single memoryless nonlinear three-port wave digital filter element in which the outgoing wave variables are computed by locally applying the monodimensional secant method to one or two port voltages, depending on whether the grid current effect is taken into account. The proposed algorithms were found to produce a richer static harmonic response, introducing comparable or less aliasing and requiring approximately 50% less CPU time than previous models. The proposed models are suitable for real-time virtual analog circuit simulation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Single-Microphone Early and Late Reverberation Suppression in Noisy Speech

    Publication Year: 2013 , Page(s): 322 - 335
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3377 KB) |  | HTML iconHTML  

    This paper presents a single-microphone approach to the enhancement of noisy reverberant speech via inverse filtering and spectral processing. An efficient algorithm is used to blindly estimate the inverse filter of the Room Impulse Response (RIR). This filter is used to attenuate the early reverberation. A simple technique to blindly determine the filter length is presented. A two-step spectral subtraction method is proposed to efficiently reduce the effects of background noise and the residual reverberation on the equalized impulse response. In general, the equalized impulse response has two detrimental effects, late impulses and pre-echoes. For the late impulses, an efficient spectral subtraction algorithm is developed which introduces only minor musical noise. Then a new algorithm is introduced which reduces the remaining pre-echo effects. The performance of this two-stage method is examined in different reverberant conditions including real environments. It is also evaluated with white Gaussian and recorded babble noise. The results obtained demonstrate that the proposed blind method is superior in terms of reducing early and late reverberation effects and noise compared to well known single-microphone techniques in the literature. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Statistical Analysis of Two-Channel Post-Filter Estimators in Isotropic Noise Fields

    Publication Year: 2013 , Page(s): 336 - 342
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2098 KB) |  | HTML iconHTML  

    This paper derives explicit expressions of the probability density functions of the two-channel post-filter estimators in isotropic noise fields to study their statistical properties. According to the analysis results, three methods are proposed to improve the performance of the noise filed coherence (NFC)-based post-filter estimator. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed Multiple Constraints Generalized Sidelobe Canceler for Fully Connected Wireless Acoustic Sensor Networks

    Publication Year: 2013 , Page(s): 343 - 356
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3330 KB) |  | HTML iconHTML  

    This paper proposes a distributed multiple constraints generalized sidelobe canceler (GSC) for speech enhancement in an N-node fully connected wireless acoustic sensor network (WASN) comprising M microphones. Our algorithm is designed to operate in reverberant environments with constrained speakers (including both desired and competing speakers). Rather than broadcasting M microphone signals, a significant communication bandwidth reduction is obtained by performing local beamforming at the nodes, and utilizing only transmission channels. Each node processes its own microphone signals together with the N + P transmitted signals. The GSC-form implementation, by separating the constraints and the minimization, enables the adaptation of the BF during speech-absent time segments, and relaxes the requirement of other distributed LCMV based algorithms to re-estimate the sources RTFs after each iteration. We provide a full convergence proof of the proposed structure to the centralized GSC-beamformer (BF). An extensive experimental study of both narrowband and (wideband) speech signals verifíes the theoretical analysis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning Lexicons From Speech Using a Pronunciation Mixture Model

    Publication Year: 2013 , Page(s): 357 - 366
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1077 KB) |  | HTML iconHTML  

    In many ways, the lexicon remains the Achilles heel of modern automatic speech recognizers. Unlike stochastic acoustic and language models that learn the values of their parameters from training data, the baseform pronunciations of words in a recognizer's lexicon are typically specified manually, and do not change, unless they are edited by an expert. Our work presents a novel generative framework that uses speech data to learn stochastic lexicons, thereby taking a step towards alleviating the need for manual intervention and automatically learning high-quality pronunciations for words. We test our model on continuous speech in a weather information domain. In our experiments, we see significant improvements over a manually specified “expert-pronunciation” lexicon. We then analyze variations of the parameter settings used to achieve these gains. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image Feature Representation of the Subband Power Distribution for Robust Sound Event Classification

    Publication Year: 2013 , Page(s): 367 - 377
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1567 KB) |  | HTML iconHTML  

    The ability to automatically recognize a wide range of sound events in real-world conditions is an important part of applications such as acoustic surveillance and machine hearing. Our approach takes inspiration from both audio and image processing fields, and is based on transforming the sound into a two-dimensional representation, then extracting an image feature for classification. This provided the motivation for our previous work on the spectrogram image feature (SIF). In this paper, we propose a novel method to improve the sound event classification performance in severe mismatched noise conditions. This is based on the subband power distribution (SPD) image - a novel two-dimensional representation that characterizes the spectral power distribution over time in each frequency subband. Here, the high-powered reliable elements of the spectrogram are transformed to a localized region of the SPD, hence can be easily separated from the noise. We then extract an image feature from the SPD, using the same approach as for the SIF, and develop a novel missing feature classification approach based on a nearest neighbor classifier (kNN). We carry out comprehensive experiments on a database of 50 environmental sound classes over a range of challenging noise conditions. The results demonstrate that the SPD-IF is both discriminative over the broad range of sound classes, and robust in severe non-stationary noise. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Generation of Isolated Wideband Sound Fields Using a Combined Two-stage Lasso-LS Algorithm

    Publication Year: 2013 , Page(s): 378 - 387
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2678 KB) |  | HTML iconHTML  

    The prohibitive number of speakers required for the reproduction of isolated soundfields is the major limitation preventing solution deployment. This paper addresses the provision of personal soundfields (zones) to multiple listeners using a limited number of speakers with an underlying assumption of fixed virtual sources. For such multizone systems, optimization of speaker positions and weightings is important to reduce the number of active speakers. Typically, single stage optimization is performed, but in this paper a new two-stage pressure matching optimization is proposed for wideband sound sources. In the first stage, the least-absolute shrinkage and selection operator (Lasso) is used to select the speakers' positions for all sources and frequency bands. A second stage then optimizes reproduction using all selected speakers on the basis of a regularized least-squares (LS) algorithm. The performance of the new, two-stage approach is investigated for different reproduction angles, frequency range and variable total speaker weight powers. The results demonstrate that using two-stage Lasso-LS optimization can give up to 69 dB improvement in the mean squared error (MSE) over a single-stage LS in the reproduction of two isolated audio signals within control zones using e.g. 84 speakers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition

    Publication Year: 2013 , Page(s): 388 - 396
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2001 KB) |  | HTML iconHTML  

    The recently proposed context-dependent deep neural network hidden Markov models (CD-DNN-HMMs) have been proved highly promising for large vocabulary speech recognition. In this paper, we develop a more advanced type of DNN, which we call the deep tensor neural network (DTNN). The DTNN extends the conventional DNN by replacing one or more of its layers with a double-projection (DP) layer, in which each input vector is projected into two nonlinear subspaces, and a tensor layer, in which two subspace projections interact with each other and jointly predict the next layer in the deep architecture. In addition, we describe an approach to map the tensor layers to the conventional sigmoid layers so that the former can be treated and trained in a similar way to the latter. With this mapping we can consider a DTNN as the DNN augmented with DP layers so that not only the BP learning algorithm of DTNNs can be cleanly derived but also new types of DTNNs can be more easily developed. Evaluation on Switchboard tasks indicates that DTNNs can outperform the already high-performing DNNs with 4-5% and 3% relative word error reduction, respectively, using 30-hr and 309-hr training sets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Privacy-Preserving Speaker Verification and Identification Using Gaussian Mixture Models

    Publication Year: 2013 , Page(s): 397 - 406
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1058 KB) |  | HTML iconHTML  

    Speech being a unique characteristic of an individual is widely used in speaker verification and speaker identification tasks in applications such as authentication and surveillance respectively. In this article, we present frameworks for privacy-preserving speaker verification and speaker identification systems, where the system is able to perform the necessary operations without being able to observe the speech input provided by the user. In a speech-based authentication setting, this privacy constraint protect against an adversary who can break into the system and use the speech models to impersonate legitimate users. In surveillance applications, we require the system to first identify if the speech recording belongs to a suspect while preserving the privacy constraints. This prevents the system from listening in on conversations of innocent individuals. In this paper we formalize the privacy criteria for the speaker verification and speaker identification problems and construct Gaussian mixture model-based protocols. We also report experiments with a prototype implementation of the protocols on a standardized dataset for execution time and accuracy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluating the Generalization of the Hearing Aid Speech Quality Index (HASQI)

    Publication Year: 2013 , Page(s): 407 - 415
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1332 KB) |  | HTML iconHTML  

    Many developers of audio signal processing strategies rely on objective measures of quality for initial evaluations of algorithms. As such, objective measures should be robust, and they should be able to predict quality accurately regardless of the dataset or testing conditions. Kates and Arehart have developed the Hearing Aid Speech Quality Index (HASQI) to predict the effects of noise, nonlinear distortion, and linear filtering on speech quality for both normal-hearing and hearing-impaired listeners, and they report very high performance with their training and testing datasets [Kates, J. and Arehart, K., Audio Eng. Soc., 58(5), 363-381 (2010)]. In order to investigate the generalizability of HASQI, we test its ability to predict normal-hearing listeners' subjective quality ratings of a dataset on which it was not trained. This dataset is designed specifically to contain a wide range of distortions introduced by real-world noises which have been processed by some of the most common noise suppression algorithms in hearing aids. We show that HASQI achieves prediction performance comparable to the Perceptual Evaluation of Speech Quality (PESQ), the standard for objective measures of quality, as well as some of the other measures in the literature. Furthermore, we identify areas of weakness and show that training can improve quantitative prediction. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Multistream Feature Framework Based on Bandpass Modulation Filtering for Robust Speech Recognition

    Publication Year: 2013 , Page(s): 416 - 426
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1243 KB) |  | HTML iconHTML  

    There is strong neurophysiological evidence suggesting that processing of speech signals in the brain happens along parallel paths which encode complementary information in the signal. These parallel streams are organized around a duality of slow vs. fast: Coarse signal dynamics appear to be processed separately from rapidly changing modulations both in the spectral and temporal dimensions. We adapt such duality in a multistream framework for robust speaker-independent phoneme recognition. The scheme presented here centers around a multi-path bandpass modulation analysis of speech sounds with each stream covering an entire range of temporal and spectral modulations. By performing bandpass operations along the spectral and temporal dimensions, the proposed scheme avoids the classic feature explosion problem of previous multistream approaches while maintaining the advantage of parallelism and localized feature analysis. The proposed architecture results in substantial improvements over standard and state-of-the-art feature schemes for phoneme recognition, particularly in presence of nonstationary noise, reverberation and channel distortions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design of Steerable Spherical Broadband Beamformers With Flexible Sensor Configurations

    Publication Year: 2013 , Page(s): 427 - 438
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4039 KB) |  | HTML iconHTML  

    In broadband beamformer applications with dynamically moving sources, it can be important to have a simple mechanism to steer the main-beam. It can also be desirable if the beampattern of the beamformer is invariant to the look direction. A number of design methods for such beamformers, based on the spherical harmonic transform, have been reported in the literature. However, these methods require the sensor positions to satisfy a certain condition which may conflict with practical considerations. This paper proposes a design method which obviates this restriction thus allowing for spherical arrays with arbitrary sensor configurations. Moreover, for a comparable level of performance and computational complexity to the existing spherical harmonic beamformers, the proposed beamformer requires fewer sensors. The trade-off is that the design of the beamformer now depends on the sensor positions. Other considerations, such as the effects of array mis-orientation and robustness, are also discussed in this paper and illustrated by design examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Acoustic Source Localization With Distributed Asynchronous Microphone Networks

    Publication Year: 2013 , Page(s): 439 - 443
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (863 KB) |  | HTML iconHTML  

    We propose a method for localizing an acoustic source with distributed microphone networks. Time Differences of Arrival (TDOAs) of signals pertaining the same sensor are estimated through Generalized Cross-Correlation. After a TDOA filtering stage that discards measurements that are potentially unreliable, source localization is performed by minimizing a fourth-order polynomial that combines hyperbolic constraints from multiple sensors. The algorithm turns to exhibit a significantly lower computational cost compared with state-of-the-art techniques, while retaining an excellent localization accuracy in fairly reverberant conditions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Tuning-Less Approach in Secondary Path Modeling in Active Noise Control Systems

    Publication Year: 2013 , Page(s): 444 - 448
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (616 KB) |  | HTML iconHTML  

    This correspondence proposes a new approach to the problem of secondary path estimation in active noise control (ANC) systems aimed to avoid any tuning during system setup. To meet this target, adaptation rules are proposed, which do not have any parameters to be tuned and whose behavior can be predicted with great accuracy, because analytical expressions of their modeling mean square relative error curves are given. Computer simulations show that their performances are comparable with those obtained by using optimal variable step size (VSS) versions of the least mean square (LMS) algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2013. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Li Deng
Microsoft Research