By Topic

Audio, Speech, and Language Processing, IEEE Transactions on

Issue 9 • Date Nov. 2012

Filter Results

Displaying Results 1 - 25 of 27
  • Table of Contents

    Publication Year: 2012 , Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (170 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing publication information

    Publication Year: 2012 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (137 KB)  
    Freely Available from IEEE
  • A Generalized Directional Laplacian Distribution : Estimation, Mixture Models and Audio Source Separation

    Publication Year: 2012 , Page(s): 2397 - 2408
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (784 KB) |  | HTML iconHTML  

    Directional or Circular statistics are pertaining to the analysis and interpretation of directions or rotations. In this work, a novel probability distribution is proposed to model multidimensional sparse directional data. The Generalized Directional Laplacian Distribution (DLD) is a hybrid between the Laplacian distribution and the von Mises-Fisher distribution. The distribution's parameters are estimated using Maximum-Likelihood Estimation over a set of training data points. Mixtures of Directional Laplacian Distributions (MDLD) are also introduced in order to model multiple concentrations of sparse directional data. The author explores the application of the derived DLD mixture model to cluster sound sources that exist in an underdetermined instantaneous sound mixture. The proposed model can solve the general {K\times L~(K< L)} underdetermined instantaneous source separation problem, offering a fast and stable solution. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of Extended Baum–Welch and Constrained Optimization for Discriminative Training of HMMs

    Publication Year: 2012 , Page(s): 2409 - 2419
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1519 KB) |  | HTML iconHTML  

    Discriminative training is an essential part in building a state-of-the-art speech recognition system. The Extended Baum–Welch (EBW) algorithm is the most popular method to carry out this demanding large-scale optimization task. This paper presents a novel analysis of the EBW algorithm which shows that EBW is performing a specific kind of constrained optimization. The constraints show an interesting connection between the improvement of the discriminative criterion and the Kullback–Leibler divergence (KLD). Based on the analysis, a novel method for controlling the EBW algorithm is proposed. The presented analysis uses decomposed formulae for Gaussian mixture KLDs which correspond to the ones used in the Constrained Line Search (CLS) optimization algorithm. The CLS algorithm for discriminative training is therefore also briefly presented and its connections to EBW studied. Large vocabulary speech recognition experiments are used to evaluate the proposed controlling of EBW, which is shown to outperform the common heuristics in model robustness. Comparison of EBW to CLS also shows differences in robustness in favor to EBW. The constraints for Gaussian parameter optimization as well as the special mixture weight estimation method used with EBW are shown to be the key factors for good performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spatial Encoding of Finite Difference Time Domain Acoustic Models for Auralization

    Publication Year: 2012 , Page(s): 2420 - 2432
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1109 KB) |  | HTML iconHTML  

    A single room impulse response can reveal information about the acoustics of a given space in both objective, and, when used for auralization, subjective terms. However, for additional spatial information, or more accurate and perceptually convincing auralization, multiple impulse responses are needed. Higher order Ambisonics is a robust means of capturing the spatial qualities of an acoustic space over multiple channels for decoding and rendering over many possible speaker layouts. A method for obtaining N th-order Ambisonic impulse responses from a room acoustic model, based on lower orders using differential microphone techniques is presented. This is tested using a third-order encoding of a 2-D finite difference time domain room acoustic simulation based on multiple circular arrays of receivers. Accurate channel directional profiles are obtained and results are verified in a series of listening tests comparing the localization of a sound source placed within the given simulation to the same source encoded directly. This generic encoding scheme can be applied to any room acoustic simulation technique where it is possible to obtain impulse responses across multiple receiver positions. Although the proposed method encompasses horizontal encoding only, it can also be applied directly in 3-D simulations where height information is not required in the final auralization. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stochastic and Analytic Optimization of Sparse Aperiodic Arrays and Broadband Beamformers With Robust Superdirective Patterns

    Publication Year: 2012 , Page(s): 2433 - 2447
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1744 KB) |  | HTML iconHTML  

    This paper addresses the spatial processing of signals collected by a linear array of sensors that feeds a filter-and-sum, data-independent beamformer. When the frequency band spanned by the signals to be processed is extremely wide, a given array can be shorter than the wavelength (at the lowest frequencies) and, at the same time, too scarcely populated for a correct sampling of the wavefield (at the highest frequencies). Superdirectivity and aperiodic sparse layouts are possible solutions to these two problems. However, these two solutions have never been considered jointly to achieve a broadband beam pattern with a desired profile. Through a mixed stochastic and analytic optimization, a method is proposed herein that synthesizes a sparse array layout and the tap coefficients of the beamformer filters to provide a broadband beam pattern that is superdirective and robust against the fluctuation of the sensors' characteristics, which is free from any grating lobes and possesses a controlled side-lobe level. Different types of beam patterns, from the frequency-invariant pattern to the maximum-directivity pattern, can be obtained and the synthesized solutions retain their validity for any steering direction inside a given interval. The functioning of the method is proven by considering a microphone array with four different design targets and by discussing the performance and the robustness of the synthesized solutions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Ray Tracing Simulation of Sound Diffraction Based on the Analytic Secondary Source Model

    Publication Year: 2012 , Page(s): 2448 - 2460
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (986 KB) |  | HTML iconHTML  

    This paper describes a novel ray tracing method for solving sound diffraction problems. This method is a Monte Carlo solution to the multiple integration in the analytic secondary source model of edge diffraction; it uses ray tracing to calculate sample values of the integrand. The similarity between our method and general ray tracing makes it possible to utilize the various approaches developed for ray tracing. Our implementation employs the OptiX ray tracing engine, which exhibits good acceleration performance on a graphics processor. Two importance sampling methods are derived from different aspects, and they provide an efficient and accurate way to solve the numerically challenging integration. The accuracy of our method was demonstrated by comparing its estimates with the ones calculated by reference software. An analysis of signal-to-noise ratios using an auditory filter bank was performed objectively and subjectively in order to evaluate the error characteristics and perceptual quality. The applicability of our method was evaluated with a prototype system of interactive ray tracing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Audio Watermarking Using Spatial Masking and Ambisonics

    Publication Year: 2012 , Page(s): 2461 - 2469
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (418 KB) |  | HTML iconHTML  

    Based on the spatial masking phenomenon and Ambisonics, a watermarking technique for audio signals is proposed. Ambisonics is a well known technique to encode and reproduce spatial information related to sound. The proposed method exploits this feature of Ambisonics. A watermark is represented as a slightly rotated version of the original sound scene. In another scenario in which Ambisonic signals are synthesized from mono or stereo signals, watermarks are embedded near the signal by adding a small copy of the host signal. This scenario presents the important advantage that reversible watermarking is possible if loudspeakers are arranged properly for playback. This advantage is attributed to the fact that first order Ambisonics represents audio signals as mutually orthogonal four-channel signals, thereby presenting a larger amount of data than in the original version. Formulation of the proposed method is explained. Listening tests demonstrate the relations between primary parameter values and imperceptibility of the watermark. Computer simulations conducted to assess the robustness of the proposed method against common signal processing attacks demonstrate that the method is robust against most of the tested attacks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bayesian Restoration of Audio Signals Degraded by Impulsive Noise Modeled as Individual Pulses

    Publication Year: 2012 , Page(s): 2470 - 2481
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1166 KB) |  | HTML iconHTML  

    Impulsive noise, also known as clicks, is a very common type of distortion in old gramophone recordings. Existing methods (both heuristic and statistical) for removal of this type of defect usually do not exploit its underlying physical generation. This work proposes a model in which each click is individually modeled, which is more physically meaningful. A Bayesian method based on the reversible-jump Metropolis-Hastings algorithm for joint detection and removal of impulsive noise in audio signals is devised. Simulations with artificial and real audio signals as well as comparisons with competing approaches are presented to illustrate and validate the proposed method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Statistical Utterance Comparison for Speaker Clustering Using Factor Analysis

    Publication Year: 2012 , Page(s): 2482 - 2491
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (441 KB) |  | HTML iconHTML  

    We propose a novel method of measuring the similarity between two or more speech utterances for speaker clustering, based on probability theory and factor analysis. The similarity function is formulated as the probability that the utterances originated from the same speaker, and uses statistical eigenvoice and eigenchannel models to incorporate physical knowledge of interspeaker and intraspeaker variabilities, allowing the similarity function to be trainable and robust. The comparison function can be efficiently computed using a compact set of sufficient statistics for each speech utterance, allowing the acoustic features to be discarded. We begin using only eigenvoices, and then show how the eigenchannels can be incorporated into the equation to result in an identical form but with a different set of sufficient statistics. We test the proposed model in a speaker clustering task using the CALLHOME telephone conversation corpus and show that it performs better than two other well-known similarity measures: the Cross-Likelihood Ratio (CLR) and Generalized Likelihood Ratio (GLR). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic Parliamentary Meeting Minute Generation Using Rhetorical Structure Modeling

    Publication Year: 2012 , Page(s): 2492 - 2504
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3012 KB) |  | HTML iconHTML  

    In this paper, we propose a one step rhetorical structure parsing, chunking and extractive summarization approach to automatically generate meeting minutes from parliamentary speech using acoustic and lexical features. We investigate how to use lexical features extracted from imperfect ASR transcriptions, together with acoustic features extracted from the speech itself, to form extractive summaries with the structure of meeting minutes. Each business item in the minute is modeled as a rhetorical chunk which consists of smaller rhetorical units. Principal Component Analysis (PCA) graphs of both acoustic and lexical features in meeting speech show clear self-clustering of speech utterances according to the underlying rhetorical state-for example acoustic and lexical feature vectors from the question and answer or motion of a parliamentary speech, are grouped together. We then propose a Conditional Random Fields (CRF)-based approach to perform both rhetorical structure modeling and extractive summarization in one step, by chunking, parsing and extraction of salient utterances. Extracted salient utterances are grouped under the labels of each rhetorical state, emulating meeting minutes to yield summaries that are more easily understandable by humans. We compare this approach to different machine learning methods. We show that our proposed CRF-based one step minute generation system obtains the best summarization performance both in terms of ROUGE-L F-measure at 74.5% and by human evaluation, at 77.5% on average. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Statistical Voice Conversion Techniques for Body-Conducted Unvoiced Speech Enhancement

    Publication Year: 2012 , Page(s): 2505 - 2517
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1672 KB) |  | HTML iconHTML  

    In this paper, we present statistical approaches to enhance body-conducted unvoiced speech for silent speech communication. A body-conductive microphone called nonaudible murmur (NAM) microphone is effectively used to detect very soft unvoiced speech such as NAM or a whispered voice while keeping speech sounds emitted outside almost inaudible. However, body-conducted unvoiced speech is difficult to use in human-to-human speech communication because it sounds unnatural and less intelligible owing to the acoustic change caused by body conduction. To address this issue, voice conversion (VC) methods from NAM to normal speech (NAM-to-Speech) and to a whispered voice (NAM-to-Whisper) are proposed, where the acoustic features of body-conducted unvoiced speech are converted into those of natural voices in a probabilistic manner using Gaussian mixture models (GMMs). Moreover, these methods are extended to convert not only NAM but also a body-conducted whispered voice (BCW) as another type of body-conducted unvoiced speech. Several experimental evaluations are conducted to demonstrate the effectiveness of the proposed methods. The experimental results show that 1) NAM-to-Speech effectively improves intelligibility but it causes degradation of naturalness owing to the difficulty of estimating natural fundamental frequency contours from unvoiced speech; 2) NAM-to-Whisper significantly outperforms NAM-to-Speech in terms of both intelligibility and naturalness; and 3) a single conversion model capable of converting both NAM and BCW is effectively developed in our proposed VC methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A CASA-Based System for Long-Term SNR Estimation

    Publication Year: 2012 , Page(s): 2518 - 2527
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (828 KB) |  | HTML iconHTML  

    We present a system for robust signal-to-noise ratio (SNR) estimation based on computational auditory scene analysis (CASA). The proposed algorithm uses an estimate of the ideal binary mask to segregate a time-frequency representation of the noisy signal into speech dominated and noise dominated regions. Energy within each of these regions is summated to derive the filtered global SNR. An SNR transform is introduced to convert the estimated filtered SNR to the true broadband SNR of the noisy signal. The algorithm is further extended to estimate subband SNRs. Evaluations are done using the TIMIT speech corpus and the NOISEX92 noise database. Results indicate that both global and subband SNR estimates are superior to those of existing methods, especially at low SNR conditions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Supervised Graph-Based Processing for Sequential Transient Interference Suppression

    Publication Year: 2012 , Page(s): 2528 - 2538
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (782 KB) |  | HTML iconHTML  

    In this paper, we present a supervised graph-based framework for sequential processing and employ it to the problem of transient interference suppression. Transients typically consist of an initial peak followed by decaying short-duration oscillations. Such sounds, e.g., keyboard typing and door knocking, often arise as an interference in everyday applications: hearing aids, hands-free accessories, mobile phones, and conference-room devices. We describe a graph construction using a noisy speech signal and training recordings of typical transients. The main idea is to capture the transient interference structure, which may emerge from the construction of the graph. The graph parametrization is then viewed as a data-driven model of the transients and utilized to define a filter that extracts the transients from noisy speech measurements. Unlike previous transient interference suppression studies, in this work the graph is constructed in advance from training recordings. Then, the graph is extended to newly acquired measurements, providing a sequential filtering framework of noisy speech. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Selective Sampling for Beat Tracking Evaluation

    Publication Year: 2012 , Page(s): 2539 - 2548
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (892 KB) |  | HTML iconHTML  

    In this paper, we propose a method that can identify challenging music samples for beat tracking without ground truth. Our method, motivated by the machine learning method “selective sampling,” is based on the measurement of mutual agreement between beat sequences. In calculating this mutual agreement we show the critical influence of different evaluation measures. Using our approach we demonstrate how to compile a new evaluation dataset comprised of difficult excerpts for beat tracking and examine this difficulty in the context of perceptual and musical properties. Based on tag analysis we indicate the musical properties where future advances in beat tracking research would be most profitable and where beat tracking is too difficult to be attempted. Finally, we demonstrate how our mutual agreement method can be used to improve beat tracking accuracy on large music collections. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Novel Acoustic Feedback Cancellation Approaches in Hearing Aid Applications Using Probe Noise and Probe Noise Enhancement

    Publication Year: 2012 , Page(s): 2549 - 2563
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1194 KB) |  | HTML iconHTML  

    Adaptive filters are widely used in acoustic feedback cancellation systems and have evolved to be state-of-the-art. One major challenge remaining is that the adaptive filter estimates are biased due to the nonzero correlation between the loudspeaker signals and the signals entering the audio system. In many cases, this bias problem causes the cancellation system to fail. The traditional probe noise approach, where a noise signal is added to the loudspeaker signal can, in theory, prevent the bias. However, in practice, the probe noise level must often be so high that the noise is clearly audible and annoying; this makes the traditional probe noise approach less useful in practical applications. In this work, we explain theoretically the decreased convergence rate when using low-level probe noise in the traditional approach, before we propose and study analytically two new probe noise approaches utilizing a combination of specifically designed probe noise signals and probe noise enhancement. Despite using low-level and inaudible probe noise signals, both approaches significantly improve the convergence behavior of the cancellation system compared to the traditional probe noise approach. This makes the proposed approaches much more attractive in practical applications. We demonstrate this through a simulation experiment with audio signals in a hearing aid acoustic feedback cancellation system, where the convergence rate is improved by as much as a factor of 10. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Modal Analysis of Spatial Discretization of Spherical Loudspeaker Distributions Used for Sound Field Synthesis

    Publication Year: 2012 , Page(s): 2564 - 2574
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2990 KB) |  | HTML iconHTML  

    The theory of sound field synthesis methods like Wave Field Synthesis (WFS) and Near-field Compensated Higher Order Ambisonics (NFC-HOA) may be formulated based on the assumption of continuous distributions of secondary sources that enclose the receiver area in the general case. In practice, a finite number of discrete loudspeakers is used, which constitutes a fundamental departure from the theoretical requirements. In this paper, we present a detailed analysis of the consequences of this spatial discretization on the synthesized sound field via an analytical frequency-dependent modal decomposition of the latter for the case of Gaussian sampling. It is shown that the underlying mechanisms are essentially similar to those in the discretization of circular secondary source distributions so that the results obtained in the latter context may be directly applied. The outstanding parallel in the discretization of spherical and circular distributions is the fact that in both cases, repetitions in a given space frequency domain occur. Therefore, the spatial bandwidth of the desired sound field has essential influence on the properties of the evolving artifacts. We propose to categorize sound field synthesis methods into spatially narrowband, wideband, and fullband approaches and show that NFC-HOA constitutes a spatially narrowband method and that WFS constitutes a spatially fullband method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal Real-Weighted Beamforming With Application to Linear and Spherical Arrays

    Publication Year: 2012 , Page(s): 2575 - 2585
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1102 KB) |  | HTML iconHTML  

    One of the uses of sensor arrays is for spatial filtering or beamforming. Current digital signal processing methods facilitate complex-weighted beamforming, providing flexibility in array design. Previous studies proposed the use of real-valued beamforming weights, which although reduce flexibility in design, may provide a range of benefits, e.g., simplified beamformer implementation or efficient beamforming algorithms. This paper presents a new method for the design of arrays with real-valued weights, that achieve maximum directivity, providing closed-form solution to array weights. The method is studied for linear and spherical arrays, where it is shown that rigid spherical arrays are particularly suitable for real-weight designs as they do not suffer from grating lobes, a dominant feature in linear arrays with real weights. A simulation study is presented for linear and spherical arrays, along with an experimental investigation, validating the theoretical developments. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Joint Approach for Single-Channel Speaker Identification and Speech Separation

    Publication Year: 2012 , Page(s): 2586 - 2601
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1673 KB) |  | HTML iconHTML  

    In this paper, we present a novel system for joint speaker identification and speech separation. For speaker identification a single-channel speaker identification algorithm is proposed which provides an estimate of signal-to-signal ratio (SSR) as a by-product. For speech separation, we propose a sinusoidal model-based algorithm. The speech separation algorithm consists of a double-talk/single-talk detector followed by a minimum mean square error estimator of sinusoidal parameters for finding optimal codevectors from pre-trained speaker codebooks. In evaluating the proposed system, we start from a situation where we have prior information of codebook indices, speaker identities and SSR-level, and then, by relaxing these assumptions one by one, we demonstrate the efficiency of the proposed fully blind system. In contrast to previous studies that mostly focus on automatic speech recognition (ASR) accuracy, here, we report the objective and subjective results as well. The results show that the proposed system performs as well as the best of the state-of-the-art in terms of perceived quality while its performance in terms of speaker identification and automatic speech recognition results are generally lower. It outperforms the state-of-the-art in terms of intelligibility showing that the ASR results are not conclusive. The proposed method achieves on average, 52.3% ASR accuracy, 41.2 points in MUSHRA and 85.9% in speech intelligibility. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spoken Document Retrieval With Unsupervised Query Modeling Techniques

    Publication Year: 2012 , Page(s): 2602 - 2612
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (512 KB) |  | HTML iconHTML  

    Ever-increasing amounts of publicly available multimedia associated with speech information have motivated spoken document retrieval (SDR) to be an active area of intensive research in the speech processing community. Much work has been dedicated to developing elaborate indexing and modeling techniques for representing spoken documents, but only little to improving query formulations for better representing the information needs of users. The latter is critical to the success of a SDR system. In view of this, we present in this paper a novel use of a relevance language modeling framework for SDR. It not only inherits the merits of several existing techniques but also provides a principled way to render the lexical and topical relationships between a query and a spoken document. We further explore various ways to glean both relevance and non-relevance cues from the spoken document collection so as to enhance query modeling in an unsupervised fashion. In addition, we also investigate representing the query and documents with different granularities of index features to work in conjunction with the various relevance and/or non-relevance cues. Empirical evaluations performed on the TDT (Topic Detection and Tracking) collections reveal that the methods derived from our modeling framework hold good promise for SDR and are very competitive with existing retrieval methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An FIR Implementation of Zero Frequency Filtering of Speech Signals

    Publication Year: 2012 , Page(s): 2613 - 2617
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (774 KB) |  | HTML iconHTML  

    Zero frequency filtering is a technique used in the characterization and analysis of glottal activity from speech signals. The filter design originally proposed has an infinite impulse response (IIR) filter followed by two successive finite impulse response (FIR) filters. In this paper, the process of its computation is analyzed and a simplified FIR implementation is proposed by employing the inherent pole-zero cancellation involved in the process. Theoretical proofs are derived in both, frequency and time domains. We show that the theoretically derived FIR filter is a convolution of two filters, whose impulse responses are triangular shaped. The advantage of the proposed FIR filter lies in reduction of computational requirements for zero frequency filtering which include - 1) use of single-precision floating point and 2) stability of the filter. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • EDICS Categories for IEEE Transactions on Audio, Speech, and Language Processing

    Publication Year: 2012 , Page(s): 2618 - 2619
    Save to Project icon | Request Permissions | PDF file iconPDF (107 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing Information for Authors

    Publication Year: 2012 , Page(s): 2620 - 2621
    Save to Project icon | Request Permissions | PDF file iconPDF (145 KB)  
    Freely Available from IEEE
  • IEEE Xplore Digital Library

    Publication Year: 2012 , Page(s): 2622
    Save to Project icon | Request Permissions | PDF file iconPDF (1372 KB)  
    Freely Available from IEEE
  • Open Access

    Publication Year: 2012 , Page(s): 2623
    Save to Project icon | Request Permissions | PDF file iconPDF (1156 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2013. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Li Deng
Microsoft Research