By Topic

Audio, Speech, and Language Processing, IEEE Transactions on

Issue 2 • Date Feb. 2010

Filter Results

Displaying Results 1 - 22 of 22
  • Table of contents

    Publication Year: 2010 , Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (99 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing publication information

    Publication Year: 2010 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (39 KB)  
    Freely Available from IEEE
  • Simulation of Directional Microphones in Digital Waveguide Mesh-Based Models of Room Acoustics

    Publication Year: 2010 , Page(s): 213 - 223
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1495 KB) |  | HTML iconHTML  

    Digital waveguide mesh (DWM) models are time-domain numerical methods providing computationally simple solutions for wave propagation problems. They have been used in various acoustical modeling and audio synthesis applications including synthesis of musical instrument sounds and speech, and modeling of room acoustics. A successful model of room acoustics should be able to account for source and receiver directivity. Methods for the simulation of directional sources in DWM models were previously proposed. This paper presents a method for the simulation of directional microphones in DWM-based models of room acoustics. The method is based on the directional weighting of the microphone response according to the instantaneous direction of incidence at a given point. The direction of incidence is obtained from instantaneous intensity that is calculated from local pressure values in the DWM model. The calculation of instantaneous intensity in DWM meshes and the directional accuracies of different mesh topologies are discussed. An intensity-based formulation for the response of a directional microphone is given. Simulation results for an actual microphone with frequency-dependent, non-ideal directivity function are presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Combining Auditory Preprocessing and Bayesian Estimation for Robust Formant Tracking

    Publication Year: 2010 , Page(s): 224 - 236
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1835 KB) |  | HTML iconHTML  

    We present a framework for estimating formant trajectories. Its focus is to achieve high robustness in noisy environments. Our approach combines a preprocessing based on functional principles of the human auditory system and a probabilistic tracking scheme. For enhancing the formant structure in spectrograms we use a Gammatone filterbank, a spectral preemphasis, as well as a spectral filtering using difference-of-Gaussians (DoG) operators. Finally, a contrast enhancement mimicking a competition between filter responses is applied. The probabilistic tracking scheme adopts the mixture modeling technique for estimating the joint distribution of formants. In conjunction with an algorithm for adaptive frequency range segmentation as well as Bayesian smoothing an efficient framework for estimating formant trajectories is derived. Comprehensive evaluations of our method on the VTR-formant database emphasize its high precision and robustness. We obtained superior performance compared to existing approaches for clean as well as echoic noisy speech. Finally, an implementation of the framework within the scope of an online system using instantaneous feature-based resynthesis demonstrates its applicability to real-world scenarios. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Pole-Zero Model Estimation Methods Minimizing a Logarithmic Criterion for Speech Analysis

    Publication Year: 2010 , Page(s): 237 - 248
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1407 KB) |  | HTML iconHTML  

    A speech production model consists of a linear, slowly time-varying filter. Pole-zero models are required for a good representation of certain types of speech sounds, like nasals and laterals. From a perceptual point of view, designing them by minimizing a logarithmic criterion appears as a very suitable approach. The most accurate available results are obtained by using Newton-like search algorithms to optimize pole and zero positions, or the coefficients of a decomposition into quadratic factors. In this paper, we propose to optimize the numerator and denominator coefficients instead. Experimental results show that this is the computationally most efficient approach, especially when the optimization criterion considers a psychoacoustical frequency scale. To illustrate its applicability in speech processing, we used the proposed method for formant and anti-formant tracking as well as speech resynthesis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Room Impulse Response Shortening/Reshaping With Infinity- and p -Norm Optimization

    Publication Year: 2010 , Page(s): 249 - 259
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (769 KB) |  | HTML iconHTML  

    The purpose of room impulse response (RIR) shortening and reshaping is usually to improve the intelligibility of the received signal by prefiltering the source signal before it is played with a loudspeaker in a closed room. In an alternative, but mathematically equivalent setting, one may aim to postfilter a recorded microphone signal to remove audible echoes. While least-squares methods have mainly been used for the design of shortening/reshaping filters for RIRs until now, we propose to use the infinity- or p-norm as optimization criteria. In our method, design errors will be uniformly distributed over the entire temporal range of the shortened/reshaped global impulse response. In addition, the psychoacoustic property of masking effects is considered during the filter design, which makes it possible to significantly reduce the filter length, compared to standard approaches, without affecting the perceived performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Optimal Frequency-Domain Multichannel Linear Filtering for Noise Reduction

    Publication Year: 2010 , Page(s): 260 - 276
    Cited by:  Papers (25)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1559 KB) |  | HTML iconHTML  

    Several contributions have been made so far to develop optimal multichannel linear filtering approaches and show their ability to reduce the acoustic noise. However, there has not been a clear unifying theoretical analysis of their performance in terms of both noise reduction and speech distortion. To fill this gap, we analyze the frequency-domain (non-causal) multichannel linear filtering for noise reduction in this paper. For completeness, we consider the noise reduction constrained optimization problem that leads to the parameterized multichannel non-causal Wiener filter (PMWF). Our contribution is fivefold. First, we formally show that the minimum variance distortionless response (MVDR) filter is a particular case of the PMWF by properly formulating the constrained optimization problem of noise reduction. Second, we propose new simplified expressions for the PMWF, the MVDR, and the generalized sidelobe canceller (GSC) that depend on the signals' statistics only. In contrast to earlier works, these expressions are explicitly independent of the channel transfer function ratios. Third, we quantify the theoretical gains and losses in terms of speech distortion and noise reduction when using the PWMF by establishing new simplified closed-form expressions for three performance measures, namely, the signal distortion index, the noise reduction factor (originally proposed in the paper titled ldquoNew insights into the noise reduction Wiener filter,rdquo by J. Chen (IEEE Transactions on Audio, Speech, and Language Processing, Vol. 15, no. 4, pp. 1218-1234, Jul. 2006) to analyze the single channel time-domain Wiener filter), and the output signal-to-noise ratio (SNR). Fourth, we analyze the effects of coherent and incoherent noise in addition to the benefits of utilizing multiple microphones. Fifth, we propose a new proof for the a posteriori SNR improvement achieved by the PMWF. Finally, we provide some simulations results to corroborate the - - findings of this work. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Robust Method to Extract Talker Azimuth Orientation Using a Large-Aperture Microphone Array

    Publication Year: 2010 , Page(s): 277 - 285
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2042 KB) |  | HTML iconHTML  

    Knowing the orientation of a talker in the focal area of a large-aperture microphone array enables the development of better beamforming algorithms (to obtain higher-quality speech output), improves source-location/tracking algorithms, and allows better selection and control of cameras in a video conference situation. Measurements in an anechoic room (e.g., Chu and Warnock, 2002) have quantified the average frequency-dependent magnitude (source radiation pattern) of the human speech source showing a front-to-back difference in magnitude that increases with frequency by about 8 dB/decade reaching about 18 dB at 8000 Hz. These amplitude differences, while severely masked by both coherent and noncoherent noise in a real environment, are the most extractable phenomena from a talker's orientation when compared to other phenomena such as phase differences due to the source or effects due to diffraction at the mouth. In this paper, we propose a robust, source-radiation-pattern-based method for extraction of the azimuth angle of a single talker for whom an accurate point-source location estimate is known. The method requires no a priori training and has been tested in more than 100 situations with real human talkers having various locations and orientations in a room equipped with a large aperture microphone array. We compare these results against earlier published algorithms and find that the method proposed herein is the most robust and is sufficient to be considered for a real time system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Nonlinear Active Noise Control With NARX Models

    Publication Year: 2010 , Page(s): 286 - 295
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (493 KB) |  | HTML iconHTML  

    The extension of active noise control (ANC) techniques to deal with nonlinear effects such as distortion and saturation requires the introduction of suitable nonlinear model classes and adaptive algorithms. Large sized models are typically used, resulting in an increased computational load, delayed convergence (and sometimes even algorithm instability), and other unwanted dynamical effects due to overparametrization. This paper discusses the usage of polynomial nonlinear autoregressive models with exogenous variables (NARX) models and model selection techniques to reduce the model size and increase its robustness, for more efficient and reliable ANC. An offline procedure is devised to identify the controller model structure, and the controller parameters are successively updated with an adaptive algorithm based on the error gradient and on the residual noise. Simulation experiments show the effectiveness of the proposed approach. A brief analysis of the involved computational complexity is also provided. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unsupervised Data-Driven Feature Vector Normalization With Acoustic Model Adaptation for Robust Speech Recognition

    Publication Year: 2010 , Page(s): 296 - 309
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1636 KB) |  | HTML iconHTML  

    In this paper, an unsupervised data-driven robust speech recognition approach is proposed based on a joint feature vector normalization and acoustic model adaptation. Feature vector normalization reduces the acoustic mismatch between training and testing conditions by mapping the feature vectors towards the training space. Model adaptation modifies the parameters of the acoustic models to match the test space. However, since neither is optimal, both approaches use an intermediate space between training and testing spaces to map either the feature vectors or acoustic models. The joint optimization of both approaches provides a common intermediate space with a better match between normalized feature vectors and adapted acoustic models. In this paper, feature vector normalization is based on a minimum mean square error (MMSE) criterion. A class dependent multi-environment model linear normalization (CD-MEMLIN) based on two classes (silence/speech) with a cross probability model (CD-MEMLIN-CPM) is used. CD-MEMLIN-CPM assumes that each class of clean and noisy spaces can be modeled with a Gaussian mixture model (GMM), training a linear transformation for each pair of Gaussians in an unsupervised data-driven training process. This feature vector normalization maps the recognition space feature vector to a normalized space. The acoustic model adaptation maps the training space to the normalized space by defining a set of linear transformations over an expanded HMM-state space, compensating for those degradations that the feature vector normalization is not able to model, like rotations. Experiments have been carried out with the Spanish SpeechDat Car database and Aurora 2 databases using both the standard Mel-frequency cepstral coefficient (MFCC) and advanced ETSI front-ends. Consistent improvements were reached for both corpora and front-ends. Using the standard MFCC front-end, a 92.08% average improvement on WER for Spanish SpeechDat Car and a 69.75% average improvement - - for clean condition evaluation of Aurora 2 was obtained, improving those results reached with ETSI advanced front-end (83.28% and 67.41%, respectively). Using the ETSI advanced front-end with the proposed solution, a 75.47% average improvement was obtained for the clean condition evaluation of Aurora 2 database. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset

    Publication Year: 2010 , Page(s): 310 - 319
    Cited by:  Papers (20)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1241 KB) |  | HTML iconHTML  

    Monaural singing voice separation is an extremely challenging problem. While efforts in pitch-based inference methods have led to considerable progress in voiced singing voice separation, little attention has been paid to the incapability of such methods to separate unvoiced singing voice due to its in harmonic structure and weaker energy. In this paper, we proposed a systematic approach to identify and separate the unvoiced singing voice from the music accompaniment. We have also enhanced the performance of separating voiced singing via a spectral subtraction method. The proposed system follows the framework of computational auditory scene analysis (CASA) which consists of the segmentation stage and the grouping stage. In the segmentation stage, the input song signals are decomposed into small sensory elements in different time-frequency resolutions. The unvoiced sensory elements are then identified by Gaussian mixture models. The experimental results demonstrated that the quality of the separated singing voice is improved for both the unvoiced and voiced parts. Moreover, to deal with the problem of lack of a publicly available dataset for singing voice separation, we have constructed a corpus called MIR-1K (multimedia information retrieval lab, 1000 song clips) where all singing voices and music accompaniments were recorded separately. Each song clip comes with human-labeled pitch values, unvoiced sounds and vocal/non-vocal segments, and lyrics, as well as the speech recording of the lyrics. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-View Semi-Supervised Learning for Dialog Act Segmentation of Speech

    Publication Year: 2010 , Page(s): 320 - 329
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (988 KB) |  | HTML iconHTML  

    Sentence segmentation of speech aims at determining sentence boundaries in a stream of words as output by the speech recognizer. Typically, statistical methods are used for sentence segmentation. However, they require significant amounts of labeled data, preparation of which is time-consuming, labor-intensive, and expensive. This work investigates the application of multi-view semi-supervised learning algorithms on the sentence boundary classification problem by using lexical and prosodic information. The aim is to find an effective semi-supervised machine learning strategy when only small sets of sentence boundary-labeled data are available. We especially focus on two semi-supervised learning approaches, namely, self-training and co-training. We also compare different example selection strategies for co-training, namely, agreement and disagreement. Furthermore, we propose another method, called self-combined, which is a combination of self-training and co-training. The experimental results obtained on the ICSI Meeting (MRDA) Corpus show that both multi-view methods outperform self-training, and the best results are obtained using co-training alone. This study shows that sentence segmentation is very appropriate for multi-view learning since the data sets can be represented by two disjoint and redundantly sufficient feature sets, namely, using lexical and prosodic information. Performance of the lexical and prosodic models is improved by 26% and 11% relative, respectively, when only a small set of manually labeled examples is used. When both information sources are combined, the semi-supervised learning methods improve the baseline F-Measure of 69.8% to 74.2%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Trellis-Based Approaches to Rate-Distortion Optimized Audio Encoding

    Publication Year: 2010 , Page(s): 330 - 341
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1102 KB) |  | HTML iconHTML  

    Many important audio coding applications, such as streaming and playback of stored audio, involve offline compression. In such scenarios, encoding delays no longer represent a major concern. Despite this fact, most current audio encoders constrain delay by making encoding decisions on a per frame basis. This paper is concerned with delayed-decision approaches to optimize the encoding operation for the entire audio file. Trellis-based dynamic programming is used for efficient search in the parameter space. A two-layered trellis effectively optimizes the choice of quantization and coding parameters within a frame, as well as window decisions and bit distribution across frames, while minimizing a psychoacoustically relevant distortion measure under a prescribed bit-rate constraint. The bitstream thus produced is standard compatible and there is no additional decoding delay. Objective and subjective results indicate substantial gains over the reference encoder. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Theoretical Analysis of Binaural Multimicrophone Noise Reduction Techniques

    Publication Year: 2010 , Page(s): 342 - 355
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1360 KB) |  | HTML iconHTML  

    Binaural hearing aids use microphone signals from both left and right hearing aid to generate an output signal for each ear. The microphone signals can be processed by a procedure based on speech distortion weighted multichannel Wiener filtering (SDW-MWF) to achieve significant noise reduction in a speech + noise scenario. In binaural procedures, it is also desirable to preserve binaural cues, in particular the interaural time difference (ITD) and interaural level difference (ILD), which are used to localize sounds. It has been shown in previous work that the binaural SDW-MWF procedure only preserves these binaural cues for the desired speech source, but distorts the noise binaural cues. Two extensions of the binaural SDW-MWF have therefore been proposed to improve the binaural cue preservation, namely the MWF with partial noise estimation (MWF-eta) and MWF with interaural transfer function extension (MWF-ITF). In this paper, the binaural cue preservation of these extensions is analyzed theoretically and tested based on objective performance measures. Both extensions are able to preserve binaural cues for the speech and noise sources, while still achieving significant noise reduction performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech Enhancement Using Harmonic Emphasis and Adaptive Comb Filtering

    Publication Year: 2010 , Page(s): 356 - 368
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2199 KB) |  | HTML iconHTML  

    An enhancement method for single-channel speech degraded by additive noise is proposed. A spectral weighting function is derived by constrained optimization to suppress noise in the frequency domain. Two design parameters are included in the suppression gain, namely, the frequency-dependent noise-flooring parameter (FDNFP) and the gain factor. The FDNFP controls the level of admissible residual noise in the enhanced speech. Enhanced harmonic structures are incorporated into the FDNFP by time-domain processing of the linear prediction residuals of voiced speech. Further enhancement of the harmonics is achieved by adaptive comb filtering derived using the gain factor with a peak-picking algorithm. The performance of the enhancement method was evaluated by the modified bark spectral distance (MBSD), ITU-Perceptual Evaluation of Speech Quality (PESQ) scores, composite objective measures and listening tests. Experimental results indicate that the proposed method outperforms spectral subtraction; a main signal subspace method applicable to both white and colored noise conditions and a perceptually based enhancement method with a constant noise-flooring parameter, particularly at lower signal-to-noise ratio conditions. Our listening test indicated that 16 listeners on average preferred the proposed approach over any of the other three approaches about 73% of the time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Detection and Interpretation of Opinion Expressions in Spoken Surveys

    Publication Year: 2010 , Page(s): 369 - 381
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (821 KB) |  | HTML iconHTML  

    This paper describes a system for automatic opinion analysis from spoken messages collected in the context of a user satisfaction survey. Opinion analysis is performed from the perspective of opinion monitoring. A process is outlined for detecting segments expressing opinions in a speech signal. Methods are proposed for accepting or rejecting segments from messages that are not reliably analyzed due to the limitations of automatic speech recognition processes, for assigning opinion hypotheses to segments and for evaluating hypothesis opinion proportions. Specific language models are introduced for representing opinion concepts. These models are used for hypothesizing opinion carrying segments in a spoken message. Each segment is interpreted by a classifier based on the Adaboost algorithm which associates a pair of topic and polarity labels to each segment. The different processes are trained and evaluated on a telephone corpus collected in a deployed customer care service. The use of conditional random fields (CRFs) is also considered for detecting segments and results are compared for different types of data and approaches. By optimizing the choice of the strategy parameters, it is possible to estimate user opinion proportions with a Kullback-Leibler divergence of 0.047 bits with respect to the true proportions obtained with a manual annotation of the spoken messages. The proportions estimated with such a low divergence are accurate enough for monitoring user satisfaction over time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Model-Based Expectation-Maximization Source Separation and Localization

    Publication Year: 2010 , Page(s): 382 - 394
    Cited by:  Papers (37)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3543 KB) |  | HTML iconHTML  

    This paper describes a system, referred to as model-based expectation-maximization source separation and localization (MESSL), for separating and localizing multiple sound sources from an underdetermined reverberant two-channel recording. By clustering individual spectrogram points based on their interaural phase and level differences, MESSL generates masks that can be used to isolate individual sound sources. We first describe a probabilistic model of interaural parameters that can be evaluated at individual spectrogram points. By creating a mixture of these models over sources and delays, the multi-source localization problem is reduced to a collection of single source problems. We derive an expectation-maximization algorithm for computing the maximum-likelihood parameters of this mixture model, and show that these parameters correspond well with interaural parameters measured in isolation. As a byproduct of fitting this mixture model, the algorithm creates probabilistic spectrogram masks that can be used for source separation. In simulated anechoic and reverberant environments, separations using MESSL produced on average a signal-to-distortion ratio 1.6 dB greater and perceptual evaluation of speech quality (PESQ) results 0.27 mean opinion score units greater than four comparable algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Predictor–Corrector Adaptation by Using Time Evolution System With Macroscopic Time Scale

    Publication Year: 2010 , Page(s): 395 - 406
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1124 KB) |  | HTML iconHTML  

    Incremental adaptation techniques for speech recognition are aimed at adjusting acoustic models to time-variant acoustic characteristics related to such factors as changes of speaker, speaking style, and noise source over time. In this paper, we propose a novel incremental adaptation framework, which models such time-variant characteristics by successively updating posterior distributions of acoustic model parameters based on a macroscopic time scale (e.g., every set of more than a dozen utterances). The proposed incremental update involves a predictor-corrector algorithm based on a macroscopic time evolution system in accordance with the Kalman filter theory. We also provide a unified interpretation of the proposal and the two major conventional approaches of indirect adaptation via transformation parameters [e.g., maximum-likelihood linear regression (MLLR)] and direct adaptation of classifier parameters [e.g., maximum a posteriori (MAP)]. We reveal analytically and experimentally that the proposed incremental adaptation realizes the predictor-corrector algorithm and involves both the conventional and their combinatorial adaptation approaches. Consequently, the proposal achieves robust recognition performance based on a balanced incremental adaptation between quickness and stability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MusicBox: Personalized Music Recommendation Based on Cubic Analysis of Social Tags

    Publication Year: 2010 , Page(s): 407 - 412
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (369 KB) |  | HTML iconHTML  

    Social tagging is becoming increasingly popular in music information retrieval (MIR). It allows users to tag music items like songs, albums, or artists. Social tags are valuable to MIR, because they comprise a multifaced source of information about genre, style, mood, users' opinion, or instrumentation. In this paper, we examine the problem of personalized music recommendation based on social tags. We propose the modeling of social tagging data with three-order tensors, which capture cubic (three-way) correlations between users-tags-music items. The discovery of latent structure in this model is performed with the Higher Order Singular Value Decomposition (HOSVD), which helps to provide accurate and personalized recommendations, i.e., adapted to the particular users' preferences. To address the sparsity that incurs in social tagging data and further improve the quality of recommendation, we propose to enhance the model with a tag-propagation scheme that uses similarity values computed between the music items based on audio features. As a result, the proposed model effectively combines both information about social tags and audio features. The performance of the proposed method is examined experimentally with real data from Last.fm. Our results indicate the superiority of the proposed approach compared to existing methods that suppress the cubic relationships that are inherent in social tagging data. Additionally, our results suggest that the combination of social tagging data with audio features is preferable than the sole use of the former. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Audio, Speech, and Language Processing Edics

    Publication Year: 2010 , Page(s): 413 - 414
    Save to Project icon | Request Permissions | PDF file iconPDF (31 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing Information for authors

    Publication Year: 2010 , Page(s): 415 - 416
    Save to Project icon | Request Permissions | PDF file iconPDF (46 KB)  
    Freely Available from IEEE
  • IEEE Signal Processing Society Information

    Publication Year: 2010 , Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (32 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2013. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Li Deng
Microsoft Research