By Topic

Audio, Speech, and Language Processing, IEEE Transactions on

Issue 5 • Date July 2007

Filter Results

Displaying Results 1 - 25 of 30
  • Table of contents

    Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (48 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (36 KB)  
    Freely Available from IEEE
  • Introduction to the Special Section on Blind Signal Processing for Speech and Audio Applications

    Page(s): 1509 - 1510
    Save to Project icon | Request Permissions | PDF file iconPDF (490 KB)  
    Freely Available from IEEE
  • Spatio–Temporal FastICA Algorithms for the Blind Separation of Convolutive Mixtures

    Page(s): 1511 - 1520
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (960 KB) |  | HTML iconHTML  

    This paper derives two spatio-temporal extensions of the well-known FastICA algorithm of Hyvarinen and Oja that are applicable to the convolutive blind source separation task. Our time-domain algorithms combine multichannel spatio-temporal prewhitening via multistage least-squares linear prediction with novel adaptive procedures that impose paraunitary constraints on the multichannel separation filter. The techniques converge quickly to a separation solution without any step size selection or divergence difficulties, and unlike other methods, ours do not require special coefficient initialization procedures to obtain good separation performance. They also allow for the efficient reconstruction of individual signals as observed in the sensor measurements directly from the system parameters for single-input multiple-output blind source separation tasks. An analysis of one of the adaptive constraint procedures shows its fast convergence to a paraunitary filter bank solution. Numerical evaluations of the proposed algorithms and comparisons with several existing convolutive blind source separation techniques indicate the excellent relative performance of the proposed methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the Assumption of Spherical Symmetry and Sparseness for the Frequency-Domain Speech Model

    Page(s): 1521 - 1528
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1476 KB) |  | HTML iconHTML  

    A new independent component analysis (ICA) formulation called independent vector analysis (IVA) was proposed in order to solve the permutation problem in convolutive blind source separation (BSS). Instead of running ICA in each frequency bin separately and correcting the disorder with an additional algorithmic scheme afterwards, IVA exploited the dependency among the frequency components of a source and dealt with them as a multivariate source by modeling it with sparse and spherically, or radially, symmetric joint probability density functions (pdfs). In this paper, we compare the speech separation performances of IVA by using a group of lp-norm-invariant sparse pdfs where the value of and the sparseness can be controlled. Also, we derive an IVA algorithm from a nonparametric perspective with the constraint of spherical symmetry and high dimensionality. Simulation results confirm the efficiency of assuming sparseness and spherical symmetry for the speech model in the frequency domain. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Blind Acoustic Beamforming Based on Generalized Eigenvalue Decomposition

    Page(s): 1529 - 1539
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (578 KB) |  | HTML iconHTML  

    Maximizing the output signal-to-noise ratio (SNR) of a sensor array in the presence of spatially colored noise leads to a generalized eigenvalue problem. While this approach has extensively been employed in narrowband (antenna) array beamforming, it is typically not used for broadband (microphone) array beamforming due to the uncontrolled amount of speech distortion introduced by a narrowband SNR criterion. In this paper, we show how the distortion of the desired signal can be controlled by a single-channel post-filter, resulting in a performance comparable to the generalized minimum variance distortionless response beamformer, where arbitrary transfer functions relate the source and the microphones. Results are given both for directional and diffuse noise. A novel gradient ascent adaptation algorithm is presented, and its good convergence properties are experimentally revealed by comparison with alternatives from the literature. A key feature of the proposed beamformer is that it operates blindly, i.e., it neither requires knowledge about the array geometry nor an explicit estimation of the transfer functions from source to sensors or the direction-of-arrival. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Blind Separation of Underdetermined Convolutive Mixtures Using Their Time–Frequency Representation

    Page(s): 1540 - 1550
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1515 KB) |  | HTML iconHTML  

    This paper considers the blind separation of nonstationary sources in the underdetermined convolutive mixture case. We introduce, two methods based on the sparsity assumption of the sources in the time-frequency (TF) domain. The first one assumes that the sources are disjoint in the TF domain, i.e., there is at most one source signal present at a given point in the TF domain. In the second method, we relax this assumption by allowing the sources to be TF-nondisjoint to a certain extent. In particular, the number of sources present (active) at a TF point should be strictly less than the number of sensors. In that case, the separation can be achieved thanks to subspace projection which allows us to identify the active sources and to estimate their corresponding time-frequency distribution (TFD) values. Another contribution of this paper is a new estimation procedure for the mixing channel in the underdetermined case. Finally, numerical performance evaluations and comparisons of the proposed methods are provided highlighting their effectiveness. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Convolutive Blind Source Separation in the Frequency Domain Based on Sparse Representation

    Page(s): 1551 - 1563
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (637 KB) |  | HTML iconHTML  

    Convolutive blind source separation (CBSS) that exploits the sparsity of source signals in the frequency domain is addressed in this paper. We assume the sources follow complex Laplacian-like distribution for complex random variable, in which the real part and imaginary part of complex-valued source signals are not necessarily independent. Based on the maximum a posteriori (MAP) criterion, we propose a novel natural gradient method for complex sparse representation. Moreover, a new CBSS method is further developed based on complex sparse representation. The developed CBSS algorithm works in the frequency domain. Here, we assume that the source signals are sufficiently sparse in the frequency domain. If the sources are sufficiently sparse in the frequency domain and the filter length of mixing channels is relatively small and can be estimated, we can even achieve underdetermined CBSS. We illustrate the validity and performance of the proposed learning algorithm by several simulation examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs

    Page(s): 1564 - 1578
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (882 KB) |  | HTML iconHTML  

    Probabilistic approaches can offer satisfactory solutions to source separation with a single channel, provided that the models of the sources match accurately the statistical properties of the mixed signals. However, it is not always possible to train such models. To overcome this problem, we propose to resort to an adaptation scheme for adjusting the source models with respect to the actual properties of the signals observed in the mix. In this paper, we introduce a general formalism for source model adaptation which is expressed in the framework of Bayesian models. Particular cases of the proposed approach are then investigated experimentally on the problem of separating voice from music in popular songs. The obtained results show that an adaptation scheme can improve consistently and significantly the separation performance in comparison with nonadapted models. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust Speech Dereverberation Using Multichannel Blind Deconvolution With Spectral Subtraction

    Page(s): 1579 - 1591
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1327 KB) |  | HTML iconHTML  

    A robust dereverberation method is presented for speech enhancement in a situation requiring adaptation where a speaker shifts his/her head under reverberant conditions causing the impulse responses to change frequently. We combine correlation-based blind deconvolution with modified spectral subtraction to improve the quality of inverse-filtered speech degraded by the estimation error of inverse filters obtained in practice. Our method computes inverse filters by using the correlation matrix between input signals that can be observed without measuring room impulse responses. Inverse filtering reduces early reflection, which has most of the power of the reverberation, and then, spectral subtraction suppresses the tail of the inverse-filtered reverberation. The performance of our method in adaptation is demonstrated by experiments using measured room impulse responses. The subjective results indicated that this method provides superior speech quality to each of the individual methods: blind deconvolution and spectral subtraction. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Grouping Separated Frequency Components by Estimating Propagation Model Parameters in Frequency-Domain Blind Source Separation

    Page(s): 1592 - 1604
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1046 KB) |  | HTML iconHTML  

    This paper proposes a new formulation and optimization procedure for grouping frequency components in frequency-domain blind source separation (BSS). We adopt two separation techniques, independent component analysis (ICA) and time-frequency (T-F) masking, for the frequency-domain BSS. With ICA, grouping the frequency components corresponds to aligning the permutation ambiguity of the ICA solution in each frequency bin. With T-F masking, grouping the frequency components corresponds to classifying sensor observations in the time-frequency domain for individual sources. The grouping procedure is based on estimating anechoic propagation model parameters by analyzing ICA results or sensor observations. More specifically, the time delays of arrival and attenuations from a source to all sensors are estimated for each source. The focus of this paper includes the applicability of the proposed procedure for a situation with wide sensor spacing where spatial aliasing may occur. Experimental results show that the proposed procedure effectively separates two or three sources with several sensor configurations in a real room, as long as the room reverberation is moderately low. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Content-Dependent Watermarking Scheme in Compressed Speech With Identifying Manner and Location of Attacks

    Page(s): 1605 - 1616
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (733 KB) |  | HTML iconHTML  

    As speech compression technologies have advanced, digital recording devices have become increasingly popular. However, data formats used in popular speech codecs are known a priori, such that compressed data can be modified easily via insertion, deletion, and replacement. This work proposes a content-dependent watermarking scheme suitable for codebook-excited linear prediction (CELP)-based speech codec that ensures the integrity of compressed speech data. Speech data are initially partitioned into many groups, each of which includes multiple speech frames. The watermark embedded in each frame is then generated according to the line spectrum frequency (LSF) feature in the current frame, the pitch extracted from the succeeding frame, the watermark embedded in the preceding frame, and the group index which is determined by the location of the current frame. Finally, some of the least significant bits (LSBs) of the indices indicating the excitation pulse positions or excitation vectors are substituted for the watermark. Conventional watermarking schemes can only detect whether compressed speech data are intact. They cannot determine where compressed speech data are altered by insertion, deletion, or replacement, whereas the proposed scheme can. Experiments established that the proposed scheme used in the G.723.1 6.3 kb/s speech codecs embeds 12 bits in each compressed speech frame with 189 bits, and only decreases the perceptual evaluation of speech quality (PESQ) by 0.11. Additionally, its accuracy in detecting the locations of attacked frames is very high, with only two normal frames mistaken as attacked frames. Therefore, the proposed watermarking scheme effectively ensures the integrity of compressed speech data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Growing and Pruning Kneser–Ney Smoothed  N -Gram Models

    Page(s): 1617 - 1624
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (476 KB) |  | HTML iconHTML  

    N-gram models are the most widely used language models in large vocabulary continuous speech recognition. Since the size of the model grows rapidly with respect to the model order and available training data, many methods have been proposed for pruning the least relevant -grams from the model. However, correct smoothing of the N-gram probability distributions is important and performance may degrade significantly if pruning conflicts with smoothing. In this paper, we show that some of the commonly used pruning methods do not take into account how removing an -gram should modify the backoff distributions in the state-of-the-art Kneser-Ney smoothing. To solve this problem, we present two new algorithms: one for pruning Kneser-Ney smoothed models, and one for growing them incrementally. Experiments on Finnish and English text corpora show that the proposed pruning algorithm provides considerable improvements over previous pruning algorithms on Kneser-Ney smoothed models and is also better than the baseline entropy pruned Good-Turing smoothed models. The models created by the growing algorithm provide a good starting point for our pruning algorithm, leading to further improvements. The improvements in the Finnish speech recognition over the other Kneser-Ney smoothed models are statistically significant, as well. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enhancing the Tracking of Partials for the Sinusoidal Modeling of Polyphonic Sounds

    Page(s): 1625 - 1634
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (544 KB) |  | HTML iconHTML  

    This paper addresses the problem of tracking partials, i.e., determining the evolution over time of the parameters of a given number of sinusoids with respect to the analyzed audio stream. We first show that the minimal frequency difference heuristic generally used to identify continuities between local maxima of successive short-time spectra can be successfully generalized using the linear prediction formalism to handle modulated sounds such as musical tones with vibrato. The spectral properties of the evolutions in time of the parameters of the partials are next studied to ensure that the parameters of the partials effectively satisfy the slow time-varying constraint of the sinusoidal model. These two improvements are combined in a new algorithm designed for the sinusoidal modeling of polyphonic sounds. The comparative tests show that onsets/offsets of sinusoids as well as closely spaced sinusoids are better identified and stochastic components are better avoided. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Joint High-Resolution Fundamental Frequency and Order Estimation

    Page(s): 1635 - 1644
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1493 KB) |  | HTML iconHTML  

    In this paper, we present a novel method for joint estimation of the fundamental frequency and order of a set of harmonically related sinusoids based on the multiple signal classification (MUSIC) estimation criterion. The presented method, termed HMUSIC, is shown to have an efficient implementation using fast Fourier transforms (FFTs). Furthermore, refined estimates can be obtained using a gradient-based method. Illustrative examples of the application of the algorithm to real-life speech and audio signals are given, and the statistical performance of the estimator is evaluated using synthetic signals, demonstrating its good statistical properties. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-Time Signal Estimation From Modified Short-Time Fourier Transform Magnitude Spectra

    Page(s): 1645 - 1653
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (692 KB) |  | HTML iconHTML  

    An algorithm for estimating signals from short-time magnitude spectra is introduced offering a significant improvement in quality and efficiency over current methods. The key issue is how to invert a sequence of overlapping magnitude spectra (a ldquospectrogramrdquo) containing no phase information to generate a real-valued signal free of audible artifacts. Also important is that the algorithm performs in real-time, both structurally and computationally. In the context of spectrogram inversion, structurally real-time means that the audio signal at any given point in time only depends on transform frames at local or prior points in time. Computationally, real-time means that the algorithm is efficient enough to run in less time than the reconstructed audio takes to play on the available hardware. The spectrogram inversion algorithm is parameterized to allow tradeoffs between computational demands and the quality of the signal reconstruction. The algorithm is applied to audio time-scale and pitch modification and compared to classical algorithms for these tasks on a variety of signal types including both monophonic and polyphonic audio signals such as speech and music. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Temporal Feature Integration for Music Genre Classification

    Page(s): 1654 - 1664
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (889 KB) |  | HTML iconHTML  

    Temporal feature integration is the process of combining all the feature vectors in a time window into a single feature vector in order to capture the relevant temporal information in the window. The mean and variance along the temporal dimension are often used for temporal feature integration, but they capture neither the temporal dynamics nor dependencies among the individual feature dimensions. Here, a multivariate autoregressive feature model is proposed to solve this problem for music genre classification. This model gives two different feature sets, the diagonal autoregressive (DAR) and multivariate autoregressive (MAR) features which are compared against the baseline mean-variance as well as two other temporal feature integration techniques. Reproducibility in performance ranking of temporal feature integration methods were demonstrated using two data sets with five and eleven music genres, and by using four different classification schemes. The methods were further compared to human performance. The proposed MAR features perform better than the other features at the cost of increased computational complexity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive Parallel Quadratic-Metric Projection Algorithms

    Page(s): 1665 - 1680
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (827 KB) |  | HTML iconHTML  

    This paper indicates that an appropriate design of metric leads to significant improvements in the adaptive projected subgradient method (APSM), which unifies a wide range of projection-based algorithms [including normalized least mean square (NLMS) and affine projection algorithm (APA)]. The key is to incorporate a priori (or a posteriori) information on characteristics of an estimandum, a system to be estimated, into the metric design. We propose a family of efficient adaptive filtering algorithms based on a parallel use of quadratic-metric projection, which assigns every point to the nearest point in a closed convex set in a quadratic-metric sense. We present two versions: (1) constant-metric and (2) variable-metric, i.e., the metric function employed is (1) constant and (2) variable among iterations. As a constant-metric version, adaptive parallel quadratic-metric projection (APQP) and adaptive parallel min-max quadratic-metric projection (APMQP) algorithms are naturally derived by APSM, being endowed with desirable properties such as convergence to a point optimal in asymptotic sense. As a variable-metric version, adaptive parallel variable-metric projection (APVP) algorithm is derived by a generalized APSM, enjoying an extended monotone property at each iteration. By employing a simple quadratic-metric, the computational complexity of the proposed algorithms is kept linear with respect to the filter length. Numerical examples demonstrate the remarkable advantages of the proposed algorithms in an application to acoustic echo cancellation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Selective-Tap Adaptive Filtering With Performance Analysis for Identification of Time-Varying Systems

    Page(s): 1681 - 1695
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (679 KB) |  | HTML iconHTML  

    Selective-tap algorithms employing the MMax tap selection criterion were originally proposed for low-complexity adaptive filtering. The concept has recently been extended to multichannel adaptive filtering and applied to stereophonic acoustic echo cancellation. This paper first briefly reviews least mean square versions of MMax selective-tap adaptive filtering and then introduces new recursive least squares and affine projection MMax algorithms. We subsequently formulate an analysis of the MMax algorithms for time-varying system identification by modeling the unknown system using a modified Markov process. Analytical results are derived for the tracking performance of MMax selective tap algorithms for normalized least mean square, recursive least squares, and affine projection algorithms. Simulation results are shown to verify the analysis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Short-Term Spatio–Temporal Clustering Applied to Multiple Moving Speakers

    Page(s): 1696 - 1710
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1342 KB) |  | HTML iconHTML  

    Distant microphones permit to process spontaneous multiparty speech with very little constraints on speakers, as opposed to close-talking microphones. Minimizing the constraints on speakers permits a large diversity of applications, including meeting summarization and browsing, surveillance, hearing aids, and more natural human-machine interaction. Such applications of distant microphones require to determine where and when the speakers are talking. This is inherently a multisource problem, because of background noise sources, as well as the natural tendency of multiple speakers to talk over each other. Moreover, spontaneous speech utterances are highly discontinuous, which makes it difficult to track the multiple speakers with classical filtering approaches, such as Kalman filtering of particle filters. As an alternative, this paper proposes a probabilistic framework to determine the trajectories of multiple moving speakers in the short-term only, i.e., only while they speak. Instantaneous location estimates that are close in space and time are grouped into ldquoshort-term clustersrdquo in a principled manner. Each short-term cluster determines the precise start and end times of an utterance and a short-term spatial trajectory. Contrastive experiments clearly show the benefit of using short-term clustering, on real indoor recordings with seated speakers in meetings, as well as multiple moving speakers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust Speaker Recognition in Noisy Conditions

    Page(s): 1711 - 1723
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3578 KB) |  | HTML iconHTML  

    This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise, but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of speaker recognition technologies on handheld devices or the Internet. While the technologies promise an additional biometric layer of security to protect the user, the practical implementation of such systems faces many challenges. One of these is environmental noise. Due to the mobile nature of such systems, the noise sources can be highly time-varying and potentially unknown. This raises the requirement for noise robustness in the absence of information about the noise. This paper describes a method that combines multicondition model training and missing-feature theory to model noise with unknown temporal-spectral characteristics. Multicondition training is conducted using simulated noisy data with limited noise variation, providing a ldquocoarserdquo compensation for the noise, and missing-feature theory is applied to refine the compensation by ignoring noise variation outside the given training conditions, thereby reducing the training and testing mismatch. This paper is focused on several issues relating to the implementation of the new model for real-world applications. These include the generation of multicondition training data to model noisy speech, the combination of different training data to optimize the recognition performance, and the reduction of the model's complexity. The new algorithm was tested using two databases with simulated and realistic noisy speech data. The first database is a redevelopment of the TIMIT database by rerecording the data in the presence of various noise types, used to test the model for speaker identification with a focus on the varieties of noise. The second database is a handheld-device database collected in realistic noisy condi- tions, used to further validate the model for real-world speaker verification. The new model is compared to baseline systems and is found to achieve lower error rates. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Noise-Robust Automatic Speech Recognition Using a Predictive Echo State Network

    Page(s): 1724 - 1730
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (212 KB) |  | HTML iconHTML  

    Artificial neural networks have been shown to perform well in automatic speech recognition (ASR) tasks, although their complexity and excessive computational costs have limited their use. Recently, a recurrent neural network with simplified training, the echo state network (ESN), was introduced by Jaeger and shown to outperform conventional methods in time series prediction experiments. We created the predictive ESN classifier by combining the ESN with a state machine framework. In small-vocabulary ASR experiments, we compared the noise-robust performance of the predictive ESN classifier with a hidden Markov model (HMM) as a function of model size and signal-to-noise ratio (SNR). The predictive ESN classifier outperformed an HMM by 8-dB SNR, and both models achieved maximum noise-robust accuracy for architectures with more states and fewer kernels per state. Using ten trials of random sets of training/validation/test speakers, accuracy for the predictive ESN classifier, averaged between 0 and 20 dB SNR, was 81plusmn3%, compared to 61plusmn2% for an HMM. The closed-form regression training for the ESN significantly reduced the computational cost of the network, and the reservoir of the ESN created a high-dimensional representation of the input with memory which led to increased noise-robust classification. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comments on Vocal Tract Length Normalization Equals Linear Transformation in Cepstral Space

    Page(s): 1731 - 1732
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (84 KB) |  | HTML iconHTML  

    The bilinear transformation (BT) is used for vocal tract length normalization (VTLN) in speech recogniton systems. We prove two properties of the bilinear mapping that motivated the band-diagonal transform proposed in M. Afify and O. Siohan, (ldquoConstrained maximum likelihood linear regression for speaker adaptation,rdquo in Proc. ICSLP, Beijing, China, Oct. 2000.) This is in contrast to what is stated in M. Pitz and H. Ney, (ldquoVocal tract length normalization equals linear transformation in cepstral space,rdquo IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp 930-944, September 2005) that the transform of Afify and Siohan was motivated by empirical observations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Audio, Speech, and Language Processing Edics

    Page(s): 1733 - 1734
    Save to Project icon | Request Permissions | PDF file iconPDF (30 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing Information for authors

    Page(s): 1735 - 1736
    Save to Project icon | Request Permissions | PDF file iconPDF (45 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2013. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Li Deng
Microsoft Research