By Topic

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Issue 3 • Date March 2014

Filter Results

Displaying Results 1 - 23 of 23
  • [Front cover]

    Publication Year: 2014, Page(s): C1
    Request permission for commercial reuse | PDF file iconPDF (321 KB)
    Freely Available from IEEE
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information

    Publication Year: 2014, Page(s): C2
    Request permission for commercial reuse | PDF file iconPDF (133 KB)
    Freely Available from IEEE
  • Table of Contents

    Publication Year: 2014, Page(s):581 - 582
    Request permission for commercial reuse | PDF file iconPDF (252 KB)
    Freely Available from IEEE
  • Table of Contents

    Publication Year: 2014, Page(s):583 - 584
    Request permission for commercial reuse | PDF file iconPDF (253 KB)
    Freely Available from IEEE
  • Synthesis of Spontaneous Speech With Syllable Contraction Using State-Based Context-Dependent Voice Transformation

    Publication Year: 2014, Page(s):585 - 595
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2226 KB)

    Pronunciation normally varies in spontaneous speech, and is an integral aspect of spontaneous expression. This study describes a voice transformation-based approach to generating spontaneous speech with syllable contractions for Hidden Markov Model (HMM)-based speech synthesis. A multi-dimensional linear regression model is adopted as the context-dependent, state-based transformation function to c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Quasi Closed Phase Glottal Inverse Filtering Analysis With Weighted Linear Prediction

    Publication Year: 2014, Page(s):596 - 607
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1640 KB) | HTML iconHTML

    This study presents a new glottal inverse filtering (GIF) technique based on closed phase analysis over multiple fundamental periods. The proposed quasi closed phase (QCP) analysis method utilizes weighted linear prediction (WLP) with a specific attenuated main excitation (AME) weight function that attenuates the contribution of the glottal source in the linear prediction model optimization. This ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Online Speech Dereverberation Algorithm Based on Adaptive Multichannel Linear Prediction

    Publication Year: 2014, Page(s):608 - 619
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2799 KB) | HTML iconHTML

    This paper proposes a real-time acoustic channel equalization method that uses an adaptive multichannel linear prediction technique. In general, multichannel equalization algorithms can eliminate reverberation if they meet the following specific conditions including: the co-primeness between channels and sufficient filter length. It also requires the characteristic of correct channel information, ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structured Sparsity Models for Reverberant Speech Separation

    Publication Year: 2014, Page(s):620 - 633
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1606 KB) | HTML iconHTML

    We tackle the speech separation problem through modeling the acoustics of the reverberant chambers. Our approach exploits structured sparsity models to perform speech recovery and room acoustic modeling from recordings of concurrent unknown sources. The speakers are assumed to lie on a two-dimensional plane and the multipath channel is characterized using the image model. We propose an algorithm f... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multichannel Equalization in the KLT and Frequency Domains With Application to Speech Dereverberation

    Publication Year: 2014, Page(s):634 - 646
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (3140 KB) | HTML iconHTML

    Equalization of acoustic channels usually involves inversion of acoustic impulse responses (AIRs), and generally employs multichannel techniques. In this paper, we propose three equalization algorithms, one in the Karhunen-Loève transform (KLT) domain and the other two in the frequency domain. Our proposed algorithm in the KLT domain provides a platform to achieve equalization in conjuncti... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Wavefield Analysis Over Large Areas Using Distributed Higher Order Microphones

    Publication Year: 2014, Page(s):647 - 658
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (3213 KB) | HTML iconHTML

    Successful recording of large spatial soundfields is a prevailing challenge in acoustic signal processing due to the enormous numbers of microphones required. This paper presents the design and analysis of an array of higher order microphones that uses 2D wavefield translation to provide a mode matching solution to the height invariant recording problem. It is shown that the use of Mth order micro... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting Psychological Factors for Interaction Style Recognition in Spoken Conversation

    Publication Year: 2014, Page(s):659 - 671
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2520 KB)

    Determining how a speaker is engaged in a conversation is crucial for achieving harmonious interaction between computers and humans. In this study, a fusion approach was developed based on psychological factors to recognize Interaction Style ( IS) in spoken conversation, which plays a key role in creating natural dialogue agents. The proposed Fused Cross-Correlation Model (FCCM) provides a unified... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Genre-Based Music Language Modeling with Latent Hierarchical Pitman-Yor Process Allocation

    Publication Year: 2014, Page(s):672 - 681
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1751 KB) | HTML iconHTML

    In this work we present a new Bayesian topic model: latent hierarchical Pitman-Yor process allocation (LHPYA), which uses hierarchical Pitman-Yor process priors for both word and topic distributions, and generalizes a few of the existing topic models, including the latent Dirichlet allocation (LDA), the bigram topic model and the hierarchical Pitman-Yor topic model. Using such priors allows for in... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Theory of Compressive Sensing Matching Pursuit Considering Time-domain Noise with Application to Speech Enhancement

    Publication Year: 2014, Page(s):682 - 696
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (3535 KB) | HTML iconHTML

    Compressive sampling matching pursuit (CoSaMP) is an efficient compressive sensing algorithm holding rigorous estimation error bounds and low computational complexity, when it deals with an additive noise signal model in the observation domain. However, in some applications, e.g., speech enhancement (SE), noise is added to a signal in the time domain, where the conventional CoSaMP cannot be direct... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cascaded Long Term Prediction for Enhanced Compression of Polyphonic Audio Signals

    Publication Year: 2014, Page(s):697 - 710
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2676 KB) | HTML iconHTML

    Audio compression systems exploit periodicity in signals to remove inter-frame redundancies via the long term prediction (LTP) tool. This simple tool capitalizes on the periodic component of the waveform by selecting a past segment as the basis for prediction of the current frame. However, most audio signals are polyphonic in nature, containing a mixture of several periodic components. While such ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems

    Publication Year: 2014, Page(s):711 - 726
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2848 KB) | HTML iconHTML

    Diversity or complementarity of automatic speech recognition (ASR) systems is crucial for achieving a reduction in word error rate (WER) upon fusion using the ROVER algorithm. We present a theoretical proof explaining this often-observed link between ASR system diversity and ROVER performance. This is in contrast to many previous works that have only presented empirical evidence for this link or h... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation

    Publication Year: 2014, Page(s):727 - 739
    Cited by:  Papers (12)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2036 KB) | HTML iconHTML

    This paper addresses the problem of sound source separation from a multichannel microphone array capture via estimation of source spatial covariance matrix (SCM) of a short-time Fourier transformed mixture signal. In many conventional audio separation algorithms the source mixing parameter estimation is done separately for each frequency thus making them prone to errors and leading to suboptimal s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • [Blank page]

    Publication Year: 2014, Page(s): B740
    Request permission for commercial reuse | PDF file iconPDF (5 KB)
    Freely Available from IEEE
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing Edics

    Publication Year: 2014, Page(s):741 - 742
    Request permission for commercial reuse | PDF file iconPDF (108 KB)
    Freely Available from IEEE
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing Information for Authors

    Publication Year: 2014, Page(s):743 - 744
    Request permission for commercial reuse | PDF file iconPDF (147 KB)
    Freely Available from IEEE
  • Open Access

    Publication Year: 2014, Page(s): 745
    Request permission for commercial reuse | PDF file iconPDF (1157 KB)
    Freely Available from IEEE
  • Publish your article in IEEE Access

    Publication Year: 2014, Page(s): 746
    Request permission for commercial reuse | PDF file iconPDF (1156 KB)
    Freely Available from IEEE
  • IEEE Signal Processing Society Information

    Publication Year: 2014, Page(s): C3
    Request permission for commercial reuse | PDF file iconPDF (121 KB)
    Freely Available from IEEE
  • [Blank page - back cover]

    Publication Year: 2014, Page(s): C4
    Request permission for commercial reuse | PDF file iconPDF (5 KB)
    Freely Available from IEEE

Aims & Scope

The IEEE/ACM Transactions on Audio, Speech, and Language Processing is dedicated to innovative theory and methods for processing signals representing audio, speech and language, and their applications. This includes analysis, synthesis, enhancement, transformation, classification and interpretation of such signals as well as the design, development, and evaluation of associated signal processing systems.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief

Haizhou Li
Institute for Infocomm Research, A*STAR 

Singapore 138632

hli@i2r.a-star.edu.sg