Notice
There is currently an issue with the citation download feature. Learn more

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Issue 5 • May 2016

Filter Results

Displaying Results 1 - 22 of 22
  • Front Cover

    Publication Year: 2016, Page(s): C1
    Request permission for commercial reuse | PDF file iconPDF (446 KB)
    Freely Available from IEEE
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information

    Publication Year: 2016, Page(s): C2
    Request permission for commercial reuse | PDF file iconPDF (61 KB)
    Freely Available from IEEE
  • Table of Contents

    Publication Year: 2016, Page(s):829 - 830
    Request permission for commercial reuse | PDF file iconPDF (231 KB)
    Freely Available from IEEE
  • Table of Contents

    Publication Year: 2016, Page(s):831 - 832
    Request permission for commercial reuse | PDF file iconPDF (232 KB)
    Freely Available from IEEE
  • Robust and Efficient Multiple Alignment of Unsynchronized Meeting Recordings

    Publication Year: 2016, Page(s):833 - 845
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (752 KB) | HTML iconHTML

    This paper proposes a way to generate a single high-quality audio recording of a meeting using no equipment other than participants' personal devices. Each participant in the meeting uses their mobile device as a local recording node, and they begin recording whenever they arrive in an unsynchronized fashion. The main problem in generating a single summary recording is to temporally align the vari... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Turbo Automatic Speech Recognition

    Publication Year: 2016, Page(s):846 - 862
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2300 KB) | HTML iconHTML

    Performance of automatic speech recognition (ASR) systems can significantly be improved by integrating further sources of information such as additional modalities, or acoustic channels, or acoustic models. Given the arising problem of information fusion, striking parallels to problems in digital communications are exhibited, where the discovery of the turbo codes by Berrou et al. was a groundbrea... View full abstract»

    Open Access
  • Unsupervised Incremental Online Learning and Prediction of Musical Audio Signals

    Publication Year: 2016, Page(s):863 - 874
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (877 KB) | HTML iconHTML

    Guided by the idea that musical human-computer interaction may become more effective, intuitive, and creative when basing its computer part on cognitively more plausible learning principles, we employ unsupervised incremental online learning (i.e. clustering) to build a system that predicts the next event in a musical sequence, given as audio input. The flow of the system is as follows: 1) segment... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Flexible Bio-Inspired Hierarchical Model for Analyzing Musical Timbre

    Publication Year: 2016, Page(s):875 - 889
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1561 KB) | HTML iconHTML

    A flexible and multipurpose bio-inspired hierarchical model for analyzing musical timbre is presented in this paper. Inspired by findings in the fields of neuroscience, computational neuroscience, and psychoacoustics, not only does the model extract spectral and temporal characteristics of a signal, but it also analyzes amplitude modulations on different timescales. It uses a cochlear filter bank ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fundamental Frequency Estimation in Speech Signals With Variable Rate Particle Filters

    Publication Year: 2016, Page(s):890 - 900
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (681 KB) | HTML iconHTML

    Fundamental frequency estimation, known as pitch estimation in speech signals is of interest both to the research community and to industry. Meanwhile, the particle filter is known to be a powerful Bayesian inference method to track dynamic parameters in nonlinear state-space models. In this paper, we propose a speech model under a time-varying source-filter speech model, and use variable rate par... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic Transcription of Flamenco Singing From Polyphonic Music Recordings

    Publication Year: 2016, Page(s):901 - 913
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1901 KB) | HTML iconHTML

    Automatic note-level transcription is considered one of the most challenging tasks in music information retrieval. The specific case of flamenco singing transcription poses a particular challenge due to its complex melodic progressions, intonation inaccuracies, the use of a high degree of ornamentation, and the presence of guitar accompaniment. In this study, we explore the limitations of existing... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Analytic Methods for 2.5-D Local Sound Field Synthesis Using Circular Distributions of Secondary Sources

    Publication Year: 2016, Page(s):914 - 926
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2218 KB) | HTML iconHTML

    Local sound field synthesis allows for synthesizing a given desired sound field inside a limited target region such that the field is free of considerable spatial aliasing artifacts. Spatial aliasing artifacts are a consequence of overlaps due to unavoidable repetitions of the space-spectral coefficients of the secondary source driving function. We analyze various conceivable analytic ways of rest... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An End-to-End Neural Network for Polyphonic Piano Music Transcription

    Publication Year: 2016, Page(s):927 - 939
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1096 KB) | HTML iconHTML

    We present a supervised neural network model for polyphonic piano music transcription. The architecture of the proposed model is analogous to speech recognition systems and comprises an acoustic model and a music language model. The acoustic model is a neural network used for estimating the probabilities of pitches in a frame of audio. The language model is a recurrent neural network that models t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fundamental Frequency Informed Speech Enhancement in a Flexible Statistical Framework

    Publication Year: 2016, Page(s):940 - 951
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (752 KB) | HTML iconHTML

    Conventional statistical clean speech estimators, like the Wiener filter, are frequently used for the spectro-temporal enhancement of noise corrupted speech. Most of these approaches estimate the clean speech independently for each time-frequency point, neglecting the structure of the underlying speech sound. In this work, we derive a statistical estimator that explicitly takes into account inform... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Binaural Noise Cue Preservation in a Binaural Noise Reduction System With a Remote Microphone Signal

    Publication Year: 2016, Page(s):952 - 966
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2116 KB) | HTML iconHTML

    A general binaural noise reduction system is considered that employs the multichannel Wiener filter with partial noise estimation (MWFη) allowing for an explicit tradeoff between noise reduction and binaural noise cue preservation. In this paper, it is assumed that along with the general binaural system, a remote microphone signal with a high input signal-to-noise ratio (SNR) is ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Deep Ensemble Learning Method for Monaural Speech Separation

    Publication Year: 2016, Page(s):967 - 977
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1510 KB) | HTML iconHTML Multimedia Media

    Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN)-based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences betwe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable Discovery of Audio Fingerprint Motifs in Broadcast Streams With Determinantal Point Process Based Motif Clustering

    Publication Year: 2016, Page(s):978 - 989
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1122 KB) | HTML iconHTML

    In this paper, we study the scalable discovery of audio repetitive patterns/motifs in long broadcast streams, where two segments are said to be repetitive if their audio fingerprints are close to each other. In this task, as we are confined to handle limited variability, we can adapt an audio hashing technique, originally proposed for searching a given music clip in music tracks, to successfully d... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing Edics

    Publication Year: 2016, Page(s):990 - 991
    Request permission for commercial reuse | PDF file iconPDF (40 KB)
    Freely Available from IEEE
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing information for authors

    Publication Year: 2016, Page(s):992 - 993
    Request permission for commercial reuse | PDF file iconPDF (64 KB)
    Freely Available from IEEE
  • Special issue on sound scene and event analysis

    Publication Year: 2016, Page(s): 994
    Request permission for commercial reuse | PDF file iconPDF (3123 KB)
    Freely Available from IEEE
  • Special Issue on Biosignal-based Spoken Communication

    Publication Year: 2016, Page(s): 995
    Request permission for commercial reuse | PDF file iconPDF (219 KB)
    Freely Available from IEEE
  • IEEE Power Electronics Society Information

    Publication Year: 2016, Page(s): C3
    Request permission for commercial reuse | PDF file iconPDF (163 KB)
    Freely Available from IEEE
  • Blank page

    Publication Year: 2016, Page(s): C4
    Request permission for commercial reuse | PDF file iconPDF (2 KB)
    Freely Available from IEEE

Aims & Scope

The IEEE/ACM Transactions on Audio, Speech, and Language Processing is dedicated to innovative theory and methods for processing signals representing audio, speech and language, and their applications. This includes analysis, synthesis, enhancement, transformation, classification and interpretation of such signals as well as the design, development, and evaluation of associated signal processing systems.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Haizhou Li
Department of Electrical and Computer Engineering, Department of Mechanical Engineering
National University of Singapore
Singapore 119077
eleliha@nus.edu.sg