By Topic

Audio, Speech, and Language Processing, IEEE Transactions on

Issue 12 • Date Dec. 2013

Filter Results

Displaying Results 1 - 24 of 24
  • [Front cover]

    Publication Year: 2013 , Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (274 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing publication information

    Publication Year: 2013 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (133 KB)  
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2013 , Page(s): 2467 - 2468
    Save to Project icon | Request Permissions | PDF file iconPDF (208 KB)  
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2013 , Page(s): 2469 - 2470
    Save to Project icon | Request Permissions | PDF file iconPDF (208 KB)  
    Freely Available from IEEE
  • Epoch Extraction Based on Integrated Linear Prediction Residual Using Plosion Index

    Publication Year: 2013 , Page(s): 2471 - 2480
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1165 KB) |  | HTML iconHTML  

    Epoch is defined as the instant of significant excitation within a pitch period of voiced speech. Epoch extraction continues to attract the interest of researchers because of its significance in speech analysis. Existing high performance epoch extraction algorithms require either dynamic programming techniques or a priori information of the average pitch period. An algorithm without such requireme... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Body Conducted Speech Enhancement by Equalization and Signal Fusion

    Publication Year: 2013 , Page(s): 2481 - 2492
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (2208 KB) |  | HTML iconHTML  

    This paper studies body-conducted speech for noise robust speech processing purposes. As body-conducted speech is typically limited in bandwidth, signal processing is required to obtain a signal that is both high in quality and low in noise. We propose an algorithm that first equalizes the body-conducted speech using filters obtained from a pre-defined filter set and subsequently fuses this equali... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Soundfield Imaging in the Ray Space

    Publication Year: 2013 , Page(s): 2493 - 2505
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (2559 KB) |  | HTML iconHTML  

    In this work we propose a general approach to acoustic scene analysis based on a novel data structure (ray-space image) that encodes the directional plenacoustic function over a line segment (Observation Window, OW). We define and describe a system for acquiring a ray-space image using a microphone array and refer to it as ray-space (or “soundfield”) camera. The method consists of ac... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cross-Lingual Automatic Speech Recognition Using Tandem Features

    Publication Year: 2013 , Page(s): 2506 - 2515
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (911 KB) |  | HTML iconHTML  

    Automatic speech recognition depends on large amounts of transcribed speech recordings in order to estimate the parameters of the acoustic model. Recording such large speech corpora is time-consuming and expensive; as a result, sufficient quantities of data exist only for a handful of languages-there are many more languages for which little or no data exist. Given that there are acoustic similarit... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dominance Based Integration of Spatial and Spectral Features for Speech Enhancement

    Publication Year: 2013 , Page(s): 2516 - 2531
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (3471 KB) |  | HTML iconHTML  

    This paper proposes a versatile technique for integrating two conventional speech enhancement approaches, a spatial clustering approach (SCA) and a factorial model approach (FMA), which are based on two different features of signals, namely spatial and spectral features, respectively. When used separately the conventional approaches simply identify time frequency (TF) bins that are dominated by in... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Linearly-Constrained Minimum-Variance Method for Spherical Microphone Arrays Based on Plane-Wave Decomposition of the Sound Field

    Publication Year: 2013 , Page(s): 2532 - 2540
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1072 KB) |  | HTML iconHTML  

    Speech signals recorded in real environments may be corrupted by ambient noise and reverberation. Therefore, noise reduction and dereverberation algorithms for speech enhancement are typically employed in speech communication systems. Although microphone arrays are useful in reducing the effect of noise and reverberation, existing methods have limited success in significantly removing both reverbe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Source/Filter Factorial Hidden Markov Model, With Application to Pitch and Formant Tracking

    Publication Year: 2013 , Page(s): 2541 - 2553
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (2392 KB) |  | HTML iconHTML  

    Tracking vocal tract formant frequencies (fp) and estimating the fundamental frequency (f0) are two tracking problems that have been tackled in many speech processing works, often independently, with applications to articulatory parameters estimations, speech analysis/synthesis or linguistics. Many works assume an auto-regressive (AR) model to fit the spectral envelope, hence... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Bag of Systems Representation for Music Auto-Tagging

    Publication Year: 2013 , Page(s): 2554 - 2569
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (2231 KB) |  | HTML iconHTML  

    We present a content-based automatic tagging system for music that relies on a high-level, concise “Bag of Systems” (BoS) representation of the characteristics of a musical piece. The BoS representation leverages a rich dictionary of musical codewords, where each codeword is a generative model that captures timbral and temporal characteristics of music. Songs are represented as a BoS... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • HMM Based Intermediate Matching Kernel for Classification of Sequential Patterns of Speech Using Support Vector Machines

    Publication Year: 2013 , Page(s): 2570 - 2582
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (2428 KB) |  | HTML iconHTML  

    In this paper, we address the issues in the design of an intermediate matching kernel (IMK) for classification of sequential patterns using support vector machine (SVM) based classifier for tasks such as speech recognition. Specifically, we address the issues in constructing a kernel for matching sequences of feature vectors extracted from the speech signal data of utterances. The codebook based I... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Geometry-Based Spatial Sound Acquisition Using Distributed Microphone Arrays

    Publication Year: 2013 , Page(s): 2583 - 2594
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (4112 KB) |  | HTML iconHTML  

    Traditional spatial sound acquisition aims at capturing a sound field with multiple microphones such that at the reproduction side a listener can perceive the sound image as it was at the recording location. Standard techniques for spatial sound acquisition usually use spaced omnidirectional microphones or coincident directional microphones. Alternatively, microphone arrays and spatial filters can... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Class of Optimal Rectangular Filtering Matrices for Single-Channel Signal Enhancement in the Time Domain

    Publication Year: 2013 , Page(s): 2595 - 2606
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (3028 KB) |  | HTML iconHTML  

    In this paper, we introduce a new class of optimal rectangular filtering matrices for single-channel speech enhancement. The new class of filters exploits the fact that the dimension of the signal subspace is lower than that of the full space. By doing this, extra degrees of freedom in the filters, that are otherwise reserved for preserving the signal subspace, can be used for achieving an improve... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Understanding Effects of Subjectivity in Measuring Chord Estimation Accuracy

    Publication Year: 2013 , Page(s): 2607 - 2615
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1576 KB) |  | HTML iconHTML  

    To assess the performance of an automatic chord estimation system, reference annotations are indispensable. However, owing to the complexity of music and the sometimes ambiguous harmonic structure of polyphonic music, chord annotations are inherently subjective, and as a result any derived accuracy estimates will be subjective as well. In this paper, we investigate the extent of the confounding ef... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Investigations on an EM-Style Optimization Algorithm for Discriminative Training of HMMs

    Publication Year: 2013 , Page(s): 2616 - 2626
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1893 KB) |  | HTML iconHTML  

    Today's speech recognition systems are based on hidden Markov models (HMMs) with Gaussian mixture models whose parameters are estimated using a discriminative training criterion such as Maximum Mutual Information (MMI) or Minimum Phone Error (MPE). Currently, the optimization is almost always done with (empirical variants of) Extended Baum-Welch (EBW). This type of optimization requires sophistica... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Declipping of Audio Signals Using Perceptual Compressed Sensing

    Publication Year: 2013 , Page(s): 2627 - 2637
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1743 KB) |  | HTML iconHTML  

    The restoration of clipped audio signals, commonly known as declipping, is important to achieve an improved level of audio quality in many audio applications. In this paper, a novel declipping algorithm is presented, jointly based on the theory of compressed sensing (CS) and on well-established properties of human auditory perception. Declipping is formulated as a sparse signal recovery problem us... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • List of Reviewers

    Publication Year: 2013 , Page(s): 2638 - 2640
    Save to Project icon | Request Permissions | PDF file iconPDF (90 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing Edics

    Publication Year: 2013 , Page(s): 2641 - 2642
    Save to Project icon | Request Permissions | PDF file iconPDF (108 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing Information for Authors

    Publication Year: 2013 , Page(s): 2643 - 2644
    Save to Project icon | Request Permissions | PDF file iconPDF (146 KB)  
    Freely Available from IEEE
  • 2013 Index IEEE Transactions on Audio, Speech, and Language Processing Vol. 21

    Publication Year: 2013 , Page(s): 2645 - 2672
    Save to Project icon | Request Permissions | PDF file iconPDF (505 KB)  
    Freely Available from IEEE
  • IEEE Signal Processing Society Information

    Publication Year: 2013 , Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (109 KB)  
    Freely Available from IEEE
  • [Blank page - back cover]

    Publication Year: 2013 , Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (5 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2013. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Li Deng
Microsoft Research