By Topic

IEEE Transactions on Audio, Speech, and Language Processing

Issue 8 • Date Nov. 2009

Filter Results

Displaying Results 1 - 25 of 25
  • Table of contents

    Publication Year: 2009, Page(s):C1 - C4
    Request permission for commercial reuse | PDF file iconPDF (101 KB)
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing publication information

    Publication Year: 2009, Page(s): C2
    Request permission for commercial reuse | PDF file iconPDF (38 KB)
    Freely Available from IEEE
  • Point Process Models for Spotting Keywords in Continuous Speech

    Publication Year: 2009, Page(s):1457 - 1470
    Cited by:  Papers (19)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (792 KB) | HTML iconHTML

    We investigate the hypothesis that the linguistic content underlying human speech may be coded in the pattern of timings of various acoustic ldquoeventsrdquo (landmarks) in the speech signal. This hypothesis is supported by several strands of research in the fields of linguistics, speech perception, and neuroscience. In this paper, we put these scientific motivations to the test by formulating a p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language

    Publication Year: 2009, Page(s):1471 - 1482
    Cited by:  Papers (25)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (655 KB) | HTML iconHTML

    This paper presents our work in automatic speech recognition (ASR) in the context of under-resourced languages with application to Vietnamese. Different techniques for bootstrapping acoustic models are presented. First, we present the use of acoustic-phonetic unit distances and the potential of crosslingual acoustic modeling for under-resourced languages. Experimental results on Vietnamese showed ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Multichannel Sinusoidal Model Applied to Spot Microphone Signals for Immersive Audio

    Publication Year: 2009, Page(s):1483 - 1497
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (622 KB) | HTML iconHTML

    In this paper, a multichannel version of the sinusoids plus noise model (also known as deterministic plus stochastic decomposition) is proposed and applied to spot microphone signals of a music recording. These are the recordings captured by the various microphones placed in a venue, before the mixing process produces the final multichannel audio mix. Coding these microphone signals makes them ava... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Binaural Sound Source Distance Learning in Rooms

    Publication Year: 2009, Page(s):1498 - 1507
    Cited by:  Papers (13)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (475 KB) | HTML iconHTML

    A method for learning the distance of a sound source in a room is presented. The proposed method is based on short-time magnitude-squared coherence between the two channels of a binaural signal. Based on white noise as the training data, a coherence profile is obtained at each desired position in the room. These profiles can then be used to identify the most likely distance of a speech signal in t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Speech Enhancement Algorithm Based on a Chi MRF Model of the Speech STFT Amplitudes

    Publication Year: 2009, Page(s):1508 - 1517
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (903 KB) | HTML iconHTML

    A speech enhancement algorithm that takes advantage of the time and frequency dependencies of speech signals is presented in this paper. The above dependencies are incorporated in the statistical model using concepts from the theory of Markov Random Fields. In particular, the speech short-time Fourier transform (STFT) amplitude samples are modeled with a novel Chi Markov Random Field prior, which ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic Speech Spectrum Representation and Tracking Variable Number of Vocal Tract Resonance Frequencies With Time-Varying Dirichlet Process Mixture Models

    Publication Year: 2009, Page(s):1518 - 1532
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1504 KB) | HTML iconHTML

    In this paper, we propose a new approach for dynamic speech spectrum representation and tracking vocal tract resonance (VTR) frequencies. The method involves representing the spectral density of the speech signals as a mixture of Gaussians with unknown number of components for which time-varying Dirichlet process mixture model (DPM) is utilized. In the resulting representation, the number of forma... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discriminative Input Stream Combination for Conditional Random Field Phone Recognition

    Publication Year: 2009, Page(s):1533 - 1546
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (527 KB) | HTML iconHTML

    In recent studies, we and others have found that conditional random fields (CRFs) can be effectively used to perform phone classification and recognition tasks by combining non-Gaussian distributed representations of acoustic input. In previous work by I. Heintz (latent phonetic analysis: Use of singular value decomposition to determine features for CRF phone recognition, Proc. ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the Use of Anti-Word Models for Audio Music Annotation and Retrieval

    Publication Year: 2009, Page(s):1547 - 1556
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1090 KB) | HTML iconHTML

    Query-by-semantic-description (QBSD) is a natural way for searching/annotating music in a large database. To improve QBSD, we propose the use of anti-words for each annotation word based on the concept of supervised multiclass labeling (SML). More specifically, words that are highly associated with the opposite semantic meaning of a word constitute its anti-word set. By modeling both a word and it... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The SIGMA Algorithm: A Glottal Activity Detector for Electroglottographic Signals

    Publication Year: 2009, Page(s):1557 - 1566
    Cited by:  Papers (28)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1131 KB) | HTML iconHTML

    Accurate estimation of glottal closure instants (GCIs) and opening instants (GOIs) is important for speech processing applications that benefit from glottal-synchronous processing. The majority of existing approaches detect GCIs by comparing the differentiated EGG signal to a threshold and are able to provide accurate results during voiced speech. More recent algorithms use a similar approach acro... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System

    Publication Year: 2009, Page(s):1567 - 1576
    Cited by:  Papers (9)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (697 KB) | HTML iconHTML

    This work focuses on the development of expressive text-to-speech synthesis techniques for a Chinese spoken dialog system, where the expressivity is driven by the message content. We adapt the three-dimensional pleasure-displeasure, arousal-nonarousal and dominance-submissiveness (PAD) model for describing expressivity in input text semantics. The context of our study is based on response messages... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parameter Estimation of a State-Space Model of Noise for Robust Speech Recognition

    Publication Year: 2009, Page(s):1577 - 1590
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (703 KB) | HTML iconHTML

    In this paper, parameter estimation of a state-space model of noise or noisy speech cepstra is investigated. A blockwise EM algorithm is derived for the estimation of the state and observation noise covariance from noise-only input data. It is supposed to be used during the offline training mode of a speech recognizer. Further a sequential online EM algorithm is developed to adapt the observation ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Class of Sparseness-Controlled Algorithms for Echo Cancellation

    Publication Year: 2009, Page(s):1591 - 1601
    Cited by:  Papers (29)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (706 KB) | HTML iconHTML

    In the context of acoustic echo cancellation (AEC), it is shown that the level of sparseness in acoustic impulse responses can vary greatly in a mobile environment. When the response is strongly sparse, convergence of conventional approaches is poor. Drawing on techniques originally developed for network echo cancellation (NEC), we propose a class of AEC algorithms that can not only work well in b... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Music Recommendation Based on Acoustic Features and User Access Patterns

    Publication Year: 2009, Page(s):1602 - 1611
    Cited by:  Papers (18)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (731 KB) | HTML iconHTML

    Music recommendation is receiving increasing attention as the music industry develops venues to deliver music over the Internet. The goal of music recommendation is to present users lists of songs that they are likely to enjoy. Collaborative-filtering and content-based recommendations are two widely used approaches that have been proposed for music recommendation. However, both approaches have the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Story Segmentation and Topic Classification of Broadcast News via a Topic-Based Segmental Model and a Genetic Algorithm

    Publication Year: 2009, Page(s):1612 - 1623
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (843 KB) | HTML iconHTML

    This paper presents a two-stage approach to story segmentation and topic classification of broadcast news. The two-stage paradigm adopts a decision tree and a maximum entropy model to identify the potential story boundaries in the broadcast news within a sliding window. The problem for story segmentation is thus transformed to the determination of a boundary position sequence from the potential bo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech Watermarking for Analog Flat-Fading Bandpass Channels

    Publication Year: 2009, Page(s):1624 - 1637
    Cited by:  Papers (16)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (643 KB) | HTML iconHTML Multimedia Media

    We present a blind speech watermarking algorithm that embeds the watermark data in the phase of non-voiced speech by replacing the excitation signal of an autoregressive speech signal representation. The watermark signal is embedded in a frequency subband, which facilitates robustness against bandpass filtering channels. We derive several sets of pulse shapes that prevent intersymbol interference ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • List of reviewers

    Publication Year: 2009, Page(s):1638 - 1641
    Request permission for commercial reuse | PDF file iconPDF (33 KB)
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing Edics

    Publication Year: 2009, Page(s):1642 - 1643
    Request permission for commercial reuse | PDF file iconPDF (31 KB)
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing Information for authors

    Publication Year: 2009, Page(s):1644 - 1645
    Request permission for commercial reuse | PDF file iconPDF (46 KB)
    Freely Available from IEEE
  • Special issue on distributed camera networks: sensing, processing, communication and computing

    Publication Year: 2009, Page(s): 1646
    Request permission for commercial reuse | PDF file iconPDF (122 KB)
    Freely Available from IEEE
  • ISBI 2010

    Publication Year: 2009, Page(s): 1647
    Request permission for commercial reuse | PDF file iconPDF (602 KB)
    Freely Available from IEEE
  • International Conference on Image Processing

    Publication Year: 2009, Page(s): 1648
    Request permission for commercial reuse | PDF file iconPDF (600 KB)
    Freely Available from IEEE
  • 2009 Index IEEE Transactions on Audio, Speech, and Language Processing Vol. 17

    Publication Year: 2009, Page(s):1649 - 1664
    Request permission for commercial reuse | PDF file iconPDF (165 KB)
    Freely Available from IEEE
  • IEEE Signal Processing Society Information

    Publication Year: 2009, Page(s): C3
    Request permission for commercial reuse | PDF file iconPDF (32 KB)
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2013. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Li Deng
Microsoft Research