Cart (Loading....) | Create Account
Close category search window
 

Two-Dimensional Speech-Signal Modeling

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Wang, T.T. ; Lincoln Lab., Massachusetts Inst. of Technol. (MIT), Lexington, MA, USA ; Quatieri, T.F.

Traditional approaches in speech-signal processing analyze short-time frames of the signal (e.g., the short-time Fourier transform). Findings from auditory neurophysiology coupled with image processing principles, however, have motivated an alternative 2-D processing framework in which 2-D analysis is performed on the time-frequency distribution itself. This paper develops a 2-D model of speech in local time-frequency regions of narrowband spectrograms using sinusoidal-series-based modulation. Our model is shown to distribute vocal tract and onset/offset content based on source information (e.g., noise and voicing) in a transformed 2-D space, thereby explicitly representing different classes of energy modulations commonly observed in spectrograms. We demonstrate the model's ability to represent speech sounds by developing and evaluating algorithms for analysis/synthesis of spectrograms. As an example application, we demonstrate the utility of the model for co-channel speaker separation using prior pitch information of two overlapping speakers. Finally, our separation scheme based on 2-D modeling is compared against a reference (frame-based) sinusoidal separation system using both prior and estimated pitch.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:20 ,  Issue: 6 )

Date of Publication:

Aug. 2012

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.