By Topic

Relevance of auditory cortical representations to speech processing and recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Shamma, S. ; Maryland Univ., College Park, MD

Summary form only given. Humans are readily capable of understanding speech despite substantial distortions, high levels of ambient noise, or interference from other speakers. Several factors are responsible for this robust performance ranging all the way from stable early auditory representations to sophisticated linguistic knowledge. In this talk, I shall describe processes that occur at intermediate levels of the central auditory pathway, specifically the midbrain and primary auditory cortex. At these levels, the relatively simple short-term acoustic spectra extracted early at the cochlea are elaborated into multi-dimensional representations that integrate spectral and temporal information over many scales. This transformation is accomplished by cortical cells that are not simply selective to the spectral energy of the acoustic signal, but rather to the complex combinations of its spectral and temporal modulations that are the true carriers of intelligibility in speech, and more generally of timbre in sound. For instance, some cells may encode selectively rapidly changing broadband spectra, while others are sensitive to slowly varying narrowband energy. This decomposition of the spectrogram affords the brain both a rich and a versatile representation that can be employed as a "metric" to assess sound quality or speech intelligibility, as well as to manipulate its characteristics in a variety of auditory tasks. I will explain in this talk the physiological and psychoacoustical data relevant to these representations, the mathematical formulation of the cortical model, and how it can be adapted to applications in ASR, assessment of speech intelligibility, speech enhancement, and signal conditioning for hearing aids. I shall also highlight recent approaches in ASR that incorporate many of the features that make these representations powerful, specifically the integration of spectral information over relatively long time scales (100's ms) and over broad spectral band- - widths (> 1 octave). Finally, I shall discuss the relevance of new discoveries in rapid cortical plasticity to the design of adaptive speech processing strategies and algorithms for separating speech streams on monaural channels

Published in:

Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on

Date of Conference:

27-27 Nov. 2005