Skip to Main Content
It is well known that speech sounds evolve at multiple timescales over the course of tens to hundreds of milliseconds. Such temporal modulations are crucial for speech perception and are believed to directly influence the underlying code for representing acoustic stimuli. The present work seeks to explicitly quantify this relationship using the principle of temporal coherence. Here we show that by constraining the outputs of model linear neurons to be highly correlated over timescales relevant to speech, we observe the emergence of neural response fields that are bandpass, localized, and reflective of the rich spectro-temporal structure present in speech. The emergent response fields also appear to share qualitative similarities those observed in auditory neurophysiology. Importantly, learning is accomplished using unlabeled speech data, and the emergent neural properties well-characterize the spectro-temporal statistics of the input. We analyze the characteristics and coverage of ensembles of learned response fields for a variety of timescales, and suggest uses of such a coherence learning framework for common speech tasks.