Skip to Main Content
Speech sounds are encoded by time-varying spectral patterns called acoustic cues. The processing and detection of these acoustic cues lead to events defined as the psychological correlates of the acoustic cues. Due to the similarity between the acoustic cues, speech sounds form natural confusion groups. When the feature of the sound within a group is masked by noise, one event can turn into another. A systematic psychoacoustic "3-D method" has been developed to explore the perceptual cues of stop consonants from naturally produced speech sounds. For each sound, our 3-D method measures the contribution of each subcomponent by time-truncating, high-pass/low-pass filtering, and masking with noise. The Al-gram, a visualization tool that simulates the auditory peripheral processing, is used to predict the audible components of the speech sound. The results are that the plosive consonants are defined by a short duration bursts characterized by their center frequency, as well as the delay to the onset of voicing. Fricatives are characterized by the duration and bandwidth of a noise-like feature. Pilot studies of hearing-impaired (HI) speech perception indicate that cochlear dead regions have a considerable impact on consonant identification. An HI listener may have problems understanding speech simply because he/she cannot hear certain sounds, since the events are missing due to either the hearing loss, or the masking effect introduced by the noise.