Skip to Main Content
This paper considers 2-D Fourier analysis of local time-frequency regions of wideband spectrograms, a representation we refer to as the wideband Grating Compression Transform (WGCT). We develop frequency-dependent, speech-production-based models of speech signals for the WGCT, building on previous work in modeling narrowband-based GCT representations (NGCT). Comparisons show important distinctions, including dual behavior, between the wideband and narrowband models, and distinct ways in which vocal tract/formant content is distributed redundantly throughout the NGCT and WGCT spaces. Our results motivate a novel taxonomy of speech-signal behavior as an interpretative framework (i.e., in relation to speech-production characteristics) for 2-D processing of speech using the GCT, as well as for other 2-D approaches and time-frequency distributions such as the auditory spectrogram. We demonstrate and evaluate the ability of the model to represent real speech content through demodulation techniques for analysis/synthesis of wideband spectrograms. Finally, we develop a co-channel speaker separation method, using prior and estimated pitch information, based on the WGCT, as well as through fusion with the NGCT. These GCT-based separation systems are compared against and further fused with a reference sinusoidal separation system.