What kind of internal representations develop with networks that transform speech of one speaker to that of another? This question is addressed in this paper by a novel supervised coding scheme: cross-coding. Instead of performing auto-association, we train networks to map speech of many speakers to speech of a particular speaker, with intermediate bottlenecks. The internal representations developed are then input to another network trained to label the corresponding sounds. Interestingly, the cross-codings seem to have captured speaker invariant properties in the different sounds. Experiments with multispeaker syllable recognition task show that the proposed scheme outperforms the corresponding multilayered net
Published in:
Pattern Recognition, 1994. Vol. 2 - Conference B: Computer Vision & Image Processing., Proceedings of the 12th IAPR International. Conference on
(Volume:2
)
Date of Conference: 9-13 Oct 1994