This paper presents a new high performance neural network architecture, shift-tolerant K-subspaces, for phoneme recognition. The architecture combines the time-delay design for phoneme recognition and the technique of MLP autoassociators. For each phoneme category, K time-delay linear autoassociators are constructed and trained with a proposed K-subspace clustering procedure, similar to the K-means algorithm, using speech data belonging to the phoneme category. This architecture with its non-classification training procedure provides an effective method for phoneme recognition. It avoids the drawback encountered in most conventional neural network based speech recognition systems that network output values do not represent candidate likelihoods. The architecture has obtained 87.37% recognition accuracy which is only slightly lower than 88.44% obtained with a TDNN and 88.30% with a shift-tolerant LVQ trained by classification learning procedures using the same data set
Published in:
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
(Volume:6
)
Date of Conference: 7-10 May 1996