In this paper we present an adaptation technique which exploits the inter/intra speaker vowel phoneme variations with respect to the tongue-hump-position within the oral cavity. The 13 vowels of American English speech can be classified into three areas according to the tongue-hump-position. The vowels, taken from the DARPA TIMIT phonetic database, in each of these areas are classified using one-class-in-one-network (OCON) feed forward subnets, similar to those proposed by Kung et al. (1995) and Jou et al. (1991), joined by a common front-end adaptation layer. This allows adaptation to be concentrated primarily on speaker characteristics, since speaker information is comparable within these areas, allowing adaptation towards a single phoneme to improve recognition of other vowel phonemes within the same network. This reduces the need for total vowel recital for complete vowel phoneme adaptation towards a new speaker. Results show increases of over 12% in the recognition rates of vowel phonemes after adaptation towards other phonemes in the same tongue-hump-position area. However, vowels that are well separated in the same group have little, even negative, effect on recognition after adaptation
Published in:
Pattern Recognition (Digest No. 1997/018), IEE Colloquium on
Date of Conference: 26 Feb 1997