Skip to Main Content
The paper describes an automatic method, called Automatic Diphone Bootstrapping (or A.D.B.), for template extraction for Speaker-Adaptive Continuous Speech Recognition using "diphones" as speech units. Diphones have proved to be very suitable for C.S.R. as they meet the main requirements of phonetic units: invariance with the context and economy. Furthermore the performance of diphone-based speaker dependent C.S.R. systems is very high. For a long time manual extraction has been presented in the literature as the only completely reliable method for sub-word template creation for any speaker (see  as an example). Recently some automatic techniques for reference pattern extraction were developed [2,3], but they also require some manual corrections. The A.D.B. procedure operates without any manual intervention and performed very well for all the speakers on which it was tested. In a connected digit recognition task, a W.R.R. of 98.79% was achieved by using the speaker-adaptive templates created by the A.D.B. procedure.