Skip to Main Content
Acoustic-phonetic conversion is probably the most critical step in continuous speech recognition. The transitional information can be used as follows, in order to improve the results. First we contitute a lexicon of the phoneme steady-state spectra and a lexicon of all the transitions (diphones), each one being characterized by a"differential spectrum". The unknown continuous speech wave is segmented into quasi steady-state and transitional segments ; the labelling of the quasi steady-state segments admits several candidates. The transitional segment between two quasi steady-state spectra is then compared to the diphones of the lexicon selected from the combination of the surrounding possible phoneme labels. Actually, only the comparisons which are compatible with the recent past of the message are made. When working as a phoneme-vocoder, the whole procedure needs about 3x real-time, without any optimization.