Skip to Main Content
We have developed a novel approach to speech feature extraction based on a modulation model of a band-pass signal. Speech is processed by a bank of band-pass filters. At the output of the band-pass filters the signal is subjected to a log-derivative operation which naturally decomposes the band-pass signal into analytic (called α˙(t)+jα˙ˆ(t)) and antianalytic (called β˙(t)-jβ˙ˆ(t)) components. The average instantaneous frequency (AIF) and average log-envelope (ALE) are then extracted as coarse features at the output of each filter. We indicate how further refined features may also be extracted from the analytic and antianalytic components. We then evaluated the feature extraction procedure on the Aurora 2 task where noise corruption is synthetic. For clean training, (compared to the mel-cepstrum front end, with 5 mixture HMM back-end) our AIF/ALE front end achieves an average improvement of 13.97% with set A and 17.92% improvement with set B and -31.72% (negative) 'improvement' with set C. The overall improvement in accuracy rates for clean training is 7.97%. Although the improvements are modest, the novelty of the front-end and its potential for future enhancements are our strengths.