Skip to Main Content
The paper discusses an approach for dealing with unexpected acoustic elements in speech. The approach is motivated by observations of human performance on such problems, which indicate the existence of multiple parallel processing streams in the human speech processing cognitive system, combined with the human ability to know when the correct information is being received. Some earlier relevant engineering approaches in multistream automatic recognition of speech (ASR) that aimed at processing of noisy speech and at dealing with unexpected out-of-vocabulary words are reviewed. The paper also reviews some currently active research in multistream ASR, focusing mainly on feedback-based techniques involving fusion of information between individual processing streams. The difference between the system behavior on its training data and during its operation is proposed as a substitute for the human ability of “knowing when knowing.” Most recent results indicate 9% relative improvement in error rates in phoneme recognition of high signal-to-noise ratio speech and as high as 30% relative improvements in moderate noise.