Skip to Main Content
In this work, we propose the use of a set of new features based on CELP (Code Excited Linear Prediction) to enhance the performance of the environmental sound recognition (ESR) problem. Traditionally, Mel Frequency Cepstral Coefficients (MFCC) have been used for the recognition of structured data like speech and music. However, their performance for the ESR problem is limited. An audio signal can be well preserved by its highly compressed CELP bit streams, which motivates us to study the CELP-based features for the audio scene recognition problem. We present a way to extract a set of features from the CELP bit streams and compare the performance of ESR using different feature sets with the Bayesian network classifier. It is shown by experimental results that the CELP-based features outperform the MFCC features in the ESR problem by a significant 9% margin in average and the integrated MFCC and CELP-based feature set can even reach a correct classification rate of 95.2% using the Bayesian network classifier.