By Topic

Audio Signal Feature Extraction and Classification Using Local Discriminant Bases

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Karthikeyan Umapathy ; Dept. of Electr. & Comput. Eng., Univ. of Western Ontario, London, Ont. ; Sridhar Krishnan ; Raveendra K. Rao

Audio feature extraction plays an important role in analyzing and characterizing audio content. Auditory scene analysis, content-based retrieval, indexing, and fingerprinting of audio are few of the applications that require efficient feature extraction. The key to extract strong features that characterize the complex nature of audio signals is to identify their discriminatory subspaces. In this paper, we propose an audio feature extraction and a multigroup classification scheme that focuses on identifying discriminatory time-frequency subspaces using the local discriminant bases (LDB) technique. Two dissimilarity measures were used in the process of selecting the LDB nodes and extracting features from them. The extracted features were then fed to a linear discriminant analysis-based classifier for a three-level hierarchical classification of audio signals into ten classes. In the first level, the audio signals were grouped into artificial and natural sounds. Each of the first level groups were subdivided to form the second level groups viz. instrumental, automobile, human, and nonhuman sounds. The third level was formed by subdividing the four groups of the second level into the final ten groups (drums, flute, piano, aircraft, helicopter, male, female, animals, birds and insects). A database of 213 audio signals were used in this study and an average classification accuracy of 83% for the first level (113 artificial and 100 natural sounds), 92% for the second level (73 instrumental and 40 automobile sounds; 40 human and 60 nonhuman sounds), and 89% for the third level (27 drums, 15 flute, and 31 piano sounds; 23 aircraft and 17 helicopter sounds; 20 male and 20 female speech; 20 animals, 20 birds and 20 insects sounds) were achieved. In addition to the above, a separate classification was also performed combining the LDB features with the mel-frequency cepstral coefficients. The average classification accuracies achieved using the combined features were 91% for the- - first level, 99% for the second level, and 95% for the third level

Published in:

IEEE Transactions on Audio, Speech, and Language Processing  (Volume:15 ,  Issue: 4 )