By Topic

Speech vs Nonspeech Segmentation of Audio Signals Using Support Vector Machines

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Danisman, T. ; Bilgisayar Muhendisligi Bolumu, Dokuz Eylul Univ., Izmir, Turkey ; Alpkocak, A.

In this study, we have presented a speech vs nonspeech segmentation of audio signals extracted from video. We have used 4330 seconds of audio signal extracted from "Lost" TV series for training. Our training set is automatically builded by using timestamp information exists in subtitles. After that, silence areas within those speech areas are discarded with a further study. Then, standard deviation of MFCC feature vectors of size 20 have been obtained. Finally, Support Vector Machines (SVM) is used with one-vs-all method for the classification. We have used 7545 seconds of audio signal from "Lost" and "How I Met Your Mother" TV Series. We achieved an overall accuracy of 87.77% for speech vs non-speech segmentation and 90.33% recall value for non-speech classes.

Published in:

Signal Processing and Communications Applications, 2007. SIU 2007. IEEE 15th

Date of Conference:

11-13 June 2007