Abstract:
Classification of normal vs. pathological infant cry is an interesting and technologically challenging research problem due to quasi-periodic sampling of vocal tract spec...Show MoreMetadata
Abstract:
Classification of normal vs. pathological infant cry is an interesting and technologically challenging research problem due to quasi-periodic sampling of vocal tract spectrum by high pitch-source harmonics resulting in extremely poor spectral resolution for commonly used spectral features, such as Mel Frequency Cepstral Coefficients (MFCC). To that effect, in this paper, we propose a new approach of feature extraction based on Constant Q Transform (CQT) that is known to have variable spectro-temporal resolution w.r.t Heisenberg’s un-certainty principle in signal processing framework. Further, CQT is also known to preserve form-invariance property (than its Short-Time Fourier Transform (STFT) counterpart)-a desirable attribute of feature descriptors to be invariant w.r.t shape, shift, rotation, and scaling. CQT- based features are then transformed to the cepstral-domain to derive Constant Q Cepstral Coefficients (CQCC), which are then fed to statistical and discriminative classifiers, namely, Gaussian Mixture Model (GMM) and Support Vector Machine (SVM) respectively. CQCC-GMM and CQCC-SVM systems gave relatively better results than MFCC for various experimental evaluation factors for infant cry classification task on widely used and statistically meaningful Baby Chilanto Database. Relatively best performance, in particular, 99.82% accuracy (0.44% EER), is observed for CQCC-GMM system.
Published in: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 23-27 May 2022
Date Added to IEEE Xplore: 27 April 2022
ISBN Information: