Abstract:
We propose a novel modified Mel-discrete cosine transform (MMD) filter bank structure, which restricts the overlap of each filter response to its immediate neighbor. In c...Show MoreMetadata
Abstract:
We propose a novel modified Mel-discrete cosine transform (MMD) filter bank structure, which restricts the overlap of each filter response to its immediate neighbor. In contrast to the well-known triangular filters employed in the extraction of the Mel-frequency cepstral coefficients (MFCC), the proposed filter structure has a smoother response and offers discrete cosine transformation and Mel-scale filtering in a single operation. It is known that the choice of MFCC as the only feature for voice activity detection (VAD) does not yield substantial improvements in the performance. Even with the long-term approach, we observe a not so encouraging VAD performance when MFCC features are employed. However, other long-term based VAD algorithms - without MFCC - are known to provide a substantial improvement in the performance under low SNR with time-varying statistics of speech and/or noise. In this work, we show that by employing the MMD followed by the long-term differential entropy of voice signal for VAD provides significant improvements in detection accuracy when compared with the other well-known long-term algorithms. Thus, this study opens up the possible benefits of the proposed MMD filter bank for other speech processing applications.
Published in: IEEE Signal Processing Letters ( Volume: 27)