Skip to Main Content
In this paper, a robust voice activity detection (VAD) algorithm based on the perceptual wavelet packet transform (PWPT) is proposed. The first step of this new VAD algorithm is to make use of the PWPT to decompose the input speech into 17 critical subband signals. To enhance energy of voice frames and decay energy of unvoice frames, the voice activity shape (VAS) is derived from the Teager energy operator (TEO) of these critical subband signals. Then the adaptive weighted threshold (AWT) value can be calculated from the second derivative recursive mean (SDRM) of the VAS and environments noise estimation. It is shown in this paper that the AWT is a robust threshold value for VAD under various noisy environments. One of advantages of this new algorithm is that the preset threshold values are not necessary. In addition, the proposed algorithm can adapt VAD threshold value to variable speech conditions. Experimental results show that the new VAD algorithm outperforms the G.729B and adaptive multi rate (AMR) VAD.