Abstract:
Recently, a various speaker-dependent Voice Activity Detections (VAD) have been proposed which detect target speaker's speeches in noisy environment. Speaker-dependent VA...Show MoreMetadata
Abstract:
Recently, a various speaker-dependent Voice Activity Detections (VAD) have been proposed which detect target speaker's speeches in noisy environment. Speaker-dependent VAD is similar to knowledge distillation in which it learns distribution of each speaker from speaker embedding model trained with lots of speakers. That is, the key idea is to sufficiently learn speaker embedding vector distribution for enhancing personality. In this paper, we proposed new strategies to enhance personality of speaker-dependent VAD. To make better personal characteristics of speakers, we considered several factors based on model size, language, and gender. Our experiments show that the model strategies achieves significant performance improvement on Average Precision(AP) of 0.959, 0.935, compared to 0.735, 0.530 of baseline model for each language evaluation set.
Published in: 2021 International Conference on Information and Communication Technology Convergence (ICTC)
Date of Conference: 20-22 October 2021
Date Added to IEEE Xplore: 07 December 2021
ISBN Information:
Print on Demand(PoD) ISSN: 2162-1233