Abstract:
Audio deepfakes represent a risk to society as they can deteriorate society’s trust in any audio. In this paper, we present a novel approach for audio deepfake detection ...Show MoreMetadata
Abstract:
Audio deepfakes represent a risk to society as they can deteriorate society’s trust in any audio. In this paper, we present a novel approach for audio deepfake detection using Generative Adversarial Networks (GANs) and contrastive learning in a multi-stage detection framework. In our process, we apply the Pre-trained Models (PTM) to extract all suitable audio phonetics, speaker identity, and other spatial prosodic features or contents, which are crucial for the model. We enhance the model’s performance by utilizing a GAN data augmentation strategy in combination with HiFi-GAN. The Contrastive learning approach is then used for improving the model’s ability to discriminate real speech from fake speech. Our experiments demonstrate that this method is superior to existing methodologies in detection and robustness.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: