Improving Short Utterance Anti-Spoofing with Aasist2 | IEEE Conference Publication | IEEE Xplore

Improving Short Utterance Anti-Spoofing with Aasist2


Abstract:

The wav2vec 2.0 and integrated spectro-temporal graph attention network (AASIST) based countermeasure achieves great performance in speech anti-spoofing. However, current...Show More

Abstract:

The wav2vec 2.0 and integrated spectro-temporal graph attention network (AASIST) based countermeasure achieves great performance in speech anti-spoofing. However, current spoof speech detection systems have fixed training and evaluation durations, while the performance degrades significantly during short utterance evaluation. To solve this problem, AASIST can be improved to AASIST2 by modifying the residual blocks to Res2Net blocks. The modified Res2Net blocks can extract multi-scale features and improve the detection performance for speech of different durations, thus improving the short utterance evaluation performance. On the other hand, adaptive large margin fine-tuning (ALMFT) has achieved performance improvement in short utterance speaker verification. Therefore, we apply Dynamic Chunk Size (DCS) and ALMFT training strategies in speech anti-spoofing to further improve the performance of short utterance evaluation. Experiments demonstrate that the proposed AASIST2 improves the performance of short utterance evaluation while maintaining the performance of regular evaluation on different datasets.
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information:

ISSN Information:

Conference Location: Seoul, Korea, Republic of

Funding Agency:


1. INTRODUCTION

With the recent surge of Artificial Intelligence Generated Content (AIGC), spoofing algorithms have also gained momentum. The convenience and the quality of generating spoof speech have improved significantly. Consequently, there is an increased risk of malicious use of spoof speech. Spoof speech is now used not only to attack automatic speaker verification (ASV) systems but also for telecommunications fraud and cognitive warfare. Much work has been proposed to prevent the dangers of spoof speech, facilitated in particular by the flagship ASVspoof Challenge series [1], [2], [3], [4]. The latest ASVspoof 2021 Challenge considered the impact of cross-channel and compression codecs on spoof speech detection systems. The spoof speech countermeasure (CM) based on pre-trained wav2vec 2.0 [5] and an integrated spectro-temporal graph attention network (AASIST) [6] achieves good results on multiple datasets [7].

Contact IEEE to Subscribe

References

References is not available for this document.