Conferences >ICASSP 2024 - 2024 IEEE Inter...

Can Large-Scale Vocoded Spoofed Data Improve Speech Spoofing Countermeasure with a Self-Supervised Front End?

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

A speech spoofing countermeasure (CM) that discriminates between unseen spoofed and bona fide data requires diverse training data. While many datasets use spoofed data ge...Show More

Metadata

Abstract:

A speech spoofing countermeasure (CM) that discriminates between unseen spoofed and bona fide data requires diverse training data. While many datasets use spoofed data generated by speech synthesis systems, it was recently found that data vocoded by neural vocoders were also effective as the spoofed training data. Since many neural vocoders are fast in building and generation, this study used multiple neural vocoders and created more than 9,000 hours of vocoded data on the basis of the VoxCeleb2 corpus. This study investigates how this large-scale vocoded data can improve spoofing countermeasures that use data-hungry self-supervised learning (SSL) models. Experiments demonstrated that the overall CM performance on multiple test sets improved when using features extracted by an SSL model continually trained on the vocoded data. Further improvement was observed when using a new SSL distilled from the two SSLs before and after the continual training. The CM with the distilled SSL outperformed the previous best model on challenging unseen test sets, including the ASVspoof 2019 logical access, WaveFake, and In-the-Wild.

Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 14-19 April 2024

Date Added to IEEE Xplore: 18 March 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP48485.2024.10446331

Conference Location: Seoul, Korea, Republic of

Contents

1. INTRODUCTION

The detection of spoofed speech generated by text-to-speech (TTS) and voice conversion (VC) systems is usually formulated as a binary classification task [1]. A detector, referred to as a spoofing countermeasure (CM), requires a significant amount of training data containing diverse human (bona fide) and synthesized (spoofed) speech waveforms. However, preparing various spoofed training data is costly. For example, it took a few months to develop the TTS and VC systems that generated the training set of the ASVspoof 2019 logical access (LA) database through trial and error [2].

References is not available for this document.

Can Large-Scale Vocoded Spoofed Data Improve Speech Spoofing Countermeasure with a Self-Supervised Front End?

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Can Large-Scale Vocoded Spoofed Data Improve Speech Spoofing Countermeasure with a Self-Supervised Front End?

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

1. INTRODUCTION

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?