Conferences >ICASSP 2019 - 2019 IEEE Inter...

Utterance-level Aggregation for Speaker Recognition in the Wild

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The objective of this paper is speaker recognition `in the wild' - where utterances may be of variable length and also contain irrelevant signals. Crucial elements in the...Show More

Metadata

Abstract:

The objective of this paper is speaker recognition `in the wild' - where utterances may be of variable length and also contain irrelevant signals. Crucial elements in the design of deep networks for this task are the type of trunk (frame level) network, and the method of temporal aggregation. We propose a powerful speaker recognition deep network, using a `thin-ResNet' trunk architecture, and a dictionary-based NetVLAD or GhostVLAD layer to aggregate features across time, that can be trained end-to-end. We show that our network achieves state of the art performance by a significant margin on the VoxCeleb1 test set for speaker recognition, whilst requiring fewer parameters than previous methods. We also investigate the effect of utterance length on performance, and conclude that for `in the wild' data, a longer length is beneficial.

Published in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 12-17 May 2019

Date Added to IEEE Xplore: 17 April 2019

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP.2019.8683120

Conference Location: Brighton, UK

Contents

References is not available for this document.

Utterance-level Aggregation for Speaker Recognition in the Wild

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Utterance-level Aggregation for Speaker Recognition in the Wild

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?