Loading [a11y]/accessibility-menu.js
On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition | IEEE Journals & Magazine | IEEE Xplore

On-the-Fly Data Loader and Utterance-Level Aggregation for Speaker and Language Recognition


Abstract:

In this article, our recent efforts on directly modeling utterance-level aggregation for speaker and language recognition is summarized. First, an on-the-fly data loader ...Show More

Abstract:

In this article, our recent efforts on directly modeling utterance-level aggregation for speaker and language recognition is summarized. First, an on-the-fly data loader for efficient network training is proposed. The data loader acts as a bridge between the full-length utterances and the network. It generates mini-batch samples on the fly, which allows batch-wise variable-length training and online data augmentation. Second, the traditional dictionary learning and Baum-Welch statistical accumulation mechanisms are applied to the network structure, and a learnable dictionary encoding (LDE) layer is introduced. The former accumulates discriminative statistics from the variable-length input sequence and outputs a single fixed-dimensional utterance-level representation. Experiments were conducted on four different datasets, namely NIST LRE 2007, AP17-OLR, SITW, and NIST SRE 2016. Experimental results show the effectiveness of the proposed batch-wise variable-length training with online data augmentation and the LDE layer, which significantly outperforms the baseline methods.
Page(s): 1038 - 1051
Date of Publication: 16 March 2020

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.