Loading [a11y]/accessibility-menu.js
Aipnet: Generative Adversarial Pre-Training of Accent-Invariant Networks for End-To-End Speech Recognition | IEEE Conference Publication | IEEE Xplore

Aipnet: Generative Adversarial Pre-Training of Accent-Invariant Networks for End-To-End Speech Recognition


Abstract:

As one of the major sources in speech variability, accents have posed a grand challenge to the robustness of speech recognition systems. In this paper, our goal is to bui...Show More

Abstract:

As one of the major sources in speech variability, accents have posed a grand challenge to the robustness of speech recognition systems. In this paper, our goal is to build a unified end-to-end speech recognition system that generalizes well across accents. For this purpose, we propose a novel pre-training framework AIPNetbased on generative adversarial nets (GAN) for accent-invariant representation learning: Accent Invariant Pre-training Networks. We pre-train AIPNetto disentangle accent-invariant and accent-specific characteristics from acoustic features through adversarial training on accented data for which transcriptions are not necessarily available. We further fine-tune AIPNetby connecting the accent-invariant module with an attention-based encoder-decoder model for multi-accent speech recognition. In the experiments, our approach is compared against four baselines including both accent-dependent and accent-independent models. Experimental results on 9 English accents show that the proposed approach outperforms all the baselines by 2.3 ~ 4.5% relative reduction on average WER when transcriptions are available in all accents and by 1.6 ~ 6.1% relative reduction when transcriptions are only available in US accent.
Date of Conference: 04-08 May 2020
Date Added to IEEE Xplore: 09 April 2020
ISBN Information:

ISSN Information:

Conference Location: Barcelona, Spain

Contact IEEE to Subscribe

References

References is not available for this document.