Conferences >2023 IEEE Automatic Speech Re...

Domain Adaptation by Data Distribution Matching Via Submodularity For Speech Recognition

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

We study the problem of building a domain-specific speech recognition model given some text from the target domain. One of the most popular approaches to this problem is ...Show More

Metadata

Abstract:

We study the problem of building a domain-specific speech recognition model given some text from the target domain. One of the most popular approaches to this problem is shallow fusion, which incorporates a domain-specific language model build from the given text. However, shallow fusion significantly increases the model size and inference cost, which makes its deployment harder. In this paper, we propose domain adaptation by data distribution matching, where a subset is selected from an existing multi-domain training data to match the target-domain distribution, and a model is fine-tuned on the subset. A submodular optimization algorithm with a novel extension is employed for the subset selection. Experiments on LibriSpeech, a corpus of audiobooks, where we treat each book as a domain, show that the proposed distribution-matching approach achieves WERs equivalent with the conventional shallow-fusion approach, without any increase in the model size and inference cost.

Published in: 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

Date of Conference: 16-20 December 2023

Date Added to IEEE Xplore: 19 January 2024

ISBN Information:

DOI: 10.1109/ASRU57964.2023.10389721

Conference Location: Taipei, Taiwan

No metrics found for this document.

Contents

1. Introduction

Domain adaptation [1, 2, 3] is a process to build a domain-specific model given a base model and some target-domain data. This is one of the most important capabilities of artificial intelligence, such as speech and language processing, since different applications have different domains of interest (e.g. finance, sports, travel, etc.), and it is often desired to deploy a customized, domain-specific model for each of such applications. In this work, we study domain adaptation of speech recognition models where some text from the target domain is available; we have chosen this scenario since “paired” data (i.e. speech and its transcription) is not available in many of the practical use cases, due to the cost needed to manually transcribe speech, but text-only data is relatively easy to prepare [4,Section 6].

Usage

Select a Year

View as

Total usage sinceJan 2024:117

Year Total:21

Data is updated monthly. Usage includes PDF downloads and HTML views.

Citations

Scopus^®

Search for
Citations in
Google Scholar^®

References is not available for this document.

Domain Adaptation by Data Distribution Matching Via Submodularity For Speech Recognition

Abstract:

Metadata

Abstract:

1. Introduction

View as

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Domain Adaptation by Data Distribution Matching Via Submodularity For Speech Recognition

Alerts

Abstract:

Metadata

Abstract:

1. Introduction

Authors

Figures

References

Keywords

Metrics

View as

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?