Loading [MathJax]/extensions/MathZoom.js
Domain Adaptation by Data Distribution Matching Via Submodularity For Speech Recognition | IEEE Conference Publication | IEEE Xplore

Domain Adaptation by Data Distribution Matching Via Submodularity For Speech Recognition


Abstract:

We study the problem of building a domain-specific speech recognition model given some text from the target domain. One of the most popular approaches to this problem is ...Show More

Abstract:

We study the problem of building a domain-specific speech recognition model given some text from the target domain. One of the most popular approaches to this problem is shallow fusion, which incorporates a domain-specific language model build from the given text. However, shallow fusion significantly increases the model size and inference cost, which makes its deployment harder. In this paper, we propose domain adaptation by data distribution matching, where a subset is selected from an existing multi-domain training data to match the target-domain distribution, and a model is fine-tuned on the subset. A submodular optimization algorithm with a novel extension is employed for the subset selection. Experiments on LibriSpeech, a corpus of audiobooks, where we treat each book as a domain, show that the proposed distribution-matching approach achieves WERs equivalent with the conventional shallow-fusion approach, without any increase in the model size and inference cost.
Date of Conference: 16-20 December 2023
Date Added to IEEE Xplore: 19 January 2024
ISBN Information:
Conference Location: Taipei, Taiwan
No metrics found for this document.

1. Introduction

Domain adaptation [1, 2, 3] is a process to build a domain-specific model given a base model and some target-domain data. This is one of the most important capabilities of artificial intelligence, such as speech and language processing, since different applications have different domains of interest (e.g. finance, sports, travel, etc.), and it is often desired to deploy a customized, domain-specific model for each of such applications. In this work, we study domain adaptation of speech recognition models where some text from the target domain is available; we have chosen this scenario since “paired” data (i.e. speech and its transcription) is not available in many of the practical use cases, due to the cost needed to manually transcribe speech, but text-only data is relatively easy to prepare [4,Section 6].

Usage
Select a Year
2025

View as

Total usage sinceJan 2024:117
01234567JanFebMarAprMayJunJulAugSepOctNovDec560550000000
Year Total:21
Data is updated monthly. Usage includes PDF downloads and HTML views.

Contact IEEE to Subscribe

References

References is not available for this document.